━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ AOS 19/20 PROJECT PRESENTATION Leonardo Tamiano ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Table of Contents ───────────────── 1. Introduction 2. Assumptions and Notes 3. Project Structure .. 1. Installation .. 2. Usage 4. Implementation Details .. 1. Data Structures ..... 1. Message ..... 2. Queue ..... 3. Session State ..... 4. Object State .. 2. Global Variables .. 3. External Parameters .. 4. Main Ideas ..... 1. Queue FIFO ..... 2. Referencing custom I/O session data ..... 3. IOCTL interface ..... 4. Writes ..... 5. Reads ..... 6. IOCTL commands ..... 7. Flushing the Device ..... 8. Cleanup of session_state 1 Introduction ══════════════ This file contains the description of the project for the course of Advanced Operating System taught by Francesco Quaglia during the year 2019-2020 at the university of Tor Vergata for the Computer Engineering master's degree. All the content of the file, as well as all the content within this folder was written by Leonardo Tamiano, a student currently enrolled at Tor Vergata in a master's degree on Computer Science. 2 Assumptions and Notes ═══════════════════════ Before a more detailed discussion on the project structure a couple of couple of assumption and notes which I think are worth mentioning: • The flushing of the device is done on a per-instance basis. This means that we can choose to flush only a specific instance of the device file at a time (identified by a minor number). To flush all the instances we'd have to iterate the process of flushing each instance, but the code written does not guarantees on the atomicity of such operation. • Also regarding the flushing, in the current implementation when defining the operations for the driver I have commented out the dev_flush() operation ┌──── │ static struct file_operations fops = { │ .owner = THIS_MODULE, │ .write = dev_write, │ .read = dev_read, │ .open = dev_open, │ .release = dev_release, │ .unlocked_ioctl = dev_ioctl, │ // .flush = dev_flush, │ }; └──── this choice was a consequence of the fact that I've tried a simple write() syscall and discovered that such syscall triggers the activation of the dev_flush() function. To make the device testable I thus have commented it out. A possible "fix" for this would be to simply add a command to the IOCTL interface that dealt with the flushing of the device instance. This would give more control to the final user, as it would allow the user of the device instance to specify exactly when the instance should be flushed. Since I believe these considerations to be a bit "out of scope" for the project at hand, I have decided to simply mention them here and ignore them afterwards. 3 Project Structure ═══════════════════ The project is structured as follows • ./src/main.c, main portion of the device driver code. • ./src/include/, contains all the header files, which are • ./src/include/utils.h • ./src/include/chardev.h • ./src/include/message.h • ./src/include/queue.h • ./src/lib/, contains all the remaining src files. • ./src/lib/message.c • ./src/lib/queue.c • ./src/user/, contains the user portion of the code used to test the device driver. 3.1 Installation ──────────────── To install the device you first have to compile it. This can be done with the make utility ┌──── │ cd ./src │ make └──── You can then insert and remove the module into the kernel with ┌──── │ sudo insmod timed-messaging-driver.ko │ sudo rmmod timed-messaging-driver └──── Finally, to use the driver you have to create a device file for it, which you can do with the following command ┌──── │ mknod my-dev c └──── where and are defined in the file ./src/include/chardev.h (read 'Implementation Details -> Main Ideas -> IOCTL interface' for more information). Since the driver allows to control more than a single device-file instance, you can create multiple instances with the same major number but with different minor numbers. The minor number choosen however has to be less than the constant MINOR defined in ./include/chardev.h. ┌──── │ mknod my-dev c 239 0 │ mknod my-dev c 239 1 └──── 3.2 Usage ───────── Once you have inserted the module and created the device file that uses our specific driver, if you also want to test it you have to compile the user space code present in ./src/user/. ┌──── │ cd ./src/user │ make └──── In particular you can write and read to it using the user space code provided in the ./src/user section. For example, suppose we have created the device files directly on the ./src/user folder. Then we can do the following ┌──── │ # write on the device file a message │ ./write dev0 hello-0 │ │ # read from the device file the message written │ ./read dev0 # prints hello-0 │ │ # write a message on another instance of the device file │ ./write dev1 hello-1 └──── 4 Implementation Details ════════════════════════ 4.1 Data Structures ─────────────────── Throughout the project the following main data structures have been used 4.1.1 Message ╌╌╌╌╌╌╌╌╌╌╌╌╌ The struct message is defined in ./src/include/message.h. This struct is used to represent a message. Further information could be added, depending on future needs for the driver. ┌──── │ typedef struct { │ int length; │ char *data; │ } message; └──── Currently there are only two functions to interact with this struct, both of them present in ./src/lib/message.c ┌──── │ message *create_message(const char *buff, size_t len, int user); │ void destroy_message(message *m); └──── 4.1.2 Queue ╌╌╌╌╌╌╌╌╌╌╌ The struct queue is defined in ./src/include/queue.h This struct is used to represent a queue. ┌──── │ typedef struct queue { │ int total_size; │ int len; │ message *m; │ struct list_head list; │ } queue; └──── Currently the following methods are implemented and can be found in ./src/lib/queue.c ┌──── │ queue *init_queue(void); │ void enqueue(queue *q, message *m); │ message *dequeue(queue *q); │ void flush_queue(queue *q); └──── 4.1.3 Session State ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ The struct session_state is defined in main.c. This struct contains custom data required to handle a single I/O session. ┌──── │ typedef struct session_state { │ unsigned long send_timeout; // in nanoseconds │ unsigned long recv_timeout; // in nanoseconds │ │ atomic_t valid; │ │ int minor; │ │ struct mutex writes_synchronizer; │ atomic_t mtimer_size; // num of msg yet to be posted │ struct mtimer { │ // message to be posted │ message *m; │ │ // add this in order so that we don't have to iterate │ // the list everytime in order to get session_state │ // pointer. │ struct session_state *state; │ │ int valid; │ │ struct hrtimer hr_timer; │ struct list_head list; │ } m_timer; │ │ } session_state; └──── Of particular interests are the fields send_timeout and recv_timeout, which the driver checks to understand whether or not the ioctl() interface was used to change the behavior of the dev_write() and dev_read() function for the current session. Another thing to notice is the struct mtimer, which is used to reference the data of a message that was "written" by an I/O session with a send_timeout > 0. A single I/O session contains a list of such structs in order to enable the device driver to access all the pending writes for the current session. This is done for example in the implementation of the IOCTL_REVOKE_DELAYED_MESSAGES command. 4.1.4 Object State ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ The struct _object_state is defined in main.c This struct contains all the data regarding a single device file instance. ┌──── │ typedef struct _object_state { │ // FIFO queue │ struct mutex queue_synchronizer; │ queue *q; │ │ // list of active I/O sessions │ struct mutex session_list_synchronizer; │ struct active_session { │ session_state *session; │ struct list_head list; │ } active_session; │ │ // waitqueue in which to place read I/O sessions in timeout │ wait_queue_head_t wq; │ │ // list of read I/O sessions in timeout. Used to set the │ // correct value for next_reader ptr. │ struct mutex readers_synchronizer; │ struct reader { │ int valid; │ int id; │ struct list_head list; │ } reader; │ │ // next I/O session ready to read in case anyone writes a │ // message to the queue. │ unsigned int next_reader_id; │ │ // indicates when the device is being flushed │ int flushing; │ │ } object_state; └──── As we can see, for each instance the driver can access two lists: one containing the session_state for all the currently active I/O sessions on that instance, and another containing the session_state for all the sessions that are currently waiting in the waitqueue wq. The choice to have two separate lists enables the driver to wake up only the first session that went to sleep on a read() with a rcv_timeout > 0, rather than the first one that opened the device with respect to the current active sessions. This in particular is done using the function wake_readers(), defined in main.c, which internally sets the next_reader_id field. Further details are discussed in Main Ideas->Reads subsection. 4.2 Global Variables ──────────────────── To manage the driver the following global variables were used ┌──── │ // ... main.c │ │ int MAX_MESSAGE_SIZE = 4096; │ int MAX_STORAGE_SIZE = 4096 * 20; │ int size_so_far = 0; │ │ object_state objects[DEVICE_MINORS]; └──── where DEVICE_MINORS is a constant defined in ./src/include/chardev.h. 4.3 External Parameters ─────────────────────── The variables MAX_MESSAGE_SIZE and MAX_STORAGE_SIZE were also referenced as external parameters to the module with the following code ┌──── │ // ... main.c │ │ module_param(MAX_MESSAGE_SIZE, int, 0660); │ module_param(MAX_STORAGE_SIZE, int, 0660); └──── This allows the user to access and modify the value of these fields using the /sys VFS ┌──── │ # read current value of MAX_MESSAGE_SIZE │ cat /sys/module/timed_messaging_driver/parameters/MAX_MESSAGE_SIZE │ │ # modify current value of MAX_MESSAGE_SIZE │ echo "24" > /sys/module/timed_messaging_driver/parameters/MAX_MESSAGE_SIZE └──── 4.4 Main Ideas ────────────── I will now discuss the main ideas behind the implementation of the device driver in order to meet the specificed behavior. During the discussion pieces of code will be shown. These pieces are not meant to be complete, but the complete code is always referenced for a more thorough analysis. 4.4.1 Queue FIFO ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ To implement the queue in order to follow a FIFO (First-In-First-Out) behavior the list_add_tail() API offered by the kernel was used. The specific piece of code is found in ./src/lib/queue.c ┌──── │ void enqueue(queue *q, message *m) │ { │ queue *new_q; │ │ new_q = kmalloc(sizeof(queue), GFP_KERNEL); │ // ... │ │ new_q->m = m; │ │ q->len++; │ q->total_size += m->length; │ │ // new entry is added at the end of the list │ list_add_tail(&(new_q->list), &(q->list)); │ } └──── 4.4.2 Referencing custom I/O session data ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ In order to quickly access the struct session_state for the current I/O session, the private_data pointer of filp was used. In particular it is set in dev_open(), during the initialization of the session. ┌──── │ static int dev_open(struct inode *inode, struct file *filp) │ { │ // ... │ │ // initialize session data │ // ... │ │ // access session state quickly from filp │ filp->private_data = (void *)state; │ │ // add session to list of active sessions │ // ... │ } └──── With this, as long as we have a reference to a filp pointer, for example when we're on a dev_write(), we can quickly access the session_state with filp->private_data and the object_state with get_minor(filp), where get_minor() is a macro defined in ./src/include/utils.h that extracts the minor number of the device file from the filp pointer. 4.4.3 IOCTL interface ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ The IOCTL interface used the macros offered by the kernel in order to define the following commands, which are present in ./src/include/chardev.h ┌──── │ // commands to be exposed to the ioctl interface │ #define IOCTL_SET_SEND_TIMEOUT _IOW(DEVICE_MAJOR, 0, unsigned long) │ #define IOCTL_SET_RECV_TIMEOUT _IOW(DEVICE_MAJOR, 1, unsigned long) │ #define IOCTL_REVOKE_DELAYED_MESSAGES _IO(DEVICE_MAJOR, 2) └──── It is worth nothing that this file is also shared with the user code that wants to use this device. Also, since these macros have to be included in the module, the MAJOR number assigned to the device cannot be dynamically obtained. This is why in ./src/include/chardev.h we have also included a static version of it. ┌──── │ #define DEVICE_MAJOR 239 │ #define DEVICE_MINORS 2 └──── For more information look here: • 4.4.4 Writes ╌╌╌╌╌╌╌╌╌╌╌╌ Regardless if the write has a timeout or not, the first dev_write() does is check the size currently used with respec to the length of the message to be posted to eventually retun with an error message ┌──── │ // dev_write()... │ │ if (atomic_read((atomic_t *)&size_so_far) + len > MAX_STORAGE_SIZE) { │ AUDIT │ printk("%s - [ERROR] dev_write() for 0x%p not enough space left on device\n", │ MODNAME, filp); │ │ // no space on device │ return -ENOSPC; │ } │ │ atomic_set((atomic_t *)&size_so_far, size_so_far + len); └──── After that the message is created using the message API implemented in ./src/lib/message.c. Notice the third parameter: it is set to 1, indicating that the buff points to a user-space buffer, and thus that a copy_from_user() function has to be used. ┌──── │ // dev_write()... │ │ // ... │ m = create_message(buff, len, 1); │ // ... └──── Then, to check wheter or not the write has to be done with a timeout or not, we check if the session_data struct for the current session has the field send_timeout > 0. If it isn't, we simply add the message to the queue using the enqueue() API implemented in ./src/lib/queue.c. Having written a new message, we wake up eventual hang readers (read the next section for more info on the readers side) ┌──── │ // dev_write()... │ │ mutex_lock(&(the_object->queue_synchronizer)); │ enqueue(q, m); │ mutex_unlock(&(the_object->queue_synchronizer)); │ │ // wake-up readers │ wake_readers(the_object); └──── If instead there is a send_timeout > 0, then we use the high-resolution timers support offered by the kernel API to start a timer for the given timeout. In particular the writer allocates memory for the mtimer struct, adds it to the list associated with the current I/O session and starts the hr-timer. ┌──── │ // dev_write()... │ │ ktime_interval = ktime_set(0, state->send_timeout); │ │ hrtimer_init(&(m_timer->hr_timer), CLOCK_MONOTONIC, HRTIMER_MODE_REL); │ m_timer->hr_timer.function = &hrtimer_callback; │ │ // add timer to list of timers │ mutex_lock(&(state->writes_synchronizer)); │ list_add_tail(&(m_timer->list), &(state->m_timer.list)); │ atomic_inc(&state->mtimer_size); │ mutex_unlock(&(state->writes_synchronizer)); │ │ // start timer │ hrtimer_start(&(m_timer->hr_timer), ktime_interval, HRTIMER_MODE_REL); └──── Once the hr-timer finished, the hrtimer-callback(), defined in main.c, gets called in order to finish the write and actually post the message in the queue. This function checks if in the meantime a call to IOCTL_REVOKE_DELAYED_MESSAGES was made to invalidate the message, and if it wasn't made, it posts the message on the queue. ┌──── │ // hrtimer-callback()... │ │ // if message still valid, post message in the queue │ if (m_timer->valid) { │ mutex_lock(&(the_object->queue_synchronizer)); │ enqueue(the_object->q, m_timer->m); │ // if there are readers waiting, wake them up │ wake_readers(the_object); │ mutex_unlock(&(the_object->queue_synchronizer)); │ } else { │ AUDIT │ printk("%s: hrtimer_callback() for state 0x%p with hrtimer 0x%p cancalled\n", │ MODNAME, state, timer); │ │ // otherwise update device's size and de-allocate mem │ atomic_set((atomic_t *)&size_so_far, │ size_so_far - m_timer->m->length); │ │ destroy_message(m_timer->m); │ } └──── 4.4.5 Reads ╌╌╌╌╌╌╌╌╌╌╌ A read with rcv_timeout == 0 will simply check if there are any messages in the queue. If there are no message it immediately returns, otherwise it takes the first one available using the dequeue() API implemented in ./src/lib/queue.c If instead rcv_timeout > 0, then a struct reader will be created, and it will be added to the list of waiting readers present in the _object_state->reader.list associated with the device instance being used by the session. Upon the creation of the reader, a random int is generated using the kernel API get_random_int(). This id will be later used to identify which reader to wake up. ┌──── │ // dev_read()... │ │ reader = kmalloc(sizeof(struct reader), GFP_KERNEL); │ reader->valid = 1; │ reader->id = get_random_int(); │ │ // add session to list of read I/O session on timeout │ mutex_lock(&(the_object->readers_synchronizer)); │ list_add_tail(&(reader->list), &(the_object->reader.list)); │ mutex_unlock(&(the_object->readers_synchronizer)); └──── After that the reader will go to sleep using the wait-event-queue API offered by the kernel ┌──── │ // dev_read()... │ │ ktime_interval = ktime_set(0, state->recv_timeout); │ // wait until we're next reader, device is being │ // flushed, or timeout is over. │ wait_event_hrtimeout(the_object->wq, │ the_object->next_reader_id == reader->id || the_object->flushing == 1, │ ktime_interval); └──── Notice the main event we're waiting on: ┌──── │ the_object->next_reader_id == reader->id || the_object->flushing == 1, └──── this event is true if the device is being flushed (read further), or if the read is the first among all other reads that are currently waiting. To implement this idea, the next_reader_id value is updated by the function wake_readers(), defined in main.c. This function simply picks the first element from the list _object_state->(reader.list) and sets the next_reader accordingly ┌──── │ static void wake_readers(object_state *the_object) │ { │ struct reader *reader; │ struct list_head *head, *pos; │ │ head = &the_object->reader.list; │ │ mutex_lock(&(the_object->readers_synchronizer)); │ list_for_each (pos, head) { │ reader = list_entry(pos, struct reader, list); │ │ if (reader->valid) { │ the_object->next_reader = reader; │ // invalidates reader so next time it is not │ // taken even if it was not deleted in time. │ reader->valid = 0; │ wake_up(&(the_object->wq)); │ break; │ } │ } │ mutex_unlock(&(the_object->readers_synchronizer)); │ } └──── after the reader is awoken it first checks if it was awoken for a flushing of the device, in which case it simply returns. Otherwise it checks if there are any message left, and if there are, it picks the first one to be delivered and returns it to the user. 4.4.6 IOCTL commands ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ The IOCTL_SET_SEND_TIMEOUT and IOCTL_SET_RECV_TIMEOUT simply set the specific field of the session_struct to the supplied timeout value. ┌──── │ // dev_ioctl()... │ │ switch (ioctl_num) { │ │ case IOCTL_SET_SEND_TIMEOUT: │ // do not directly store messages upon write() but wait a timeout │ state->send_timeout = param; │ │ AUDIT │ printk("%s: dev_ioctl() for 0x%p with IOCTL_SET_SEND_TIMEOUT of %lu\n", │ MODNAME, filp, param); │ │ break; │ │ case IOCTL_SET_RECV_TIMEOUT: │ // do not directly return if read on empty queue but wait a timeout │ state->recv_timeout = param; │ │ AUDIT │ printk("%s: dev_ioctl() for 0x%p with IOCTL_SET_RECV_TIMEOUT of %lu\n", │ MODNAME, filp, param); │ │ break; │ │ // ... └──── The IOCTL_REVOKE_DELAYED_MESSAGES implementation instead iterates the list of m_timer structs found in session_data->mtimer.list and invalidates each message by setting the value of the valid field to 0. This is done so that when the hrtimer_callback() is executed, it will know that the message is no longer valid and thus the message will not post be posted on the queue. ┌──── │ // dev_ioctl()... │ │ // ... │ case IOCTL_REVOKE_DELAYED_MESSAGES: │ head = &(state->m_timer.list); │ │ // iterate list and invalid all messages │ mutex_lock(&(state->writes_synchronizer)); │ list_for_each (pos, head) { │ m_timer = list_entry(pos, struct mtimer, list); │ m_timer->valid = 0; │ } │ mutex_unlock(&(state->writes_synchronizer)); │ │ break; │ │ } └──── The IOCTL_REVOKE_DELAYED_MESSAGES command can also be called from the dev_flush() function during the flushing of a device instance. For more information, read the next subsection. 4.4.7 Flushing the Device ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ The flushing of the device takes place in two different situation: 1. By calling dev_flush() with a correct value of filp. This is done for example when we set .flush = dev_flush in the struct file_operations of the driver. 2. During the cleanup_module() in order to clean the resources used. In this case dev_flush() is called with filp == NULL. Worth of notice is the fact that the way to get a reference to the object_state pointer changes depending on how we were called. In particular if dev_flush() is called from cleanup_module(), then the minor number of the device is passed as the second argument, and thus a type conversion is needed. ┌──── │ // dev_flush()... │ │ // check if dev_flush() is being called by cleanup_module() │ the_object = filp ? &objects[get_minor(filp)] : &objects[(int)id]; └──── Other than that, we have to make sure that dev_flush() is only executed by a single thread at any given time. To guarantee this the following barrier was used ┌──── │ // dev_flush()... │ │ ret = __sync_bool_compare_and_swap(&(the_object->flushing), 0, 1); │ if (!ret) { │ // dev_flush() already executing │ return 0; │ } │ │ // starts actual execution └──── The actual execution of dev_flush() is pretty straightforward: 1. All resources get locked ┌──── │ // dev_flush()... │ │ mutex_lock(&the_object->session_list_synchronizer); │ mutex_lock(&the_object->readers_synchronizer); │ mutex_lock(&the_object->queue_synchronizer); └──── 2. All delayed message for all active I/O sessions are revoked by iterating on the list _object_state->active_session.list and invoking a dev_ioctl() with the IOCTL_REVOKE_DELAYED_MESSAGES command. ┌──── │ // dev_flush()... │ │ head = &the_object->active_session.list; │ list_for_each (pos, head) { │ session = list_entry(pos, struct active_session, list); │ │ // NOTE: instead of using filp we pass directly the session as the │ // third argument │ dev_ioctl(NULL, IOCTL_REVOKE_DELAYED_MESSAGES, │ (unsigned long)session->session); │ } └──── 3. The queue gets flushed using the flesh_queue() API implemented in ./src/lib/queue.c. ┌──── │ // dev_flush()... │ │ total_size = the_object->q->total_size; │ flush_queue(the_object->q); └──── After revoking all the messages, the size of the device is updated accordingly ┌──── │ // dev_flush()... │ │ atomic_set((atomic_t *)&size_so_far, size_so_far - total_size); └──── 4. All the readers are woken up ┌──── │ // dev_flush()... │ │ wake_up(&(the_object->wq)); └──── 5. All resources get unlocked ┌──── │ // dev_flush()... │ │ mutex_unlock(&(the_object->session_list_synchronizer)); │ mutex_unlock(&(the_object->readers_synchronizer)); │ mutex_unlock(&(the_object->queue_synchronizer)); └──── 6. The variable flushing, indicating wheter or not a flushing is taking place, is reset to its default value. ┌──── │ // dev_flush()... │ │ atomic_set((atomic_t *)&the_object->flushing, 0); └──── 4.4.8 Cleanup of session_state ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ Whenever a I/O session is about to be closed in dev_release() a check is made if there are any writes on the session that have still to be executed (writes with send_timeout > 0). If there are no such writes, then the memory used for the session_state struct can be freed, and the session can be removed from the list of active sessions using the remove_session() function, defined in main.c. ┌──── │ // dev_release() │ │ // ... │ if (atomic_read(&state->mtimer_size) == 0) { │ // no delayed write to execute │ remove_session(state); │ kfree((session_state *)filp->private_data); │ } │ │ // ... └──── Otherwise the memory cannot yet be freed. In this case the valid field of the session is set to 0, indicating the session is no longer active. ┌──── │ // dev_release() │ │ // ... │ else { │ atomic_set(&state->valid, 0); │ } └──── In this second case the actually deallocation of the memory is handled by the hrtimer_callback() function, which at the end of its execution checks if the session is no longer active, and if there are no more writes pending, and if that is the case, it takes care of closing the resources associated with the session ┌──── │ // hrtimer_callback() │ │ // de-allocate memory │ kfree(m_timer); │ │ // signal we are done │ atomic_dec(&state->mtimer_size); │ │ if ((atomic_read(&state->valid) == 0) && │ (atomic_read(&state->mtimer_size) == 0)) { │ // we are the last hr-timer for current I/O session, │ // thus we have to free memory and remove our session │ // from the list of active sessions. │ remove_session(state); │ kfree(state); │ } │ │ │ return HRTIMER_NORESTART; │ } └────