Learning common data structures of Linux kernel

Common data structures of Linux kernel

         Linux kernel provides many practical built-in data structures for developers. These structures can help developers save time in developing and designing personalized methods, and these data structures have rich and powerful functions. This paper mainly takes 4.1 as an example

Several commonly used are: linked list queue red black tree mapping

1. Linked list

         Linked list is the simplest and most common data structure in Linux kernel. It is a data structure for storing and operating a variable number of nodes. The difference from static array is that all its nodes are dynamically created, inserted and deleted. During compilation, you do not need to know the specific number, creation time is different, and does not occupy continuous memory space.

Uniqueness of Linux kernel linked list: in the past, when using linked list, people usually added a pointer to the previous node and a pointer to the next node in the structure. for example

struct dog{
    unsigned int age;
    unsigned int weight;
    struct dog *next;
    struct dog *prev;
}

         Although this method is very common, it is not universal enough. Each time a new structure linked list is created, it should be matched with this, and the corresponding linked list and structure are used as parameter operation methods. Linux kernel provides a general linked list construction and corresponding linked list operation method.

         The structure code of the linked list is in the < Linux / list. H > file. The structure is as follows:

struct list_head{
    struct list_head *next;
    struct list_head *prev;
}

        It seems no different from before, but its magic is that it can be directly embedded into any structure to generate a linked list. Let's see how it is used.

struct dog{
    unsigned int age;
    unsigned int weight;
    struct list_head list;
}

         Next, all data structures only need to use the operation structure list_head's linked list operation method can realize all the operations of the linked list. Before talking about the operation of linked list, we need to explain some students' questions. The structure of linked list operation is list_head, but this part is not the node information we need for our work.

         When compiling C language, the variable offset of a given structure is fixed, that is, the pointer of structure dog and structure list_ The offset between the head pointers is fixed, so we get a list_ The pointer of the dog structure to which the current node belongs can be found by using the pointer of the head. The specific execution method is container_of as follows,

#define container_of(ptr, type,member) \
{   const typeo( ((type *)0)->member) *__mptr = (ptr); \
    (type*)( (char*)__mptr - offsetof(type,member)) }

         Linux defines a macro function to return a list_ The function of the parent type structure pointer of the head structure. Relying on this method, the kernel provides various routines for creation operation and linked list management without paying attention to the list_head is the data structure of the embedded object.

#define list_entry(ptr,type,member) container_of(ptr,type,member)

Initialization of linked list: because most nodes are created dynamically, the most common way is to initialize the linked list at run time. There are several ways to initialize the linked list

The pointer of the chain header is initialized as a parameter

#define INIT_LIST_HEAD(list);  {list->next = list; list->prev = list;}

    struct dog *yellow_dog;

    INIT_LIST_HEAD(&yellow_dog->list);

Initializing structure members

struct dog yellow_dog{

    .age = 4,
    .weight = 6,
    .list = LIST_HEAD_INIT(yellow_dog.list),

};
#define  LIST_HEAD_INIT(yellow_dog.list)  {&(yellow_dog.list),&(yellow_dog.list)}

         I believe you should understand that the first thing to initialize is the chain header. Here, there is another question: isn't there only a linear chain list that needs the chain header? Why do circular two-way linked lists also need?

         Although in the way given by the Linux kernel above, each node can start to traverse the whole linked list as a chain header, sometimes a special chain header is required. For example, in the routines supporting the linked list in the Linux kernel, a standard index pointer is required to point to the linked list.

static LIST_HEAD(dog_list);

#define LIST_HEAD(name) struct list_head name = LIST_HEAD_INIT(name)
  • Operation of linked list

         For the operation of the linked list, the most basic is to add nodes, delete the docking between nodes, traverse forward and backward, and so on. We continue to use the example given above here, assuming that the chain header node is dog_list

Addition of linked list nodes

         There are two functions to add a linked list node. Add a new node before and after the chain header

 static inline void __list_add(struct list_head *new,struct list_head *prev, 
    struct list_head *next)

{       
    next->prev = new;
    new->next = next;
    new->prev = prev;
    prev->next = new;
}
list_add(struct list_head *new, struct list_head *head);
{
    __list_add(new, head, head->next);
}

         This function inserts a new node after the head node, because the ring linked list does not have a really fixed head. If the last inserted node is always used as the head, then this function can be used to make a stack structure. (you may need a temp pointer that always points to the last node as an auxiliary when using)

list_add_tail(struct list_head *new, struct list_head *head);
{

    __list_add(new, head->prev, head);

}

         This function inserts a new node and a list in front of the head node_ Similar to add, this function can implement a queue if the first inserted element is always regarded as head.

Deletion of linked list nodes

         In addition to adding linked list nodes, the most important operation is to delete a node from the linked list and call list_del function, which can remove the node from the linked list but will not release the node, that is, the front and back pointers of the deleted node still point to the nodes in the linked list. Therefore, after the function is called, it is usually necessary to re initialize the pointer of the deleted node.

static inline void __list_del(struct list_head* prev, struct list_head *next)
{

    next->prev = prev;
    prev->next = next;

}
static inline void list_del(struct list_head* entry)
{
    __list_del(entry->prev,entry->next);
}
static inline void list_del_init(struct list_head* entry)
{
    __list_del(entry->prev,entry->next);
    INIT_LIST_HEAD(entry);
}

Move and merge linked list nodes

         Remove the node from a linked list and add it to the head node of another linked list, list_move, or remove the node from a linked list and add it to the list in front of the head node of another linked list_move_tail;

list_move(struct list_head *list, struct list_head *head);

list_move_tail(struct list_head *list, struct list_head *head);

The implementation of these two functions is to remove the node first, and then add the removed node to the front and back of the head node

Check whether the linked list is empty

list_empty(struct list_head *head);

The implementation of this function is return head - > next = = head. If the linked list is empty, it returns non-zero, otherwise it returns 0;

Linked list merge

list_splice(struct list_head *list, struct list_head *head);

This function can merge the two linked lists, and the linked list pointed to by the list is inserted behind the head node

static inline void __list_splice(const struct list_head *list, struct list_head 
    *prev, struct list_head *next)
{
    struct list_head *first = list->next;
    struct list_head *last = list->prev;

    first->prev = prev;
    prev->next = first;

    last->next = next;
    next->prev = last;
}
static inline void list_splice(const struct list_head *list,struct list_head *head)
{
    if (!list_empty(list))
        __list_splice(list, head, head->next);
}

In addition, there are functions that can initialize the original linked list

static inline void list_splice_init(struct list_head *list, struct list_head *head)
{
    if (!list_empty(list)) {
    __list_splice(list, head, head->next);
    INIT_LIST_HEAD(list);
    }
}

Traversal linked list

         The linked list is just a container that can store data. If we can't access the required data, these are meaningless. Therefore, we must be able to use the linked list to move and access the structure containing the required data.

         The most basic way to traverse a linked list is to use list_for_each macro

#define list_for_each(pos,head)   \

for(pos = (head)->next; pos!=(head);pos = pos->next)

         However, point to the linked list structure list_ The pointer to head is usually useless. We usually use a list that points to a list containing a linked list structure_ Head structure pointer, so the kernel uses the previously improved list_ The entry macro provides a traversal macro.

#define list_for_each_entry(pos, head, member)                          \

        for (pos = list_first_entry(head, typeof(*pos), member);        \

        &pos->member != (head);                                    \

        pos = list_next_entry(pos, member))


#define list_first_entry(ptr, type, member) \

        list_entry((ptr)->next, type, member)


#define list_next_entry(pos, member) \

        list_entry((pos)->member.next, typeof(*(pos)), member)

         pos refers to the containing list_ Pointer to the head node object, which should be a list_ The return value of entry, head is a pointer to the head node, and member is the list in pos_ The variable name of the head structure.

         Reverse traverse the linked list macro to list_ for_ each_ entry_ The work of reserve is similar to the above, except that it is reverse traversal

#define list_for_each_entry_reverse(pos, head, member)          \

    for (pos = list_last_entry(head, typeof(*pos), member);         \

    &pos->member != (head);                                    \

    pos = list_prev_entry(pos, member))

         If you want to delete nodes while traversing, it is not possible, because the previous linked list traversal method is based on the fact that you will not modify the linked list. If the current item is deleted during traversal, the next traversal cannot obtain the next or prev pointer. For example, use list_for_each_entry, because after deleting the current node, you need to continue to access the next node. At this time, the node to be revoked is required, so the traversal task cannot be completed. Typically, developers store next or prev pointers into temporary variables before potential deletion operations. The Linux kernel already provides this routine

list_for_each_entry_safe(pos,next,head,member);

         This function can be used according to the above routine, but it needs to provide an additional next pointer. The next pointer is of the same type as the pos pointer, which is used to store the next item, so that the current item can be deleted safely

#define list_for_each_entry_safe(pos, n, head, member)             \

        for(pos = list_first_entry(head, typeof(*pos), member),      \

        n = list_next_entry(pos, member);                           \
        &pos->member != (head);                                  \
        pos = n, n = list_next_entry(n, member))

         Similarly, if the node is deleted while traversing the linked list in reverse, the kernel provides a list_for_each_entry_safe_reserve. Note that the traversal of the secure version can only protect the deletion of data from the linked list in the loop body. If there are concurrent deletions in other places at this time, it is not safe, so the linked list should be locked.

Other linked list methods can be found in < Linux / list. H >

2. Queue

         Queue is a data structure that no operating system can do without. For example, when processing network packets, you need to process the data in order. The order of data is very important. The first arriving packet should be processed first, and the last arriving packet should be processed last. This is the characteristic of queues. FIFO,first in first out

         The related method structure of the general queue in the Linux kernel is implemented in kfifo.c, and the declaration and macro are defined in kfifo.h. The APIs of the 2.6 kernel and the 4.1 kernel are different. The specific version is used to check the code, but the idea is in the same line and has reference significance. This section takes the 4.1 kernel as the standard.

  • kfifo

         Kfifo of Linux provides two main operations, queue in and queue out. The kfifo object maintains two offsets: the entry offset and the exit offset. The entry offset refers to the next queue entry position, and the exit offset refers to the next queue exit position.

         Under normal circumstances, the outlet offset is always less than or equal to the inlet offset, otherwise it is meaningless.       

struct __kfifo {

    unsigned int    in;      
    unsigned int    out;
    unsigned int    mask;
    unsigned int    esize;  //The unit size of size is generally the size of unsigned char
    void            *data;
};
struct kfifo {

    union {
        struct __kfifo  kfifo;
        unsigned char        *type;
        const unsigned char  *const_type;
        char            (*rectype)[0];
        void         *ptr;
        void const   *ptr_const;
    }
    unsigned char            buf[0];
}

The above structure is sorted out according to the kfifo.h file in the 4.1 kernel. In the actual code, kfifo is a dynamic application type

The method of using hierarchical macro definition is simple and efficient to meet the needs of dynamically modifying data types and pointer types

struct kfifo __STRUCT_KFIFO_PTR(unsigned char, 0, void);


#define __STRUCT_KFIFO_PTR(type, recsize, ptrtype) \

{ \
    __STRUCT_KFIFO_COMMON(type, recsize, ptrtype); \
    type            buf[0]; \
}


#define __STRUCT_KFIFO_COMMON(datatype, recsize, ptrtype) \

    union { \
        struct __kfifo  kfifo; \

        datatype        *type; \

        const datatype  *const_type; \

        char            (*rectype)[recsize]; \

        ptrtype         *ptr; \

        ptrtype const   *ptr_const; \
        }

Create queue

         kfifo must be defined and initialized before it can be used. There are static methods and dynamic methods, and dynamic methods are more common.

The dynamic method uses kfifo_alloc macro

#define kfifo_alloc(fifo, size, gfp_mask) \

    __kfifo_int_must_check_helper( \

    ({ \

        typeof((fifo) + 1) __tmp = (fifo); \

        struct __kfifo *__kfifo = &__tmp->kfifo; \

        __is_kfifo_ptr(__tmp) ? \

        __kfifo_alloc(__kfifo, size, sizeof(*__tmp->type), gfp_mask) : \

        -EINVAL; \

    }) \

    )

         The macro has three parameters. The first parameter is the pointer to the structure kfifo, the second parameter size is the size of the queue you want to create and initialize, and the third parameter is the identification of memory allocation. Don't press the table below for the time being. If the application is successful, it returns 0; otherwise, it returns a negative number.

         The use of this macro is based on the Union in the kfifo structure__ Kfifo can be used normally. Adding + 1 to the pointer can ensure that the queue pointer is passed in instead of the first address of an array. Call__ kfifo_alloc function to apply for space. Sizeof (* _tmp - > type) in this function reflects the type buf[0] in the above structure; The role of unit space size. Kfifo structure is more flexible

The following is an example of normal use of this macro:

    struct kfifo fifo;

    int ret = 0;

    ret = kfifo_alloc(&fifo, PAGE_SIZE, GFP_KERNEL);

    if(ret){

        return ret;

    }

Now fifo represents a size of PAGE_SIZE queue

The second method is as follows,

         Buffer allocation function kfifo_init can create and initialize a kfifo object and use the memory space of size bytes pointed by the buffer pointer. This function allows users to allocate memory freely

#define kfifo_init(fifo, buffer, size) \

({ \

        typeof((fifo) + 1) __tmp = (fifo); \ To prevent mistransmission of other types, it must be a pointer

        struct __kfifo *__kfifo = &__tmp->kfifo; \

        __is_kfifo_ptr(__tmp) ? \

        __kfifo_init(__kfifo, buffer, size, sizeof(*__tmp->type)) : \

        -EINVAL; \

})

There are also methods for statically declaring kfifo

DECLARE_KFIFO(name,size);

INIT_KFIFO(name);

         The two macros should be used together. After initialization, the buffer of kfifo is in the buf array in the structure kfifo, and the size is the power of 2 as before. Why is the mask size - 1 to the power of 2?

         Because this is a ring queue, when calculating the module in the future, use mask direct logic to get the result with size as the module, so as to improve the speed.

Push queue data

After the kfifo object is created and initialized, pushing data into the queue needs to be completed through methods.

The macro kfifo is used in the 4.1 kernel_ In (FIFO, buf, n), a queue function is used in the 2.6 kernel

unsigned int kfifo_in(struct kfifo *fifo, const void *from, unsigned int len);

The first parameter is the queued object pointer fifo, the second parameter is the starting pointer of the queued data, and the third parameter len is to copy the len byte data pointed by the from pointer to the queue. If successful, the byte size of the queued data is returned. If the free space in the queue is less than len, the data of the available space in the queue can be copied at most. In this case, the return value may be less than len or even 0, which means that there is no data in the queue.

#define kfifo_in(fifo, buf, n) \

({ \

    typeof((fifo) + 1) __tmp = (fifo); \ 

    typeof(__tmp->ptr_const) __buf = (buf); \

    unsigned long __n = (n); \

    const size_t __recsize = sizeof(*__tmp->rectype); \

    struct __kfifo *__kfifo = &__tmp->kfifo; \

    (__recsize) ?\

    __kfifo_in_r(__kfifo, __buf, __n, __recsize) : \

    __kfifo_in(__kfifo, __buf, __n); \

})

Judge the remaining unused space is not enough len, and then call kfifo_copy_in join the team

unsigned int __kfifo_in(struct __kfifo *fifo, const void *buf, unsigned int len)

{
    unsigned int l;

    l = kfifo_unused(fifo);

    if (len > l)

        len = l;

    kfifo_copy_in(fifo, buf, len, fifo->in);

    fifo->in += len;

    return len;

}
__kfifo_poke_n internal helper function for storeing the length of  the record into the fifo


unsigned int __kfifo_in_r(struct __kfifo *fifo, const void *buf,

                unsigned int len, size_t recsize)

{

    if (len + recsize > kfifo_unused(fifo))

        return 0;

    __kfifo_poke_n(fifo, len, recsize);
    
    kfifo_copy_in(fifo, buf, len, fifo->in + recsize);

    fifo->in += len + recsize;

    return len;

}
static void kfifo_copy_in(struct __kfifo *fifo, const void *src,

                unsigned int len, unsigned int off)

{
    unsigned int size = fifo->mask + 1;

    unsigned int esize = fifo->esize;

    unsigned int l;
    
    off &= fifo->mask;

    if (esize != 1) {

        off *= esize;

        size *= esize;

        len *= esize;

    }

    l = min(len, size - off);

    memcpy(fifo->data + off, src, l);
    
    memcpy(fifo->data, src + l, len - l);

    /*

    * make sure that the data in the fifo is up to date before

    * incrementing the fifo->in index counter

    */

    smp_wmb(); //Memory barrier

}

Similarly, another way to join the team is kfifo_put, without the third parameter, defaults to 1

Through smp_wmb() guarantees to write data to the buffer before modifying the in index

#define kfifo_put(fifo, val) \

({ \

        typeof((fifo) + 1) __tmp = (fifo); \

        typeof(*__tmp->const_type) __val = (val); \

        unsigned int __ret; \

        size_t __recsize = sizeof(*__tmp->rectype); \

        struct __kfifo *__kfifo = &__tmp->kfifo; \

        if (__recsize) \

                __ret = __kfifo_in_r(__kfifo, &__val, sizeof(__val), \

                        __recsize); \

        else { \

                __ret = !kfifo_is_full(__tmp); \

                if (__ret) { \

                        (__is_kfifo_ptr(__tmp) ? \

                        ((typeof(__tmp->type))__kfifo->data) : \

                        (__tmp->buf) \

                        )[__kfifo->in & __tmp->kfifo.mask] = \

                                (typeof(*__tmp->type))__val; \

                        smp_wmb(); \

                        __kfifo->in++; \

                } \

        } \

        __ret; \

})

Extract queue data

         Extract data using kfifo_out(struct kfifo* fifo, void* to, unsigned int len). This function copies the data with the length of len from the queue pointed to by fifo to the buffer pointed to by to. If it succeeds, the copy length is returned. If the data size in the queue is less than len, the number of copied data must be less than len.

#define kfifo_out(fifo, buf, n) \

__kfifo_uint_must_check_helper( \

({ \
        typeof((fifo) + 1) __tmp = (fifo); \      \\Temporary pointer to the queue structure

        typeof(__tmp->ptr) __buf = (buf); \    \\Pointer to temporary destination buffer

        unsigned long __n = (n); \             \\Length of extracted data

        const size_t __recsize = sizeof(*__tmp->rectype); \

        struct __kfifo *__kfifo = &__tmp->kfifo; \

        (__recsize) ?\

        __kfifo_out_r(__kfifo, __buf, __n, __recsize) : \

        __kfifo_out(__kfifo, __buf, __n); \

}) \

)

Basis__ recsize determines which function to execute

unsigned int __kfifo_out_r(struct __kfifo *fifo, void *buf,

                unsigned int len, size_t recsize)

{
        unsigned int n;

        if (fifo->in == fifo->out)

                return 0;

        len = kfifo_out_copy_r(fifo, buf, len, recsize, &n);

        fifo->out += n + recsize;

        return len;
}
static unsigned int kfifo_out_copy_r(struct __kfifo *fifo,

        void *buf, unsigned int len, size_t recsize, unsigned int *n)

{       
        *n = __kfifo_peek_n(fifo, recsize);

        if (len > *n)

            len = *n;

        kfifo_copy_out(fifo, buf, len, fifo->out + recsize);

        return len;

}
/*

 * __kfifo_peek_n internal helper function for determinate the length of

 * the next record in the fifo

 */

static unsigned int __kfifo_peek_n(struct __kfifo *fifo, size_t recsize)

{       

        unsigned int l;

        unsigned int mask = fifo->mask;

        unsigned char *data = fifo->data;

        l = __KFIFO_PEEK(data, fifo->out, mask);

        if (--recsize)

                l |= __KFIFO_PEEK(data, fifo->out + 1, mask) << 8;

        return l;

}
static void kfifo_copy_out (struct __kfifo *fifo, void *dst,

                unsigned int len, unsigned int off)
{
        unsigned int size = fifo->mask + 1;

        unsigned int esize = fifo->esize;

        unsigned int l;

        off &= fifo->mask;

        if (esize != 1) {

                off *= esize;

                size *= esize;

                len *= esize;

        }

        l = min(len, size - off);

        memcpy(dst, fifo->data + off, l);

        memcpy(dst + l, fifo->data, len - l);

        /*

         * make sure that the data is copied before

         * incrementing the fifo->out index counter

         */

        smp_wmb();

}

Single value dequeue

#define kfifo_get(fifo, val) \

__kfifo_uint_must_check_helper( \

({ \

        typeof((fifo) + 1) __tmp = (fifo); \

        typeof(__tmp->ptr) __val = (val); \

        unsigned int __ret; \

        const size_t __recsize = sizeof(*__tmp->rectype); \

        struct __kfifo *__kfifo = &__tmp->kfifo; \

        if (__recsize) \

            __ret = __kfifo_out_r(__kfifo, __val, sizeof(*__val), \

            __recsize); \

        else { \

            __ret = !kfifo_is_empty(__tmp); \

            if (__ret) { \

                *(typeof(__tmp->type))__val = \

                (__is_kfifo_ptr(__tmp) ? \

                ((typeof(__tmp->type))__kfifo->data) : \

                (__tmp->buf) \

                )[__kfifo->out & __tmp->kfifo.mask]; \

                smp_wmb(); \

                __kfifo->out++; \

            } \

        } \
        __ret; \
}) \

)

Get queue length

Call kfifo_ The size method obtains the overall size, in bytes, of the space used to store kfifo queues

#define kfifo_size(fifo)  ((fifo)->kfifo.mask + 1)

Call kfifo_ The len method returns the pushed queue data size

#define kfifo_len(fifo) \

({ \

   typeof((fifo) + 1) __tmpl = (fifo); \

   __tmpl->kfifo.in - __tmpl->kfifo.out; \

})

If you want to know how much free space is left in the kfifo queue, call the method

#define>kfifo_avail(fifo) \↩

    __kfifo_uint_must_check_helper( \↩

    ({ \↩

        typeof((fifo) + 1) __tmpq = (fifo); \↩

        const size_t __recsize = sizeof(*__tmpq->rectype); \↩

        unsigned int __avail = kfifo_size(__tmpq) - kfifo_len(__tmpq); \↩

        (__recsize) ? ((__avail <= __recsize) ? 0 : \↩

        __kfifo_max_r(__avail - __recsize, __recsize)) : \↩

        __avail; \↩

    }) \↩

)↩

Method to judge whether the queue is empty or full

#define kfifo_is_empty(fifo) \

({ \

    typeof((fifo) + 1) __tmpq = (fifo); \

    __tmpq->kfifo.in == __tmpq->kfifo.out; \

})
#define>kfifo_is_full(fifo) \

({ \

    typeof((fifo) + 1) __tmpq = (fifo); \

    kfifo_len(__tmpq) > __tmpq->kfifo.mask; \

})

Resetting and undoing queues

Resetting the queue means clearing all contents in the current queue and calling kfifo_reset method

/* kfifo_reset - removes the entire fifo content↩

* @fifo: address of the fifo to be used↩

*

* Note: usage of kfifo_reset() is dangerous. It should be only called when the↩

* fifo is exclusived locked or when it is secured that no other thread is↩

* accessing the fifo.↩Either locked or accessed by no other process

*/

#define kfifo_reset(fifo) \

    (void)({ \↩

        typeof((fifo) + 1) __tmp = (fifo); \↩

        __tmp->kfifo.in = __tmp->kfifo.out = 0; \↩

    })↩
/**↩

* kfifo_reset_out - skip fifo content↩

* @fifo: address of the fifo to be used↩

*↩

* Note: The usage of kfifo_reset_out() is safe until it will be only called↩

* from the reader thread and there is only one concurrent reader. Otherwise↩

* it is dangerous and must be handled in the same way as kfifo_reset().↩

*/↩As long as it is read by a linear reader and there is only one concurrent reader

#define kfifo_reset_out(fifo)>--\↩

    (void)({ \↩

        typeof((fifo) + 1) __tmp = (fifo); \↩

        __tmp->kfifo.out = __tmp->kfifo.in; \↩

    })↩

         Undo a use kfifo_ Allocate the allocated queue and call kfifo_free method, if kfifo is used_ If the queue is created by init method, the relevant cache needs to be released.

#define kfifo_free(fifo) \

({ \↩

    typeof((fifo) + 1) __tmp = (fifo); \↩

    struct __kfifo *__kfifo = &__tmp->kfifo; \↩

    if (__is_kfifo_ptr(__tmp)) \↩

        __kfifo_free(__kfifo); \↩

})
void __kfifo_free(struct __kfifo *fifo)↩
{

    kfree(fifo->data);↩

    fifo->in = 0;↩

    fifo->out = 0;↩

    fifo->esize = 0;↩

    fifo->data = NULL;↩

    fifo->mask = 0;

 }↩

3. Binary tree

         In the mathematical sense, tree structure is an acyclic connected digraph, in which each node has 0 or more outgoing edges or 0 or 1 incoming edges. Binary tree is that each node has at most two outgoing edges, that is, 0, 1 or 2 child nodes

Binary search tree

         Binary search tree   As a classical data structure, BST(Binary Search Tree) not only has the characteristics of fast insertion and deletion of linked list, but also has the advantage of fast array search; Therefore, it is widely used. For example, this data structure is generally used in file system and database system for efficient sorting and retrieval.

         In the binary search tree:

         1. If the left subtree of any node is not empty, the value of all nodes on the left subtree is not greater than the value of its root node.

         2. If the right subtree of any node is not empty, the values of all nodes on the right subtree are not less than the values of its root node.

         3. The left and right subtrees of any node are also binary search trees.

         No matter what kind of operation is searching, deleting and inserting, the time spent is directly proportional to the height of the tree. Therefore, if there are n elements in total, the average time required for each operation is O(log(n))

Self balanced binary search tree

Balanced binary tree has the following properties:

        1. It can be an empty tree;

        2. If it is not an empty tree, the left and right subtrees of any node are balanced binary trees, and the absolute value of the difference in height does not exceed 1.

         It can also be a binary search tree in which the absolute value of the depth difference of all leaf nodes does not exceed 1

         Self balanced binary search tree is a binary search tree in which all operations on the tree attempt to maintain balance or semi balance

Red black tree

         Red black tree is a self balancing binary search tree. The main balanced binary tree data structure of Linux is red black tree. Red black tree has the following six characteristics:

  1. There are only two colors for all nodes, either red or black;
  2. The root node is black
  3. All leaf nodes are black;
  4. The leaf node does not contain data;
  5. All non leaf nodes have two child nodes;
  6. If a node is red, its child nodes are black;
  7. If the path from a node to its leaf node always contains the same number of black nodes, the path is the shortest compared with other paths.

         This is a semi balanced binary tree. The deepest leaf node depth will not exceed twice the shallowest leaf node depth

Resolution:

  1. obviously
  2. obviously
  3. Leaf nodes are black and are usually recorded as NULL nodes
  4. All nodes except leaf nodes store data
  5. Each time a node with data (non leaf node) is added or deleted, the NULL node is automatically supplemented
  6. obviously
  7. The other definition is that the simple path from any node to each leaf contains the same number of black nodes. This property is very complex and has little to do with the LInux kernel data structure, which is omitted here.

rbtree

         The red black tree implemented by the Linux kernel is called rbtree, which is defined in the file lib/rbtree.c and declared in rbtree.h. In addition to some optimization, rbtree is similar to the red black tree described above, and maintains a certain balance. Its insertion efficiency is logarithmic with the number of nodes in the tree, O(log(n))

         The implementation of rbtree in Linux kernel does not provide search and insertion routines. These routines want to be defined by rbtree users. This is because C language is not easy to carry out generic programming. At the same time, Linux kernel developers believe that the most effective search and insertion methods need to be implemented by users themselves. You can use the auxiliary functions provided by rbtree, But you have to implement the comparison operator yourself.

         The root node of rbtree consists of the data structure RB_ Root describes that to create a red black tree, we need to create a new RB_ Root and initialize to the special value RB_ ROOT:

struct rb_root root = RB_ROOT;

struct rb_root {

  struct rb_node *rb_node;

};

#define RB_ROOT (struct rb_root) { NULL, }

Other nodes in the tree are represented by rb_node structure description

struct rb_node {

    unsigned long  __rb_parent_color;

    struct rb_node *rb_right;

    struct rb_node *rb_left;

 } __attribute__((aligned(sizeof(long))));

         Take an actual scenario. Search a file area in the page cache (described by an i node and an offset). Each I node has its own page offset associated with the rbtree in the file.

static inline struct page * rb_search_page_cache(struct inode * inode,

     unsigned long offset)

{

    struct rb_node * n = inode->i_rb_page_cache.rb_node;↩

    struct page * page;↩

    while (n)↩

    {
        page = rb_entry(n, struct page, rb_page_cache);↩

        if (offset < page->offset)↩

            n = n->rb_left;↩

        else if (offset > page->offset)↩

            n = n->rb_right;↩

        else↩

            return page;↩

    }

    return NULL;↩
}

#define rb_entry(ptr, type, member) container_of(ptr, type, member)
static inline struct page * rb_insert_page_cache(struct inode * inode,

     unsigned long offset,struct rb_node * node)

{↩

    struct rb_node ** p = &inode->i_rb_page_cache.rb_node;↩

    struct rb_node * parent = NULL;↩

    struct page * page;↩

    while(*p){
        
        parent = *p;

        page = rb_entry(parent, struct page, rb_page_cache);

        if (offset < page->offset)

            p = &(*p)->rb_left;
    
        else if (offset > page->offset)

            p = &(*p)->rb_right;

        else↩

            return page;

    }

        rb_link_node(node, parent, p);

        rb_insert_color(node,&inode->i_rb_page_cache);

        return NULL;

}

         The above is the operation of searching a file or inserting a node in the page cache of a previous version of Linux kernel. The 4.1 kernel has used the update method. Here, use this example to briefly illustrate the use of rbtree.

         Firstly, I contained in inode structure is analyzed_ rb_ page_ There is a red black tree node structure Rb in the cache structure_ node; Rb in page structure_ page_ chache;

         Second, use rb_entry method, from the member structure RB_ page_ Based on the relationship between chache and page, the pointer n of the member structure is used to deduce the pointer of the page structure, because next, the page offset offset is used to find the corresponding page structure. The same is true for insertion

        rb_ The use of node structure is actually the same as chain header list_ The use method of head structure is similar:

         List_ After the head inserts the structure to be added to the linked list, use the list provided by the kernel_ Head adds, deletes, traverses, moves and merges node methods, and so on; Then from the list_ The head pointer is pushed back to the original structure pointer. The important thing in the linked list is the use of node information, not just the operation of nodes_ The operation of head can actually pass the information to the node. Similarly, when processing some computer information, if you need to use the red black tree structure to preprocess these information, you can also add RB to it when building the structure_ Node structure. For this structure, use the designed search, add, delete or left-right rotation of the kernel to maintain the red black tree. When actually processing the business, find the corresponding business information through the red black tree node to improve the processing efficiency.

4. Mapping

         A map, also known as an associative array, is a collection of unique keys, and each key must be associated with a specific value. This key to value association is called mapping.

         The mapping must support at least three operations:

        *Add (key,value)

        *Remove(key)

        *value=Lookup(key)

         The Linux kernel provides a simple and effective mapping data structure. But it is not a universal mapping.

         Its goal is to map a unique identifier (UID) to a pointer.

         In addition to the three standard operations, Linux also provides an allocate operation based on add, which can not only add key value pairs to the map, but also generate UID s.

         The idr data structure is used to map UID s in user space

Initialize an idr

         Building a IDR is simple. First, static definition or dynamic allocation of a IDR data structure, then call idr_. Init function;

struct idr id_huh;

idr_init(&id_huh);

struct idr {

    struct idr_layer __rcu *hint; /* the last layer allocated from */

    struct idr_layer __rcu *top;

    int layers;/* only valid w/o concurrent changes */

    int cur;/* current pos for cyclic allocation */

    spinlock_t lock;

    int id_free_cnt;

    struct idr_layer *id_free;

};
void idr_init(struct idr *idp)
{

    memset(idp, 0, sizeof(struct idr));

    spin_lock_init(&idp->lock);

}

Assign a new UID

After the idr is established, you can assign a new UID in two steps:

         The first step is to tell idr that a new UID needs to be allocated, allowing the size of the backup tree to be adjusted if necessary; The second step is to really let you go to the new UID.

         The first way to resize the backup tree is idr_pre_get():

         The second function obtains the new UID and adds it to the method idr of idr_get_new();

int idr_get_new( struct idr* idp, void *ptr, int *id);

         This method allocates a UID using the idr pointed to by idp and associates it with the pointer ptr. When successful, it returns 0 and stores the new UID in id. When an error occurs, a non-0 error code is returned. If the error code is - EAGAIN, it indicates that idr needs to be called again_ pre_ Get, if the error code - ENOSPC is returned, the idr is full;

         Here is a specific example

    int id;

    do{

        if(!idr_pre_get(&idr_huh,GFP_KERNEL))

            return -ENOSPC;

        ret = idr_get_new(&idr_huh,ptr,&id);

    }while(ret == -EAGAIN)

         Function idr_get_new_above() allows the caller to specify a minimum UID return value, which is the same as idr_get_new is the same, except that it ensures that the new UID is greater than or equal to a specified value. Using this variant method can ensure that the UID will not be reused. Its value is not only unique in the currently allocated ID, but also unique during the whole operation of the system.

For example:

    int id;

    do{

        if(!idr_pre_get(&idr_huh,GFP_KERNEL))

            return -ENOSPC;
        
        ret = idr_get_new(&idr_huh,ptr,next_id,&id);

    }while(ret == -EAGAIN)

    if(!ret)

        next_id = id + 1;

Find UID

         When we allocate some UIDs in an idr, we need to find them when necessary: the caller gives the UID, and idr returns the corresponding pointer. Naturally, searching is simpler than allocating a new UID. You can call idr directly_ Find method:

void *idr_find(struct idr *idp, int id)

          If the function is called successfully, it returns the corresponding associated pointer; otherwise, it returns a NULL pointer. Therefore, if you call idr_ before get_ New or idr_get_new_above maps a NULL pointer to a UID, and the function will return a NULL pointer even if it is executed successfully. The use method is relatively simple

    struct my_struct *ptr = idr_find(&idr_huh,id);

    if(!ptr)

        return -EINVAL;

Delete UID

         The method of deleting UID from idr is idr_remove();

void idr_remove(struct idr* idp, int id);

       If IDR_ If remove succeeds, the pointer associated with the id will be deleted from the map, but there is no way to return the error code. For example, id is not in idp.

Undo UID

         The operation of revoking an idr is very simple. Call idr_destroy function. If this method succeeds, only unused memory in idr will be released. It does not free any memory currently allocated for UID use.

         If the method succeeds, the unused memory in the idr is released, but any memory currently allocated to the UID is not released. Usually, the kernel code will not directly revoke idr unless it is shut down and uninstalled, and it can be deleted only when there are no other users, but idr can be called_ remove_ The all method forcibly deletes all UIDs, and then calls idr_ The destroy method frees all the memory occupied by idr.

5. Selection of data structure

         If the main operation on the data set is to traverse the data, use the linked list. No other data structure can provide algorithm traversal elements with better complexity than linear algorithm, so the simplest data structure should be selected to complete the simple work. In addition, when performance is not the primary consideration, or when there are relatively few data items to store, or when you need to interact with other code using linked lists in the kernel, you should also choose linked lists.

         If your code conforms to the producer and consumer model, use queues, especially if you want a fixed length buffer. Queue makes adding and deleting items simple and effective. Colleague queue also provides FIFO semantics, which is the common demand of producer consumer model. But if you want to store a data set of unknown size, you'd better use the linked list to dynamically add and delete data.

         If you need to map a UID to an object, use mapping. The mapping structure is simple and effective, and mapping can help you maintain and allocate UIDs. The mapping interface of Linux is for UID to pointer mapping, which is not suitable for other scenarios. If you are dealing with descriptors sent to user space, consider using mappings.

         If you need to store a large amount of data and retrieve it quickly, you should use red black tree. The red black tree can ensure that the search time complexity is logarithmic, and the warrior can also ensure that the time complexity of sequential traversal is linear. Although it is complex, the memory overhead is not very poor. But if you don't perform too many time critical lookups, use a linked list.

         If the above data structures can not meet the requirements, the kernel also implements some less used data structures, such as base tree and bitmap. Only when all the data structures provided by the kernel can not meet the requirements, can you design it yourself. A common data structure often implemented in independent source files is hash tables. Because the hash table is nothing more than some buckets and a hash function.

6. Algorithm complexity

         Large O symbol and large Θ Symbols, one is the upper limit and the other is the upper and lower limits

         f(n) is O(g(n)), indicating that there is a constant c. for any n, there is always f(n) < = c * g (n)

         f(n) yes Θ (g(n)) indicates that there are two constants A and B. for any n, there is always a * g (n) < = f(n) < = b * g (n)

       O(1) O(log(n)) O(n) O(n^2) O(n^3) O(2^n) O(n!)

         Obviously, the use of the big O symbol is not like the big O symbol Θ The symbol is just as accurate because it does not characterize the lower bound of the algorithm time complexity. But in practice, when we use the big O symbol, it is regarded as big Θ Symbol usage.

         Although we do not approve of using algorithms with high time complexity, we should always pay attention to the relationship between the algorithm load and the size of typical input sets, and do not blindly optimize the algorithm for scalability requirements that do not need to be supported. At this point, time complexity and space complexity should be considered comprehensively. It is possible that an algorithm with time complexity O(1) always takes several hours no matter how large the input is, which is not necessarily better than an algorithm with complexity O(n) but few inputs.

reference material:

        1. Design and implementation of Linux kernel, the third edition, by Robert Love, translated by Chen Lijun and Kang Hua;

        two   Linux 4.1.25 kernel

appendix

The appendix collects some functions in the previous part. If placed in the front, it will affect the reading experience, so I put some fragmented relevant code in the appendix.

int __kfifo_alloc(struct __kfifo *fifo, unsigned int size,size_t esize, gfp_t gfp_mask)

{

 /*  

 * round down to the next power of 2, since our 'let the indices

 * wrap' technique works only in this case.

 */

        size = roundup_pow_of_two(size);//Not less than the power of 2 of size

        fifo->in = 0;

        fifo->out = 0;

        fifo->esize = esize;

        if (size < 2) {

                fifo->data = NULL;

                fifo->mask = 0;

                return -EINVAL;

        }   

        fifo->data = kmalloc(size * esize, gfp_mask);

        if (!fifo->data) {

                fifo->mask = 0;

                return -ENOMEM;

        }   

        fifo->mask = size - 1;

        return 0;

}
#define roundup_pow_of_two(n)                   \

(                                               \
        __builtin_constant_p(n) ? (             \

                (n == 1) ? 1 :                  \

                (1UL << (ilog2((n) - 1) + 1))   \

                                   ) :          \

        __roundup_pow_of_two(n)            \ 
)

The function is to obtain a number not less than n to the power of 2, and the functions of the two paths of the built-in function are the same.

  Built in function of Gcc__ builtin_constant_p(n) is used to judge whether a value is constant during compilation. If the value of the parameter is constant, the function returns 1, otherwise it returns 0.

 unsigned long __roundup_pow_of_two(unsigned long n)
{

    return 1UL << fls_long(n - 1);

}
static inline unsigned fls_long(unsigned long l)
{

    if (sizeof(l) == 4)

        return fls(l);

    return fls64(l);

}

This function is equivalent to an interface. According to different systems, the data of long type is either 4 or 8

static __always_inline int fls(int x)
{

    int r = 32;

    if (!x)

        return 0;

    if (!(x & 0xffff0000u)) {

        x <<= 16;

        r -= 16;

    }

    if (!(x & 0xff000000u)) {

        x <<= 8;

        r -= 8;

    }

    if (!(x & 0xf0000000u)) {

        x <<= 4;

        r -= 4;

    }

    if (!(x & 0xc0000000u)) {

        x <<= 2;

        r -= 2;

    }

    if (!(x & 0x80000000u)) {

        x <<= 1;

        r -= 1;

    }

    return r;

}

The function returns the initial significant bit of the input parameter. In 32 bits, it is counted from right to left, and the starting bit is 1 instead of 0

#define DECLARE_KFIFO(fifo, type, size) STRUCT_KFIFO(type, size) fifo
#define STRUCT_KFIFO(type, size) struct __STRUCT_KFIFO(type, size, 0, type)
#define __STRUCT_KFIFO(type, size, recsize, ptrtype) \
{ \
    __STRUCT_KFIFO_COMMON(type, recsize, ptrtype); \

    type buf[((size < 2) || (size & (size - 1))) ? -1 : size]; \
}

#define __STRUCT_KFIFO_COMMON(datatype, recsize, ptrtype) \

        union { \

                struct __kfifo  kfifo; \

                datatype        *type; \

                const datatype  *const_type; \

                char            (*rectype)[recsize]; \

                ptrtype         *ptr; \

                ptrtype const   *ptr_const; \

        }
#define kfifo_in(fifo, buf, n) \
({ \

        typeof((fifo) + 1) __tmp = (fifo); \

        typeof(__tmp->ptr_const) __buf = (buf); \

        unsigned long __n = (n); \

        const size_t __recsize = sizeof(*__tmp->rectype); \

        struct __kfifo *__kfifo = &__tmp->kfifo; \

        (__recsize) ?\

        __kfifo_in_r(__kfifo, __buf, __n, __recsize) : \

        __kfifo_in(__kfifo, __buf, __n); \

})
#define __KFIFO_PEEK(data, out, mask) ((data)[(out) & (mask)])

Tags: Linux data structure linked list kernel

Posted on Sat, 18 Sep 2021 22:36:44 -0400 by thiscatis