The creation and derivation of learning note process and thread in Linux operating system

1, Foreword

In the previous article, we analyzed the unified structure task of processes and threads in the kernel_ Struct, this article will continue to analyze the process, thread creation and derivation process. First, it introduces how to edit a program into an execution file and finally into a process execution. Then it introduces the execution of threads. Finally, it analyzes how to realize multi processes and multi threads through existing processes and threads. Because there are many similarities and differences between processes and threads, this paper will compare processes and threads to deepen understanding and memory.

2, Process creation

                         . The process of creating this process will be expanded in detail below. Under Linux, binary programs should also have strict format, which we call ELF (Executable and Linkable Format). This format can be divided into different formats according to different compilation results. Mainly including

  1. Relocatable file

    . o files generated by assembler assembly

  2. Executable file

    Executable application

  3. Shared object file

    Dynamic library file, that is. so file

    Three types of files are described in detail in the process creation process.

2.1 compilation

                      . The compilation instructions are as follows

gcc -c -fPIC xxxx.c

   - c means compiling and assembling the specified source file without linking. -fPIC means to generate position independent code, that is, to use relative address instead of absolute address, so as to meet the loading requirements of shared library. When compiling, do preprocessing work first, such as embedding the header file into the body, expanding the defined macro, and then the real compilation process, and finally compiling it into an. o file. This is the first type of ELF, Relocatable File. It is called Relocatable File because compiled code and variables will be loaded to a certain location when they are loaded into memory in the future. For example, to call a function is to jump to the code location where the function is executed, and to modify a global variable is to modify the location of the variable. But at this time, it's still a. o file, not a program that can run directly. It's just some code fragments. Therefore, the position in. o is uncertain, but it must be relocatable to meet the needs.

The header of an ELF file is used to describe the entire file. This file format is defined in the kernel as struct elf32_hdr and struct elf64_hdr. The functions of other section s are as follows

  • . text: put compiled binary executable code
  • . rodata: read only data, such as string constant, const variable
  • . data: initialized global variables
  • . bss: the global variable is not initialized, and the runtime will set 0
  • . symtab: symbol table, which records functions and variables
  • . rel.text :. Repositioning table for text section
  • . rel.data :. Relocation table of data section
  • . strtab: string table, string constant and variable name

   the metadata information of these sections also needs to be saved in a place, that is, the last Section Header Table. In this table, each section has an entry, and struct ELF32 is also defined in the code_ Shdr and struct elf64_shdr. In the ELF header, there is information about the location of the Section Header Table of this file, how many table items there are, and so on.

2.2 links

   links are divided into static links and dynamic links. The static link library and the target file will generate an executable file through the link, while the dynamic link will form a dynamic connector through the link, and dynamically select and load some or all of the functions when the executable file is executed. Their advantages and disadvantages are as follows

  • Advantages of static link library

    (1) Code loading speed is fast, execution speed is slightly faster than dynamic link library;

    (2) It is only necessary to ensure that there are correct. LIB files in the developer's computer. When publishing programs in binary form, it is not necessary to consider the existence and version of. LIB files on the user's computer, which can avoid DLL hell and other problems.

  • Disadvantages of static link library

    The executable generated by using static link is large in size and contains the same common code, causing waste

  • Advantages of dynamic link library

    (1) Save more memory and reduce page exchange;

    (2) DLL files are independent of EXE files. As long as the output interface remains unchanged (i.e. the name, parameter, return value type and calling convention remain unchanged), changing DLL files will not have any impact on EXE files, thus greatly improving maintainability and scalability;

    (3) Programs written in different programming languages can call the same DLL function as long as they follow the function calling convention;

    (4) It is suitable for large-scale software development, making the development process independent and less coupling, and convenient for development and testing between different developers and development organizations.

  • Disadvantages of DLL

    The application program using dynamic link library is not self-contained, and the DLL module it depends on should also exist. If dynamic link is used when loading, and the DLL is found not to exist when the program starts, the system will terminate the program and give an error message. Using runtime dynamic link, the system will not terminate, but because the export function in DLL is not available, the program will fail to load; the speed is slower than static link. When a module is updated, if the new module is not compatible with the old module, the software that needs the module to run cannot be executed. This was common in early Windows.

   static links and dynamic links are introduced respectively

2.2.1 static link

The execution instructions for the static link library. a file (Archives) are as follows

ar cr libXXX.a XXX.o XXXX.o 

   when you need to use the static library, the. o file will be extracted from the. a file in turn and linked to the program, with the following instructions

gcc -o XXXX XXX.O -L. -lsXXX

    - L means to find the. A file in the current directory, - lsXXXX will automatically complete the file name, such as adding the prefix lib, suffix. A, and changing it into libXXX.a. after finding the. A file, take out XXXX.o in it, and make a link with XXX.o to form the binary execution file XXXX. Here, relocation will extract the function from. o and merge it with the function extracted from. A to find the actual call location and form the final executable file, which is the second format file of ELF.

                     . Here, ELF file is divided into code segment, data segment and the part not loaded into memory, and Segment Header Table is added for record management, which is defined as struct Elf32 in the code_ Phdr and struct elf64_phdr, in addition to the description of the segment, the most important thing is p_vaddr, this is the virtual address of this segment loaded into memory. This section will be described in detail in the memory chapter.

2.2.2 dynamic link

The main purpose of shared libraries is to solve the problem that a large number of static links will cause space waste, so it is designed to be shared by multiple programs. The execution commands are as follows

gcc -shared -fPIC -o libXXX.so XXX.o

   when a dynamic link library is linked to a program file, the final program file does not include the code in the dynamic link library, but only includes the reference to the dynamic link library, and does not save the full path of the dynamic link library, only the name of the dynamic link library.

gcc -o XXX XXX.O -L. -lXXX

   when running this program, first look for the dynamic link library, and then load it. By default, the system looks for dynamic link libraries in the / lib and / usr/lib folders. If we can't find it, an error will be reported. We can set LD_LIBRARY_PATH environment variable. The program runtime will look for the dynamic link library under the folder specified by this environment variable. Dynamic link library is the third type of ELF, Shared Object file.

   the ELF of dynamic link has the following parts more than that of static link

  • . interp section, LD inside- linux.so , responsible for link actions at run time
  • . plt (Procedure Linkage Table)
  • . got.plt Global Offset Table

When the program is compiled, a new item will be created in PLT for each function, such as PLT[n], while the actual address of the function will be stored in the dynamic library, which is recorded as GOT[m]. The overall addressing process is as follows

  1. PLT[n] seeks address from GOT[m]
  2. There is no address at the beginning of GOT[m], so we need to take the following methods to obtain the address
    1. Callback PLT[0]
    2. PLT[0] calls GOT[2], that is, ld-linux.so
    3. ld-linux.so Find the actual address of the desired function and store it in GOT[m]

Therefore, we establish the corresponding relationship between PLT[n] and GOT[m], thus realizing the dynamic link.

2.3 loading operation

                     . In the kernel, there is a data structure that defines how to load binaries.

struct linux_binfmt {
        struct list_head lh;
        struct module *module;
        int (*load_binary)(struct linux_binprm *);
        int (*load_shlib)(struct file *);
        int (*core_dump)(struct coredump_params *cprm);
        unsigned long min_coredump;     /* minimal dump size */
} __randomize_layout;

   for ELF file format, its corresponding implementation is

static struct linux_binfmt elf_format = {
        .module         = THIS_MODULE,
        .load_binary    = load_elf_binary,
        .load_shlib     = load_elf_library,
        .core_dump      = elf_core_dump,
        .min_coredump   = ELF_EXEC_PAGESIZE,
};

                      . exec is a special set of functions:

  • The function containing P (execvp, execlp) will find the program under the PATH path; the function without p needs to input the full PATH of the program;
  • The functions containing v (execv, execvp, execve) receive parameters in the form of an array;
  • The functions containing l (execl, execlp, execle) receive parameters in the form of lists;
  • The function containing e (execve, execle) receives environment variables as an array.

   when we run the executable file through the shell or send a subclass through the fork, we load it through this kind of function.

3, User state of thread creation

The function corresponding to the creation of   thread is pthread_create(), thread is not a mechanism completely implemented by the kernel, it is completed by the cooperation of kernel state and user state. pthread_create() is not a system call, but a function of Glibc library, so let's start with Glibc. But before we start, let's mention * * that the same function will be used when the creation of a thread reaches the kernel state and the derivation of a process:__ do_fork() * *, which is also easy to understand, because for kernel state, thread and process are the same task_struct structure. This section describes the creation of threads in user mode, while the creation of kernel mode will be explained together with the derivation of processes.

Ntpl / pthread in Glibc_ It is defined in create. C __pthread_create_2_1() Function, which mainly performs the following operations

  1. Property parameters for the processing thread. For example, when writing a program, we set the thread stack size. If no thread property is passed in, the default value is taken.
const struct pthread_attr *iattr = (struct pthread_attr *) attr;
struct pthread_attr default_attr;
//c11 thrd_create
bool c11 = (attr == ATTR_C11_THREAD);
if (iattr == NULL || c11)
{
  ......
  iattr = &default_attr;
}
  1. Just like in the kernel, every process or thread has a task_struct structure also has a structure for maintaining threads in user state, which is pthread structure.
struct pthread *pd = NULL;
  1. All calls involving functions should use the stack. Each thread also has its own stack. The next step is to create a thread stack.
int err = ALLOCATE_STACK (iattr, &pd);

ALLOCATE_STACK Is a macro, corresponding function allocate_stack() mainly does the following things:

  • If the stack size is set in the thread attribute, the attribute value is taken out;
  • In order to prevent the stack access from exceeding the boundary, add a piece of space guardsize at the end of the stack. Once the stack is accessed, an error will be reported;
  • The thread stack is created in the process heap. If a process keeps creating and deleting threads, it is impossible for us to constantly apply for and clear the memory blocks used by the thread stack, so we need a cache. get_cached_stack is based on the calculated size to see if the existing cache can meet the conditions. If there is no cache, you need to call__ MMAP creates a new cache. As we mentioned in the system call section, if you want to malloc a block of memory in the heap, use__ mmap;
  • The thread stack also grows from top to bottom. Each thread should have a pthread structure, which is also placed in the stack space. The location at the bottom of the stack is actually the highest address;
  • Calculate the location of guard memory and call setup_stack_prot sets the memory to be protected;
  • Fill in the member variables of pthread structure, including stackblock and stackblock_size,guardsize,specific. Here, specific is used to store Thread Specific Data, that is, global variables belonging to threads;
  • Put the thread stack in the stack_ In the used list, there are two lists in the management thread stack, one is stack_used, that is, the stack is being used; the other is stack_cache, as mentioned above, once the thread is finished, it will be cached first and not released. When other threads are created, it will be used by other threads.
# define ALLOCATE_STACK(attr, pd) allocate_stack (attr, pd, &stackaddr)

static int
allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
                ALLOCATE_STACK_PARMS)
{
  struct pthread *pd;
  size_t size;
  size_t pagesize_m1 = __getpagesize () - 1;
......
  /* Get the stack size from the attribute if it is set.  Otherwise we
     use the default we determined at start time.  */
  if (attr->stacksize != 0)
    size = attr->stacksize;
  else
    {
      lll_lock (__default_pthread_attr_lock, LLL_PRIVATE);
      size = __default_pthread_attr.stacksize;
      lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
    }
......
  /* Allocate some anonymous memory.  If possible use the cache.  */
  size_t guardsize;
  void *mem;
  const int prot = (PROT_READ | PROT_WRITE
                   | ((GL(dl_stack_flags) & PF_X) ? PROT_EXEC : 0));
  /* Adjust the stack size for alignment.  */
  size &= ~__static_tls_align_m1;
  /* Make sure the size of the stack is enough for the guard and
  eventually the thread descriptor.  */
  guardsize = (attr->guardsize + pagesize_m1) & ~pagesize_m1;
  size += guardsize;
......    
  /* Try to get a stack from the cache.  */  
  pd = get_cached_stack (&size, &mem);
  if (pd == NULL)
  {
    /* If a guard page is required, avoid committing memory by first
    allocate with PROT_NONE and then reserve with required permission
    excluding the guard page.  */
    mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
      MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
    /* Place the thread descriptor at the end of the stack.  */
#if TLS_TCB_AT_TP
    pd = (struct pthread *) ((char *) mem + size) - 1;
#elif TLS_DTV_AT_TP
    pd = (struct pthread *) ((((uintptr_t) mem + size - __static_tls_size) & ~__static_tls_align_m1) - TLS_PRE_TCB_SIZE);
#endif
    /* Now mprotect the required region excluding the guard area. */
    char *guard = guard_position (mem, size, guardsize, pd, pagesize_m1);
    setup_stack_prot (mem, size, guard, guardsize, prot);
    pd->stackblock = mem;
    pd->stackblock_size = size;
    pd->guardsize = guardsize;
    pd->specific[0] = pd->specific_1stblock;
    /* And add to the list of stacks in use.  */
    stack_list_add (&pd->list, &stack_used);
  }
  
  *pdp = pd;
  void *stacktop;
# if TLS_TCB_AT_TP
  /* The stack begins before the TCB and the static TLS block.  */
  stacktop = ((char *) (pd + 1) - __static_tls_size);
# elif TLS_DTV_AT_TP
  stacktop = (char *) (pd - 1);
# endif
  *stack = stacktop;
...... 
}

4, Kernel state creation of thread and derivation of process

Multi process is a common way of program implementation. The system call is fork() function. The whole process of system call has been described in detail in the previous article. For fork(), the corresponding system call sys will be found in the system call table_ Fork completes the generation of subprocesses, while sys_fork will call _do_fork().

SYSCALL_DEFINE0(fork)
{
......
  return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);
}

About__ do_fork() first press no table, then look at the thread. Let's move on to pthread_create(). In fact, with the user state stack, what needs to be solved is where the user state program starts to run. start_routine() is the function for the thread, start_routine(), parameter arg, and scheduling policy should be assigned to pthread. Next up__ nptl_ Adding one to nthreads indicates that there is another thread.

pd->start_routine = start_routine;
pd->arg = arg;
pd->schedpolicy = self->schedpolicy;
pd->schedparam = self->schedparam;
/* Pass the descriptor to the caller.  */
*newthread = (pthread_t) pd;
atomic_increment (&__nptl_nthreads);
retval = create_thread (pd, iattr, &stopped_start, STACK_VARIABLES_ARGS, &thread_ran);

   what really creates a thread is to call create_thread() function, which is defined as follows. At the same time, it also specifies the location of the callback when the kernel state thread is created: start_thread().

static int
create_thread (struct pthread *pd, const struct pthread_attr *attr,
bool *stopped_start, STACK_VARIABLES_PARMS, bool *thread_ran)
{
  const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM | CLONE_SIGHAND | CLONE_THREAD | CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | 0);
  ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS, clone_flags, pd, &pd->tid, tp, &pd->tid)/* It's started now, so if we fail below, we'll have to cancel it
and let it clean itself up.  */
  *thread_ran = true;
}

At start_ In the thread() entry function, the function provided by the user is actually called. After the user's function is executed, the thread related data will be released. For example, thread local data thread_local variables, the number of threads is also reduced by one. If this is the last thread, exit the process directly, and__ free_tcb() is used to release pthread.

#define START_THREAD_DEFN \
  static int __attribute__ ((noreturn)) start_thread (void *arg)

START_THREAD_DEFN
{
    struct pthread *pd = START_THREAD_SELF;
    /* Run the code the user provided.  */
    THREAD_SETMEM (pd, result, pd->start_routine (pd->arg));
    /* Call destructors for the thread_local TLS variables.  */
    /* Run the destructor for the thread-local data.  */
    __nptl_deallocate_tsd ();
    if (__glibc_unlikely (atomic_decrement_and_test (&__nptl_nthreads)))
        /* This was the last thread.  */
        exit (0);
    __free_tcb (pd);
    __exit_thread ();
}

  __ free_tcb () will call__ deallocate_stack() to release the entire thread stack. This thread stack is based on the list of currently used thread stacks_ Take it down from used and put it into the cached thread stack list stack_cache, which ends the thread's life cycle.

void
internal_function
__free_tcb (struct pthread *pd)
{
  ......
  __deallocate_stack (pd);
}

void
internal_function
__deallocate_stack (struct pthread *pd)
{
  /* Remove the thread from the list of threads with user defined
     stacks.  */
  stack_list_del (&pd->list);
  /* Not much to do.  Just free the mmap()ed memory.  Note that we do
     not reset the 'used' flag in the 'tid' field.  This is done by
     the kernel.  If no thread has been created yet this field is
     still zero.  */
  if (__glibc_likely (! pd->user_stack))
    (void) queue_stack (pd);
}

  ARCH_CLONE actually calls__ clone().

# define ARCH_CLONE __clone

/* The userland implementation is:
   int clone (int (*fn)(void *arg), void *child_stack, int flags, void *arg),
   the kernel entry is:
   int clone (long flags, void *child_stack).

   The parameters are passed in register and on the stack from userland:
   rdi: fn
   rsi: child_stack
   rdx: flags
   rcx: arg
   r8d: TID field in parent
   r9d: thread pointer
%esp+8: TID field in child

   The kernel expects:
   rax: system call number
   rdi: flags
   rsi: child_stack
   rdx: TID field in parent
   r10: TID field in child
   r8:  thread pointer  */
        .text
ENTRY (__clone)
        movq    $-EINVAL,%rax
......
        /* Insert the argument onto the new stack.  */
        subq    $16,%rsi
        movq    %rcx,8(%rsi)

        /* Save the function pointer.  It will be popped off in the
           child in the ebx frobbing below.  */
        movq    %rdi,0(%rsi)

        /* Do the system call.  */
        movq    %rdx, %rdi
        movq    %r8, %rdx
        movq    %r9, %r8
        mov     8(%rsp), %R10_LP
        movl    $SYS_ify(clone),%eax
......
        syscall
......
PSEUDO_END (__clone)

clone() in the kernel is defined as follows. If other system calls are called in the main thread of the process, the current user state stack points to the whole process stack, the top of the stack pointer also points to the process stack, and the instruction pointer is also the code pointing to the main thread of the process. At this moment, when calling clone, the user state stack, stack top pointer and instruction pointer, like other system calls, point to the main thread. But for threads, all of this has to change. Because we hope that when the system call clone succeeds, in addition to the task corresponding to this thread in the kernel_ Struct, when the system call returns to the user state, the user state stack should be the thread stack, the top of the stack pointer should point to the thread stack, and the instruction pointer should point to the function that the thread will execute. So we need to do it ourselves. We need to press the parameters and instructions of the function that the thread is going to execute into the stack. When returning from the kernel and popping out of the stack, we need to start from this function and carry on with these parameters.

SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
     int __user *, parent_tidptr,
     int __user *, child_tidptr,
     unsigned long, tls)
{
    return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);
}

The thread and process have reached the same destination and entered the same function__ do_fork() works. The source code is as follows, and the main work includes copy structure_ Process () and wake up a new process_ Up_ There are two parts of new(). Where the thread will_ Clone in thread() function_ Flags completes the switching of stack top pointer and instruction pointer mentioned above, as well as the microsecond difference between some threads and processes.

long _do_fork(unsigned long clone_flags,
        unsigned long stack_start,
        unsigned long stack_size,
        int __user *parent_tidptr,
        int __user *child_tidptr,
        unsigned long tls)
{
    struct task_struct *p;
    int trace = 0;
    long nr;

......
    p = copy_process(clone_flags, stack_start, stack_size,
       child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
......
	if (IS_ERR(p))
		return PTR_ERR(p);
    struct pid *pid;
    pid = get_task_pid(p, PIDTYPE_PID);
    nr = pid_vnr(pid);

    if (clone_flags & CLONE_PARENT_SETTID)
      put_user(nr, parent_tidptr);
	if (clone_flags & CLONE_VFORK) {
		p->vfork_done = &vfork;
		init_completion(&vfork);
		get_task_struct(p);
	}
    wake_up_new_task(p);
......
    put_pid(pid);
	return nr;
};

4.1 task structure replication

   as follows copy_process() Function source code reduction, task_ The complexity of struct structure also determines the complexity of the replication process, so many of them are omitted here, only the main calling functions of each part are reserved

static __latent_entropy struct task_struct *copy_process(
          unsigned long clone_flags,
          unsigned long stack_start,
          unsigned long stack_size,
          int __user *child_tidptr,
          struct pid *pid,
          int trace,
          unsigned long tls,
          int node)
{
    int retval;
    struct task_struct *p;
......
    //Assign task_struct structure
    p = dup_task_struct(current, node);  
......
    //Permission processing
    retval = copy_creds(p, clone_flags);
......
    //Set scheduling related variables
    retval = sched_fork(clone_flags, p);    
......
    //Initializing file and file system related variables
    retval = copy_files(clone_flags, p);
    retval = copy_fs(clone_flags, p);  
......
    //Initialize signal dependent variables
    init_sigpending(&p->pending);
    retval = copy_sighand(clone_flags, p);
    retval = copy_signal(clone_flags, p);  
......
    //Copy process memory space
    retval = copy_mm(clone_flags, p);
...... 
    //Initialize affinity variable
    INIT_LIST_HEAD(&p->children);
    INIT_LIST_HEAD(&p->sibling);
......
    //Establish kinship
	//Source code to be explained later  
};
  1. copy_process() calls DUP first_ task_struct() assign task_struct structure, dup_task_struct() mainly does the following:
  • Call alloc_task_struct_node assigns a task_struct structure;
  • Call alloc_thread_stack_node to create the kernel stack, which calls__ vmalloc_node_range assigns a continuous thread_ Memory space of size, assigned to task_ void *stack member variable of struct;
  • Call arch_dup_task_struct(struct task_struct *dst, struct task_struct *src), set the task_struct to copy, in fact, is to call memcpy;
  • Call setup_thread_stack set thread_info.
static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
{
    struct task_struct *tsk;
	unsigned long *stack;
......
   	tsk = alloc_task_struct_node(node);
	if (!tsk)
		return NULL;

	stack = alloc_thread_stack_node(tsk, node);
	if (!stack)
		goto free_tsk; 
    if (memcg_charge_kernel_stack(tsk))
		goto free_stack;

	stack_vm_area = task_stack_vm_area(tsk);

	err = arch_dup_task_struct(tsk, orig);
......    
 	setup_thread_stack(tsk, orig);
......    
};    
  1. Next, call copy_creds processing permission related content
  • Call prepare_creds, prepare a new struct cred *new. How to prepare? In fact, allocate a new struct cred structure from memory, and then call memcpy to copy a cred of the parent process.
  • Then p - > cred = P - > Real_ cred = get_ Cred (New), which points to the new cred for "who can I operate" and "who can operate me" permissions of the new process.
/*
 * Copy credentials for the new process created by fork()
 *
 * We share if we can, but under some circumstances we have to generate a new
 * set.
 *
 * The new process gets the current process's subjective credentials as its
 * objective and subjective credentials
 */
int copy_creds(struct task_struct *p, unsigned long clone_flags)
{
	struct cred *new;
	int ret;
......
	new = prepare_creds();
	if (!new)
		return -ENOMEM;
......
	atomic_inc(&new->user->processes);
	p->cred = p->real_cred = get_cred(new);
	alter_cred_subscribers(new, 2);
	validate_creds(new);
	return 0;
}
  1. Set scheduling related variables. This part of the source code will not be shown first, and will be detailed in the process scheduling. sched_fork mainly does the following:
  • Call__ sched_fork, in this case_ RQ set to 0, initialize sched_entity, the exec in it_ start,sum_exec_runtime,prev_ sum_ exec_ Both runtime and vruntime are set to 0. These variables relate to the actual and virtual runtime of the process. Whether it is time to be scheduled depends on them;
  • Set the state of the process P - > state = task_ NEW;
  • Initialization priority prio, normal_prio,static_prio;
  • Set the scheduling class to p - > sched if it is a normal process_ class = &fair_ sched_ class;
  • Call task of scheduling class_ Fork function, for CFS In other words, calling task_fork_fair. In this function, first call update_curr, update the statistics of the current process, then set the child process as the vruntime of the parent process, and finally call place_. Entity, initializing sched_entity. Here is a variable sysctl_sched_child_runs_first, you can set the parent and child processes who run first. If the subprocess is set to run first, even if the vruntime of the two processes is the same, the sched of the subprocess should also be checked_ Entity is placed in front and then called resched_. Curr, mark the currently running process TIF_NEED_RESCHED, that is, set the parent process to be scheduled so that the next time it is scheduled, the parent process will be preempted by the child process.
  1. Initializing file and file system related variables
  • copy_files is mainly used to copy file information opened by a task.
    • For processes, the information is in a structure of files_struct to maintain, each open file has a file descriptor. In copy_ File function calls dup_fd, a new file will be created here_ Struct, and then make a copy of all the file descriptor arrays fdtable.
    • For threads, because clone is set_ The files ID bit changes to the original files_ Adding one to the struct reference count does not copy the file.
static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
{
	struct files_struct *oldf, *newf;
	int error = 0;

	/*
	 * A background process may not have any files ...
	 */
	oldf = current->files;
	if (!oldf)
		goto out;
	if (clone_flags & CLONE_FILES) {
		atomic_inc(&oldf->count);
		goto out;
	}
	newf = dup_fd(oldf, &error);
	if (!newf)
		goto out;

	tsk->files = newf;
	error = 0;
out:
	return error;
}
  • copy_fs is mainly used to copy the directory information of a task.
    • For a process, this information uses a structure fs_struct to maintain. A process has its own root directory, root file system, pwd and file system in FS_ Maintenance in struct. copy_fs function calls copy_fs_struct, create a new fs_struct and copy the FS of the original process_ struct.
    • For threads, because clone is set_ The FS flag bit changes to the original FS_ Adding one to the number of users of struct does not copy the file system structure.
static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
{
	struct fs_struct *fs = current->fs;
	if (clone_flags & CLONE_FS) {
		/* tsk->fs is already what we want */
		spin_lock(&fs->lock);
		if (fs->in_exec) {
			spin_unlock(&fs->lock);
			return -EAGAIN;
		}
		fs->users++;
		spin_unlock(&fs->lock);
		return 0;
	}
	tsk->fs = copy_fs_struct(fs);
	if (!tsk->fs)
		return -ENOMEM;
	return 0;
}
  1. Initialize signal dependent variables
  • All threads in the whole process share a shared_pending, which is also a list of signals, is sent to the whole process, which thread handles the same. Therefore, we can make sure that the signal sent to the process can be processed by a thread, but the scope of influence should be the whole process. For example, if you kill a process, all threads will be killed. If a signal is a pthread to a thread_ Kill, only the thread should be able to receive it.
  • copy_sighand
    • For a process, a new sighand is assigned_ struct. The most important thing here is to maintain the signal processing function in copy_ memcpy will be called in sighand to copy the signal processing function sighand - > action from the parent process to the child process.
    • For threads, due to the design of clone_ The signal flag bit will exit with the reference count added, and no new signal variable will be assigned.
static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
{
	struct sighand_struct *sig;
	if (clone_flags & CLONE_SIGHAND) {
		refcount_inc(&current->sighand->count);
		return 0;
	}
	sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
	rcu_assign_pointer(tsk->sighand, sig);
	if (!sig)
		return -ENOMEM;
	refcount_set(&sig->count, 1);
	spin_lock_irq(&current->sighand->siglock);
	memcpy(sig->action, current->sighand->action, sizeof(sig->action));
	spin_unlock_irq(&current->sighand->siglock);
	return 0;
}
  • init_sigpending and copy_signal is used to initialize the signal structure and copy the data structure used to maintain the signals sent to the process. copy_ The signal function assigns a new signal_struct and initialize it. For the thread, it is also a direct exit without copying.
static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
{
	struct signal_struct *sig;
	if (clone_flags & CLONE_THREAD)
		return 0;
	sig = kmem_cache_zalloc(signal_cachep, GFP_KERNEL);
......
     /* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
	sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
	tsk->thread_node = (struct list_head)LIST_HEAD_INIT(sig->thread_head);
	init_waitqueue_head(&sig->wait_chldexit);
	sig->curr_target = tsk;
	init_sigpending(&sig->shared_pending);
	INIT_HLIST_HEAD(&sig->multiprocess);
	seqlock_init(&sig->stats_lock);
	prev_cputime_init(&sig->prev_cputime);
......
};
  1. Copy process memory space

    • All processes have their own memory space, using mm_struct structure. copy_ Call dup_ in mm () function Mm (), assign a new mm_struct structure, call memcpy to copy this structure. dup_mmap() is used to copy the memory mapped part of the memory space. When we talked about system call earlier, we said that mmap can allocate a large amount of memory. In fact, mmap can also map a file to memory, which is convenient for reading and writing files like reading and writing memory. We will talk about this in the memory management section.
    • Threads do not copy memory space, so because CLONE_VM identity bit points directly to the original mm_struct.
    static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
    {
    	struct mm_struct *mm, *oldmm;
    	int retval;
    ......
    	/*
    	 * Are we cloning a kernel thread?
    	 * We need to steal a active VM for that..
    	 */
    	oldmm = current->mm;
    	if (!oldmm)
    		return 0;
    	/* initialize the new vmacache entries */
    	vmacache_flush(tsk);
    	if (clone_flags & CLONE_VM) {
    		mmget(oldmm);
    		mm = oldmm;
    		goto good_mm;
    	}
    	retval = -ENOMEM;
    	mm = dup_mm(tsk);
    	if (!mm)
    		goto fail_nomem;
    good_mm:
    	tsk->mm = mm;
    	tsk->active_mm = mm;
    	return 0;
    fail_nomem:
    	return retval;
    }
    
  2. Assign pid, set tid, group_leader, and establish the relationship between tasks.

  • group_leader: Group for process_ The leader is itself, separate from the old process. The thread is set to the group of the current process_ leader.
  • tgid: pid of the process itself, and pid of the current process for the thread
  • real_parent: for the process, it is the current process, and for the thread, it is the real of the current process_ parent
static __latent_entropy struct task_struct *copy_process(......) {
......    
    p->pid = pid_nr(pid);
    if (clone_flags & CLONE_THREAD) {
        p->exit_signal = -1;
        p->group_leader = current->group_leader;
        p->tgid = current->tgid;
    } else {
        if (clone_flags & CLONE_PARENT)
          p->exit_signal = current->group_leader->exit_signal;
        else
          p->exit_signal = (clone_flags & CSIGNAL);
        p->group_leader = p;
        p->tgid = p->pid;
    }
......
    if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
        p->real_parent = current->real_parent;
        p->parent_exec_id = current->parent_exec_id;
    } else {
        p->real_parent = current;
        p->parent_exec_id = current->self_exec_id;
    } 
......  
};

4.2 wake up of new process

  _ Do_ The second thing fork does is call wake_up_new_task() wakes up the process.

void wake_up_new_task(struct task_struct *p)
{
    struct rq_flags rf;
    struct rq *rq;
......
    p->state = TASK_RUNNING;
......
    activate_task(rq, p, ENQUEUE_NOCLOCK);
    trace_sched_wakeup_new(p);
    check_preempt_curr(rq, p, WF_FORK);
......
}

   first, we need to set the state of the process to TASK_RUNNING. activate_ Enqueue is called in the task() function_ task().

void activate_task(struct rq *rq, struct task_struct *p, int flags)
{
	if (task_contributes_to_load(p))
		rq->nr_uninterruptible--;

	enqueue_task(rq, p, flags);

	p->on_rq = TASK_ON_RQ_QUEUED;
}

static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
{
.....
    p->sched_class->enqueue_task(rq, p, flags);
}

   if it is the scheduling class of CFS, execute the corresponding enqueue_task_fair(). In enqueue_ task_ CFS is the queue taken from fair()_ RQ, and then call enqueue_. entity(). In enqueue_ In the entity() function, update will be called_ Curr (), update the running statistics, and then call it. enqueue_entity, sched_entity is added to the red black tree, and then se - > on_ RQ = 1 is set on the queue. Back to enqueue_ task_ After fair, add one to the number of processes running on this queue. Then, wake_up_new_task will call check_preempt_curr to see if the current process can be preempted.

static void
enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
    struct cfs_rq *cfs_rq;
    struct sched_entity *se = &p->se;
......
	for_each_sched_entity(se) {
		if (se->on_rq)
			break;
		cfs_rq = cfs_rq_of(se);
		enqueue_entity(cfs_rq, se, flags);

		cfs_rq->h_nr_running++;
		cfs_rq->idle_h_nr_running += idle_h_nr_running;

		/* end evaluation on encountering a throttled cfs_rq */
		if (cfs_rq_throttled(cfs_rq))
			goto enqueue_throttle;

		flags = ENQUEUE_WAKEUP;
	}
......
}

Check_ preempt_ In curr, RQ - > curr - > sched of the corresponding scheduling class will be called_ class->check_preempt_curr(rq, p, flags). For CFS scheduling class, check is called_ preempt_ wakeup. At check_ preempt_ In the wakeup function, task is called earlier_ fork_ When fair, set sysctl_sched_child_runs_first, the TIF of the current parent process has been set_ NEED_ If reset is set, return directly. Otherwise, check_preempt_wakeup still calls update_curr updates the statistics once, and then wakeup_preempt_entity PK the parent process and the child process once to see if they want to preempt. If so, call resched_curr marks the parent process as TIF_NEED_RESCHED. If the newly created process should preempt the parent process, at what time? Don't forget that fork is a system call. When returning from the system call, it is a good time to seize. If the parent process determines that it has been set to TIF_NEED_RESCHED, let the subprocess run first and seize themselves.

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
    struct task_struct *curr = rq->curr;
    struct sched_entity *se = &curr->se, *pse = &p->se;
    struct cfs_rq *cfs_rq = task_cfs_rq(curr);
......
    if (test_tsk_need_resched(curr))
        return;
......
    find_matching_se(&se, &pse);
    update_curr(cfs_rq_of(se));
    if (wakeup_preempt_entity(se, pse) == 1) {
        goto preempt;
    }
    return;
preempt:
    resched_curr(rq);
......
}

                    .

5, Summary

                        . This article introduces the creation of process and thread, and the derivation of multi process is explained by changing the contrast because it uses the same function as the kernel creation of thread. Therefore, the process, thread structure and creation process are all analyzed. The following will continue to analyze the process and thread scheduling.

Source data

[1] kernel/fork.c

[2] glibc/nptl/pthread_create.c

reference

[1] wiki

[2] elixir.bootlin.com/linux

[3] woboq

[3] Linux-insides

[4] Deep understanding of Linux kernel

[5] The art of Linux kernel design

[6] Geek time interesting talk about Linux operating system

150 talk about getting rid of Python web crawler easily

Tags: Linux glibc Attribute Programming

Posted on Fri, 12 Jun 2020 23:34:39 -0400 by prashanth0626