1, Processes and threads
Process:
- Get the process after the program is loaded into memory
- A running activity of a data set in a program
Thread:
- An execution unit in a process
- A schedulable entity in an operating system
- A relatively independent sequence of control flows in a process
Connection between the two:
- The basic unit of system resource allocation is process, but the basic unit of CPU scheduling is thread
- There can be multiple threads sharing process resources in a process
- Threads cannot exist separately from the process, and can only depend on the process execution
- Any thread can create another new thread
- Threads have a life cycle, birth and death
To complete a complex task, single thread can no longer meet the requirements. Should we prefer multithreading or multiprocessing? Processes use much more resources than threads. Each time the system starts a thread, it needs to reallocate the corresponding resources, which will occupy a lot of memory. In contrast, the overhead of using threads is much smaller, with only one thread object and the corresponding stack. Therefore, in order to work more effectively, programs often start multiple threads to work. The more complex the functions are, the more threads are used. In multithreading, process resources are shared. When accessing these resources, we will encounter the problems of synchronization and mutual exclusion. Deadlock is the problem caused by mutual exclusion.
2, Deadlock
Deadlock concept
- Threads can't continue to execute because they wait for critical resources.
Deadlock generation condition
- There are multiple critical resources in the system and the resources cannot be preempted
- A thread requires multiple resources to continue execution
Critical resources
- A resource that can only be used by one thread at a time
Can't be preempted
- When a thread holds the resource, if the thread does not actively release the resource, other threads cannot force it to release and preempt the resource.
3, Deadlock detection
3.1. Create deadlock program
The writer creates two threads and uses two mutexes to access critical resources according to the conditions generated above. The lock is obtained alternately in each thread and then released:
#include <iostream> #include <mutex> #include <thread> std::mutex g_mutex1; std::mutex g_mutex2; void thread1() { while(1) { g_mutex1.lock(); g_mutex2.lock(); std::cout << "thread1 do work ..." << std::endl; g_mutex2.unlock(); g_mutex1.unlock(); } } void thread2() { while(1) { g_mutex2.lock(); g_mutex1.lock(); std::cout << "thread2 do work ..." << std::endl; g_mutex1.unlock(); g_mutex2.unlock(); } } int main() { std::thread t1(thread1); std::thread t2(thread2); t1.join(); t2.join(); return 0; }
compile
g++ -g main.cpp -lpthread
The running results show that only thread1 runs and stops after a period of time. This is because the t1 object is created first, resulting in thread1 running first, and thread2 waiting for each other to release resources after running, resulting in deadlock.
3.2. Analyze deadlock using GDB
The deadlock time of the program is uncertain. If the program is run again, the error may be difficult to catch, so gdb can be used to dynamically link to the target process.
View process number
ps -aux | grep a.out
Start gdb
sudo gdb
Link the target program, and the program will be suspended after entering
attach Process number
View the current status of all processes
(gdb) info threads Id Target Id Frame * 1 Thread 0x7f34d8997740 (LWP 8915) "a.out" __pthread_clockjoin_ex (threadid=139864948958976, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145 2 Thread 0x7f34d8996700 (LWP 8916) "a.out" __lll_lock_wait (futex=futex@entry=0x55dc8faa11a0 <g_mutex2>, private=0) at lowlevellock.c:52 3 Thread 0x7f34d8195700 (LWP 8917) "a.out" __lll_lock_wait (futex=futex@entry=0x55dc8faa1160 <g_mutex1>, private=0) at lowlevellock.c:52
View stack backtracking separately
thread apply all bt Thread 3 (Thread 0x7f27ca9ef700 (LWP 10122)): #0 __lll_lock_wait (futex=futex@entry=0x560a953c6160 <g_mutex1>, private=0) at lowlevellock.c:52 #1 0x00007f27cb7400a3 in __GI___pthread_mutex_lock (mutex=0x560a953c6160 <g_mutex1>) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000560a953c3541 in __gthread_mutex_lock (__mutex=0x560a953c6160 <g_mutex1>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 #3 0x0000560a953c3596 in std::mutex::lock (this=0x560a953c6160 <g_mutex1>) at /usr/include/c++/9/bits/std_mutex.h:100 #4 0x0000560a953c33aa in thread2 () at main.cpp:27 #5 0x0000560a953c3ebe in std::__invoke_impl<void, void (*)()> (__f=@0x560a962d4008: 0x560a953c338a <thread2()>) at /usr/include/c++/9/bits/invoke.h:60 #6 0x0000560a953c3e56 in std::__invoke<void (*)()> (__fn=@0x560a962d4008: 0x560a953c338a <thread2()>) at /usr/include/c++/9/bits/invoke.h:95 #7 0x0000560a953c3de8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x560a962d4008) at /usr/include/c++/9/thread:244 #8 0x0000560a953c3da5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x560a962d4008) at /usr/include/c++/9/thread:251 #9 0x0000560a953c3d76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x560a962d4000) at /usr/include/c++/9/thread:195 #10 0x00007f27cb628de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007f27cb73d609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #12 0x00007f27cb467293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7f27cb1f0700 (LWP 10121)): #0 __lll_lock_wait (futex=futex@entry=0x560a953c61a0 <g_mutex2>, private=0) at lowlevellock.c:52 #1 0x00007f27cb7400a3 in __GI___pthread_mutex_lock (mutex=0x560a953c61a0 <g_mutex2>) at ../nptl/pthread_mutex_lock.c:80 #2 0x0000560a953c3541 in __gthread_mutex_lock (__mutex=0x560a953c61a0 <g_mutex2>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 #3 0x0000560a953c3596 in std::mutex::lock (this=0x560a953c61a0 <g_mutex2>) at /usr/include/c++/9/bits/std_mutex.h:100 #4 0x0000560a953c3348 in thread1 () at main.cpp:13 #5 0x0000560a953c3ebe in std::__invoke_impl<void, void (*)()> (__f=@0x560a962d3eb8: 0x560a953c3328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:60 #6 0x0000560a953c3e56 in std::__invoke<void (*)()> (__fn=@0x560a962d3eb8: 0x560a953c3328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:95 #7 0x0000560a953c3de8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x560a962d3eb8) at /usr/include/c++/9/thread:244 #8 0x0000560a953c3da5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x560a962d3eb8) at /usr/include/c++/9/thread:251 #9 0x0000560a953c3d76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x560a962d3eb0) at /usr/include/c++/9/thread:195 #10 0x00007f27cb628de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007f27cb73d609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #12 0x00007f27cb467293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7f27cb1f1740 (LWP 10120)): #0 __pthread_clockjoin_ex (threadid=139808888260352, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145 #1 0x00007f27cb629047 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x0000560a953c343a in main () at main.cpp:41
Finally, it is determined that the program does have a deadlock
3.3. Analyze deadlock using core dump file
When direct debugging is not possible, especially after deployment to the site, you can entrust the customer to generate a core dump file for problem analysis.
View process number
ps -aux | grep a.out
Start gdb
sudo gdb
Link the target program, and the program will be suspended after entering
attach Process number
Generate core dump file
gcore Save file name
Wait for the customer to submit the file before debugging
gcore Executable name core-dump file name
result
(gdb) thread apply all bt Thread 3 (Thread 0x7f12864d0700 (LWP 10871)): #0 __lll_lock_wait (futex=futex@entry=0x560150530160 <g_mutex1>, private=0) at lowlevellock.c:52 #1 0x00007f12872210a3 in __GI___pthread_mutex_lock (mutex=0x560150530160 <g_mutex1>) at ../nptl/pthread_mutex_lock.c:80 #2 0x000056015052d541 in __gthread_mutex_lock (__mutex=0x560150530160 <g_mutex1>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 #3 0x000056015052d596 in std::mutex::lock (this=0x560150530160 <g_mutex1>) at /usr/include/c++/9/bits/std_mutex.h:100 #4 0x000056015052d3aa in thread2 () at main.cpp:27 #5 0x000056015052debe in std::__invoke_impl<void, void (*)()> (__f=@0x56015130c008: 0x56015052d38a <thread2()>) at /usr/include/c++/9/bits/invoke.h:60 #6 0x000056015052de56 in std::__invoke<void (*)()> (__fn=@0x56015130c008: 0x56015052d38a <thread2()>) at /usr/include/c++/9/bits/invoke.h:95 #7 0x000056015052dde8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56015130c008) at /usr/include/c++/9/thread:244 #8 0x000056015052dda5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56015130c008) at /usr/include/c++/9/thread:251 #9 0x000056015052dd76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56015130c000) at /usr/include/c++/9/thread:195 #10 0x00007f1287109de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007f128721e609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #12 0x00007f1286f48293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7f1286cd1700 (LWP 10870)): #0 __lll_lock_wait (futex=futex@entry=0x5601505301a0 <g_mutex2>, private=0) at lowlevellock.c:52 #1 0x00007f12872210a3 in __GI___pthread_mutex_lock (mutex=0x5601505301a0 <g_mutex2>) at ../nptl/pthread_mutex_lock.c:80 #2 0x000056015052d541 in __gthread_mutex_lock (__mutex=0x5601505301a0 <g_mutex2>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749 #3 0x000056015052d596 in std::mutex::lock (this=0x5601505301a0 <g_mutex2>) at /usr/include/c++/9/bits/std_mutex.h:100 #4 0x000056015052d348 in thread1 () at main.cpp:13 #5 0x000056015052debe in std::__invoke_impl<void, void (*)()> (__f=@0x56015130beb8: 0x56015052d328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:60 #6 0x000056015052de56 in std::__invoke<void (*)()> (__fn=@0x56015130beb8: 0x56015052d328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:95 #7 0x000056015052dde8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56015130beb8) at /usr/include/c++/9/thread:244 #8 0x000056015052dda5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56015130beb8) at /usr/include/c++/9/thread:251 #9 0x000056015052dd76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56015130beb0) at /usr/include/c++/9/thread:195 #10 0x00007f1287109de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007f128721e609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #12 0x00007f1286f48293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7f1286cd2740 (LWP 10869)): #0 __pthread_clockjoin_ex (threadid=139717547726592, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, bloc--Type <RET> for more, q to quit, c to continue without paging-- k=<optimized out>) at pthread_join_common.c:145 #1 0x00007f128710a047 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x000056015052d43a in main () at main.cpp:41
4, Deadlock avoidance
According to the conditions of deadlock generation, we can destroy one or more of them, so we can avoid deadlock
Method 1
- Assign a unique sequence number (r1,... rn) to all critical resources
- The corresponding resource lock is also assigned a unique sequence number (m1,... mn)
- All programs in the system request resources in strict incremental order
Method 2
- Using only one thread lock will not cause deadlock, but the system efficiency will be reduced (equivalent to only one critical resource)
Method 3
- When the acquisition of a resource fails, release the resources already held in your hand. After releasing the resources, the context state may need to be adjusted, which may not pay off.