[GDB debugging-3] multi thread deadlock

1, Processes and threads

Process:

  • Get the process after the program is loaded into memory
  • A running activity of a data set in a program

Thread:

  • An execution unit in a process
  • A schedulable entity in an operating system
  • A relatively independent sequence of control flows in a process

Connection between the two:

  • The basic unit of system resource allocation is process, but the basic unit of CPU scheduling is thread
  • There can be multiple threads sharing process resources in a process
  • Threads cannot exist separately from the process, and can only depend on the process execution
  • Any thread can create another new thread
  • Threads have a life cycle, birth and death

To complete a complex task, single thread can no longer meet the requirements. Should we prefer multithreading or multiprocessing? Processes use much more resources than threads. Each time the system starts a thread, it needs to reallocate the corresponding resources, which will occupy a lot of memory. In contrast, the overhead of using threads is much smaller, with only one thread object and the corresponding stack. Therefore, in order to work more effectively, programs often start multiple threads to work. The more complex the functions are, the more threads are used. In multithreading, process resources are shared. When accessing these resources, we will encounter the problems of synchronization and mutual exclusion. Deadlock is the problem caused by mutual exclusion.

2, Deadlock

Deadlock concept

  • Threads can't continue to execute because they wait for critical resources.

Deadlock generation condition

  • There are multiple critical resources in the system and the resources cannot be preempted
  • A thread requires multiple resources to continue execution

Critical resources

  • A resource that can only be used by one thread at a time

Can't be preempted

  • When a thread holds the resource, if the thread does not actively release the resource, other threads cannot force it to release and preempt the resource.

3, Deadlock detection

3.1. Create deadlock program

The writer creates two threads and uses two mutexes to access critical resources according to the conditions generated above. The lock is obtained alternately in each thread and then released:

#include <iostream>
#include <mutex>
#include <thread>

std::mutex g_mutex1;
std::mutex g_mutex2;

void thread1()
{
    while(1)
    {
        g_mutex1.lock();
        g_mutex2.lock();

        std::cout << "thread1 do work ..." << std::endl;

        g_mutex2.unlock();
        g_mutex1.unlock();
    }
}

void thread2()
{
    while(1)
    {
        g_mutex2.lock();
        g_mutex1.lock();

        std::cout << "thread2 do work ..." << std::endl;

        g_mutex1.unlock();
        g_mutex2.unlock();
    }
}

int main()
{
    std::thread t1(thread1);
    std::thread t2(thread2);

    t1.join();
    t2.join();
    return 0;
}

compile

g++ -g main.cpp -lpthread

The running results show that only thread1 runs and stops after a period of time. This is because the t1 object is created first, resulting in thread1 running first, and thread2 waiting for each other to release resources after running, resulting in deadlock.

3.2. Analyze deadlock using GDB

The deadlock time of the program is uncertain. If the program is run again, the error may be difficult to catch, so gdb can be used to dynamically link to the target process.

View process number

ps -aux | grep a.out

Start gdb

sudo gdb

Link the target program, and the program will be suspended after entering

attach Process number

View the current status of all processes

(gdb) info threads
  Id   Target Id                                Frame 
* 1    Thread 0x7f34d8997740 (LWP 8915) "a.out" __pthread_clockjoin_ex (threadid=139864948958976, thread_return=0x0, 
    clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
  2    Thread 0x7f34d8996700 (LWP 8916) "a.out" __lll_lock_wait (futex=futex@entry=0x55dc8faa11a0 <g_mutex2>, private=0)
    at lowlevellock.c:52
  3    Thread 0x7f34d8195700 (LWP 8917) "a.out" __lll_lock_wait (futex=futex@entry=0x55dc8faa1160 <g_mutex1>, private=0)
    at lowlevellock.c:52

View stack backtracking separately

thread apply all bt

Thread 3 (Thread 0x7f27ca9ef700 (LWP 10122)):
#0  __lll_lock_wait (futex=futex@entry=0x560a953c6160 <g_mutex1>, private=0) at lowlevellock.c:52
#1  0x00007f27cb7400a3 in __GI___pthread_mutex_lock (mutex=0x560a953c6160 <g_mutex1>) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000560a953c3541 in __gthread_mutex_lock (__mutex=0x560a953c6160 <g_mutex1>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3  0x0000560a953c3596 in std::mutex::lock (this=0x560a953c6160 <g_mutex1>) at /usr/include/c++/9/bits/std_mutex.h:100
#4  0x0000560a953c33aa in thread2 () at main.cpp:27
#5  0x0000560a953c3ebe in std::__invoke_impl<void, void (*)()> (__f=@0x560a962d4008: 0x560a953c338a <thread2()>) at /usr/include/c++/9/bits/invoke.h:60
#6  0x0000560a953c3e56 in std::__invoke<void (*)()> (__fn=@0x560a962d4008: 0x560a953c338a <thread2()>) at /usr/include/c++/9/bits/invoke.h:95
#7  0x0000560a953c3de8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x560a962d4008) at /usr/include/c++/9/thread:244
#8  0x0000560a953c3da5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x560a962d4008) at /usr/include/c++/9/thread:251
#9  0x0000560a953c3d76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x560a962d4000) at /usr/include/c++/9/thread:195
#10 0x00007f27cb628de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f27cb73d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f27cb467293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f27cb1f0700 (LWP 10121)):
#0  __lll_lock_wait (futex=futex@entry=0x560a953c61a0 <g_mutex2>, private=0) at lowlevellock.c:52
#1  0x00007f27cb7400a3 in __GI___pthread_mutex_lock (mutex=0x560a953c61a0 <g_mutex2>) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000560a953c3541 in __gthread_mutex_lock (__mutex=0x560a953c61a0 <g_mutex2>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3  0x0000560a953c3596 in std::mutex::lock (this=0x560a953c61a0 <g_mutex2>) at /usr/include/c++/9/bits/std_mutex.h:100
#4  0x0000560a953c3348 in thread1 () at main.cpp:13
#5  0x0000560a953c3ebe in std::__invoke_impl<void, void (*)()> (__f=@0x560a962d3eb8: 0x560a953c3328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:60
#6  0x0000560a953c3e56 in std::__invoke<void (*)()> (__fn=@0x560a962d3eb8: 0x560a953c3328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:95
#7  0x0000560a953c3de8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x560a962d3eb8) at /usr/include/c++/9/thread:244
#8  0x0000560a953c3da5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x560a962d3eb8) at /usr/include/c++/9/thread:251
#9  0x0000560a953c3d76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x560a962d3eb0) at /usr/include/c++/9/thread:195
#10 0x00007f27cb628de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f27cb73d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f27cb467293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f27cb1f1740 (LWP 10120)):
#0  __pthread_clockjoin_ex (threadid=139808888260352, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
#1  0x00007f27cb629047 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x0000560a953c343a in main () at main.cpp:41                              

Finally, it is determined that the program does have a deadlock

3.3. Analyze deadlock using core dump file

When direct debugging is not possible, especially after deployment to the site, you can entrust the customer to generate a core dump file for problem analysis.

View process number

ps -aux | grep a.out

Start gdb

sudo gdb

Link the target program, and the program will be suspended after entering

attach Process number

Generate core dump file

gcore Save file name

Wait for the customer to submit the file before debugging

gcore Executable name core-dump file name

result

(gdb) thread apply all bt

Thread 3 (Thread 0x7f12864d0700 (LWP 10871)):
#0  __lll_lock_wait (futex=futex@entry=0x560150530160 <g_mutex1>, private=0) at lowlevellock.c:52
#1  0x00007f12872210a3 in __GI___pthread_mutex_lock (mutex=0x560150530160 <g_mutex1>) at ../nptl/pthread_mutex_lock.c:80
#2  0x000056015052d541 in __gthread_mutex_lock (__mutex=0x560150530160 <g_mutex1>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3  0x000056015052d596 in std::mutex::lock (this=0x560150530160 <g_mutex1>) at /usr/include/c++/9/bits/std_mutex.h:100
#4  0x000056015052d3aa in thread2 () at main.cpp:27
#5  0x000056015052debe in std::__invoke_impl<void, void (*)()> (__f=@0x56015130c008: 0x56015052d38a <thread2()>) at /usr/include/c++/9/bits/invoke.h:60
#6  0x000056015052de56 in std::__invoke<void (*)()> (__fn=@0x56015130c008: 0x56015052d38a <thread2()>) at /usr/include/c++/9/bits/invoke.h:95
#7  0x000056015052dde8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56015130c008) at /usr/include/c++/9/thread:244
#8  0x000056015052dda5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56015130c008) at /usr/include/c++/9/thread:251
#9  0x000056015052dd76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56015130c000) at /usr/include/c++/9/thread:195
#10 0x00007f1287109de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f128721e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f1286f48293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f1286cd1700 (LWP 10870)):
#0  __lll_lock_wait (futex=futex@entry=0x5601505301a0 <g_mutex2>, private=0) at lowlevellock.c:52
#1  0x00007f12872210a3 in __GI___pthread_mutex_lock (mutex=0x5601505301a0 <g_mutex2>) at ../nptl/pthread_mutex_lock.c:80
#2  0x000056015052d541 in __gthread_mutex_lock (__mutex=0x5601505301a0 <g_mutex2>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3  0x000056015052d596 in std::mutex::lock (this=0x5601505301a0 <g_mutex2>) at /usr/include/c++/9/bits/std_mutex.h:100
#4  0x000056015052d348 in thread1 () at main.cpp:13
#5  0x000056015052debe in std::__invoke_impl<void, void (*)()> (__f=@0x56015130beb8: 0x56015052d328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:60
#6  0x000056015052de56 in std::__invoke<void (*)()> (__fn=@0x56015130beb8: 0x56015052d328 <thread1()>) at /usr/include/c++/9/bits/invoke.h:95
#7  0x000056015052dde8 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x56015130beb8) at /usr/include/c++/9/thread:244
#8  0x000056015052dda5 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x56015130beb8) at /usr/include/c++/9/thread:251
#9  0x000056015052dd76 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x56015130beb0) at /usr/include/c++/9/thread:195
#10 0x00007f1287109de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f128721e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f1286f48293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f1286cd2740 (LWP 10869)):
#0  __pthread_clockjoin_ex (threadid=139717547726592, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, bloc--Type <RET> for more, q to quit, c to continue without paging--
k=<optimized out>) at pthread_join_common.c:145
#1  0x00007f128710a047 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x000056015052d43a in main () at main.cpp:41

4, Deadlock avoidance

According to the conditions of deadlock generation, we can destroy one or more of them, so we can avoid deadlock

Method 1

  • Assign a unique sequence number (r1,... rn) to all critical resources
  • The corresponding resource lock is also assigned a unique sequence number (m1,... mn)
  • All programs in the system request resources in strict incremental order

Method 2

  • Using only one thread lock will not cause deadlock, but the system efficiency will be reduced (equivalent to only one critical resource)

Method 3

  • When the acquisition of a resource fails, release the resources already held in your hand. After releasing the resources, the context state may need to be adjusted, which may not pay off.

Tags: C C++ Linux

Posted on Thu, 18 Nov 2021 14:08:14 -0500 by manichean