Analysis of dump samples of three thread stacks in tomcat

"Pick up flowers day and night and leave no regrets. If you never forget, there will be an echo."

By reading through the source code of tomcat request task processing, tomcat thread pool, TaskQueue, ReentrantLock and AQS, as well as their functions and principles, we can analyze the clues in tomcat thread dump.

In addition, for sample 3, since there is no problem of experimental reproduction to prove the inference, please be more skeptical.

Sample 1

This situation is very simple. After starting tomcat locally, jstack can see that there is not much to talk about. Let's go through it quickly.

"http-nio-8081-exec-27" #42 daemon prio=1 os_prio=-2 tid=0x000000001a549000 nid=0x4a70 waiting on condition [0x000000001dc0f000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000008144aab0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:107)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:1)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:58)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

There are 10 threads like "http-nio-8081-exec-xx", which is the Size of the thread pool coreSize. Look at the key points above. TaskQueue.take is a non timeout blocking. When the number of threads is less than the core thread Size, take the method to get out of the queue.

Looking up, at this time, the take block is in ConditionObject.await, which is internally blocked by LockSupport.park. It can be seen from the source code that the Condition is notEmpty, that is, the current queue is empty. The core threads wait for the notEmpty Condition to be met, and then compete for the takeLock (ReentrantLock unfair lock, implemented based on AQS exclusive mode) in the task queue. After obtaining the lock, take the task out of the queue and execute it.

Sample two

This happens after the business peak. Under normal circumstances, some idle threads should be released to enter a balance point, or when there is little business, such as in the middle of the night.

"http-nio-8081-exec-162" #177 daemon prio=1 os_prio=-2 tid=0x000000001bbec000 nid=0x2580 waiting on condition [0x00000000272df000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000008144aab0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:89)
	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:1)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:58)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- None

In a word, LinkedBlockingQueue.poll we know that when the number of threads is greater than coreSize, the task queue will use poll instead of take when it goes out of the queue. Next, ConditionObject.awaitNanos, let's take a look at the source code of TaskQueue and LinkedBlockingQueue. We will also know that this is still blocked on the notEmpty condition, but this time it is timeout blocking, In other words, if the maximum idle time of keepalive set by the thread pool is not used, the thread will eventually end. As for why such threads go through a slow release process for a long time, we are Idle determination and release of worker thread of tomcat - Jianshu (jianshu.com) A more detailed analysis is made in.

Sample three

373 http-nio-8081-exec threads in WAITING:

"http-nio-8081-exec-385" #459 daemon prio=5 os_prio=0 tid=0x00007f4b6819a800 nid=0x674e waiting on condition [0x00007f4b2e977000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c8ddb088> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2083)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)

A thread in RUNNABLE:

"http-nio-8081-exec-386" #461 daemon prio=5 os_prio=0 tid=0x00007f4b74511000 nid=0x683f runnable [0x00007f4b2e773000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)

Let's focus on sample 3 this time. At that time, the author recalls that it was also a business peak. This springboot tomcat should be a "fake death" state in which the process was still in progress and the port was connected, but the request basically did not respond.

Sample 3 and sample 2 are different from awaitNanos. awaitNanos(long nanosTimeout) method takes the lock when polling (timeout) in the LinkedBlockingQueue, and then waits. On the Condition of notEmpty, call notEmpty.awaitNanos(nanos). notEmpty is the Condition on the reentrant lock takelock of the queue.

poll(timeout) method of LinkedBlockingQueue:

public E poll(long timeout, TimeUnit unit) throws InterruptedException {
    final E x;
    final int c;
    long nanos = unit.toNanos(timeout);
    final AtomicInteger count = this.count;
    final ReentrantLock takeLock = this.takeLock;
    takeLock.lockInterruptibly();	//Take the lock first
    try {
        while (count.get() == 0) { //Queue is empty
            if (nanos <= 0L)
                return null;  //If the queue is still empty after waiting for the timeout time, null is returned
            nanos = notEmpty.awaitNanos(nanos);	//Blocking waits for a non null condition and releases the lock
        }
        x = dequeue();  //The queue has a task. Get out of the queue
        c = count.getAndDecrement();
        if (c > 1)
            notEmpty.signal();	//If the queue is not empty after a task is queued, other threads waiting on notEmpty will scramble for the lock and continue to execute from await.
    } finally {
        takeLock.unlock();  //Release lock
    }
    if (c == capacity)
        signalNotFull();
    return x;
}

Let's go inside awaitNanos:

public final long awaitNanos(long nanosTimeout)
        throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    // We don't check for nanosTimeout <= 0L here, to allow
    // awaitNanos(0) as a way to "yield the lock".
    final long deadline = System.nanoTime() + nanosTimeout;
    long initialNanos = nanosTimeout;
    Node node = addConditionWaiter();	//new Node(Node.CONDITION) and add it to the end of the condition queue
    int savedState = fullyRelease(node);//Release the exclusive lock and unpark wake up the subsequent nodes to grab the lock
    int interruptMode = 0;
	
	//In while, it indicates that it still needs to be in the condition queue
	//For example, if a task comes to the queue, it will signal notEmpty, so that the nodes in the Condition queue will transfer forsignal to the synchronization queue
	//That is, the essence of Condition.await is to throw itself into the condition queue, and then wait for the signal to wake up. If it is awakened, it will enter the synchronization queue again; At the same time, the lock is released so that subsequent nodes in the synchronization queue can grab the lock to see whether the conditions are met.
    while (!isOnSyncQueue(node)) {//If the node is in the queue and is not the first one and waitStatus == Node.CONDITION, then true is returned; 
        if (nanosTimeout <= 0L) {
            transferAfterCancelledWait(node);
            break;
        }
        if (nanosTimeout > SPIN_FOR_TIMEOUT_THRESHOLD)
            LockSupport.parkNanos(this, nanosTimeout);   //Sample 2 is still in the condition queue, waiting
        if ((interruptMode = checkInterruptWhileWaiting(node)) != 0)
            break;
        nanosTimeout = deadline - System.nanoTime();
    }
	
	//This indicates that the lock has been (satisfied) in the synchronization queue waiting to be reacquired
    if (acquireQueued(node, savedState) && interruptMode != THROW_IE) //Sample three
        interruptMode = REINTERRUPT;
    if (node.nextWaiter != null)
        unlinkCancelledWaiters();
    if (interruptMode != 0)
        reportInterruptAfterWait(interruptMode);
    long remaining = deadline - System.nanoTime(); // avoid overflow
    return (remaining <= initialNanos) ? remaining : Long.MIN_VALUE;
}

Next, let's jump to see the queue entry logic of the task queue, starting with the offer method of LinkedBlockingQueue:

public boolean offer(E e) {
    if (e == null) throw new NullPointerException();
    final AtomicInteger count = this.count;
    if (count.get() == capacity)
        return false;	//When the maximum queue capacity is reached, false is returned
    int c = -1;
    Node<E> node = new Node<E>(e);
    final ReentrantLock putLock = this.putLock;
    putLock.lock();	//Take the lock first
    try {
        if (count.get() < capacity) {
            enqueue(node);	//Join the team
            c = count.getAndIncrement();
            if (c + 1 < capacity)
                notFull.signal(); //When the queue is not full, signal wakes up the thread waiting for notFull condition
        }
    } finally {
        putLock.unlock();
    }
    if (c == 0)	//c==0 indicates that the queue was empty before, but now it is in the queue, so it is not empty. It is only at this time that the signal is not empty. If the queue is not empty, the signal is not empty
        signalNotEmpty(); //If the queue is not empty, signal wakes up the thread waiting for the notEmpty condition
    return c >= 0;
}

The above signalNotEmpty() will eventually be executed in the transferForSignal(Node node) method, which is the operation of transferring the thread (corresponding Node) from the condition queue to the synchronization queue.

To analyze these two methods, we need to understand the principle of AQS. After the count.get() == 0 thread in the poll comes in, it enters the conditional queue, then releases the takeLock, drills into the while (!isOnSyncQueue(node)) loop, and constantly judges whether it is in the synchronization queue before timeout, that is, when the task queue is in the queue, it is not empty, The threads in the conditional queue will be moved to the synchronization queue to get the takeLock again and work.

It can be seen from the thread stack of sample 3 that the thread jumps out of the while loop and comes to if (acquirequeueueueueueueueued (node, savedstate) & & interruptmode! = thread_ie), that is, there are at least 300 threads in the queue, and these threads originally waiting in the event queue are transferred to the synchronization queue to wait for takeLock to work! Let's continue and enter acquirequeueueueueueueueued:

final boolean acquireQueued(final Node node, int arg) {
    boolean interrupted = false;
    try {
        for (;;) {
            final Node p = node.predecessor();
            if (p == head && tryAcquire(arg)) {
                setHead(node);
                p.next = null; // help GC
                return interrupted;
            }
			//Either it is not the head node, or tryAcquire failed to obtain the lock;
            if (shouldParkAfterFailedAcquire(p, node)) //If the status of the front node is SIGNAL, then park
                interrupted |= parkAndCheckInterrupt();  //Park suspends the current thread and waits for the front node to unpark; The 400 threads are park here!
        } //The condition for jumping out of the loop is that the front node is the head node and the lock is robbed. For an unfair lock, the thread currently holding the lock will also grab the lock with subsequent nodes.
    } catch (Throwable t) {
        cancelAcquire(node);
        if (interrupted)
            selfInterrupt();
        throw t;
    }
}

At this point, we have a general idea of the process: if the notEmpty condition is not satisfied, block the takeLock.condition first. After it is satisfied, try to acquire the lock first because it is an unfair lock. If it is successful, go to the queue to execute the task. If it is not obtained, go into the synchronization queue to get the lock, and wake up the rest of the park s except the first one.
From the thread stack, these 300 + threads belong to the scenario of waiting for the preamble node to wake up in the AQS synchronization queue and then getting the lock.

Is this a thread pool starvation problem? Or is it a performance bottleneck?

The author envisages the following two scenarios where the above situation may occur:

1. In the unfair lock mode, there are 300 + threads, and a thread 386 grabs the lock every time for some reason. tomcat task queue is a subclass of jdk LinkedBlockingQueue. The out of queue lock takeLock is a non fair lock. No. 386 is the thread currently holding the lock. After polling, takeLock.unlock() releases the lock and unpark its successor nodes in the queue. Then it executes the task. Then you can use poll - > tryacquire with subsequent nodes to grab locks. Then it grabbed it.

Miscellaneous (jvm thread stack analysis, socket read0 blocking caused by jdkbug, jvm tuning, etc.) - fat rabbit loves beans and livestock - blog Garden (cnblogs.com)

Possible situation: thread 386 is always in the RUNNABLE state, and the system will give priority to this thread for scheduling. Other threads in the synchronization queue can't get the time slice and can't grab the lock. Then, the 386 thread remains in this state because of the above jdk bug. So the whole process pretended to be dead.

2. There are many tasks in the task queue. Then, the threads in the previous condition queue meet the notEmpty condition to grab the lock and take the task. Then, they wake up one after another to take the lock and take the task to execute the requested business logic according to the sequence in the synchronization queue. At the time of jstack, these park threads in the synchronization queue that have not been awakened are collected into the thread stack.
When a steady stream of task queues are requested, the worker thread will always meet the notEmpty Condition and enter the AQS synchronization queue, queue and park. In this way, as long as the speed of entering the queue is greater than the speed of leaving the queue (AQS synchronization queue here), 300 + threads will queue up to obtain the ReentrantLock$NonfairSync lock of the TaskQueue.

This is actually the so-called performance bottleneck. There is a large backlog in the task queue (so tomcat does not respond to new requests, pretends to be dead, actually enters the task queue, and the queue is very long). The subsequent worker thread AQS queues up to get takeLock, and then executes the tasks in the queue one by one. The queue is too long.

If 2 this is the case, it should not be from awaitNanos. If the worker thread is busy with processing all the time, it should be count. Get()= 0, and then go straight out of the team. Even park should go in from takeLock.lockInterruptibly() and go acquirequeueueued.

Therefore, from the perspective of the thread stack, it is a thread that is idle, moves from the condition queue to the synchronization queue, then 300 + most threads that do not grab the lock enter the park, and then the 386 threads that grab the lock occupy the CPU time slice all the time (although the author does not know the underlying thread scheduling principle to prove this thread hunger), Other threads have no chance to grab the lock and run what happens in such an instant.

Therefore, at present, the author tends to be a jdk bug that causes one thread to occupy cpu for a long time and a large number of other threads to starve and park blocking.

Tags: Tomcat Multithreading

Posted on Sun, 07 Nov 2021 17:07:13 -0500 by jandrews