1, Foreword
Recently, the reconstructed Dubbo service thread pool was optimized, and the worker thread used the CachedThreadPool thread strategy. However, after going online, the thread pool increased all the way, which almost led to an online accident.
Therefore, this article reveals the mystery of thread pool.
2, Introduction to Dubbo thread pool
Source code of CachedThreadPool in Dubbo
package org.apache.dubbo.common.threadpool.support.cached; import org.apache.dubbo.common.URL; import org.apache.dubbo.common.threadlocal.NamedInternalThreadFactory; import org.apache.dubbo.common.threadpool.ThreadPool; import org.apache.dubbo.common.threadpool.support.AbortPolicyWithReport; import java.util.concurrent.Executor; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.SynchronousQueue; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import static org.apache.dubbo.common.constants.CommonConstants.ALIVE_KEY; import static org.apache.dubbo.common.constants.CommonConstants.CORE_THREADS_KEY; import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_ALIVE; import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_CORE_THREADS; import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_QUEUES; import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_THREAD_NAME; import static org.apache.dubbo.common.constants.CommonConstants.QUEUES_KEY; import static org.apache.dubbo.common.constants.CommonConstants.THREADS_KEY; import static org.apache.dubbo.common.constants.CommonConstants.THREAD_NAME_KEY; /** * This thread pool is self-tuned. Thread will be recycled after idle for one minute, and new thread will be created for * the upcoming request. * * @see java.util.concurrent.Executors#newCachedThreadPool() */ public class CachedThreadPool implements ThreadPool { @Override public Executor getExecutor(URL url) { //1 get the thread name prefix. If not, the default is Dubbo String name = url.getParameter(THREAD_NAME_KEY, DEFAULT_THREAD_NAME); //2. Get the number and size of core threads in the thread pool int cores = url.getParameter(CORE_THREADS_KEY, DEFAULT_CORE_THREADS); //3. Get the maximum number of threads in the thread pool. The default is the maximum integer value int threads = url.getParameter(THREADS_KEY, Integer.MAX_VALUE); //4. Get the thread pool queue size int queues = url.getParameter(QUEUES_KEY, DEFAULT_QUEUES); //5. Get how long the thread pool is recycled, in milliseconds int alive = url.getParameter(ALIVE_KEY, DEFAULT_ALIVE); //6. Use ThreadPoolExecutor in JUC package to create thread pool return new ThreadPoolExecutor(cores, threads, alive, TimeUnit.MILLISECONDS, queues == 0 ? new SynchronousQueue<Runnable>() : (queues < 0 ? new LinkedBlockingQueue<Runnable>() : new LinkedBlockingQueue<Runnable>(queues)), new NamedInternalThreadFactory(name, true), new AbortPolicyWithReport(name, url)); } }
It can be seen that Dubbo essentially uses the ThreadPoolExecutor in the JUC package to create a thread pool. The source code is as follows
public ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory, RejectedExecutionHandler handler) { if (corePoolSize < 0 || maximumPoolSize <= 0 || maximumPoolSize < corePoolSize || keepAliveTime < 0) throw new IllegalArgumentException(); if (workQueue == null || threadFactory == null || handler == null) throw new NullPointerException(); this.acc = System.getSecurityManager() == null ? null : AccessController.getContext(); this.corePoolSize = corePoolSize; this.maximumPoolSize = maximumPoolSize; this.workQueue = workQueue; this.keepAliveTime = unit.toNanos(keepAliveTime); this.threadFactory = threadFactory; this.handler = handler; }
The general flow chart is as follows:
1. When the thread pool is smaller than the corePoolSize, the new task will create a new thread, even if there are idle threads in the thread pool.
2. When the thread pool reaches the corePoolSize, the newly submitted task will be put into the workQueue and wait for the thread pool task to be scheduled for execution.
3. When the workQueue is full and maximumpoolsize > corepoolsize, a new task will create a new thread to execute the task.
4. When the number of submitted tasks exceeds maximumPoolSize, the new submitted task is processed by RejectedExecutionHandler.
5. When the thread pool exceeds the corePoolSize and the idle time reaches keepAliveTime, the idle thread is closed.
In addition, when allowCoreThreadTimeOut(true) is set, the corePoolSize thread in the thread pool will be closed when the idle time reaches keepAliveTime.
RejectedExecutionHandler provides four rejection policies by default
1. AbortPolicy policy: this policy will directly throw exceptions to prevent the system from working normally;
2. CallerRunsPolicy policy: if the number of threads in the thread pool reaches the upper limit, the policy will put the tasks in the task queue into the caller thread to run;
3. DiscardOledestPolicy policy: this policy will discard the oldest task in the task queue, that is, the task first added to the current task queue and to be executed immediately, and try to submit again.
4. DiscardPolicy policy: this policy will silently discard tasks that cannot be processed and will not be processed. Of course, with this strategy, the loss of tasks should be allowed in the business scenario;
It is worth noting that the reject policy AbortPolicyWithReport in Dubbo actually inherits the ThreadPoolExecutor.AbortPolicy policy policy, mainly printing more key information and stack information.
3, About thread pool configuration
Thread pool configuration is very important, but it is often easy to ignore. If the configuration is unreasonable or the thread pool is reused a few times, it will still be created and closed frequently.
- How to reasonably calculate the number of core threads?
We can calculate the average response time of the interface and the QPS that the service needs to support. For example, the average RT of our interface is 0.005s, so one working thread can process 200 tasks. If a single machine needs to support QPS 3W, we can calculate the number of core threads that need to be 150
Formula: QPS ➗ (1 ➗ Average RT) = QPS * RT
- Easily overlooked @ Async annotation
Using @ Async annotation in Spring, the default thread pool is SimpleAsyncTaskExecutor. By default, if it is not configured, it is equal to not using thread pool, because it will recreate a new thread every time and will not be reused.
So remember, if you use @ Async, you must configure it
@EnableAsync @Configuration @Slf4j public class ThreadPoolConfig { private static final int corePoolSize = 100; // Number of core threads (default threads) private static final int maxPoolSize = 400; // Maximum number of threads private static final int keepAliveTime = 60; // Allowed thread idle time (unit: default is seconds) private static final int queueCapacity = 0; // Number of buffer queues private static final String threadNamePrefix = "Async-Service-"; // Thread pool name prefix @Bean("taskExecutor") public ThreadPoolTaskExecutor getAsyncExecutor(){ ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor(); executor.setCorePoolSize(corePoolSize); executor.setMaxPoolSize(maxPoolSize); executor.setQueueCapacity(queueCapacity); executor.setKeepAliveSeconds(keepAliveTime); executor.setThreadNamePrefix(threadNamePrefix); executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy()); // initialization executor.initialize(); return executor; } }
4, How is thread pool surge caused?
The working thread of Dubbo server is configured as follows:
corethreads: 150 threads: 800 threadpool: cached queues: 10
It seems reasonable to set a small number of queues to prevent short-term thread pool shortage caused by jitter. From the above, it seems that there is no problem. In terms of daytime traffic, the number of core threads is completely sufficient (RT < 5ms, QPS < 1W). However, after going online, the thread pool soared all the way, reaching the maximum threshold of 800. The alarm information is as follows:
org.apache.dubbo.remoting.RemotingException("Server side(IP,20880) thread pool is exhausted, detail msg:Thread pool is EXHAUSTED! Thread Name: DubboServerHandler-IP:20880, Pool Size: 800 (active: 4, core: 300, max: 800, largest: 800), Task: 4101304 (completed: 4101301), Executor status:(isShutdown:false, isTerminated:false, isTerminating:false), in dubbo://IP:20880!"
It can be seen from the above that when the maximum number of threads is reached, the number of active threads is very small, which is completely unexpected.
5, Scene simulation
By source code
queues == 0 ? new SynchronousQueue<Runnable>() : (queues < 0 ? new LinkedBlockingQueue<Runnable>() : new LinkedBlockingQueue<Runnable>(queues))
It can be seen that:
When the queue element is 0, the blocking queue uses synchronous queue; When the queue element is less than 0, the unbounded blocking queue LinkedBlockingQueue is used; When the queue element is greater than 0, the bounded queue LinkedBlockingQueue is used.
There will certainly be no problem with the number of core threads and the maximum number of threads, so I guess there is a problem with the queue number setting.
In order to reproduce, I wrote a simple code simulation
package com.bytearch.fast.cloud; import java.util.concurrent.*; public class TestThreadPool { public final static int queueSize = 10; public static void main(String[] args) { ExecutorService executorService = getThreadPool(queueSize); for (int i = 0; i < 100000; i++) { int finalI = i; try { executorService.execute(new Runnable() { @Override public void run() { doSomething(finalI); } }); } catch (Exception e) { System.out.println("emsg:" + e.getMessage()); } if (i % 20 == 0) { try { Thread.sleep(1); } catch (InterruptedException e) { e.printStackTrace(); } } } System.out.println("all done!"); try { Thread.sleep(1000000); } catch (InterruptedException e) { e.printStackTrace(); } } public static ExecutorService getThreadPool(int queues) { int cores = 150; int threads = 800; int alive = 60 * 1000; return new ThreadPoolExecutor(cores, threads, alive, TimeUnit.MILLISECONDS, queues == 0 ? new SynchronousQueue<Runnable>() : (queues < 0 ? new LinkedBlockingQueue<Runnable>() : new LinkedBlockingQueue<Runnable>(queues))); } public static void doSomething(final int i) { try { Thread.sleep(5); System.out.println("thread:" + Thread.currentThread().getName() + ", active:" + Thread.activeCount() + ", do:" + i); } catch (InterruptedException e) { e.printStackTrace(); } } }
Simulation results:
queueSize value 0 has no exception, 10 has a reject exception, and 100 has no exception
The exceptions are as follows:
emsg:Task com.bytearch.fast.cloud.TestThreadPool$1@733aa9d8 rejected from java.util.concurrent.ThreadPoolExecutor@6615435c[Running, pool size = 800, active threads = 32, queued tasks = 9, completed tasks = 89755] all done!
Obviously, when the concurrency is high, using LinkedBlockingQueue bounded queue and setting the number of queues is relatively small, the thread pool will have problems.
After changing the queues configuration to 0, go online and return to normal.
As for the deeper reasons, students who are interested in this field can analyze in depth, and can also communicate with me in the official account.
6, Summary
This time, I shared the basic principle of thread pool ThreadPoolExecutor, the calculation method of thread pool configuration, and the easily ignored problem of using annotation @ Async configuration.
In addition, it introduces the strange problem we encounter when using thread pool, a parameter problem, which may lead to unexpected consequences.