Source: https://my.oschina.net/xiaomu0082/blog/2990388
First, let's talk about the problem: the API of Intranet sandbox environment is stuck for one week, and all APIs are unresponsive
At the beginning, when the test complained about the slow response of the environment, we restarted the application and the application returned to normal, so we didn't deal with it.
But later, the frequency of problems became more and more frequent, and more and more colleagues began to complain, so they felt that there might be problems in the code and began to check.
Firstly, it is found that the developed local ide has no problems. When the application is stuck, the database and redis are normal, and there are no special error logs. I began to suspect that it was a machine problem in the sandbox environment (the test environment itself is fragile!!)
So ssh goes to the server and executes the following command
top
At this time, I found that the machine was still normal, so I planned to look at the jvm stack information
First, let's look at the threads that consume resources
Execute top -H -p 12798
Find the first three relatively resource consuming threads
jstack viewing heap memory
jstack 12798 |grep 12799 Hexadecimal 31 ff
I didn't see any problem. I looked up and down 10 lines, so I implemented it
You can see that some threads are in lock state. However, there are no business-related codes, which are ignored. There is no clue at this time. Think about it. Decided to give up the stuck machine
In order to protect the accident scene, dump all heap memory of the problem process, and then restart the test environment application in the debug mode. It is planned to debug the problem machine remotely when the problem reappears
The next day, the problem reappeared, so the operation and maintenance department was informed to forward nginx, remove the problem application, and debug tomcat remotely.
I found an interface at random. The breakpoint was at the interface entrance. The tragedy began and nothing happened! The API waited for a service response and did not enter the breakpoint.
At this time, I was a little confused and calmed down for a while. I set a breakpoint in the aop before the entrance. I debug ged again. This time, I entered the breakpoint. f8 N times later, I found that the card owner was running the redis command.
In addition, the Redis series interview questions and answers have been sorted out. Wechat searches the Java technology stack and sends them in the background: the interview can be read online.
Continue to follow, and finally find a problem at a place in jedis:
/** * Returns a Jedis instance to be used as a Redis connection. The instance can be newly created or retrieved from a * pool. * * @return Jedis instance ready for wrapping into a {@link RedisConnection}. */ protected Jedis fetchJedisConnector() { try { if (usePool && pool != null) { return pool.getResource(); } Jedis jedis = new Jedis(getShardInfo()); // force initialization (see Jedis issue #82) jedis.connect(); return jedis; } catch (Exception ex) { throw new RedisConnectionFailureException("Cannot get Jedis connection", ex); } }
After the above pool.getResource(), the thread starts to wait
public T getResource() { try { return internalPool.borrowObject(); } catch (Exception e) { throw new JedisConnectionException("Could not get a resource from the pool", e); } }
return internalPool.borrowObject(); This code should be a rental code, followed by
public T borrowObject(long borrowMaxWaitMillis) throws Exception { this.assertOpen(); AbandonedConfig ac = this.abandonedConfig; if (ac != null && ac.getRemoveAbandonedOnBorrow() && this.getNumIdle() < 2 && this.getNumActive() > this.getMaxTotal() - 3) { this.removeAbandoned(ac); } PooledObject p = null; boolean blockWhenExhausted = this.getBlockWhenExhausted(); long waitTime = 0L; while(p == null) { boolean create = false; if (blockWhenExhausted) { p = (PooledObject)this.idleObjects.pollFirst(); if (p == null) { create = true; p = this.create(); } if (p == null) { if (borrowMaxWaitMillis < 0L) { p = (PooledObject)this.idleObjects.takeFirst(); } else { waitTime = System.currentTimeMillis(); p = (PooledObject)this.idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS); waitTime = System.currentTimeMillis() - waitTime; } } if (p == null) { throw new NoSuchElementException("Timeout waiting for idle object"); }
There is a piece of code
if (p == null) { if (borrowMaxWaitMillis < 0L) { p = (PooledObject)this.idleObjects.takeFirst(); } else { waitTime = System.currentTimeMillis(); p = (PooledObject)this.idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS); waitTime = System.currentTimeMillis() - waitTime; } }
Borrowmaxwaitmillis < 0 will be executed all the time, and then cycle all the time. It is suspected that this value is not configured
Find the redis pool configuration and find that MaxWaitMillis is not configured. The else code is also an Exception after configuration, which can not solve the problem
Continue with F8
public E takeFirst() throws InterruptedException { this.lock.lock(); Object var2; try { Object x; while((x = this.unlinkFirst()) == null) { this.notEmpty.await(); } var2 = x; } finally { this.lock.unlock(); } return var2; }
When I found the word lock here, I began to suspect that all request APIs were blocked
So install Arthas on the ssh server again (Arthas is an open source Java diagnostic tool of Alibaba)
Execute thread command
It is found that a large number of http NiO threads are waiting. http nio-8083-exec - this thread is actually the tomcat thread that comes out of the http request
Randomly find a thread to view the heap memory
thread -428
It can be confirmed that the api keeps turning around, which is caused by the code for redis to obtain the connection,
Reading this memory code, all threads are waiting for @53e5504e this object to release the lock. Therefore, jstack searched 53e5504e globally and did not find the thread where the object was located.
Since then. The cause of the problem can be determined to be the problem of redis connection acquisition. However, the reason why the connection cannot be obtained is uncertain
Execute the thread - B of arthas again (thread - B, find the thread currently blocking other threads)
no result. This is different from what I thought. I should be able to find a blocking thread. So I looked at the document of this command and found the following sentence
Well, we happen to be the latter....
Organize your thoughts again. This time, modify the redis pool configuration, set the connection acquisition timeout to 2s, and then observe what the application did when the problem reappears again.
Add a configuration
JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory(); ....... JedisPoolConfig config = new JedisPoolConfig(); config.setMaxWaitMillis(2000); ....... jedisConnectionFactory.afterPropertiesSet();
Restart the service and wait....
Another day, again
ssh server, check tomcat accesslog and find that a large number of api requests appear 500,
org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource fr om the pool at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:140) at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:229) at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:57) at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:128) at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:91) at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:78) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:177) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:152) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:85) at org.springframework.data.redis.core.DefaultHashOperations.get(DefaultHashOperations.java:48)
Find the source, 500 places for the first time,
The following codes were found
....... Cursor c = stringRedisTemplate.getConnectionFactory().getConnection().scan(options); while (c.hasNext()) { .....,, }
After analyzing the code, stringRedisTemplate.getConnectionFactory().getConnection() obtains the redisConnection in the pool, and there is no subsequent operation
In other words, the links in the redis connection pool are not released or returned to the link pool after being leased. Although the business has been processed and the redisConnection has been idle, the status of the redisConnection in the pool has not returned to the idle state
Normal should be
Since then, the problem has been found.
Summary: spring stringRedisTemplate encapsulates the normal redis operations, but does not support Commands such as Scan SetNx. In this case, you need to get jedis Connection for some special Commands
use
stringRedisTemplate.getConnectionFactory().getConnection()
Is not recommended
We can use
stringRedisTemplate.execute(new RedisCallback() { @Override public Cursor doInRedis(RedisConnection connection) throws DataAccessException { return connection.scan(options); } });
To execute, or after using the connection, use
RedisConnectionUtils.releaseConnection(conn, factory);
To release connection
At the same time, it is not recommended to use the keys command in redis. The redis pool should be configured reasonably. Otherwise, there is no error log and no error report. It is very difficult to locate the problem.
Recent hot article recommendations:
1.1000 + Java interview questions and answers (2021 latest version)
2.Stop playing if/ else on the full screen. Try the strategy mode. It's really fragrant!!
3.what the fuck! What is the new syntax of xx ≠ null in Java?
4.Spring Boot 2.5 heavy release, dark mode is too explosive!
5.Java development manual (Songshan version) is the latest release. Download it quickly!
Feel good, don't forget to like + forward!