Colleagues use Redis cards indiscriminately. I'm really drunk

Source: https://my.oschina.net/xiaomu0082/blog/2990388 First, let's talk about the problem: the API of Intranet sandbox ...

Source: https://my.oschina.net/xiaomu0082/blog/2990388

First, let's talk about the problem: the API of Intranet sandbox environment is stuck for one week, and all APIs are unresponsive

At the beginning, when the test complained about the slow response of the environment, we restarted the application and the application returned to normal, so we didn't deal with it.

But later, the frequency of problems became more and more frequent, and more and more colleagues began to complain, so they felt that there might be problems in the code and began to check.

Firstly, it is found that the developed local ide has no problems. When the application is stuck, the database and redis are normal, and there are no special error logs. I began to suspect that it was a machine problem in the sandbox environment (the test environment itself is fragile!!)

So ssh goes to the server and executes the following command

top

At this time, I found that the machine was still normal, so I planned to look at the jvm stack information

First, let's look at the threads that consume resources

Execute top -H -p 12798

Find the first three relatively resource consuming threads

jstack viewing heap memory

jstack 12798 |grep 12799 Hexadecimal 31 ff

I didn't see any problem. I looked up and down 10 lines, so I implemented it

You can see that some threads are in lock state. However, there are no business-related codes, which are ignored. There is no clue at this time. Think about it. Decided to give up the stuck machine

In order to protect the accident scene, dump all heap memory of the problem process, and then restart the test environment application in the debug mode. It is planned to debug the problem machine remotely when the problem reappears

The next day, the problem reappeared, so the operation and maintenance department was informed to forward nginx, remove the problem application, and debug tomcat remotely.

I found an interface at random. The breakpoint was at the interface entrance. The tragedy began and nothing happened! The API waited for a service response and did not enter the breakpoint.

At this time, I was a little confused and calmed down for a while. I set a breakpoint in the aop before the entrance. I debug ged again. This time, I entered the breakpoint. f8 N times later, I found that the card owner was running the redis command.

In addition, the Redis series interview questions and answers have been sorted out. Wechat searches the Java technology stack and sends them in the background: the interview can be read online.

Continue to follow, and finally find a problem at a place in jedis:

/** * Returns a Jedis instance to be used as a Redis connection. The instance can be newly created or retrieved from a * pool. * * @return Jedis instance ready for wrapping into a {@link RedisConnection}. */ protected Jedis fetchJedisConnector() { try { if (usePool && pool != null) { return pool.getResource(); } Jedis jedis = new Jedis(getShardInfo()); // force initialization (see Jedis issue #82) jedis.connect(); return jedis; } catch (Exception ex) { throw new RedisConnectionFailureException("Cannot get Jedis connection", ex); } }

After the above pool.getResource(), the thread starts to wait

public T getResource() { try { return internalPool.borrowObject(); } catch (Exception e) { throw new JedisConnectionException("Could not get a resource from the pool", e); } }

return internalPool.borrowObject(); This code should be a rental code, followed by

public T borrowObject(long borrowMaxWaitMillis) throws Exception { this.assertOpen(); AbandonedConfig ac = this.abandonedConfig; if (ac != null && ac.getRemoveAbandonedOnBorrow() && this.getNumIdle() < 2 && this.getNumActive() > this.getMaxTotal() - 3) { this.removeAbandoned(ac); } PooledObject p = null; boolean blockWhenExhausted = this.getBlockWhenExhausted(); long waitTime = 0L; while(p == null) { boolean create = false; if (blockWhenExhausted) { p = (PooledObject)this.idleObjects.pollFirst(); if (p == null) { create = true; p = this.create(); } if (p == null) { if (borrowMaxWaitMillis < 0L) { p = (PooledObject)this.idleObjects.takeFirst(); } else { waitTime = System.currentTimeMillis(); p = (PooledObject)this.idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS); waitTime = System.currentTimeMillis() - waitTime; } } if (p == null) { throw new NoSuchElementException("Timeout waiting for idle object"); }

There is a piece of code

if (p == null) { if (borrowMaxWaitMillis < 0L) { p = (PooledObject)this.idleObjects.takeFirst(); } else { waitTime = System.currentTimeMillis(); p = (PooledObject)this.idleObjects.pollFirst(borrowMaxWaitMillis, TimeUnit.MILLISECONDS); waitTime = System.currentTimeMillis() - waitTime; } }

Borrowmaxwaitmillis < 0 will be executed all the time, and then cycle all the time. It is suspected that this value is not configured

Find the redis pool configuration and find that MaxWaitMillis is not configured. The else code is also an Exception after configuration, which can not solve the problem

Continue with F8

public E takeFirst() throws InterruptedException { this.lock.lock(); Object var2; try { Object x; while((x = this.unlinkFirst()) == null) { this.notEmpty.await(); } var2 = x; } finally { this.lock.unlock(); } return var2; }

When I found the word lock here, I began to suspect that all request APIs were blocked

So install Arthas on the ssh server again (Arthas is an open source Java diagnostic tool of Alibaba)

Execute thread command

It is found that a large number of http NiO threads are waiting. http nio-8083-exec - this thread is actually the tomcat thread that comes out of the http request

Randomly find a thread to view the heap memory

thread -428

It can be confirmed that the api keeps turning around, which is caused by the code for redis to obtain the connection,

Reading this memory code, all threads are waiting for @53e5504e this object to release the lock. Therefore, jstack searched 53e5504e globally and did not find the thread where the object was located.

Since then. The cause of the problem can be determined to be the problem of redis connection acquisition. However, the reason why the connection cannot be obtained is uncertain

Execute the thread - B of arthas again (thread - B, find the thread currently blocking other threads)

no result. This is different from what I thought. I should be able to find a blocking thread. So I looked at the document of this command and found the following sentence

Well, we happen to be the latter....

Organize your thoughts again. This time, modify the redis pool configuration, set the connection acquisition timeout to 2s, and then observe what the application did when the problem reappears again.

Add a configuration

JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory(); ....... JedisPoolConfig config = new JedisPoolConfig(); config.setMaxWaitMillis(2000); ....... jedisConnectionFactory.afterPropertiesSet();

Restart the service and wait....

Another day, again

ssh server, check tomcat accesslog and find that a large number of api requests appear 500,

org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource fr om the pool at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:140) at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:229) at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:57) at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:128) at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:91) at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:78) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:177) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:152) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:85) at org.springframework.data.redis.core.DefaultHashOperations.get(DefaultHashOperations.java:48)

Find the source, 500 places for the first time,

The following codes were found

....... Cursor c = stringRedisTemplate.getConnectionFactory().getConnection().scan(options); while (c.hasNext()) { .....,, }

After analyzing the code, stringRedisTemplate.getConnectionFactory().getConnection() obtains the redisConnection in the pool, and there is no subsequent operation

In other words, the links in the redis connection pool are not released or returned to the link pool after being leased. Although the business has been processed and the redisConnection has been idle, the status of the redisConnection in the pool has not returned to the idle state

Normal should be

Since then, the problem has been found.

Summary: spring stringRedisTemplate encapsulates the normal redis operations, but does not support Commands such as Scan SetNx. In this case, you need to get jedis Connection for some special Commands

use

stringRedisTemplate.getConnectionFactory().getConnection()

Is not recommended

We can use

stringRedisTemplate.execute(new RedisCallback() { @Override public Cursor doInRedis(RedisConnection connection) throws DataAccessException { return connection.scan(options); } });

To execute, or after using the connection, use

RedisConnectionUtils.releaseConnection(conn, factory);

To release connection

At the same time, it is not recommended to use the keys command in redis. The redis pool should be configured reasonably. Otherwise, there is no error log and no error report. It is very difficult to locate the problem.

Recent hot article recommendations:

1.1000 + Java interview questions and answers (2021 latest version)

2.Stop playing if/ else on the full screen. Try the strategy mode. It's really fragrant!!

3.what the fuck! What is the new syntax of xx ≠ null in Java?

4.Spring Boot 2.5 heavy release, dark mode is too explosive!

5.Java development manual (Songshan version) is the latest release. Download it quickly!

Feel good, don't forget to like + forward!

25 October 2021, 07:13 | Views: 1375

Add new comment

For adding a comment, please log in
or create account

0 comments