One article: cache consistency strategy, avalanche, penetration and other issues

1, Cache principle

The first layer optimization scheme to be considered in the high concurrency scenario is to increase the cache. In particular, Redis can copy a copy of the original data in the database into memory, which can reduce the reading operation of the database, reduce the pressure on the database, and speed up the response speed of the system, but it will also bring other problems, For example, you need to consider data consistency and prevent possible cache breakdown, penetration and avalanche problems.

1. Implementation steps

First, query whether there is data to be in the cache. If so, directly return the data in the cache. If there is no data in the cache, query the database, update the obtained data to the cache and then return it. If there is no data in the database, you can return null.

Considering the data consistency, the code logic at the cache is relatively standardized. First, Redis is taken, hit is returned, and missed is queried and synchronized through the database.

public Result query(String id) {
    Result result = null;
    //1. Fetch data from Redis cache
    result = (Result)redisTemplate.opsForValue().get(id);
    if (null != result) {
        System.out.println("Get data from cache");
        return result;
    }
    //2. Query through DB. If yes, update redis synchronously; otherwise, return null
    System.out.println("Get data from database");
    result = Dao.query(id);
    if (null != result) {
        redisTemplate.opsForValue().set(id,result);
        redisTemplate.expire(id,20000, TimeUnit.MILLISECONDS);
    }
    return result;
}

For other operations such as adding, deleting and updating, you can directly empty the cache value under the Key before performing DB operation. In this way, the logic is clear and simple, the maintenance complexity will be reduced, and the cost is to query again.

public void update(Entity entity) {
    redisTemplate.delete(entity.getId());
    Dao.update(entity);
    return entity;
}

public Entity add(Entity entity) {
    redisTemplate.delete(entity.getId());
    Dao.insert(entity);
    return entity;
}

2. Cache update strategy

Generally, the scenarios suitable for caching are: frequent access, more reading scenarios and less writing scenarios, and low requirements for data consistency. If the above three conditions are not met, it is of little significance to maintain a set of cached data. In practical applications, it is usually necessary to select an appropriate caching scheme for business scenarios. The following four caching strategies are given, from top to bottom, in the order of strong consistency to weak consistency.

Update policy features and applicable scenarios

Real time update, synchronous update, strong consistency, strong intrusion and strong coupling with business, financial transfer business, etc

If real-time asynchronous update (MQ / publish subscribe / observer mode), the business is decoupled. If there is a delay in consistency, it is not suitable for frequent write scenarios

The invalidation mechanism is set to cache invalidation. There is a certain delay and there may be an avalanche. It is applicable to more reads and less writes, and can accept a certain delay

Task scheduling carries out full update statistical business through scheduled tasks, which is accessed frequently and updated regularly

2, Cache avalanche and breakdown

1. Cache avalanche concept

Cache avalanche refers to an avalanche in which the same expiration time is used when we set the cache, resulting in the cache invalidation at the same time at a certain time, all requests are forwarded to the DB, and the DB is under excessive instantaneous pressure. Different from cache breakdown, cache breakdown refers to the concurrent query of the same data. Cache avalanche refers to the query of the database because different data are expired and many data cannot be found.

Solution

Disperse the cache expiration time. For example, we can add a random value based on the original expiration time, such as 1-5 minutes random, so that the repetition rate of the expiration time of each cache will be reduced, and it is difficult to cause collective failure events.

The single thread (process) write of the cache is guaranteed by locking or queuing, so as to avoid a large number of concurrent requests falling on the underlying storage system in case of failure.

The first scheme is easy to implement. The second idea is mainly to add blocking exclusive locks. When the cache query cannot be found, only one thread is allowed to query the DB, so as to avoid a large number of concurrent requests with the same ID falling into the database.

public Result query(String id) {
    // 1. Fetch data from cache
    Result result = null;
    result = (Result)redisTemplate.opsForValue().get(id);
    if (result ! = null) {
        logger.info("Get data from cache");
        return result;
    }

    //2. Lock queue, blocking lock
    doLock(id);//There may be as many locks as there are IDs
    try {
        //Only one thread at a time
       //Double check, the first access to the following can be directly hit from the cache
        result = (Result)redisTemplate.opsForValue().get(id);
        if (result != null) {
            logger.info("Get data from cache");
            return result;//The second thread is returned here
        }

        result = dao.query(id);
        // 3. If the query result from the database is not empty, put the data into the cache to facilitate the next query
        if (null != result) {
            redisTemplate.opsForValue().set(id,result);
            redisTemplate.expire(id,20000, TimeUnit.MILLISECONDS);
        }
        return provinces;
    } catch(Exception e) {
        return null;
    }
    finally {
        //4. Unlock
        releaseLock(provinceid);
    }
}

private void releaseLock(String userCode) {
    ReentrantLock oldLock = (ReentrantLock) locks.get(userCode);
    if(oldLock !=null && oldLock.isHeldByCurrentThread()) {
        oldLock.unlock();
    }
}

private void doLock(String lockcode) {
    //id has different values
    //If the id is the same, add a lock, not the same key. You cannot use the same lock
    ReentrantLock newLock = new ReentrantLock();//Create a lock
    //If it already exists, newLock will be discarded directly
    Lock oldLock = locks.putIfAbsent(lockcode, newLock);
    if(oldLock == null) {
        newLock.lock();
    } else {
        oldLock.lock();
    }
}

Note: the solution of lock queuing is to solve the concurrency problem in distributed environment, and it may also solve the problem of distributed lock; Threads will also be blocked, and the user experience is very poor! Therefore, it is rarely used in real high concurrency scenarios!

2. Cache breakdown concept

For an existing key, when the cache expires, there are a large number of requests at the same time. These requests will breakdown into the DB, resulting in a large number of instantaneous DB requests and a sudden increase in pressure.

Solution

Before accessing the key, SETNX(set if not exists) is used to set another short-term key to lock the access of the current key. After the access is completed, the short-term key is deleted.

3, Cache penetration

1. Cache penetration concept

Cache penetration refers to data that does not exist in the cache or database, but users constantly initiate requests, such as data with id of "- 1" or data with id of particularly large and nonexistent. At this time, the user is likely to be an attacker, and the attack will lead to excessive pressure on the database.

Solution: bloom filter

The use method of Bloom filter is similar to the SET set of java, which is used to judge whether an element (key) is in a SET. Different from the general hash set, this algorithm does not need to store the value of the key. For each key, it only needs k bits, and each key stores a flag to judge whether the key is in the SET.

Use steps:

Load the List data into the bloom filter

private BloomFilter<String> bf =null;

//This method is called automatically after the PostConstruct annotation object is created
@PostConstruct
public void init() {
    //After the bean initialization is completed, instantiate the bloomFilter and load the data
    List<Entity> entities= initList();
    //Initialize bloom filter
    bf = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), entities.size());
    for (Entity entity : entities) {
        bf.put(entity.getId());
    }
}

The database can be queried only after the access passes through the bloom filter

public Provinces query(String id) {
    //Judge whether the value exists in the bloom filter before accessing the cache and database
    if(!bf.mightContain(id)) {
        Log.info("Illegal access"+System.currentTimeMillis());
        return null;
    }
    Log.info("Get data from database"+System.currentTimeMillis());
    Entity entity= super.query(id);
    return entity;
}

In this way, when there is a malicious attack from the outside, the nonexistent data requests can be directly intercepted at the filter layer without affecting the underlying database system.

Tags: Database Redis Cache

Posted on Thu, 21 Oct 2021 09:43:50 -0400 by acac