After reading the three questions of Redis cache, make sure you can talk to the interviewer.

In daily development, database is used to store data. Since there is usually no high concurrency in general system tasks...

In daily development, database is used to store data. Since there is usually no high concurrency in general system tasks, it seems that there is no problem.

Once the demand for a large amount of data is involved, such as the rush purchase of some commodities, or the instant large amount of home page visits, the system that only uses the database to save data will have serious performance disadvantages due to disk oriented and disk read / write speed problems. For detailed disk read / write principles, please refer to this piece [].

At this moment, thousands of requests come, which requires the system to complete thousands of read / write operations in a very short time. At this time, the database is often unable to bear, which is extremely easy to cause the database system to be paralyzed and eventually lead to the serious production problem of service downtime.

In order to overcome the above problems, projects usually introduce NoSQL technology, which is a memory based database and provides certain persistence functions.

Redis technology is one of NoSQL technologies. The use of redis cache greatly improves the performance and efficiency of applications, especially in data query.

But at the same time, it also brings some problems. Among them, the most crucial problem is the consistency of data. Strictly speaking, this problem has no solution. If data consistency is required, caching cannot be used.

Other typical problems are cache penetration, cache breakdown and cache avalanche. This article proposes solutions to these three cache problems from the actual code operation. After all, Redis's cache problem is a high-frequency question in the actual interview, so we should have both theory and practice.

Cache penetration

Cache penetration refers to querying a database and a piece of data that does not exist in the cache. The database will be queried all the time, and the access pressure to the database will increase. There are two solutions for cache penetration:

  1. Cache empty objects: code maintenance is simple, but the effect is not good.

  2. Bloom filter: the code maintenance is complex and the effect is very good.

Cache empty objects

Caching an empty object means that when the requested data does not exist in the cache or in the database, the first request will skip the cache for database access, and return empty after accessing the database. At this time, the empty object will also be cached.

If you access the empty object again, you will directly hit the cache instead of the database again. The schematic diagram of caching the empty object is as follows:


The implementation code of caching empty objects is as follows:

public class UserServiceImpl { @Autowired UserDAO userDAO; @Autowired RedisCache redisCache; public User findUser(Integer id) { Object object = redisCache.get(Integer.toString(id)); //  Exist in cache, return directly if(object != null) { //  Check whether the object is an empty cache object. If yes, null will be returned directly if(object instanceof NullValueResultDO) { return null; } return (User)object; } else { //  Does not exist in cache, query database User user = userDAO.getUser(id); //  Store in cache if(user != null) { redisCache.put(Integer.toString(id),user); } else { //  Save empty objects into cache redisCache.put(Integer.toString(id), new NullValueResultDO()); } return user; } } }

The implementation code of caching empty objects is very simple, but caching empty objects will bring big problems, that is, there will be many empty objects in the cache, which will occupy memory space and waste resources. One solution is to set a short expiration time for empty objects. The code is as follows:

//  When caching, add one more empty object with an expiration time of 60 seconds redisCache.put(Integer.toString(id), new NullValueResultDO(),60);

Bloom filter

Bloom filter is a probability based data structure, which is mainly used to judge whether an element is in the set. It has the advantages of fast running speed (time efficiency) and small memory occupation (space efficiency), but it has some problems of false recognition rate and deletion difficulty. It can only tell you that an element must not be in the collection or may be in the collection.

In computer science, there is an idea: space for time, time for space. In general, you can't have both, but the operation efficiency and space size of Bloom filter have both. How does it do it?

A concept of misjudgment rate is quoted in the bloom filter, that is, it may think that elements that do not belong to the set may belong to the set, but it will not think that elements that belong to the set do not belong to the set. The characteristics of the bloom filter are as follows:

  1. A very large set of binary digits   (there are only 0 and 1 in the array)

  2. Several hash functions

  3. High space efficiency and query efficiency

  4. There is no False Negative: an element can be reported in a set.

  5. There may be false positives: an element is not in a set and may also be exploded.

  6. No deletion method is provided, which makes code maintenance difficult.

  7. The bit group is initialized to 0, which does not store the specific value of the element. When the element is hashed by the hash function (that is, the array subscript), the corresponding array position value is changed to 1.

The schematic diagram of actual bloom filter storage data and query data is as follows:


After reading the above features and schematic diagram, many readers still can't understand it. Don't worry. Next, we explain the bloom filter step by step through the diagram. In a word, the bloom filter is a large binary digit group, and only 0 and 1 are stored in the array.

The structure diagram of the initialized bloom filter is as follows:


The above only shows a very small part of the bloom filter. In fact, the bloom filter is a very large array (the large here refers to its large length, not its large memory space).

So how is a data stored in the bloom filter?

When a data is stored in the bloom filter, it will be hashed by a hash function (if you don't understand the hash function, please refer to this piece []) to obtain the corresponding hash value as the subscript of the array, and then modify the value of the subscript corresponding to the initialized bit group to 1. The result is as follows:


When the second value is stored again, the schematic diagram of the modified result is as follows:


Therefore, each time a data is stored, the hash function will be calculated, and the calculation result will be used as the subscript. How many hash functions there are in the bloom filter will calculate how many subscripts. The insertion process of the bloom filter is as follows:

  1. Assign the element to be added to m hash functions

  2. Get m positions corresponding to the bit array

  3. Set these m positions to 1

So why is there a misjudgment rate?

Assuming that there are three values x, y and z in the bloom filter after we store the values for many times, the storage structure diagram of the bloom filter is as follows:


When we want to query, such as the number a, in fact, the number a does not exist in the bloom filter. After calculation by two hash functions, the hash values of a are 2 and 13 respectively. The structural schematic diagram is as follows:


After query, it is found that the values stored in positions 2 and 13 are 1, but the subscripts of 2 and 13 are x and z respectively. After modification of the calculated subscript positions, a does not actually exist in the bloom filter, so the bloom filter will misjudge that the modified value may exist. Because the Bloom filter does not store element values, there is a misjudgment rate.

Then, the accuracy of the specific bloom filtering judgment is related to the following two factors:

  1. The larger the size of the bloom filter, the smaller the misjudgment rate. Therefore, the general length of the bloom filter is very large.

  2. Number of hash functions: the more the number of hash functions, the smaller the misjudgment rate.

So why not delete the element?

The reason is very simple, because after deleting an element, the subscript of the corresponding element is set to zero, and the subscript of other elements may also refer to the changed subscript, so the judgment of other elements will be affected. The schematic diagram is as follows:


After deleting the z element, set the corresponding subscripts 10 and 13 to 0, which will affect the subscripts of x and y elements and lead to inaccurate judgment of data. Therefore, the api for deleting elements is not provided directly.

The above is the principle of Bloom filter. Only when you understand the principle can you be like a fish in water in practical application. Here is the code to practice and write a simple bloom filter.

For a bloom filter, the core of the bloom filter must be defined first:

  • Several hash functions

  • Stored value Api

  • Judge worth Api

The implementation code is as follows:

public class MyBloomFilter { //  Bloom filter length private static final int SIZE = 2 << 10; //  Simulate the implementation of different hash functions private static final int[] num= new int[] ; //  Initialize bit group private BitSet bits = new BitSet(SIZE); //  Hash function for storing private MyHash[] function = new MyHash[num.length]; //  Initialize hash function public MyBloomFilter() { for (int i = ; i < num.length; i++) { function [i] = new MyHash(SIZE, num[i]); } } //  Stored value Api   public void add(String value) { //  Hash the stored value for (MyHash f: function) { //  Change the value of the hash subscript position corresponding to the array to 1 bits.set(f.hash(value), true); } } //  Determine whether there is a worthy Api public boolean contains(String value) { if (value == null) { return false; } boolean result= true; for (MyHash f : func) { result= result&& bits.get(f.hash(value)); } return result; } }

The hash function code is as follows:

public static class MyHash { private int cap; private int seed; //  Initialization data public MyHash(int cap, int seed) { this.cap = cap; this.seed = seed; } //  hash function public int hash(String value) { int result = ; int len = value.length(); for (int i = ; i < len; i++) { result = seed * result + value.charAt(i); } return (cap - 1) & result; } }

The test code of Bloom filter is as follows:

public static void test { String value = "4243212355312"; MyBloomFilter filter = new MyBloomFilter(); System.out.println(filter.contains(value)); filter.add(value); System.out.println(filter.contains(value)); }

The above is a very simple bloom filter, but Auction of second-hand mobile game account The project may have been written for you by cattle or large companies, such as Google Guava of Google. You only need to introduce the following dependencies into the project:

<dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>27.0.1-jre</version> </dependency>

The specific operation codes in the actual project are as follows:

public static void MyBloomFilterSysConfig { @Autowired OrderMapper orderMapper //  1. Create bloom filter    The second parameter is the expected data volume of 10000000, and the third parameter is the error rate of 0.00001 BloomFilter<CharSequence> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")),10000000, 0.00001); //  2. Get all orders and put the order id into the bloom filter List<Order> orderList = orderMapper.findAll() for (Order order;orderList ) { Long id = order.getId(); bloomFilter.put("" + id); } }

In the actual project, a system task or timing task will be started to initialize the bloom filter, and the id of the hotspot query data will be put into the bloom filter. When the user requests again, the bloom filter will be used to judge whether the id of the change order exists in the bloom filter. If it does not exist, null will be returned directly. The specific operation code is as follows:

//  Judge whether the order id exists in the bloom filter bloomFilter.mightContain("" + id)

The disadvantage of Bloom filter is to maintain the data in the container, because the order data must change frequently. In real time, the data in bloom filter should be updated to the latest.

Buffer breakdown

Cache breakdown means that a key is very hot. It is constantly carrying large concurrency. Large concurrency focuses on accessing this point. When the key fails, the continuous large concurrency breaks through the cache and directly requests the database, which increases the access pressure on the database at the moment.

Cache breakdown emphasizes concurrency. There are two reasons for cache breakdown:

  1. The data has not been queried, and it is accessed concurrently for the first time. (cold door data)

  2. Added to the cache, reids has set the time when the data becomes invalid. This data just becomes invalid, and there are large concurrent accesses (hot data)

The solution to cache breakdown is locking. The specific schematic diagram is as follows:


When a user has a large concurrent access, the process of querying the cache and querying the database is locked. Only the first incoming request can be executed. When the first request puts the data into the cache, the next access will be directly concentrated in the cache to prevent cache breakdown.

It is a common practice in price comparison in the industry, that is, when the value value obtained according to the key is empty, Lock it, load the data from the database, and then release the Lock. If other threads fail to acquire locks, wait for a period of time and try again. It should be noted here that distributed locks should be used in the distributed environment. For a single machine, it is enough to use ordinary locks (synchronized and Lock).

The following is a code demonstration of a case of obtaining commodity inventory. The specific implementation code of the lock of the stand-alone version is as follows:

//  Get inventory quantity public String getProduceNum(String key) { try { synchronized (this) { //Lock //  Fetch data from the cache and store it in the cache int num= Integer.parseInt(redisTemplate.opsForValue().get(key)); if (num> ) { //Inventory not checked once - 1 redisTemplate.opsForValue().set(key, (num- 1) + ""); System.out.println("The remaining inventory is num: " + (num- 1)); } else { System.out.println("Inventory is 0"); } } } catch (NumberFormatException e) { e.printStackTrace(); } finally { } return "OK"; }

The specific implementation code of distributed lock is as follows:

public String getProduceNum(String key) { //  Get distributed lock RLock lock = redissonClient.getLock(key); try { //  Get inventory int num= Integer.parseInt(redisTemplate.opsForValue().get(key)); //  Lock             lock.lock(); if (num> ) { //Reduce inventory and store in cache redisTemplate.opsForValue().set(key, (num - 1) + ""); System.out.println("Remaining inventory is num: " + (num- 1)); } else { System.out.println("Inventory is already 0"); } } catch (NumberFormatException e) { e.printStackTrace(); } finally { //Unlock lock.unlock(); } return "OK"; }

Cache avalanche

Cache avalanche means that the cache set expires in a certain period of time. At the moment, countless requests bypass the cache and directly request the database.

There are two reasons for cache avalanche:

  1. reids down

  2. Most of the data is invalid

For example, tmall's double 11 is about to arrive at double 11 o'clock, and there will soon be a wave of rush buying. This wave of goods will be put into the cache at 23 o'clock. Assuming the cache lasts for an hour, the cache of these goods will expire at 24 o'clock in the morning.

The access and query of these commodities fall on the database. For the database, it will produce periodic pressure peaks, put pressure on the database, and even collapse the database.

The schematic diagram of cache avalanche is as follows. Under normal conditions, there are no a large number of invalid users accessing the key. The schematic diagram is as follows:


When a large number of key s fail at a certain time point, the schematic diagram of cache avalanche is as follows:


There are two solutions for cache avalanche:

  1. Build a highly available cluster to prevent single redis downtime.

  2. Set different expiration times to prevent a large number of key s from becoming invalid at the same time.

For the business system, it is always the specific analysis of the specific situation. There is no best, only the most appropriate. We will continue to learn more about other cache problems, such as cache fullness and data loss. Finally, we also mention the three words LRU, RDB and AOF. Generally, we use LRU strategy to deal with overflow, and Redis's RDB and AOF persistence strategy to ensure data security under certain circumstances.

6 November 2021, 20:05 | Views: 5681

Add new comment

For adding a comment, please log in
or create account

0 comments