Deeply understand the sacn command of redis

As we all know, redis is an in memory database based on single thread. Therefore, you need to be very careful when executing some O(N) commands, otherwise redis will block the process for a long time, and then affect the normal use of other services.

Sometimes we need to process some qualified keys in redis, such as deletion, traversal and so on. Before redis 2.8, we can use keys to perform regular matching traversal on all elements to get the data we want, but this command has fatal disadvantages:

  1. There is no limit on the number of entries. All qualified key s are traversed at one time.
  2. The command is a traversal algorithm with a time complexity of O(N).

Once the amount of qualified data is huge, traversing millions or tens of millions of data will definitely block the redis process. Therefore, we need to avoid using keys in online machines.

In version 2.8 of redis, a scan command is launched, which not only meets the business requirements mentioned above, but also solves the phenomenon of blocking the redis process. Compared with the keys command, the scan command has the following advantages:

  1. Although the time complexity is also O(n), the command is not executed at one time, but carried out multiple times.
  2. You can limit the number of commands returned at a time. Each execution only returns part of the data without blocking the process for a long time.

This command is not perfect. To some extent, it may return duplicate data, which needs to be manually de duplicated by the client.

1, redis data structure

Redis uses the Hash table as the underlying implementation. The reason is that the search is efficient and the implementation is simple. When it comes to Hash tables, many people's first reaction is HashMap. Yes, the storage structure of redis's underlying key is an array + linked list structure similar to HashMap. The array size of the first dimension is 2 ^ n (n > = 0). Each expansion doubles the length of the array.

The scan command is to traverse this one-dimensional array. The cursor value returned each time is also the index of this array. The limit parameter indicates how many array elements are traversed, and the qualified results attached under these elements are returned. Because the size of the linked list attached to each element is different, the number of results returned each time is also different.

2, Traversal order of scan command

127.0.0.1:6379> keys *
1) "hahaha"
2) "hehehe"
3) "zezeze"
127.0.0.1:6379> scan 0 MATCH * COUNT 1
1) "2"
2) 1) "hahaha"
127.0.0.1:6379> scan 2 MATCH * COUNT 1
1) "1"
2) 1) "hehehe"
127.0.0.1:6379> scan 1 MATCH * COUNT 1
1) "3"
2) 1) "zezeze"
127.0.0.1:6379> scan 3 MATCH * COUNT 1
1) "0"
2) (empty list or set)

There are three key s in our Redis. We only traverse the elements in one-dimensional array at a time. As shown above, the traversal order of the SCAN command is

0->2->1->3

This order looks strange. It's easier for us to understand by converting it into binary.

00->10->01->11

We find that every time this sequence is high-order plus 1. Ordinary binary addition is to add and carry from right to left. This sequence is added and carried from left to right. This is also confirmed in the source code of redis.
In the dictScan function of the dict.c file, the cursor is processed as follows

v = rev(v);
v++;
v = rev(v);

It means to invert the cursor, add one, and then invert it, which is what we call the "high-order plus 1" operation.

You may have questions here. Why do you use this order for traversal instead of the normal order of 0, 1, 2... Because you need to consider the expansion and contraction of the dictionary during traversal (you have to admire the comprehensiveness of the developers).

3, How does scan deal with the expansion and contraction of hash tables

Let's take a look at how the SCAN traversal will be performed when capacity expansion occurs. The original array has 4 elements, that is, the index has 2 bits. At this time, it needs to be expanded into 3 bits and rehash.

1. Capacity expansion
All elements originally attached to XX are assigned to 0xx and 1xx. In the figure above, when we are about to traverse 10, dict rehash. At this time, the scan command will traverse from 010, while 000 and 100 (the elements hitched under 00) will not be traversed again.

2. Volume reduction
Let's look at the shrinkage. Suppose that dict shrinks from 3 bits to 2 bits. When it is about to traverse 110, dict shrinks, and scan will traverse 10. At this time, the elements attached under 010 will be iterated, but the elements before 010 will not be iterated. Therefore, some repeated elements may appear during volume reduction.

4, rehash operation of redis

Rehash is a complex process. In order not to block Redis processes, it adopts a progressive rehash mechanism.

/* Dictionaries */
typedef struct dict {
    // Type specific function
    dictType *type;
    // Private data
    void *privdata;
    // Hashtable 
    dictht ht[2];
    // rehash index
    // When rehash is not in progress, the value is - 1
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    // Number of security iterators currently running
    int iterators; /* number of iterators currently running */
} dict;

In redis's dictionary structure, there are two hash tables, a new table and an old table. During rehash, redis gradually migrates the elements in the old table to the new table. Next, let's take a look at the source code of rehash operation of dict.

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}

The rehash process takes bucket as the basic unit for migration. The so-called bucket is actually the elements of the one-dimensional array mentioned above, one list at a time. Let's explain this code.

  1. First, judge whether rehash is in progress. If yes, continue; Otherwise, return directly.
  2. The next step is to start progressive rehash in n steps. At the same time, it also determines whether there are remaining elements to ensure security.
  3. Before rehash, first judge whether the bucket to be migrated is out of bounds.
  4. Then skip the empty bucket. Here is an empty bucket_ The visits variable indicates the maximum number of empty buckets that can be accessed. This variable is mainly used to ensure that there are not too many blocked Redis.
  5. The next step is element migration. rehash all the elements of the current bucket and update the number of elements in the two tables.
  6. After migrating a bucket each time, you need to point the bucket in the old table to NULL.
  7. Finally, judge whether all the migration is completed. If so, recover the space and reset the rehash index. Otherwise, tell the caller that there is still data that has not been migrated.

Because Redis uses a progressive rehash mechanism, the scan command needs to scan both new and old tables and return the results to the client.

Tags: Java Redis

Posted on Fri, 26 Nov 2021 04:40:04 -0500 by vahidi