Redis (3) - what are the data structures of redis
Redis has five basic data structures: string, list, hash, set and Zset. These five kinds are the most basic and important parts of redis related knowledge.
1. string:
The string in Redis is a kind of dynamic string, which means that users can modify it. Its underlying implementation is similar to the ArrayList in Java, with a character array. From the source sds.h/sdshdr file, you can see the definition of string in the bottom layer of Redis, namely the Simple Dynamic String structure:
/* Note: sdshdr5 is never used, we just access the flags byte directly. * However is here to document the layout of type 5 SDS strings. */ struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; /* 3 lsb of type, and 5 msb of string length */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr8 { uint8_t len; /* used */ uint8_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr16 { uint16_t len; /* used */ uint16_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr32 { uint32_t len; /* used */ uint32_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr64 { uint64_t len; /* used */ uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; };
You will find that Redis uses generics to define the same structure many times. Why not use int directly?
Because when strings are short, len and alloc can be represented by byte and short. In order to optimize memory, Redis uses different structures to represent strings of different lengths.
Difference between SDS and C string:
Why not consider using C strings directly? Because the simple string representation of C language does not meet the Redis requirements on string security, efficiency and function. We know that C language uses a character array with length of N+1 to represent a string with length of N, and the last element of the character array is always' \ 0 '. (the figure below shows a character array with the value of "Redis" in C language.)
Such a simple data structure may cause the following problems:
- Get the operation with the string length of O(N) → because C does not save the length of the array, it needs to traverse the entire array once every time;
- The buffer overflow / memory leak can't be eliminated well → like the above reasons, if the splicing or string shortening operation is performed, the above problems will be easily caused if the operation is improper;
- C string can only save text data → because the string in C language must conform to some encoding (such as ASCII), for example, '\ 0' appearing in the middle may be judged as a string ending ahead of time and cannot be recognized;
For example, we add strings. The Redis source code is as follows:
/* Append the specified binary-safe string pointed by 't' of 'len' bytes to the * end of the specified sds string 's'. * * After the call, the passed sds string is no longer valid and all the * references must be substituted with the new pointer returned by the call. */ sds sdscatlen(sds s, const void *t, size_t len) { // Get the length of the original string size_t curlen = sdslen(s); // Adjust the space as needed. If the capacity is not enough for the additional content, the byte array will be reallocated and the content of the original string will be copied to the new array s = sdsMakeRoomFor(s,len); if (s == NULL) return NULL; // insufficient memory memcpy(s+curlen, t, len); // Append target string to byte array sdssetlen(s, curlen+len); // Set the length after appending s[curlen+len] = '\0'; // Let string end with \ 0 for debugging printing return s; }
- Note: Redis specifies that the string length should not exceed 512 MB.
Basic operation on string:
After installing Redis, we can use Redis cli to perform command-line operations on Redis. Of course, Redis official also provides an online debugger, in which you can also type in commands for operations: http://try.redis.io/#run
- Set and get key value pairs:
> SET key value OK > GET key "value"
As you can see, we usually use SET and GET to SET and GET string values.
The value can be any kind of string (including binary data), for example, you can save a. jpeg picture under a key, just be careful not to exceed the maximum of 512 MB.
When the key exists, the SET command will overwrite the value you SET last time:
> SET key newValue OK > GET key "newValue"
In addition, you can also use the = = EXISTS and DEL keywords to query whether there are and delete key value pairs = =:
> EXISTS key (integer) 1 > DEL key (integer) 1 > GET key (nil)
- Set key value pairs in batch:
> SET key1 value1 OK > SET key2 value2 OK > MGET key1 key2 key3 # Return to a list 1) "value1" 2) "value2" 3) (nil) > MSET key1 value1 key2 value2 > MGET key1 key2 1) "value1" 2) "value2"
- Expiration and SET command extension:
You can set the expiration time for the key, which will be automatically deleted by the end of the time. This function is often used to control the expiration time of the cache. (expiration can be any data structure)
> SET key value1 > GET key "value1" > EXPIRE name 5 # 5s Expire after ... # wait for 5s > GET key (nil)
SETNX command equivalent to set + exit:
> SETNX key value1 ... # wait for 5s Post acquisition > GET key (nil) > SETNX key value1 # SET succeeds if key does not exist (integer) 1 > SETNX key value1 # SET fails if key exists (integer) 0 > GET key "value" # No change
- Count:
If value is an integer, you can also use the INCR command to perform atomic auto increment operation on it, which means that multiple clients operate on the same key in time, and it will never lead to competition:
> SET counter 100 > INCR count (interger) 101 > INCRBY counter 50 (integer) 151
- GETSET command to return the original value:
For strings, another GETSET is interesting. Its function is the same as its name: set a value for key and return the original value:
> SET key value > GETSET key value1 "value"
This is very convenient for setting and viewing some keys that need to be counted at intervals. For example, when the system is entered by the user, you use the INCR command to operate a key. When you need to count, you use the GETSET command to reassign the key to 0, which achieves the purpose of Statistics.
2. list:
Redis's list is equivalent to LinkedList in Java language. Note that it is a linked list rather than an array. This means that the insert and delete operations of list are very fast, and the time complexity is O(1), but the index positioning is very slow, and the time complexity is O(n).
We can see the definition of adlist.h/listNode from the source code:
/* Node, List, and Iterator are the only data structures used currently. */ typedef struct listNode { struct listNode *prev; struct listNode *next; void *value; } listNode; typedef struct listIter { listNode *next; int direction; } listIter; typedef struct list { listNode *head; listNode *tail; void *(*dup)(void *ptr); void (*free)(void *ptr); int (*match)(void *ptr, void *key); unsigned long len; } list;
As you can see, multiple LISTNODES can form a two-way linked list through prev and next pointers:
Although only using multiple listNode structures can form a linked list, using the adlist.h/list structure to hold the linked list will make the operation more convenient:
Basic operation of linked list:
- LPUSH and RPUSH can add a new element to the left (head) and right (tail) of the list respectively;
- The LRANGE command can take a certain range of elements from the list;
- The LINDEX command can take out the elements of the specified table from the list, which is equivalent to the get(int index) operation in Java linked list operation;
Demonstration:
> rpush mylist A (integer) 1 > rpush mylist B (integer) 2 > lpush mylist first (integer) 3 > lrange mylist 0 -1 # -1 Represents the last element, From the first element to the last element, i.e. all 1) "first" 2) "A" 3) "B"
list implementation queue:
Queue is a first in, first out data structure, commonly used in message queuing and asynchronous logical processing, which ensures the access order of elements:
> RPUSH books python java golang (integer) 3 > LPOP books "python" > LPOP books "java" > LPOP books "golang" > LPOP books (nil)
list implementation stack:
The stack is the first in and last out data structure, which is the opposite of the queue:
> RPUSH books python java golang > RPOP books "golang" > RPOP books "java" > RPOP books "python" > RPOP books (nil)
Performance summary:
- It is a list of strings. left and right can be inserted and added
- If the key does not exist, create a new linked list
- New content if key exists
- If all the values are removed, the corresponding key disappears
- The efficiency of the operation of the linked list is very high both in the head and the tail, but if the intermediate elements are operated, the efficiency will be very poor.
3. Dictionary hash:
The dictionary in Redis is equivalent to the HashMap in Java, and its internal implementation is almost the same. It uses the "array + linked list" chain address method to solve some hash conflicts. At the same time, this structure also absorbs the advantages of two different data structures. The source code is defined as dict.h/dictht:
typedef struct dictht { // Hash table array dictEntry **table; // Hash table size unsigned long size; // Hash table size mask, used to calculate index value, always equal to size - 1 unsigned long sizemask; // The number of existing nodes in the hash table unsigned long used; } dictht; typedef struct dict { dictType *type; void *privdata; // There are two dictht structures inside dictht ht[2]; long rehashidx; /* rehashing not in progress if rehashidx == -1 */ unsigned long iterators; /* number of iterators currently running */ } dict;
The table attribute is an array. Each element in the array is a pointer to the dict.h/dictEntry structure, and each dictEntry structure holds a key value pair:
typedef struct dictEntry { // key void *key; // value union { void *val; uint64_t u64; int64_t s64; double d; } v; // Point to the next hash table node to form a linked list struct dictEntry *next; } dictEntry;
It can be seen from the above source code that in fact, there are two hashtables in the dictionary structure. Generally, only one hashtable has a value. However, when the dictionary is expanded or shrunk, a new hashtable needs to be allocated, and then a gradual relocation is required (the following reasons).
Progressive rehash:
The expansion of a large dictionary takes a lot of time. It needs to re apply for a new array, and then re attach all the elements in the old dictionary's linked list to the new array. This is an O(n) level operation. As a single threaded Redis, it is difficult to bear such a time-consuming process. Therefore, Redis uses a gradual rehash small step to move:
Progressive rehash will retain the new and old hash structures while rehash. As shown in the figure above, the two hash structures will be queried at the same time. Then, in the subsequent timing tasks and hash operation instructions, the contents of the old dictionary will be gradually migrated to the new dictionary. When the move is complete, a new hash structure will be used instead.
Conditions for expansion and contraction:
Under normal circumstances, when the number of elements in the hash table is equal to the length of the first dimension array, it will start to expand. The expanded new array is twice the size of the original array. However, if Redis is doing bgsave (persistence command), in order to reduce the memory separation too much, Redis tries not to expand the capacity. However, if the hash table is very full and reaches 5 times the length of the first dimension array, it will force the expansion at this time.
When the hash table becomes more and more sparse due to the gradual deletion of elements, Redis will shrink the hash table to reduce the first dimension array space occupation of the hash table. The condition used is that the number of elements is less than 10% of the length of the array. Whether Redis is doing bgsave will not be taken into account in scaling.
Basic operation of Dictionary:
There are also disadvantages of hash. The storage consumption of hash structure is higher than that of a single string. Therefore, whether to use hash or string should be weighed according to the actual situation
> HSET books java "think in java" # Command line strings need quotation marks if they contain spaces (integer) 1 > HSET books python "python cookbook" (integer) 1 > HGETALL books # key and value interval appear 1) "java" 2) "think in java" 3) "python" 4) "python cookbook" > HGET books java "think in java" > HSET books java "head first java" (integer) 0 # Because it is an update operation, return 0 > HMSET books java "effetive java" python "learning python" # Batch operation OK
4. set:
The Redis set is equivalent to the HashSet in the Java language. Its internal key value pairs are unordered and unique. Its internal implementation is equivalent to a special dictionary, in which all values are NULL.
Basic use of set:
As the structure is relatively simple, let's take a look at how to use it directly:
> SADD books java (integer) 1 > SADD books java # repeat (integer) 0 > SADD books python golang (integer) 2 > SMEMBERS books # Note the order. set is unordered 1) "java" 2) "python" 3) "golang" > SISMEMBER books java # Query whether a value exists, equivalent to contains (integer) 1 > SCARD books # Get length (integer) 3 > SPOP books # Pop up a "java"
5. There is a sequence table zset:
This may make Redis the most distinctive data structure. It is similar to the combination of SortedSet and HashMap in Java. On the one hand, it is a set, ensuring the uniqueness of internal value. On the other hand, it can give each value a score value, which is used to represent the weight of sorting.
Its internal implementation uses a data structure called "jump table". Because of its complexity, it's better to briefly mention the principle here:
Imagine you are the boss of a start-up company. At the beginning, there were only a few people. Everyone was on the same footing. Later, with the development of the company, the number of people is increasing and the cost of team communication is increasing. Gradually, the team leader system is introduced to divide the team, so some people are employees and have the identity of team leader.
Later, the scale of the company is further expanded, and the company needs to enter another level: Department. Then each department will elect a minister from the group leader.
The jump table is similar to this mechanism. All the elements in the bottom layer will be strung up, all of them are employees. Then a representative will be selected every few elements, and then these representatives will be strung up with another level of pointer. Then I will pick out the second level representatives from these representatives and string them up. Finally, a pyramid structure was formed.
Think about your current location: Asia > China > a province > a city , is such a structure!
Basic operation with sequence table zset:
> ZADD books 9.0 "think in java" > ZADD books 8.9 "java concurrency" > ZADD books 8.6 "java cookbook" > ZRANGE books 0 -1 # List by score, and the parameter range is the ranking range 1) "java cookbook" 2) "java concurrency" 3) "think in java" > ZREVRANGE books 0 -1 # It is listed in reverse order of score, and the parameter range is the ranking range 1) "think in java" 2) "java concurrency" 3) "java cookbook" > ZCARD books # amount to count() (integer) 3 > ZSCORE books "java concurrency" # Get the score of the specified value "8.9000000000000004" # Use of internal score double Type is stored, so there is a decimal point precision problem > ZRANK books "java concurrency" # ranking (integer) 1 > ZRANGEBYSCORE books 0 8.91 # Traverse zset according to score interval 1) "java cookbook" 2) "java concurrency" > ZRANGEBYSCORE books -inf 8.91 withscores # According to the score range (-∞, 8.91] ergodic zset,At the same time, the score is returned. inf representative infinite,Infinite meaning. 1) "java cookbook" 2) "8.5999999999999996" 3) "java concurrency" 4) "8.9000000000000004" > ZREM books "java concurrency" # Delete value (integer) 1 > ZRANGE books 0 -1 1) "java cookbook" 2) "think in java"
6, Five data types of application scenarios:
- String: the most common set/get operation. value can be string or number. Generally do some complex counting function cache (the INCR command performs atomic auto increment operation).
- Hash: here value stores structured objects, and it is more convenient to operate one of the fields.
- List: use the data structure of list to do simple * * message queue function. In addition, you can use lrang command to do redis based paging function, which has excellent performance and user experience. List is also a good way to complete the queuing, first in, first out principle * *.
- Set: set stores a pile of non repetitive sets, so it can do * * global de duplication function, and can also use intersection, union, subtraction and other operations to calculate common hobbies, all hobbies, and own unique preferences * *.
- Sorted set: a weight parameter score is added, and the elements in the set can be arranged according to the score. You can use the * * leaderboard application to select the Top N operation * *.
Thank you and refer to: java bosom friend