https://github.com/ning1875/falcon-plus This is my change to the open source version of falcon
===
The most important thing I forgot to write is: why can consistency hash algorithm only migrate a small number of keys when the nodes change? Because the sortkeys list is actually a hash ring. The relative position of the client's hash value and the stock node's hash value in the ordered sortkeys list has not changed. What has changed is the value between the hash in front of the offline node and the previous one, so the change rate is: 1-n/m
=================================================================================================================================================
transfer in open Falcon generates two consistent hash rings for judge and graph
func initNodeRings() { cfg := g.Config() JudgeNodeRing = rings.NewConsistentHashNodesRing(int32(cfg.Judge.Replicas), cutils.KeysOfMap(cfg.Judge.Cluster)) GraphNodeRing = rings.NewConsistentHashNodesRing(int32(cfg.Graph.Replicas), cutils.KeysOfMap(cfg.Graph.Cluster)) }
The purpose of hash ring is to report each counter:endpoint+metric+tag Compute consistent hashes in different back-end judge and graph instances, and do the same when returning to find. This is a typical distributed cache idea, which is the basis for falcon to bear high concurrency. Consistent hashes commonly exist in lb applications such as lvs nginx. Load balancing consistency hash of Nginx
Hash hash is to transform the input of any length into the output of fixed length through hash algorithm, and the output is the hash value.
Indefinite length input -- > hash function -- > fixed length hash value
1. The essence of hash algorithm is lossy compression of original data
2. Hash operation includes addition hash, bit operation hash, multiplication hash, division hash, table lookup hash and mixed hash
3. The fixed length of hash value is to divide the input into fixed length bits, perform hash operation in turn, and then iterate with no way. The bits are insufficient to complete
4. Hash table lookup: take out an element in the set for comparison, and then narrow the scope to find if it is inconsistent. Hash lookup is to directly calculate the location of this element in the set according to the key value, which is close to O(1) time complexity
5. Anti collision ability of hash: for any two different data blocks, the possibility of having the same hash value is very small; for a given data block, it is very difficult to find the same hash value of the same data block.
6. Tamper resistance: for a data block, even if only one bit is changed, its hash value will be changed greatly.
Let's look at the consistency hash algorithm
When there is a node change (increase or decrease), only a few key s need rebrance to the new node.
A good consistent hash algorithm should meet the following requirements:
Balance: it means that the hash results are evenly distributed among all nodes
Monotonicity:
Dispersion: the hash results of different terminals are inconsistent. A good consistency hash algorithm should avoid this
Load: different terminals may map the same content to different nodes
1. The data structure required by the consistent hash algorithm is a map and a sorted hash key list
2. The process of generating hash ring is: generate key for each node through hash algorithm (md5 crc32), update map and add key list
3. Search process: calculate key2 according to the string to be stored, and then find the index of key1 that is a little larger than key2 through binary search, and go to map according to key1 to get the corresponding node
4. The virtual node is introduced to solve the problem of data skew: when there are too few service nodes in the consistent hash algorithm, it is easy to cause data skew due to uneven node segments
5. The method of virtual node is to generate multiple (30 + hashkey s in general) corresponding to the same node: it's like you go to Taobao to search for the same product, see a store and then see another store selling the same thing. The seller provides you with a list of stores and tells you that these stores are mine. The feeling of same destination
Look at the consistency hash code source code from the mask body -- >
Hash ring source code used in falcon
https://github.com/toolkits/consistent
1. First look at the data structure: remember the map and sortkeys, because they are the core
type HashKey uint32 type HashKeyOrder []HashKey type HashRing struct { ring map[HashKey]string //map in hash ring sortedKeys []HashKey //Hash key list nodes []string //Node list weights map[string]int }
Let's see if we find something similar in falcon
// Consistent holds the information about the members of the consistent hash circle. type Consistent struct { circle map[uint32]string members map[string]bool sortedHashes uints NumberOfReplicas int count int64 scratch [64]byte sync.RWMutex }
2. Let's look at the process of generating hash rings:
First, initialize the structure, then call a function of the generating ring.
func New(nodes []string) *HashRing { hashRing := &HashRing{ ring: make(map[HashKey]string), sortedKeys: make([]HashKey, 0), nodes: nodes, weights: make(map[string]int), } //Generate hash ring hashRing.generateCircle() return hashRing }
Take a look at the logic here: 1. Loop all virtual nodes, generate hashkey s according to nodes, and insert them into map and sortkeys respectively
func (h *HashRing) generateCircle() { totalWeight := 0 //This paragraph about weight can be ignored for _, node := range h.nodes { if weight, ok := h.weights[node]; ok { totalWeight += weight } else { totalWeight += 1 h.weights[node] = 1 } } for _, node := range h.nodes { weight := h.weights[node] // When three nodes and their weights are all 1, factor = 40 is to add virtual nodes factor := math.Floor(float64(40*len(h.nodes)*weight) / float64(totalWeight)) for j := 0; j < int(factor); j++ { //nodekey : 'node01-00' 'node01-01' 'node01-02' nodeKey := fmt.Sprintf("%s-%d", node, j) //bKey : [236 120 185 49 156 84 249 99 169 176 131 185 148 230 91 141] bKey := hashDigest(nodeKey) for i := 0; i < 3; i++ { //key:3261919718 //key:2087224356 //key:2167064686 key := hashVal(bKey[i*4 : i*4+4]) fmt.Printf("Akey:%v\n",key) //Plug the h.ring map with 3*factor=120 value s as the key of this node h.ring[key] = node //List add operation h.sortedKeys = append(h.sortedKeys, key) } } } //h.sortedKeys ring.keys () is [31575610 64842500 65702829 80981415...] sort.Sort(HashKeyOrder(h.sortedKeys)) }
Take a look at hashDigest here: generate MD5 []byte
func hashDigest(key string) [md5.Size]byte { return md5.Sum([]byte(key)) }
falcon uses crc32.ChecksumIEEE
func (c *Consistent) hashKey(key string) uint32 { if len(key) < 64 { var scratch [64]byte copy(scratch[:], key) return crc32.ChecksumIEEE(scratch[:len(key)]) } return crc32.ChecksumIEEE([]byte(key)) }
Take a look at hashval here: Shift + or operate every four bits of the generated md5 byte as a hashkey
func hashVal(bKey []byte) HashKey { //Displacement plus or operation return ((HashKey(bKey[3]) << 24) | (HashKey(bKey[2]) << 16) | (HashKey(bKey[1]) << 8) | (HashKey(bKey[0]))) }
See here, we have a spectrum in mind: for each node, calculate 3 * 40 = 120 numbers of uint32 as keys, insert them into the map and hashkey list, and finally sort the hashkey list to prepare for the final binary search
3. Finally, let's look at the search process:
The process of searching is to generate a hash key based on the key, find the index of the key in the list through sorting keys list, get the hash key based on the index, and then go to map to get the corresponding node
func (h *HashRing) GetNode(stringKey string) (node string, ok bool) { //First, get the index of the key in the sortedKeys list pos, ok := h.GetNodePos(stringKey) if !ok { return "", false } return h.ring[h.sortedKeys[pos]], true } func (h *HashRing) GetNodePos(stringKey string) (pos int, ok bool) { if len(h.ring) == 0 { return 0, false } // key is hashkey 2880865363 key := h.GenKey(stringKey) nodes := h.sortedKeys /* Here, the index of hashkey in h.sortedKeys is obtained by binary search method sort.Search The second parameter of is interesting. It's a method to return bool */ pos = sort.Search(len(nodes), func(i int) bool { return nodes[i] > key }) if pos == len(nodes) { // Wrap the search, should return first node return 0, true } else { return pos, true } }
Let's look up here: Using
/* Here, the index of hashkey in h.sortedKeys is obtained by binary search method sort.Search The second parameter of is interesting. It's a method to return bool */ pos = sort.Search(len(nodes), func(i int) bool { return nodes[i] > key })
Let's take a look at the description of Search in the source code: even my poor English shows that this is a binary Search method:
The process is to find the smallest key greater than key1 by binary search based on the calculated key1 and the ordered list
>>1 is divided by 2, and the first power is halved
// Search uses binary search to find and return the smallest index i // in [0, n) at which f(i) is true, assuming that on the range [0, n), // f(i) == true implies f(i+1) == true. That is, Search requires that // f is false for some (possibly empty) prefix of the input range [0, n) // and then true for the (possibly empty) remainder; Search returns // the first true index. If there is no such index, Search returns n. // (Note that the "not found" return value is not -1 as in, for instance, // strings.Index.) // Search calls f(i) only for i in the range [0, n). func Search(n int, f func(int) bool) int { // Define f(-1) == false and f(n) == true. // Invariant: f(i-1) == false, f(j) == true. i, j := 0, n for i < j { h := int(uint(i+j) >> 1) // avoid overflow when computing h // i ≤ h < j if !f(h) { i = h + 1 // preserves f(i-1) == false } else { j = h // preserves f(j) == true } } // i == j, f(i-1) == false, and f(j) (= f(i)) == true => answer is i. return i } //Let's do a binary search of python def bin_search(data_set,val): #low and high represent the minimum subscript and the maximum subscript low=0 high=len(data_set)-1 while low <=high:# Only when low is less than High can we prove that there are several mid=(low+high)//2 print "low:%d,mid:%d,high:%d" % (low,mid, high) if data_set[mid]==val: return mid #Return his subscript elif data_set[mid]>val: high=mid-1 else: low=mid+1 return # return null proof not found data_set = list(range(100)) print(bin_search(data_set, 34))
Let's test the migration of key when the nodes of the consistent hash algorithm change
func RingInit(server_arr []string) *hashring.HashRing{ return hashring.New(server_arr) } func PengzhuangCeshi(){ servers1 :=[]string{ "192.168.0.241:11212", "192.168.0.242:11212", "192.168.0.243:11212", "192.168.0.244:11212", "192.168.0.245:11212", } servers2 :=[]string{ "192.168.0.241:11212", "192.168.0.242:11212", "192.168.0.243:11212", "192.168.0.244:11212", } r1 := RingInit(servers1) r2 := RingInit(servers2) test_num :=10000000 client_ip := "10.10.10.10" migr_num :=0 for i:=0;i<test_num;i++{ key :=fmt.Sprintf("%s_%v",client_ip,i) choose_server1,_ := r1.GetNode(key) choose_server2,_ := r2.GetNode(key) if choose_server1 !=choose_server2{ migr_num+=1 } } fmt.Println("migr_num",migr_num) fmt.Printf("migr_rate %.3f", float32(migr_num)/float32(test_num)) } func main(){ PengzhuangCeshi() //Test() }
test_num :=10000000
4 / 5 change
migr_num 1839416
migr_rate 0.184
5 / 2 change
migr_num 5737265
migr_rate 0.574
3 / 2 change
migr_num 3072919
migr_rate 0.307
4 / 3 change
migr_num 2491462
migr_rate 0.249
If a server is not available, the affected data is only the data between this server and the previous server in its ring space (i.e. the first server encountered when walking in a counterclockwise direction), and the rest will not be affected.
We speculate that the mobility is rate = 1-m / N if M < n???
At last, we need to talk less about a consistent hash ring of python
#coding:utf-8 import md5 class ConsistentHashRing(object): def __init__(self,nodes,replicas=3): self.replicas = replicas self.ring = {} self.sort_keys = [] if nodes: for node in nodes: self.add_nodes(node) def add_nodes(self,node): for i in xrange(self.replicas): key='%s_%d'%(node,i) hashkey = self.gen_key(key) #print hashkey self.ring[hashkey] = node self.sort_keys.append(hashkey) self.sort_keys.sort() def gen_key(self,key): m = md5.new() m.update(key) return long(m.hexdigest(), 16) def get_node(self,data_key): return self.get_node_pos(data_key)[0] def get_node_pos(self,data_key): key = self.gen_key(data_key) nodes = self.sort_keys for i in xrange(0,len(nodes)): node = nodes[i] if key <= node: return self.ring[node],i return self.ring[nodes[0]],0 if __name__ == '__main__': nodes=["node-1","node-2","node-3"] Ring = ConsistentHashRing(nodes) for i in xrange(10000): print Ring.get_node("key1-%d"%i)