Interpretation of two consistent hash algorithms This is my change to the open source version of falcon


The most important thing I forgot to write is: why can consistency hash algorithm only migrate a small number of keys when the nodes change? Because the sortkeys list is actually a hash ring. The relative position of the client's hash value and the stock node's hash value in the ordered sortkeys list has not changed. What has changed is the value between the hash in front of the offline node and the previous one, so the change rate is: 1-n/m


transfer in open Falcon generates two consistent hash rings for judge and graph

func initNodeRings() {
    cfg := g.Config()

    JudgeNodeRing = rings.NewConsistentHashNodesRing(int32(cfg.Judge.Replicas), cutils.KeysOfMap(cfg.Judge.Cluster))
    GraphNodeRing = rings.NewConsistentHashNodesRing(int32(cfg.Graph.Replicas), cutils.KeysOfMap(cfg.Graph.Cluster))

The purpose of hash ring is to report each counter:endpoint+metric+tag Compute consistent hashes in different back-end judge and graph instances, and do the same when returning to find. This is a typical distributed cache idea, which is the basis for falcon to bear high concurrency. Consistent hashes commonly exist in lb applications such as lvs nginx. Load balancing consistency hash of Nginx

Hash hash is to transform the input of any length into the output of fixed length through hash algorithm, and the output is the hash value.

Indefinite length input -- > hash function -- > fixed length hash value

1. The essence of hash algorithm is lossy compression of original data

2. Hash operation includes addition hash, bit operation hash, multiplication hash, division hash, table lookup hash and mixed hash

3. The fixed length of hash value is to divide the input into fixed length bits, perform hash operation in turn, and then iterate with no way. The bits are insufficient to complete

4. Hash table lookup: take out an element in the set for comparison, and then narrow the scope to find if it is inconsistent. Hash lookup is to directly calculate the location of this element in the set according to the key value, which is close to O(1) time complexity

5. Anti collision ability of hash: for any two different data blocks, the possibility of having the same hash value is very small; for a given data block, it is very difficult to find the same hash value of the same data block.

6. Tamper resistance: for a data block, even if only one bit is changed, its hash value will be changed greatly.

Let's look at the consistency hash algorithm

When there is a node change (increase or decrease), only a few key s need rebrance to the new node.

A good consistent hash algorithm should meet the following requirements:

Balance: it means that the hash results are evenly distributed among all nodes


Dispersion: the hash results of different terminals are inconsistent. A good consistency hash algorithm should avoid this

Load: different terminals may map the same content to different nodes

1. The data structure required by the consistent hash algorithm is a map and a sorted hash key list

2. The process of generating hash ring is: generate key for each node through hash algorithm (md5 crc32), update map and add key list

3. Search process: calculate key2 according to the string to be stored, and then find the index of key1 that is a little larger than key2 through binary search, and go to map according to key1 to get the corresponding node

4. The virtual node is introduced to solve the problem of data skew: when there are too few service nodes in the consistent hash algorithm, it is easy to cause data skew due to uneven node segments

5. The method of virtual node is to generate multiple (30 + hashkey s in general) corresponding to the same node: it's like you go to Taobao to search for the same product, see a store and then see another store selling the same thing. The seller provides you with a list of stores and tells you that these stores are mine. The feeling of same destination

Look at the consistency hash code source code from the mask body -- >


Hash ring source code used in falcon

1. First look at the data structure: remember the map and sortkeys, because they are the core

type HashKey uint32
type HashKeyOrder []HashKey

type HashRing struct {
    ring       map[HashKey]string  //map in hash ring
    sortedKeys []HashKey           //Hash key list
    nodes      []string            //Node list
    weights    map[string]int

Let's see if we find something similar in falcon

// Consistent holds the information about the members of the consistent hash circle.
type Consistent struct {
    circle           map[uint32]string
    members          map[string]bool
    sortedHashes     uints
    NumberOfReplicas int
    count            int64
    scratch          [64]byte

2. Let's look at the process of generating hash rings:

First, initialize the structure, then call a function of the generating ring.

func New(nodes []string) *HashRing {
    hashRing := &HashRing{
        ring:       make(map[HashKey]string),
        sortedKeys: make([]HashKey, 0),
        nodes:      nodes,
        weights:    make(map[string]int),
    //Generate hash ring
    return hashRing

Take a look at the logic here: 1. Loop all virtual nodes, generate hashkey s according to nodes, and insert them into map and sortkeys respectively

func (h *HashRing) generateCircle() {
    totalWeight := 0

    //This paragraph about weight can be ignored
    for _, node := range h.nodes {
        if weight, ok := h.weights[node]; ok {
            totalWeight += weight
        } else {
            totalWeight += 1
            h.weights[node] = 1

    for _, node := range h.nodes {
        weight := h.weights[node]
        // When three nodes and their weights are all 1, factor = 40 is to add virtual nodes

        factor := math.Floor(float64(40*len(h.nodes)*weight) / float64(totalWeight))
        for j := 0; j < int(factor); j++ {
            //nodekey : 'node01-00' 'node01-01' 'node01-02'
            nodeKey := fmt.Sprintf("%s-%d", node, j)
            //bKey : [236 120 185 49 156 84 249 99 169 176 131 185 148 230 91 141]
            bKey := hashDigest(nodeKey)
            for i := 0; i < 3; i++ {
                key := hashVal(bKey[i*4 : i*4+4])
                //Plug the h.ring map with 3*factor=120 value s as the key of this node
                h.ring[key] = node
                //List add operation
                h.sortedKeys = append(h.sortedKeys, key)
    //h.sortedKeys  ring.keys () is [31575610 64842500 65702829 80981415...]


Take a look at hashDigest here: generate MD5 []byte

func hashDigest(key string) [md5.Size]byte {
    return md5.Sum([]byte(key))

falcon uses crc32.ChecksumIEEE

func (c *Consistent) hashKey(key string) uint32 {
    if len(key) < 64 {
        var scratch [64]byte
        copy(scratch[:], key)
        return crc32.ChecksumIEEE(scratch[:len(key)])
    return crc32.ChecksumIEEE([]byte(key))

Take a look at hashval here: Shift + or operate every four bits of the generated md5 byte as a hashkey

func hashVal(bKey []byte) HashKey {
    //Displacement plus or operation
    return ((HashKey(bKey[3]) << 24) |
        (HashKey(bKey[2]) << 16) |
        (HashKey(bKey[1]) << 8) |

See here, we have a spectrum in mind: for each node, calculate 3 * 40 = 120 numbers of uint32 as keys, insert them into the map and hashkey list, and finally sort the hashkey list to prepare for the final binary search

3. Finally, let's look at the search process:

The process of searching is to generate a hash key based on the key, find the index of the key in the list through sorting keys list, get the hash key based on the index, and then go to map to get the corresponding node

func (h *HashRing) GetNode(stringKey string) (node string, ok bool) {
    //First, get the index of the key in the sortedKeys list
    pos, ok := h.GetNodePos(stringKey)
    if !ok {
        return "", false
    return h.ring[h.sortedKeys[pos]], true

func (h *HashRing) GetNodePos(stringKey string) (pos int, ok bool) {
    if len(h.ring) == 0 {
        return 0, false
    // key is hashkey 2880865363
    key := h.GenKey(stringKey)

    nodes := h.sortedKeys
    Here, the index of hashkey in h.sortedKeys is obtained by binary search method
    sort.Search The second parameter of is interesting. It's a method to return bool
    pos = sort.Search(len(nodes), func(i int) bool { return nodes[i] > key })

    if pos == len(nodes) {
        // Wrap the search, should return first node
        return 0, true
    } else {
        return pos, true

Let's look up here: Using

Here, the index of hashkey in h.sortedKeys is obtained by binary search method
sort.Search The second parameter of is interesting. It's a method to return bool
pos = sort.Search(len(nodes), func(i int) bool { return nodes[i] > key })

Let's take a look at the description of Search in the source code: even my poor English shows that this is a binary Search method:

The process is to find the smallest key greater than key1 by binary search based on the calculated key1 and the ordered list

>>1 is divided by 2, and the first power is halved

// Search uses binary search to find and return the smallest index i
// in [0, n) at which f(i) is true, assuming that on the range [0, n),
// f(i) == true implies f(i+1) == true. That is, Search requires that
// f is false for some (possibly empty) prefix of the input range [0, n)
// and then true for the (possibly empty) remainder; Search returns
// the first true index. If there is no such index, Search returns n.
// (Note that the "not found" return value is not -1 as in, for instance,
// strings.Index.)
// Search calls f(i) only for i in the range [0, n).

func Search(n int, f func(int) bool) int {
    // Define f(-1) == false and f(n) == true.
    // Invariant: f(i-1) == false, f(j) == true.
    i, j := 0, n
    for i < j {
        h := int(uint(i+j) >> 1) // avoid overflow when computing h
        // i ≤ h < j
        if !f(h) {
            i = h + 1 // preserves f(i-1) == false
        } else {
            j = h // preserves f(j) == true
    // i == j, f(i-1) == false, and f(j) (= f(i)) == true  =>  answer is i.
    return i
//Let's do a binary search of python

def bin_search(data_set,val):
    #low and high represent the minimum subscript and the maximum subscript
    while low <=high:# Only when low is less than High can we prove that there are several

        print "low:%d,mid:%d,high:%d" % (low,mid, high)
        if data_set[mid]==val:
            return mid  #Return his subscript
        elif data_set[mid]>val:
    return # return null proof not found
data_set = list(range(100))
print(bin_search(data_set, 34))

Let's test the migration of key when the nodes of the consistent hash algorithm change

func RingInit(server_arr []string)  *hashring.HashRing{
    return hashring.New(server_arr)

func PengzhuangCeshi(){
    servers1 :=[]string{
    servers2 :=[]string{

    r1 := RingInit(servers1)
    r2 := RingInit(servers2)
        test_num :=10000000
    client_ip := ""
    migr_num :=0
    for i:=0;i<test_num;i++{
        key :=fmt.Sprintf("%s_%v",client_ip,i)
        choose_server1,_ := r1.GetNode(key)
        choose_server2,_ := r2.GetNode(key)
        if choose_server1 !=choose_server2{

    fmt.Printf("migr_rate %.3f", float32(migr_num)/float32(test_num))


func main(){

test_num :=10000000

4 / 5 change

migr_num 1839416

migr_rate 0.184

5 / 2 change

migr_num 5737265

migr_rate 0.574

3 / 2 change

migr_num 3072919

migr_rate 0.307

4 / 3 change

migr_num 2491462

migr_rate 0.249

If a server is not available, the affected data is only the data between this server and the previous server in its ring space (i.e. the first server encountered when walking in a counterclockwise direction), and the rest will not be affected.

We speculate that the mobility is rate = 1-m / N if M < n???

At last, we need to talk less about a consistent hash ring of python

import md5
class ConsistentHashRing(object):

    def __init__(self,nodes,replicas=3):

        self.replicas = replicas
        self.ring = {}
        self.sort_keys = []
        if nodes:
            for node in nodes:

    def add_nodes(self,node):
        for i in xrange(self.replicas):
            hashkey = self.gen_key(key)
            #print hashkey
            self.ring[hashkey] = node

    def gen_key(self,key):
        m =
        return long(m.hexdigest(), 16)

    def get_node(self,data_key):
        return self.get_node_pos(data_key)[0]

    def get_node_pos(self,data_key):
        key = self.gen_key(data_key)
        nodes = self.sort_keys
        for i in xrange(0,len(nodes)):

            node = nodes[i]
            if key <= node:
                return self.ring[node],i
        return self.ring[nodes[0]],0

if __name__ == '__main__':
    Ring = ConsistentHashRing(nodes)
    for i in xrange(10000):
       print Ring.get_node("key1-%d"%i)

Tags: Go github Nginx Python less

Posted on Wed, 10 Jun 2020 23:46:24 -0400 by assonitis