Talk about the snowflake algorithm used by big factories

preface

rand and srand have been used to generate pseudo-random numbers before. The sequence of pseudo-random numbers is fixed. Today, learn to generate real random numbers.

Entropy pool

The value of random number can be generated by / dev/urandom. The entropy pool under / dev/urandom Linux is the environmental noise under the current system, which describes the degree of chaos of a system. The environmental noise is composed of these aspects, such as the use of memory, the use of files, the number of different types of processes, etc.

The value of random number can be generated by / dev/urandom. The entropy pool under / dev/urandom Linux is the environmental noise under the current system, which describes the degree of chaos of a system. The environmental noise is composed of these aspects, such as the use of memory, the use of files, the number of different types of processes, etc.

#include <stdio.h>
#include <fcntl.h>


int main()
{
        int randNum = 0;
        int fd = 0;


    for(int i=0;i<5;i++)
    {      
      fd = open("/dev/urandom", O_RDONLY);  
      read(fd, (char *)&randNum, sizeof(int));
      close(fd); 
      printf("randNum is %d\n", randNum);
    }


        return 0;
}

Operation results:

mapan@mapan-virtual-machine:~/c++$ ./a.out 
randNum is 94961710
randNum is -523780773
randNum is 1542169420
randNum is -1632410867

The five random numbers printed each time are different. In fact, its randomness is not very good. The number generated by snowflake algorithm is very random. It usually generates a unique ID in a distributed system.

Snowflake algorithm

The ID generated by SnowFlake algorithm is a 64 bit integer with the following structure (each part is separated by "-")
0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 00000000000

In the 1-bit identification part, in java, because the highest bit of long is the symbol bit, the positive number is 0 and the negative number is 1, the generated ID is generally positive, so it is 0;

The 41 bit timestamp part, which is a millisecond time, generally does not store the current timestamp, but the difference of the timestamp (current time - fixed start time), so that the generated ID can start from a smaller value; The 41 bit timestamp can be used for 69 years, (1L < 41) / (1000L 60 24 365) = 69 years;

In the 10 bit node part, the first 5 bits are used as the data center ID and the last 5 bits are used as the machine ID in the Twitter implementation, and 1024 nodes can be deployed;

The 12 bit serial number part supports 4096 ID s generated by the same node in the same millisecond;

/* 
    snowflake 


    ID Generation strategy 
    Millisecond time 41 bits + machine ID 10 bits + sequence 12 bits in millisecond.
    0 41 51 64 +-----------+------+------+ |time |pc |inc | +-----------+------+------+ 
    The first 41 bits are timestamp s in microseconds.
    Next, 10bits is the machine ID configured in advance.
    The last 12bits is the accumulation counter.
    macheine id(10bits)It indicates that at most 1024 machines can generate IDS at the same time, and the sequence number(12bits) also indicates that a machine can generate 4096 IDS at most in 1ms* 
      Note that the 64 bit operating system is required because the displacement operation is used, otherwise the generated ID may be incorrect 
*/  


#include <stdio.h>  
#include <pthread.h>  
#include <unistd.h>  
#include <stdlib.h>  
#include <sched.h>  
#include <linux/unistd.h>  
#include <sys/syscall.h>  
#include <errno.h>  
#include<linux/types.h>  
#include<time.h>  
#include <stdint.h>  
#include <sys/time.h>  


struct  globle  
{  
    int global_int:12;  
    uint64_t last_stamp;  
    int workid;  
    int seqid;  
};  


void set_workid(int workid);  
pid_t gettid( void );  
uint64_t get_curr_ms();  
uint64_t wait_next_ms(uint64_t lastStamp);  
int atomic_incr(int id);  
uint64_t get_unique_id();
#include "snowflake.h"


struct globle g_info;


#Define sequencemask (- 1L ^ (- 1L < < 12L)) / / L indicates long 4095


void set_workid(int workid)
{
 g_info.workid = workid;
}


pid_t gettid( void )//Get thread ID
{
  return syscall( __NR_gettid );
}


uint64_t get_curr_ms()  //Get milliseconds
{
  struct timeval time_now;
  gettimeofday(&time_now,NULL);
  uint64_t ms_time =time_now.tv_sec*1000+time_now.tv_usec/1000;
  return ms_time;
}


uint64_t wait_next_ms(uint64_t lastStamp)
{
  uint64_t cur = 0;
  do {
    cur = get_curr_ms();
  } while (cur <= lastStamp);
  return cur;
}


int atomic_incr(int id)//accumulation
{
  __sync_add_and_fetch(&id, 1);
  return id;
}


uint64_t get_unique_id()
{
  uint64_t  uniqueId=0;
  uint64_t nowtime = get_curr_ms();//Gets the current number of milliseconds


  uniqueId = nowtime << 22;   //Fill in the timestamp section


  //0x3ff 1023, binary corresponding to 11 1111 1111 
  //100 binary 0000 0110 0100
  //Perform the shift first
  uniqueId |= (g_info.workid & 0x3ff) << 12;   //Fill node part


  if (nowtime < g_info.last_stamp)
  {
    perror("error");
    exit(-1);
  }


  if (nowtime == g_info.last_stamp)
  {
    //4095 binary 0000 1111 1111 1111 [long type]
    g_info.seqid = atomic_incr(g_info.seqid) & sequenceMask;
    if (g_info.seqid == 0)  //seqid=0 conflict prevention, modification time
    {
      nowtime = wait_next_ms(g_info.last_stamp);//Gets a time greater than the current time
    }
  }
  else
  {
    g_info.seqid  = 0;
  }
  g_info.last_stamp = nowtime;


  uniqueId |= g_info.seqid;//Fill in the serial number section
  return uniqueId;
}


int main()
{
  set_workid(100);
  int i;
  for(i=0;i<10;i++)
  {
    uint64_t unquie = get_unique_id();
    printf("pthread_id:%u, id [%llu]\n",gettid(),unquie);
  }


  return;  
}

Operation results:

mapan@mapan-virtual-machine:~/c++$ ./a.out 
pthread_id:4970, id [6595660141600063488]
pthread_id:4970, id [6595660141600063489]
pthread_id:4970, id [6595660141600063490]
pthread_id:4970, id [6595660141600063491]
pthread_id:4970, id [6595660141600063492]

ending

Snowflake algorithm is used by many large factories, and its randomness is better than entropy pool. The idea of snowflake algorithm is also useful in daily work. It is a common routine to spell multiple data into one value.

Tags: Java Algorithm data structure

Posted on Wed, 08 Sep 2021 19:03:45 -0400 by volant