Self-increasing ID algorithm snowflake

There are no rules for IDs generated using UUID s or GUID s

Snowflake algorithm is implemented by Twitter engineers for incremental but not repetitive ID s

Overview
In distributed systems, there are scenarios where a globally unique ID is required. In order to prevent ID conflicts, a 36-bit UUID can be used, but it has some drawbacks. First, it is relatively long, and UUIDs are generally out of order.Sometimes we want to be able to use a simpler ID, and we want IDs to be generated in time order.Twitter's snowflake addressed this need by initially migrating the storage system from MySQL to Cassandra, which had no sequential ID generation mechanism and developed such a set of globally unique ID generation services.(
The project address is: https://github.com/twitter/snowflake Is implemented in Scala.(
The python version details open source projects https://github.com/erans/pysnowflake.

Structure
The structure of the snowflake is as follows (each part is -separated):

0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000

The first is unused, the next 41 bits are in milliseconds (41 bits can last 69 years), then 5 bits of datacenterId and 5 bits of workerId(10 bits can support deployment of up to 1024 nodes), and the last 12 bits are counts in milliseconds (12 bits of counting sequence number supports 4096 ID sequence numbers per millisecond per node).

It adds up to just 64 bits, which is a Long.(Convert to 18 string length)

The IDs generated by snowflake are sorted by time as a whole, do not produce ID collisions (distinguished by datacenter and workerId) throughout the distributed system, and are efficient.It is said that snowflake can generate 260,000 IDs per second.

From the diagram, we can see that all three groups of floating stations except the first one is not available. It is said that the first 41 digits can support 1023 machines by 1982, the last 10 digits can support 1023 machines, and the last 12 digits serial number can generate 4095 self-increasing ID s in 1 millisecond.

Use locks in multiple threads.

 

Before you can read the code, try computer common sense: < < Move left if 1< < 2:1 Move left 2 bits = 1*2^2=4 (how much power does ^ mean here? Don't be confused with this.)

^ XOR: true^true=false; false^false=false true^false=true^true=true=true Example: 1001^0001=1000

Negative binary:

Step 1: Make absolute values binary in how many bits you need to represent
Step 2: Reverse, 0 becomes 1,1 becomes 0
Step 3: Add 1 at the end
Example: -1 binary 0001 negation 1110 last plus 1 1111

Good. Don't talk more about direct code:

 1 public class IdWorker
 2     {
 3         //machine ID
 4         private static long workerId;
 5         private static long twepoch = 687888001020L; //Unique time, this is a random variable to avoid duplication, set it to be no larger than the current timestamp
 6         private static long sequence = 0L;
 7         private static int workerIdBits = 4; //Number of machine code bytes.4 bytes to save machine code(Defined as Long Type appears, maximum offset 64 bits, so left offset 64 bits is meaningless)
 8         public static long maxWorkerId = -1L ^ -1L << workerIdBits; //Maximum Machine ID
 9         private static int sequenceBits = 10; //Counter bytes, 10 bytes to hold count codes
10         private static int workerIdShift = sequenceBits; //Left shift of machine code data, which is the number of digits occupied by subsequent counters
11         private static int timestampLeftShift = sequenceBits + workerIdBits; //Time stamp left shift bits are machine code and total counter bytes
12         public static long sequenceMask = -1L ^ -1L << sequenceBits; //Counts can be generated in one microsecond, and if they are reached, they will wait until the next microsecond
13         private long lastTimestamp = -1L;
14 
15         /// <summary>
16         /// Machine Code
17         /// </summary>
18         /// <param name="workerId"></param>
19         public IdWorker(long workerId)
20         {
21             if (workerId > maxWorkerId || workerId < 0)
22                 throw new Exception(string.Format("worker Id can't be greater than {0} or less than 0 ", workerId));
23             IdWorker.workerId = workerId;
24         }
25 
26         public long nextId()
27         {
28             lock (this)
29             {
30                 long timestamp = timeGen();
31                 if (this.lastTimestamp == timestamp)
32                 { //Generated in the same microsecond ID
33                     IdWorker.sequence = (IdWorker.sequence + 1) & IdWorker.sequenceMask; //use&Operation calculates whether the count generated in this microsecond has reached the upper limit
34                     if (IdWorker.sequence == 0)
35                     {
36                         //Generated in one microsecond ID Count reached maximum, waiting for next microsecond
37                         timestamp = tillNextMillis(this.lastTimestamp);
38                     }
39                 }
40                 else
41                 { //Generated in different microseconds ID
42                     IdWorker.sequence = 0; //Count Clear 0
43                 }
44                 if (timestamp < lastTimestamp)
45                 { //If the current timestamp is more than the last one generated ID The timestamp is small, throwing an exception because there is no guarantee that it will be generated now ID No previous build
46                     throw new Exception(string.Format("Clock moved backwards.  Refusing to generate id for {0} milliseconds",
47                         this.lastTimestamp - timestamp));
48                 }
49                 this.lastTimestamp = timestamp; //Save current timestamp as last generated ID Timestamp
50                 long nextId = (timestamp - twepoch << timestampLeftShift) | IdWorker.workerId << IdWorker.workerIdShift | IdWorker.sequence;
51                 return nextId;
52             }
53         }
54 
55         /// <summary>
56         /// Get the next microsecond timestamp
57         /// </summary>
58         /// <param name="lastTimestamp"></param>
59         /// <returns></returns>
60         private long tillNextMillis(long lastTimestamp)
61         {
62             long timestamp = timeGen();
63             while (timestamp <= lastTimestamp)
64             {
65                 timestamp = timeGen();
66             }
67             return timestamp;
68         }
69 
70         /// <summary>
71         /// Generate current timestamp
72         /// </summary>
73         /// <returns></returns>
74         private long timeGen()
75         {
76             return (long)(DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)).TotalMilliseconds;
77         }
78 
79     }

Call:
1  IdWorker idworker = new IdWorker(1);
2             for (int i = 0; i < 1000; i++)
3             {
4                 Console.WriteLine(idworker.nextId());
5             }

Other algorithms:

Method 1: UUID

UUID s are Universal Unique Identifiers, or GUID s in other languages, that generate a 32-bit global unique identifier.

 

String uuid = UUID.randomUUID().toString()

 

Example results:

 

046b6c7f-0b8a-43b9-b35d-6489e6daee91

 

 

Why do unordered UUID s cause poor repository performance?

 

This involves splitting the B+tree index:

 

 

 

 

As we all know, most indexes of relational databases are B+tree structure. Take ID field for example, each node of index tree stores several IDs.

 

If our IDs are inserted in increasing order, such as 8, 9, 10, the new IDs will only be inserted into the last node.When the last node is full, a new node will fission out.Such an insert is a more efficient insert because it has the fewest number of splits and takes full advantage of the space of each node.

 

 

 

 

However, if our insertions are completely out of order, not only will some intermediate nodes split, but also many unsaturated nodes will be created in white, which greatly reduces the performance of database insertions.

 

 

Method 2: Database Self-Adding Primary Key

 

Suppose a table named table has the following structure:

 

id        feild

35        a

 

Each time the ID is generated, access the database and execute the following statement:

 

begin;

REPLACE INTO table ( feild )  VALUES ( 'a' );

SELECT LAST_INSERT_ID();

commit;


REPLACE INTO means inserting a record and replacing the old data if the values of the unique indexes in the table conflict.

 

This gives you an incremental ID each time.

 

To improve performance, DB proxy can be used to request different repositories in a distributed system, with each repository set with a different initial value and the same number of steps and libraries:

 

 

 

 

 

 

 

Thus, DB1 generates IDs of 1,4,7,10,13... and DB2 generates IDs of 2,5,8,11,14.....

Tags: C# Database github MySQL Scala

Posted on Thu, 19 Mar 2020 03:20:03 -0400 by Tea_J