Several common distributed ID solutions

1, Distributed ID concept

Speaking of ID, the characteristic is unique. In the human world, ID is the ID card and everyone's unique ID. In complex distributed systems, it is often necessary to uniquely identify a large number of data and messages.

For example, in the case of a single entity, the ID field of the database can use self increment as the ID, but after the data is divided into databases and tables, a unique ID must be used to identify a piece of data. This ID is the distributed ID. For distributed ID, it also needs to have the characteristics of distributed system: high concurrency, high availability, high performance and so on.

2, Distributed ID implementation scheme

The following table shows the comparison of some common schemes:

 

 

At present, there are two popular distributed ID solutions: "segment mode" and "snowflake algorithm".

The number segment mode depends on the database, but it is different from the database primary key auto increment mode. Assuming that 100 is a number segment 100200300, 100 ID S can be obtained each time, and the performance is significantly improved.

"Snowflake algorithm" is composed of symbol bit + timestamp + working machine id + serial number, as shown in the figure:

 

 

The sign bit is 0, 0 represents a positive number, and the ID is a positive number.

Needless to say, the timestamp bit is used to store the timestamp. The unit is ms.

The working machine id bit is used to store the machine id, which is usually divided into 5 area bits + 5 server identification bits.

The sequence number bit is self incremented.

How much data can snowflake algorithm store? Time range: 2 ^ 41 / (3652460601000) = 69 year work process range: 2 ^ 10 = 1024 serial number range: 2 ^ 12 = 4096, indicating that 4096 ID S can be generated in 1ms.

According to the logic of the algorithm, only the algorithm needs to be implemented in Java language and encapsulated into a tool method, then each business application can directly use the tool method to obtain the distributed ID, just ensure that each business application has its own working machine ID, and there is no need to build an application to obtain the distributed ID separately. Here is the twitter version of Snowflake algorithm:

public class SnowFlake {

    /**
     * Starting timestamp
     */
    private final static long START_STMP = 1480166465631L;

    /**
     * Number of bits occupied by each part
     */
    private final static long SEQUENCE_BIT = 12; //Number of digits occupied by serial number
    private final static long MACHINE_BIT = 5;   //Number of digits occupied by machine identification
    private final static long DATACENTER_BIT = 5;//Number of bits occupied by data center

    /**
     * Maximum value of each part
     */
    private final static long MAX_DATACENTER_NUM = -1L ^ (-1L << DATACENTER_BIT);
    private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT);
    private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);

    /**
     * Displacement of each part to the left
     */
    private final static long MACHINE_LEFT = SEQUENCE_BIT;
    private final static long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT;
    private final static long TIMESTMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT;

    private long datacenterId;  //Data center
    private long machineId;     //Machine identification
    private long sequence = 0L; //serial number
    private long lastStmp = -1L;//Last timestamp

    public SnowFlake(long datacenterId, long machineId) {
        if (datacenterId > MAX_DATACENTER_NUM || datacenterId < 0) {
            throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0");
        }
        if (machineId > MAX_MACHINE_NUM || machineId < 0) {
            throw new IllegalArgumentException("machineId can't be greater than MAX_MACHINE_NUM or less than 0");
        }
        this.datacenterId = datacenterId;
        this.machineId = machineId;
    }

    /**
     * Generate next ID
     *
     * @return
     */
    public synchronized long nextId() {
        long currStmp = getNewstmp();
        if (currStmp < lastStmp) {
            throw new RuntimeException("Clock moved backwards.  Refusing to generate id");
        }

        if (currStmp == lastStmp) {
            //Within the same milliseconds, the serial number increases automatically
            sequence = (sequence + 1) & MAX_SEQUENCE;
            //The number of sequences in the same millisecond has reached the maximum
            if (sequence == 0L) {
                currStmp = getNextMill();
            }
        } else {
            //Within different milliseconds, the serial number is set to 0
            sequence = 0L;
        }

        lastStmp = currStmp;

        return (currStmp - START_STMP) << TIMESTMP_LEFT //Timestamp part
                | datacenterId << DATACENTER_LEFT       //Data center part
                | machineId << MACHINE_LEFT             //Machine identification part
                | sequence;                             //Serial number part
    }

    private long getNextMill() {
        long mill = getNewstmp();
        while (mill <= lastStmp) {
            mill = getNewstmp();
        }
        return mill;
    }

    private long getNewstmp() {
        return System.currentTimeMillis();
    }

    public static void main(String[] args) {
        SnowFlake snowFlake = new SnowFlake(2, 3);

        for (int i = 0; i < (1 << 12); i++) {
            System.out.println(snowFlake.nextId());
        }

    }
}

3, Distributed ID open source component

3.1 how to select open source components

Selecting open source components first depends on whether the software features meet the requirements, mainly including compatibility and scalability.

Secondly, we need to see whether the current technical capability can be used smoothly according to the current technical stack and technical capability of ourselves or the team.

Third, it depends on the community of open source components. It mainly focuses on whether updates are frequent, whether the project is maintained, whether you can contact for help when you encounter a pit, whether it is widely used in the industry, etc.

3.2 meituan Leaf

Leaf is a distributed ID generation service launched by meituan basic R & D platform. Its name is taken from the words of German philosopher and mathematician Leibniz: "there are no two identity leaves in the world." leaf has the characteristics of high reliability, low delay and global uniqueness.

At present, it has been widely used in meituan finance, meituan takeout, meituan wine travel and other departments. For specific technical details, please refer to an article on meituan technology blog: "Leaf meituan distributed ID generation service".

Currently, the Leaf project is open source on Github:

https://github.com/Meituan-Dianping/Leaf.

The features of Leaf are as follows:

  • It is globally unique. There will never be duplicate IDS, and the overall trend of IDS is increasing.
  • High availability. The service is completely based on the distributed architecture. Even if MySQL goes down, it can tolerate the unavailability of the database for a period of time.
  • High concurrency and low latency. On the virtual machine of CentOS 4C8G, the remote call of QPS can reach 5W +, and the TP99 can be within 1ms.
  • The access is simple. It can be accessed directly through the company's RPC service or HTTP call.

3.3 Baidu UidGenerator

UidGenerator Baidu is an open source distributed high-performance unique ID generator based on Snowflake algorithm. According to a description on the official website, UidGenerator works in application projects in the form of components and supports custom workerId bits and initialization strategies, which is applicable to scenarios such as automatic restart and drift of instances in virtual environments such as docker.

In terms of implementation, UidGenerator solves the natural concurrency limitation of sequence by borrowing the future time; RingBuffer is used to cache the generated UID, parallelize the production and consumption of UID, and supplement the CacheLine to avoid the hardware level "pseudo sharing" problem caused by RingBuffer. Finally, the single machine QPS can reach 6 million.

GitHub address of UidGenerator:

https://github.com/baidu/uid-generator

3.4 comparison of open source components

Baidu UidGenerator is the of Java language; The last record submitted was two years ago, and basically no one maintained it; Only snowflake algorithm is supported.

Meituan Leaf is also part of the Java language; The most recent maintenance is 2020; Support segment mode and snowflake algorithm.

To sum up, meituan Leaf is slightly better than the two open source components.

 

Source: cnblogs.com/SmallStrange/p/14277333.html

Tags: Distribution

Posted on Tue, 23 Nov 2021 10:02:10 -0500 by chick3n