Why doesn't MySQL recommend using uuid or snowflake id as the primary key?

preface

When designing tables in mysql, MySQL officially recommends not to use uuid or discontinuous and non repeated snowflake id(long and unique, single machine increment), but continuous and self increment primary key id. the official recommendation is auto_increment, so why not recommend using uuid? What are the disadvantages of using uuid?

In this blog, we will analyze this problem and explore the internal reasons.

Contents of this blog

  • mysql program instance

  • Comparison of index structures using uuid and self incrementing id

  • summary

1, mysql and program instances

1.1. To illustrate this problem, let's first create three tables

They are user_auto_key,user_uuid,user_random_key means auto growing primary key, UUID as primary key, random key as primary key, and others remain unchanged

According to the control variable method, we only use different strategies to generate the primary key of each table, while the other fields are exactly the same. Then test the insertion speed and query speed of the table:

Note: the random key here actually refers to the discontinuous, non repetitive and irregular id: a string of 18 bit long values calculated by the snowflake algorithm

id auto generate table:

User uuid table

Random primary key table:

1.2. Theory alone is not enough. Go directly to the program and use spring's JDBC template to realize the additional query test:

Technical framework: springboot + jdbc template + JUnit + hutool. The principle of the program is to connect to its own test database, and then write the same amount of data in the same environment to analyze the insertion time of insert to synthesize its efficiency. In order to achieve the most real effect, all data are randomly generated, such as name, email and address.

package com.wyq.mysqldemo;import cn.hutool.core.collection.CollectionUtil;import com.wyq.mysqldemo.databaseobject.UserKeyAuto;import com.wyq.mysqldemo.databaseobject.UserKeyRandom;import com.wyq.mysqldemo.databaseobject.UserKeyUUID;import com.wyq.mysqldemo.diffkeytest.AutoKeyTableService;import com.wyq.mysqldemo.diffkeytest.RandomKeyTableService;import com.wyq.mysqldemo.diffkeytest.UUIDKeyTableService;import com.wyq.mysqldemo.util.JdbcTemplateService;import org.junit.jupiter.api.Test;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.boot.test.context.SpringBootTest;import org.springframework.util.StopWatch;import java.util.List;@SpringBootTestclass MysqlDemoApplicationTests {
    @Autowired    private JdbcTemplateService jdbcTemplateService;
    @Autowired    private AutoKeyTableService autoKeyTableService;
    @Autowired    private UUIDKeyTableService uuidKeyTableService;
    @Autowired    private RandomKeyTableService randomKeyTableService;

    @Test    void testDBTime() {
        StopWatch stopwatch = new StopWatch("implement sql Time consumption");

        /**         * auto_increment key task         */        final String insertSql = "INSERT INTO user_key_auto(user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?)";
        List<UserKeyAuto> insertData = autoKeyTableService.getInsertData();        stopwatch.start("Automatic generation key Table task start");        long start1 = System.currentTimeMillis();        if (CollectionUtil.isNotEmpty(insertData)) {            boolean insertResult = jdbcTemplateService.insert(insertSql, insertData, false);            System.out.println(insertResult);        }        long end1 = System.currentTimeMillis();        System.out.println("auto key Time consumed:" + (end1 - start1));
        stopwatch.stop();

        /**         * uudID key of         */        final String insertSql2 = "INSERT INTO user_uuid(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?)";
        List<UserKeyUUID> insertData2 = uuidKeyTableService.getInsertData();        stopwatch.start("UUID of key Table task start");        long begin = System.currentTimeMillis();        if (CollectionUtil.isNotEmpty(insertData)) {            boolean insertResult = jdbcTemplateService.insert(insertSql2, insertData2, true);            System.out.println(insertResult);        }        long over = System.currentTimeMillis();        System.out.println("UUID key Time consumed:" + (over - begin));
        stopwatch.stop();

        /**         * Random long value key         */        final String insertSql3 = "INSERT INTO user_random_key(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?)";        List<UserKeyRandom> insertData3 = randomKeyTableService.getInsertData();        stopwatch.start("Random long value key Table task start");        Long start = System.currentTimeMillis();        if (CollectionUtil.isNotEmpty(insertData)) {            boolean insertResult = jdbcTemplateService.insert(insertSql3, insertData3, true);            System.out.println(insertResult);        }        Long end = System.currentTimeMillis();        System.out.println("random key Task consumption time:" + (end - start));        stopwatch.stop();

        String result = stopwatch.prettyPrint();        System.out.println(result);    }
 

1.3. Program writing results

user_key_auto write result:

user_random_key write result:

user_uuid table write result:

1.4. Efficiency test results

When the existing data volume is 130W: let's test the insertion of 10w data to see what results will be obtained:

It can be seen that when the amount of data is about 100W, the insertion efficiency of uuid is at the bottom, and 130W of data is added in the later order, and the time of uudi decreases linearly.

The overall efficiency ranking of time occupation is: Auto_ key>random_ Key > UUID, the efficiency of UUID is the lowest. When the amount of data is large, the efficiency drops sharply. So why does this happen? With questions, let's discuss this issue:

2, Comparison of index structures using uuid and self incrementing id

2.1. Internal structure using self incrementing id

The self incremented primary key values are sequential, so innodb stores each record after one record. When the maximum fill factor of the page is reached (the default maximum fill factor of innodb is 15 / 16 of the page size, and 1 / 16 of the space will be reserved for future use     Modified):

① The next record will be written to the new page. Once the data is loaded in this order, the primary key page will be almost filled with sequential records, which improves the maximum filling rate of the page and will not waste pages

② The newly inserted row must be the next row under the original maximum data row. mysql positioning and addressing are fast, and no additional consumption will be made for calculating the location of the new row

③ Reduce page splitting and fragmentation

2.2. Index internal structure using uuid

Because uuid is irregular relative to the self increasing id of the order, and the value of the new row does not have to be larger than the value of the previous primary key, innodb cannot always insert the new row into the end of the index, but needs to find a new suitable location for the new row to allocate new space.

This process requires a lot of additional operations. Out of order data will lead to scattered data distribution, which will lead to the following problems:

① The written target page may have been flushed to the disk and removed from the cache, or it has not been loaded into the cache. innodb has to find and read the target page from the disk to memory before inserting, which will lead to a large number of random IO

② Because the writes are out of order, innodb has to frequently perform page splitting operations to allocate space for new rows. Page splitting causes a large amount of data to be moved, and at least three pages need to be modified for one insertion

③ Due to frequent page splitting, pages will become sparse and filled irregularly, which will eventually lead to data fragmentation

After the random values (uuid and snowflake id) are loaded into the cluster index (innodb's default index type), it is sometimes necessary to do an optimize table to rebuild the table and optimize the page filling, which will take a certain amount of time.

Conclusion: innodb should be inserted in the self increasing order of the primary key as much as possible, and the monotonically increasing value of the cluster key should be used to insert new rows as much as possible

2.3. Disadvantages of using self increasing id

Is there no harm in using self incrementing IDS? No, the self incrementing id also has the following problems:

① Once others crawl your database, they can get your business growth information according to the self increasing id of the database, and it is easy to analyze your business situation

② For high concurrent loads, innodb will cause obvious lock contention when inserting by primary key, and the upper bound of primary key will become the focus of contention, because all inserts occur here, and concurrent inserts will lead to gap lock competition

③Auto_ The increment lock mechanism will cause the self incrementing lock to be robbed, resulting in a certain performance loss

Attachment: Auto_ For the lock contention problem of increment, you need to tune InnoDB if you want to improve it_ autoinc_ lock_ Mode configuration

3, Summary

This blog first asks questions at the beginning, builds tables, and uses JDBC template to test the performance of different ID generation strategies in large amount of data insertion, then analyzes the different ID mechanisms in mysql index structure, advantages and disadvantages, and deeply explains why uuid and random non repeating id have performance loss in data insertion, and explains this problem in detail.

In the actual development, it is better to use the self increasing id according to the official recommendation of mysql. mysql is broad and profound. There are still many points worthy of optimization that we need to learn.



Link: https://www.jianshu.com/p/5da1146c8aa3

Tags: Database MySQL Spring

Posted on Sun, 19 Sep 2021 02:47:35 -0400 by er0x