Influence of unordered uuid on Database
Due to the recent performance test of large tables, it is found that unordered UUIDs as primary keys have a certain impact on the table insertion performance. Combined with the actual situation, it is found that the larger the amount of data in the table, the greater the impact on the table insertion performance.
testing environment
PostgreSQL creates an insert script to test tps in various cases.
Database version: PostgreSQL 10.4 (artrybase 5.0.0, Thunisoft)
Operating system configuration: CentOS Linux release 7, 32GB memory, 8 cpu
Test parameters: pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb
Various primary key tests for empty tables, 1000w data, 5000w data and 100 million data.
Test unordered UUIDs, ordered UUIDs, sequences, common btree, unique index and no primary key
test
1. Create table
--Disordered uuid pgbenchdb=# create table test_uuid_v4(id char(32) primary key); CREATE TABLE --Orderly uuid pgbenchdb=# create table test_time_nextval(id char(32) primary key); CREATE TABLE --Incremental sequence pgbenchdb=# create table test_seq_bigint(id int8 primary key); CREATE TABLE --Create sequence create sequence test_seq start with 1 ;
2. Test script
--Test disorder uuid script vi pgbench_uuid_v4.sql insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-','')); --Test order uuid script vi pgbench_time_nextval.sql insert into test_time_nextval (id) values (replace(uuid_time_nextval()::text,'-','')); --Test sequence script vi pgbench_seq_bigint.sql insert into test_seq_bigint (id) values (nextval('test_seq'::regclass));
Unordered uuid, no data
Disk usage avg-cpu: %user %nice %system %iowait %steal %idle 0.76 0.00 0.38 4.67 0.00 94.19 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 96.00 0.00 2048.00 42.67 1.02 10.67 0.00 10.67 10.33 99.20 dm-0 0.00 0.00 0.00 96.00 0.00 2048.00 42.67 1.02 10.66 0.00 10.66 10.32 99.10 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 tps: [thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb transaction type: /opt/thunisoft/pgbench_uuid_v4.sql scaling factor: 1 query mode: prepared number of clients: 8 number of threads: 8 duration: 60 s number of transactions actually processed: 53494 latency average = 8.974 ms tps = 891.495404 (including connections establishing) tps = 891.588967 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 9.006 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-',''));
tps without data
category | for the first time | The second time | third time | average value(tps) |%util |await ---------------+---------+---------+---------+---------+-------+------- disorder uuid | 919 | 907 | 891 | 906 | 99.2% | 10.66 Orderly uuid | 985 | 882 | 932 | 933 | 98.7% | 4.4 sequence | 1311 | 1277 | 1280 | 1289 | 97.5% | 3.4
Initialize 100w data to the table
pgbenchdb=# insert into test_uuid_v4 (id) select replace(uuid_generate_v4()::text,'-','') from generate_series(1,1000000); INSERT 0 1000000 Time: 43389.817 ms (00:43.390) pgbenchdb=# insert into test_time_nextval (id) select replace(uuid_time_nextval()::text,'-','') from generate_series(1,1000000); INSERT 0 1000000 Time: 30585.134 ms (00:30.585) pgbenchdb=# insert into test_seq_bigint select generate_series (1,1000000); INSERT 0 1000000 Time: 9818.639 ms (00:09.819) disorder uuid Insert 100 w Need 43 s,Order requires 30 s,The sequence requires 10 s.
tps after inserting one million data
category | for the first time | The second time | third time | average value(tps) |%util |await ---------------+---------+---------+---------+---------+-------+------- disorder uuid | 355 | 440 | 302 | 365 | 98.8% | 13 Orderly uuid | 948 | 964 | 870 | 927 | 97.2% | 4.0 sequence | 1159 | 1234 | 1115 | 1169 | 96.6% | 3.5
tps after inserting 10 million data
category | for the first time | The second time | third time | average value(tps) |%util |await ---------------+---------+---------+---------+---------+-------+------- disorder uuid | 260 | 292 | 227 | 260 | 99.2% | 16.8 Orderly uuid | 817 | 960 | 883 | 870 | 97.7% | 3.9 sequence | 1305 | 1261 | 1270 | 1278 | 96.8% | 3.0
After inserting 50 million data
Insert 5 into the table kw Data and add a primary key pgbenchdb=# insert into test_time_nextval (id) select replace(uuid_time_nextval()::text,'-','') from generate_series(1,50000000); INSERT 0 50000000 Time: 453985.318 ms (07:33.985) pgbenchdb=# insert into test_seq_bigint select generate_series (1,50000000); INSERT 0 50000000 Time: 352206.160 ms (05:52.206) pgbenchdb=# insert into test_uuid_v4 (id) select replace(uuid_generate_v4()::text,'-','') from generate_series(1,50000000); INSERT 0 50000000 Time: 1159689.338 ms (00:19:19.689) When there is no primary key, insert 50 million data in order uuid It takes 7 minutes, the sequence takes 6 minutes, and it is out of order uuid It took nearly 20 minutes. pgbenchdb=# alter table test_uuid_v4 add primary key ("id"); ALTER TABLE Time: 845199.296 ms (14:05.199) pgbenchdb=# alter table test_time_nextval add primary key ("id"); ALTER TABLE Time: 932151.103 ms (15:32.151) pgbenchdb=# alter table test_seq_bigint add primary key ("id"); ALTER TABLE Time: 148138.871 ms (02:28.139) pgbenchdb=# select pg_size_pretty(pg_total_relation_size('test_uuid_v4')); pg_size_pretty ---------------- 6072 MB (1 row) Time: 0.861 ms pgbenchdb=# select pg_size_pretty(pg_total_relation_size('test_time_nextval')); pg_size_pretty ---------------- 6072 MB (1 row) Time: 0.942 ms pgbenchdb=# select pg_size_pretty(pg_total_relation_size('test_seq_bigint')); pg_size_pretty ---------------- 2800 MB (1 row) Time: 0.699 ms
After 5kw insertion
category | for the first time | The second time | third time | average value(tps) |%util |await ---------------+---------+---------+---------+---------+-------+------- disorder uuid | 162 | 163 | 163 | 163 | 99.6% | 18.4 Orderly uuid | 738 | 933 | 979 | 883 | 97.7% | 3.9 sequence | 1132 | 1264 | 1265 | 1220 | 96.8% | 3.5
After inserting 100 million pieces of data
category | for the first time | The second time | third time | average value(tps) |%util |await ---------------+---------+---------+---------+---------+-------+------- disorder uuid | 121 | 131 | 143 | 131 | 99.6% | 28.2 Orderly uuid | 819 | 795 | 888 | 834 | 99.2% | 28.7 sequence | 1193 | 1115 | 1109 | 1139 | 96.8% | 11.3
Ordinary btree index
The unordered uuid is measured above. In 1kw case, the tps with primary key is 260 and the tps without primary key is 1234. Try to test the normal index and the unique index tps
--Create normal index pgbenchdb=# create index i_test_uuid_v4_id on test_uuid_v4(id); CREATE INDEX Time: 316367.010 ms (05:16.367) --After creating a normal index [thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb transaction type: /opt/thunisoft/pgbench_uuid_v4.sql scaling factor: 1 query mode: prepared number of clients: 8 number of threads: 8 duration: 60 s number of transactions actually processed: 13308 latency average = 36.080 ms tps = 221.727391 (including connections establishing) tps = 221.749660 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 38.512 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-','')); --Create unique index pgbenchdb=# drop index i_test_uuid_v4_id; DROP INDEX Time: 267.451 ms pgbenchdb=# create unique index i_test_uuid_v4_id on test_uuid_v4(id); CREATE INDEX Time: 153372.622 ms (02:33.373) [thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb ^[[3~transaction type: /opt/thunisoft/pgbench_uuid_v4.sql scaling factor: 1 query mode: prepared number of clients: 8 number of threads: 8 duration: 60 s number of transactions actually processed: 13847 latency average = 34.693 ms tps = 230.593988 (including connections establishing) tps = 230.620469 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 36.410 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-',''));
Both ordinary btree indexes and unique indexes will affect the efficiency of insertion.
Delete all primary key indexes
--Delete all primary keys alter table test_uuid_v4 drop constraint "test_uuid_v4_pkey"; alter table test_time_nextval drop constraint "test_time_nextval_pkey" ; alter table test_seq_bigint drop constraint "test_seq_bigint_pkey"; 1,--disorder uuid: test pgbench_uuid_v4.sql [thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_uuid_v4.sql -U sa pgbenchdb transaction type: /opt/thunisoft/pgbench_uuid_v4.sql scaling factor: 1 query mode: prepared number of clients: 8 number of threads: 8 duration: 60 s number of transactions actually processed: 74109 latency average = 6.479 ms tps = 1234.842229 (including connections establishing) tps = 1235.042674 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 6.112 insert into test_uuid_v4 (id) values (replace(uuid_generate_v4()::text,'-','')); 2,--Orderly uuid,test pgbench_time_nextval.sql [thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_time_nextval.sql -U sa pgbenchdb transaction type: /opt/thunisoft/pgbench_time_nextval.sql scaling factor: 1 query mode: prepared number of clients: 8 number of threads: 8 duration: 60 s number of transactions actually processed: 74027 latency average = 6.486 ms tps = 1233.364360 (including connections establishing) tps = 1233.482292 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 6.186 insert into test_time_nextval (id) values (replace(uuid_time_nextval()::text,'-','')); 3,--Sequence, test pgbench_seq_bigint.sql [thunisoft@localhost thunisoft]$ pgbench -M prepared -r -n -j 8 -c 8 -T 60 -f /opt/thunisoft/pgbench_seq_bigint.sql -U sa pgbenchdb transaction type: /opt/thunisoft/pgbench_seq_bigint.sql scaling factor: 1 query mode: prepared number of clients: 8 number of threads: 8 duration: 60 s number of transactions actually processed: 76312 latency average = 6.290 ms tps = 1271.832907 (including connections establishing) tps = 1272.124397 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 5.916 insert into test_seq_bigint (id) values (nextval('test_seq'::regclass));
After deleting the primary key constraint, tps is very close in the three cases, reaching 1200 +.
Btree index, average tps comparison of insertion operations
category/average tps | No data | ten million | Fifty million | one hundred million | ---------------+---------+---------+---------+---------+ disorder uuid | 960 | 260 | 163 | 131 | Orderly uuid | 933 | 870 | 883 | 834 | sequence | 1289 | 1278 | 1220 | 1139 |
According to the test data, it can be seen that the tps of the inserted data decreases sharply after the unordered uuid reaches 1kw, while the ordered uuid and the increasing sequence decrease less. The tps ordered uuid of 100 million data is 6 times that of disorder, and the sequence is 9 times that of disorder uuid.
Create a separate tablespace to store index information
If you have a fast disk, you can store indexes and data separately to speed up writing.
Create a separate index space:
create tablespace indx_test owner sa location '/home/tablespace/index_test';
Specify index storage directory:
create index i_test_uuid_v4_id on test_uuid_v4 using btree(id) tablespace indx_test;
About ordered uuid
Test the sequential UUIDs plug-in used to generate ordered UUIDs.
The structure of ordered uuid is (block ID; random data). In fact, it is to divide the data into two parts, one is self increasing and the other is random.
Two algorithms are provided:
1.uuid_sequence_nextval(sequence regclass, block_size int default 65536, block_count int default 65536)
The prefix is a self incrementing sequence. If the BLOCK ID is stored in 2 bytes, 256 records can be stored in an index BLOCK (assuming an 8K BLOCK, one record includes uuid VALUE (16 bytes) and ctid (6 bytes), so an index page stores about 363 records (8000 / (16 + 6))
2.uuid_time_nextval(interval_length int default 60, interval_count int default 65536) RETURNS uuid
By default, the prefix of the data every 60 seconds is the same. The prefix is incremented by 1 and circulates after 65535.
use uuid_time_nextval Generated order uuid pgbenchdb=# select id from test_time_nextval; id ---------------------------------- a18b7dd0ca92b0b5c1844a402f9c6999 a18b540b8bbe0ddb2b6d0189b2e393c6 a18b83eb7320b0a90e625185421e065e a18bade4ff15e05dab81ecd3f4c2dee4 a18b79e41c3bc8d2d4ba4b70447e6b29 a18bdad18d9e0d2fa1d9d675bc7129f0 a18b13723ec7be9a2f1a3aec5345a88b a18bd9d866047aec69a064d30e9493d2 a18bd76e8c787c7464479502f381e6d7 a18ba5c0c966f81cfdbeff866618da8d ......
The first four bits of ordered uuid are ordered, and the following ones are randomly generated.
epilogue
1. For ordered UUIDs, the first four bits are ordered and the rest are randomly generated.
2. In this environment, it is found that with the increasing amount of data, tps declines sharply.
3. Due to the existence of btree index, unordered UUIDs will lead to a large number of discrete io. Resulting in high disk utilization. This affects the insertion efficiency. It becomes more obvious with the increase of table data.
4. The test is conducted on ordinary disks, not on SSDs.
5. If you want to use ordered UUIDs, there are many ways to implement them. You also need to consider generating global ordered UUIDs under distributed conditions.