Author: Xu Yaorong
Member of DBA team of aikesheng South District delivery service department, mainly responsible for MySQL fault handling and related technical support. Like movies, games, travel and billiards.
Source: original contribution
*It is produced by aikesheng open source community. The original content cannot be used without authorization. For reprint, please contact Xiaobian and indicate the source.
Capacity expansion ideas
internal_ Description of replication parameter
Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas).
When this parameter is set to true, only one normal copy is selected for write operation to write data. If the local table of the Distributed table is a replicated table (* ReplicaMergeTree), it can be set to true. The data replication between replicas will be handled by the ReplicatedMergeTree itself, and the Distributed general table will no longer be responsible for it, thus reducing its burden.
If this parameter is set to false (the default), the write operation will write data to all copies. In essence, this means copying data from the distributed table itself. This method is not as good as using replication tables, because it does not check the consistency of replicas, and replica data may be inconsistent over time (node downtime, data insertion failure).
1. ReplicatedMergeTree ENGINE
The ReplicatedMergeTree table engine itself has the synchronization function, so it does not need distributed tables to replicate data between replicas (internal_replication is true). The data synchronization of replicas is coordinated by zookeeper. The specific expansion steps are as follows:
- Add the complete information of the expanded cluster in the cluster configuration of the new replica node.
- The historical replica node modifies the configuration file and adds the new replica node information to the cluster configuration (the historical replica cluster does not need to stop the library, and the configuration file can be hot updated).
- Start the new replica node and create the corresponding replication local table and distributed table (at this time, the query request of the replica node can route all replica nodes normally).
- zookeeper will automatically synchronize the data information in the historical replica to the new replica node for data synchronization.
2. MergeTree ENGINE
Because the MergeTree table engine itself does not have the function of synchronizing replicas, the data replication of cluster replicas needs to be handled by the distributed master table (internal_replication is false).
Therefore, the new replica node will not synchronize the historical data from the original historical replica node, but for the new data, the cluster replicas can be synchronized normally. To this end, the historical data is synchronized by backup. The specific steps are as follows:
- Add the complete information of the current cluster in the cluster configuration of the new replica node.
- The historical replica node modifies the configuration file and adds the new replica node information to the cluster configuration (the historical replica cluster does not need to stop the library, and the configuration file can be hot updated).
- Start the new replica node and create the corresponding replicated local tables and distributed tables.
- In the historical replica, export the historical data through filtering, and then import the historical data into the local table of the new replica to achieve data consistency. During this period, the writing of the cluster will not be affected.
Case verification
Environment introduction:
Operating system version: CentOS Linux release 7.5.1804 (Core) (4C4G)
Software version: ClickHouse version 21.8.4.51, zookeeper-3.7.0
hostname | ip | port | role |
---|---|---|---|
node1 | 10.186.63.71 | 9000 | replica(clickhouse) |
node2 | 10.186.63.74 | 9000 | replica(clickhouse) |
node3 | 10.186.63.48 | 9000 | Node to be added (clickhouse) |
node1 | 10.186.63.71 | 2181 | zookeeper (single node) |
This time, two clickhouse nodes are built in advance, single partition and double replica (multi replica and single partition). The test objects are replicated MergeTree and MergeTree. Each node creates the local table of the corresponding engine and the Distributed master table of the Distributed engine. The write, query and other operations of the local table on each node are routed and forwarded by the Distributed master table.
Due to internal in two table engine configurations_ The requirements of replication parameters are inconsistent, so they are tested separately.
ReplicatedMergeTree ENGINE
1. Cluster information
(1) The following configuration information defines the cluster name as test_ Single slice double copy cluster of action (metrika.xml).
<yandex> <zookeeper-servers> <node index="1"> <host>node1</host> <port>2181</port> </node> </zookeeper-servers> <remote_servers> <test_action> <shard> <internal_replication>true</internal_replication> <replica> <host>node1</host> <port>9000</port> </replica> <replica> <host>node2</host> <port>9000</port> </replica> </shard> </test_action> </remote_servers> <networks> <ip>::/0</ip> </networks> <macros> <cluster>test_action</cluster> <shard>1</shard> <replica>node1</replica> </macros> </yandex>
Note: the cluster information configuration of different nodes in the cluster is the same except for the macros label. Macros is the unique ID of the replica. The following is the specific description.
macros label of node1:
<macros> <cluster>test_action</cluster> ##Cluster name <shard>1</shard> ##shard number <replica>node1</replica> ##Copy name </macros> node2 of macros label: <macros> <cluster>test_action</cluster> <shard>1</shard> <replica>node2</replica> </macros>
(2) Cluster information and table structure data
Cluster information:
node1 :) select * from system.clusters where cluster = 'test_action'; SELECT * FROM system.clusters WHERE cluster = 'test_action' Query id: 8495c696-9060-4aac-acc6-c641b2ec7aa2 ┌─cluster─────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐ │ test_action │ 1 │ 1 │ 1 │ node1 │ 10.186.63.71 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 2 │ node2 │ 10.186.63.74 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ └─────────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘ 2 rows in set. Elapsed: 0.006 sec.
2. Expansion copy
(1) The node3 node creates metrika.xml and adds cluster information.
<yandex> <zookeeper-servers> <node index="1"> <host>node1</host> <port>2181</port> </node> </zookeeper-servers> <remote_servers> <test_action> <shard> <internal_replication>true</internal_replication> <replica> <host>node1</host> <port>9000</port> </replica> <replica> <host>node2</host> <port>9000</port> </replica> <replica> <host>node3</host> <port>9000</port> </replica> </shard> </test_action> </remote_servers> <macros> <cluster>test_action</cluster> <shard>1</shard> <replica>node3</replica> </macros> <networks> <ip>::/0</ip> </networks> </yandex>
(2) Modify node1 and node2 node metrika.xml file and add node3 cluster information.
Add the following information under the shard tag
<replica> <host>node3</host> <port>9000</port> </replica>
(3) After the node3 node is started, all nodes check the cluster information, and the replica information is synchronized completely.
node1 :) select * from system.clusters where cluster = 'test_action'; SELECT * FROM system.clusters WHERE cluster = 'test_action' Query id: c901b3a0-743e-4cfc-b5f8-5add5c21ba42 ┌─cluster─────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐ │ test_action │ 1 │ 1 │ 1 │ node1 │ 10.186.63.71 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 2 │ node2 │ 10.186.63.74 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 3 │ node3 │ 10.186.63.48 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ └─────────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘ 3 rows in set. Elapsed: 0.005 sec. node2 :) select * from system.clusters where cluster = 'test_action'; SELECT * FROM system.clusters WHERE cluster = 'test_action' Query id: bbdc533d-996c-4c46-a660-13a1b3e7f5cf ┌─cluster─────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐ │ test_action │ 1 │ 1 │ 1 │ node1 │ 10.186.63.71 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 2 │ node2 │ 10.186.63.74 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 3 │ node3 │ 10.186.63.48 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ └─────────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘ 3 rows in set. Elapsed: 0.003 sec. node3 :) select * from system.clusters where cluster = 'test_action'; SELECT * FROM system.clusters WHERE cluster = 'test_action' Query id: e676da7c-faf0-4876-aba8-c2fb29d9adb5 ┌─cluster─────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐ │ test_action │ 1 │ 1 │ 1 │ node1 │ 10.186.63.71 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 2 │ node2 │ 10.186.63.74 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ │ test_action │ 1 │ 1 │ 3 │ node3 │ 10.186.63.48 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │ └─────────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘ 3 rows in set. Elapsed: 0.004 sec.
3. Establish synchronization
(1) node3 node creates the corresponding local replication table and distributed table structure. Create local replication table
create table table_test( label_id UInt32, label_name String, insert_time Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/test_action/1/table_test','node3',insert_time, (label_id, insert_time), 8192);
Create distributed tables
CREATE TABLE table_test_all AS table_test ENGINE = Distributed(test_action, default, table_test, rand());
Note: do not use distributed DDL syntax (ON CLUSTER) when creating a table at this time. You can create it locally, otherwise an error will be thrown for the existing table of other nodes.
(2) After creating local and distributed tables, zookeeper will automatically synchronize data to the new replica in the background. After synchronization, check that the local table data is consistent, and the distributed master table can route queries normally.
node3 :) select * from table_test; SELECT * FROM table_test Query id: dffc5029-0a45-49fb-a5c6-c39f488b2a40 ┌─label_id─┬─label_name─┬─insert_time─┐ │ 4 │ 111 │ 2021-07-01 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 1 │ 111 │ 2021-06-28 │ │ 2 │ 111 │ 2021-06-29 │ │ 3 │ 111 │ 2021-06-30 │ └──────────┴────────────┴─────────────┘ 4 rows in set. Elapsed: 0.005 sec. node3 :) select * from table_test_all; SELECT * FROM table_test_all Query id: b590017e-a502-4010-86f6-190ab899b15f ┌─label_id─┬─label_name─┬─insert_time─┐ │ 4 │ 111 │ 2021-07-01 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 1 │ 111 │ 2021-06-28 │ │ 2 │ 111 │ 2021-06-29 │ │ 3 │ 111 │ 2021-06-30 │ └──────────┴────────────┴─────────────┘ 4 rows in set. Elapsed: 0.006 sec.
(3) Insert data through the distributed table to verify the normal replication of cluster data.
node1 inserts new data.
node1 :) insert into table_test_all values (5,'111','2021-07-02'),(6,'111','2021-07-03'); INSERT INTO table_test_all VALUES Query id: b3047373-13e4-4667-bda6-17f21c96ace7 Ok. 2 rows in set. Elapsed: 0.017 sec. node1 :) select * from table_test_all; SELECT * FROM table_test_all Query id: e942c27d-3193-4303-8ff6-f0da86939558 ┌─label_id─┬─label_name─┬─insert_time─┐ │ 1 │ 111 │ 2021-06-28 │ │ 2 │ 111 │ 2021-06-29 │ │ 3 │ 111 │ 2021-06-30 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 4 │ 111 │ 2021-07-01 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 5 │ 111 │ 2021-07-02 │ │ 6 │ 111 │ 2021-07-03 │ └──────────┴────────────┴─────────────┘ 6 rows in set. Elapsed: 0.006 sec. node2 :) select * from table_test_all; SELECT * FROM table_test_all Query id: 21f9e176-d314-4823-b4c7-3f4a0fadee4f ┌─label_id─┬─label_name─┬─insert_time─┐ │ 1 │ 111 │ 2021-06-28 │ │ 2 │ 111 │ 2021-06-29 │ │ 3 │ 111 │ 2021-06-30 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 4 │ 111 │ 2021-07-01 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 5 │ 111 │ 2021-07-02 │ │ 6 │ 111 │ 2021-07-03 │ └──────────┴────────────┴─────────────┘ 6 rows in set. Elapsed: 0.004 sec. node3 :) select * from table_test_all; SELECT * FROM table_test_all Query id: 9e66470b-11bb-4b7d-bc4c-40b07ee01ae8 ┌─label_id─┬─label_name─┬─insert_time─┐ │ 4 │ 111 │ 2021-07-01 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 1 │ 111 │ 2021-06-28 │ │ 2 │ 111 │ 2021-06-29 │ │ 3 │ 111 │ 2021-06-30 │ └──────────┴────────────┴─────────────┘ ┌─label_id─┬─label_name─┬─insert_time─┐ │ 5 │ 111 │ 2021-07-02 │ │ 6 │ 111 │ 2021-07-03 │ └──────────┴────────────┴─────────────┘ 6 rows in set. Elapsed: 0.005 sec.
MergeTree ENGINE (single slice, double copy)
metrika.xml of node1, node2 and node3 nodes except internal_ Except that replication is set to false, the operations of other configurations, capacity expansion and adding replicas are the same as above. The following starts with establishing synchronization.
Establish synchronization
(1) node3 node creates the corresponding local replication table and distributed table structure.
Create local replication table
CREATE TABLE t_cluster( id Int16, name String, birth Date )ENGINE = MergeTree() PARTITION BY toYYYYMM(birth) ORDER BY id;
Create distributed tables
CREATE TABLE dist_t_cluster as t_cluster engine = Distributed(test_action, default, t_cluster,rand());
(2) node3 views the local table and distributed table data, and the historical data will not be synchronized by itself; Insert new data into the distributed master table, and the replicas can be synchronized normally.
node1 :) select * from dist_t_cluster; SELECT * FROM dist_t_cluster Query id: 42b3e629-8cf0-4ea5-a9ca-a26caafa03c2 ┌─id─┬─name─┬──────birth─┐ │ 1 │ aaa │ 2021-02-01 │ │ 2 │ bbb │ 2021-02-02 │ └────┴──────┴────────────┘ node2 :) select * from dist_t_cluster; SELECT * FROM dist_t_cluster Query id: 8fbe57b7-c375-439f-a235-9feb07b33d83 ┌─id─┬─name─┬──────birth─┐ │ 1 │ aaa │ 2021-02-01 │ │ 2 │ bbb │ 2021-02-02 │ └────┴──────┴────────────┘ node3 :) select * from dist_t_cluster; SELECT * FROM dist_t_cluster Query id: ffa76681-9f09-4016-b7ca-3b97119f9580 Ok. 0 rows in set. Elapsed: 0.005 sec. ##node3 insert new data: insert into dist_t_cluster values(3, 'aaa', '2021-03-01'), (4, 'bbb', '2021-03-02'); ##When querying the data, the replica synchronization is normal, but the new node node3 can only query the new data and cannot synchronize the previous data. node3 :) select * from dist_t_cluster; SELECT * FROM dist_t_cluster Query id: 7ad9f34b-e2a8-47ed-9d6e-a7ed76a6f8c6 ┌─id─┬─name─┬──────birth─┐ │ 3 │ aaa │ 2021-03-01 │ │ 4 │ bbb │ 2021-03-02 │ └────┴──────┴────────────┘ node2 :) select * from dist_t_cluster; SELECT * FROM dist_t_cluster Query id: 8fbe57b7-c375-439f-a235-9feb07b33d83 ┌─id─┬─name─┬──────birth─┐ │ 1 │ aaa │ 2021-02-01 │ │ 2 │ bbb │ 2021-02-02 │ └────┴──────┴────────────┘ ┌─id─┬─name─┬──────birth─┐ │ 3 │ aaa │ 2021-03-01 │ │ 4 │ bbb │ 2021-03-02 │ └────┴──────┴────────────┘ node1 :) select * from dist_t_cluster; SELECT * FROM dist_t_cluster Query id: 42b3e629-8cf0-4ea5-a9ca-a26caafa03c2 ┌─id─┬─name─┬──────birth─┐ │ 1 │ aaa │ 2021-02-01 │ │ 2 │ bbb │ 2021-02-02 │ └────┴──────┴────────────┘ ┌─id─┬─name─┬──────birth─┐ │ 3 │ aaa │ 2021-03-01 │ │ 4 │ bbb │ 2021-03-02 │ └────┴──────┴────────────┘
(3) In any historical copy, export the previous data into a tsv file through filtering, transfer it to the node3 node, import it into the local table, and check the consistency of the data.
##node1 node exports data [root@node1 ~]# clickhouse-client --query="select * from t_cluster where id <5 " > /var/lib/clickhouse/backup/t_cluster.tsv ##Transfer the csv file to the node3 node for import [root@node3 config.d]# cat /tmp/t_cluster.tsv | clickhouse-client --query="insert into t_cluster FORMAT TSV" node3 :) select * from t_cluster; SELECT * FROM t_cluster Query id: 0713ab9e-6594-41f0-917d-8b968c609766 ┌─id─┬─name─┬──────birth─┐ │ 1 │ aaa │ 2021-02-01 │ │ 2 │ bbb │ 2021-02-02 │ └────┴──────┴────────────┘ ┌─id─┬─name─┬──────birth─┐ │ 3 │ aaa │ 2021-03-01 │ │ 4 │ bbb │ 2021-03-02 │ └────┴──────┴────────────┘