clickhouse distributed three piece one copy deployment
1. Cluster node information
192.168.18.237 ck1 192.168.18.238 ck2 192.168.18.239 ck3
2. Installing zookeeper
2.1 download the apache-zookeeper-3.6.2-bin.tar.gz installation package
Placed in the / opt/zk directory on all servers
2.2. Enter the / opt/zk directory and unzip the tar package
tar -zxf apache-zookeeper-3.6.2-bin.tar.gz
2.3. Modify zoo.cfg file
Enter the conf directory of zookeeper and copy zoo_sample.cfg is zoo.cfg
cp zoo_sample.cfg zoo.cfg
Modify the zoo.cfg file:
tickTime=2000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper dataLogDir=/data/zookeeper/log clientPort=2181 server.1=192.168.18.237:2888:3888 server.2=192.168.18.238:2888:3888 server.3=192.168.18.239:2888:3888
2.4. Create the required directory
mkdir -p /data/zookeeper/log
After configuration, scp the current zookeeper directory to the other two nodes
scp -r /opt/zk/zookeeper root@ck2:/opt/ scp -r /opt/zk/zookeeper root@ck3:/opt/ scp -r /data/zookeeper root@ck2:/opt/ scp -r /data/zookeeper root@ck3:/opt/
2.5. Setting myid
vi /data/zookeeper/myid
The myid content of ck1 is 1, the myid content of ck2 is 2, and the myid content of ck3 is 3
2.6. Start zookeeper service
Enter the zookeeper bin directory and start the zookeeper service. Each node needs to be started
./zkServer.sh start ./zkServer.sh enable
Use zkServer.sh status to view node information. One node is a leader and two nodes are follower s. This proves that the deployment of the zookeeper cluster is successful
3. Install clickhouse
3.1 build a stand-alone version at three nodes
rpm -ivh clickhouse-common-static-20.12.5.14-2.x86_64.rpm rpm -ivh clickhouse-server-20.12.5.14-2.noarch.rpm rpm -ivh clickhouse-client-20.12.5.14-2.noarch.rpm cat > /etc/metrika.xml <<EOF <yandex> <networks> <ip>::/0</ip> </networks> </yandex> EOF sed -i 's/<!-- <listen_host>::<\/listen_host> -->/<listen_host>::<\/listen_host>/g' /etc/clickhouse-server/config.xml sed -i 's/<tcp_port>9000<\/tcp_port>/<tcp_port>8999<\/tcp_port>/g' /etc/clickhouse-server/config.xml sed -i 's/<password><\/password>/<password>123456<\/password>/g' /etc/clickhouse-server/users.xml
At this time, the clickhouse service port number of each node has been changed to 8999, and the password of the default user has been changed to 123456
Start the Clickhouse server service
systemctl start clickhouse-server systemctl enable clickhouse-server
You can use the command stand-alone login authentication
clickhouse-client --host localhost --user default --password 123456 --port 8999
3.2 cluster deployment
Modify metrika.xml for all nodes
First, take a node as an example, VI / etc / Clickhouse server / metrika.xml. Add the configuration information as follows:
<yandex> <clickhouse_remote_servers> <!--Cluster name--> <perftest_3shards_1replicas> <!--Slice 1--> <shard> <internal_replication>true</internal_replication> <replica> <host>ck1</host> <port>8999</port> <user>default</user> <password>123456</password> </replica> </shard> <!--Slice 2--> <shard> <replica> <internal_replication>true</internal_replication> <host>ck2</host> <port>8999</port> <user>default</user> <password>123456</password> </replica> </shard> <!--Slice 3--> <shard> <internal_replication>true</internal_replication> <replica> <host>ck3</host> <port>8999</port> <user>default</user> <password>123456</password> </replica> </shard> </perftest_3shards_1replicas> </clickhouse_remote_servers> <!--zookeeper Related configuration--> <zookeeper-servers> <!--index Corresponding value of zookeeper of myid Content of--> <node index="1"> <host>ck1</host> <port>2182</port> </node> <node index="2"> <host>ck2</host> <port>2182</port> </node> <node index="3"> <host>ck3</host> <port>2182</port> </node> </zookeeper-servers> <macros> <!--Copy name of this node. The following is based on the node where the configuration file is located IP/Domain name modification--> <replica>ck1</replica> </macros> <networks> <ip>::/0</ip> </networks> <!--compression algorithm--> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
explain:
- clickhouse_remote_servers corresponds to the incl attribute value in config.xml
- perftest_3shards_1replicas is the cluster name and can be customized.
- shard is data slicing
- internal_replication =true this parameter is related to data writing and automatic replication. From the perspective of production environment, we all copy tables and write through local tables. Just configure true here. It is not recommended and no other circumstances need to be considered.
- clickhouse_compression data compression.
- When there is no copy, you can use the node label instead of the shard label to define the node
Restart the Clickhouse server service on all nodes
systemctl restart clickhouse-server
3.3 validation cluster
Log in to the clickhouse client
clickhouse-client --host localhost --user default --password 123456 --port 8999
SQL query
select * from system.clusters
In this way, the cluster is built
3.4 testing
Create databases and local tables at each node
create database testdb ; create table person_local (ID Int8, Name String, BirthDate Date) ENGINE = MergeTree(BirthDate, (Name, BirthDate), 8192);
Create a distribution table at each node
create table person_all as person_local ENGINE = Distributed(perftest_3shards_1replicas, testdb, person_local, rand());
The Distributed table itself does not store data, which is equivalent to routing. You need to specify the cluster name, database name, data table name and partition KEY
Here, random fragmentation is represented by rand() function.
Query the distribution table, route to the specific data table according to the cluster configuration information, and then merge the results.
- person_local is the local table, and the data is only local
- person_all is a distributed table. Query this table, and the engine will automatically calculate and return the entire cluster data
insert data
insert into person_all (*) valuses ('1','a','2021-10-01'); insert into person_all (*) valuses ('2','b','2021-10-01'); insert into person_all (*) valuses ('3','c','2021-10-01'); insert into person_all (*) valuses ('4','d','2021-10-01'); insert into person_all (*) valuses ('5','e','2021-10-01'); insert into person_all (*) valuses ('6','f','2021-10-01'); insert into person_all (*) valuses ('7','g','2021-10-01'); insert into person_all (*) valuses ('8','h','2021-10-01'); insert into person_all (*) valuses ('9','i','2021-10-01'); insert into person_all (*) valuses ('10','j','2021-10-01'); insert into person_all (*) valuses ('11','k','2021-10-01'); insert into person_all (*) valuses ('12','l','2021-10-01');
Then view the data of distributed tables and local tables
select * from person_all; select * from person_local;
It can be found that the data is fragmented to the local tables of three machines.