clickhouse distributed three piece one copy deployment

clickhouse distributed three piece one copy deployment

1. Cluster node information

192.168.18.237	ck1

192.168.18.238	ck2

192.168.18.239	ck3

2. Installing zookeeper

2.1 download the apache-zookeeper-3.6.2-bin.tar.gz installation package

Placed in the / opt/zk directory on all servers

2.2. Enter the / opt/zk directory and unzip the tar package

tar -zxf apache-zookeeper-3.6.2-bin.tar.gz

2.3. Modify zoo.cfg file

Enter the conf directory of zookeeper and copy zoo_sample.cfg is zoo.cfg

cp zoo_sample.cfg zoo.cfg 

Modify the zoo.cfg file:

tickTime=2000

initLimit=10

syncLimit=5

dataDir=/data/zookeeper

dataLogDir=/data/zookeeper/log

clientPort=2181

server.1=192.168.18.237:2888:3888
server.2=192.168.18.238:2888:3888
server.3=192.168.18.239:2888:3888

2.4. Create the required directory

mkdir -p /data/zookeeper/log

After configuration, scp the current zookeeper directory to the other two nodes

scp -r /opt/zk/zookeeper root@ck2:/opt/
scp -r /opt/zk/zookeeper root@ck3:/opt/
scp -r /data/zookeeper root@ck2:/opt/
scp -r /data/zookeeper root@ck3:/opt/

2.5. Setting myid

vi /data/zookeeper/myid

The myid content of ck1 is 1, the myid content of ck2 is 2, and the myid content of ck3 is 3

2.6. Start zookeeper service

Enter the zookeeper bin directory and start the zookeeper service. Each node needs to be started

./zkServer.sh start
./zkServer.sh enable

Use zkServer.sh status to view node information. One node is a leader and two nodes are follower s. This proves that the deployment of the zookeeper cluster is successful

3. Install clickhouse

3.1 build a stand-alone version at three nodes

rpm -ivh clickhouse-common-static-20.12.5.14-2.x86_64.rpm

rpm -ivh clickhouse-server-20.12.5.14-2.noarch.rpm

rpm -ivh clickhouse-client-20.12.5.14-2.noarch.rpm


cat > /etc/metrika.xml <<EOF
<yandex>
        <networks>
        <ip>::/0</ip>
        </networks>
</yandex>
EOF


sed -i 's/<!-- <listen_host>::<\/listen_host> -->/<listen_host>::<\/listen_host>/g' /etc/clickhouse-server/config.xml

sed -i 's/<tcp_port>9000<\/tcp_port>/<tcp_port>8999<\/tcp_port>/g' /etc/clickhouse-server/config.xml

sed -i  's/<password><\/password>/<password>123456<\/password>/g' /etc/clickhouse-server/users.xml

At this time, the clickhouse service port number of each node has been changed to 8999, and the password of the default user has been changed to 123456

Start the Clickhouse server service

systemctl start clickhouse-server
systemctl enable clickhouse-server

You can use the command stand-alone login authentication

clickhouse-client --host localhost --user default --password 123456 --port 8999

3.2 cluster deployment

Modify metrika.xml for all nodes

First, take a node as an example, VI / etc / Clickhouse server / metrika.xml. Add the configuration information as follows:

<yandex>

<clickhouse_remote_servers>
<!--Cluster name-->
<perftest_3shards_1replicas> 
<!--Slice 1-->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ck1</host>
<port>8999</port>
<user>default</user>
<password>123456</password>
</replica>
</shard>

<!--Slice 2-->
<shard>
<replica>
<internal_replication>true</internal_replication>
<host>ck2</host>
<port>8999</port>
<user>default</user>
<password>123456</password>
</replica>
</shard>

<!--Slice 3-->
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ck3</host>
<port>8999</port>
<user>default</user>
<password>123456</password>
</replica>
</shard>
</perftest_3shards_1replicas>
</clickhouse_remote_servers>

<!--zookeeper Related configuration-->
<zookeeper-servers>
<!--index Corresponding value of zookeeper of myid Content of-->
<node index="1"> 
<host>ck1</host>
<port>2182</port>
</node>
<node index="2">
<host>ck2</host>
<port>2182</port>
</node>
<node index="3">
<host>ck3</host>
<port>2182</port>
</node>
</zookeeper-servers>

<macros>
<!--Copy name of this node. The following is based on the node where the configuration file is located IP/Domain name modification-->
<replica>ck1</replica>
</macros>

<networks>
<ip>::/0</ip>
</networks>

<!--compression algorithm-->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>

</yandex>

explain:

  • clickhouse_remote_servers corresponds to the incl attribute value in config.xml
  • perftest_3shards_1replicas is the cluster name and can be customized.
  • shard is data slicing
  • internal_replication =true this parameter is related to data writing and automatic replication. From the perspective of production environment, we all copy tables and write through local tables. Just configure true here. It is not recommended and no other circumstances need to be considered.
  • clickhouse_compression data compression.
  • When there is no copy, you can use the node label instead of the shard label to define the node

Restart the Clickhouse server service on all nodes

systemctl restart clickhouse-server

3.3 validation cluster

Log in to the clickhouse client

clickhouse-client --host localhost --user default --password 123456 --port 8999

SQL query

select * from system.clusters

In this way, the cluster is built

3.4 testing

Create databases and local tables at each node

create database testdb ;
create table person_local (ID Int8, Name String, BirthDate Date) ENGINE = MergeTree(BirthDate, (Name, BirthDate), 8192);

Create a distribution table at each node

create table person_all as person_local ENGINE = Distributed(perftest_3shards_1replicas, testdb, person_local, rand());

The Distributed table itself does not store data, which is equivalent to routing. You need to specify the cluster name, database name, data table name and partition KEY
Here, random fragmentation is represented by rand() function.

Query the distribution table, route to the specific data table according to the cluster configuration information, and then merge the results.

  • person_local is the local table, and the data is only local
  • person_all is a distributed table. Query this table, and the engine will automatically calculate and return the entire cluster data

insert data

insert into person_all (*) valuses ('1','a','2021-10-01');
insert into person_all (*) valuses ('2','b','2021-10-01');
insert into person_all (*) valuses ('3','c','2021-10-01');
insert into person_all (*) valuses ('4','d','2021-10-01');
insert into person_all (*) valuses ('5','e','2021-10-01');
insert into person_all (*) valuses ('6','f','2021-10-01');
insert into person_all (*) valuses ('7','g','2021-10-01');
insert into person_all (*) valuses ('8','h','2021-10-01');
insert into person_all (*) valuses ('9','i','2021-10-01');
insert into person_all (*) valuses ('10','j','2021-10-01');
insert into person_all (*) valuses ('11','k','2021-10-01');
insert into person_all (*) valuses ('12','l','2021-10-01');

Then view the data of distributed tables and local tables

select * from person_all;
select * from person_local;

It can be found that the data is fragmented to the local tables of three machines.

Tags: Database Big Data Zookeeper

Posted on Sun, 10 Oct 2021 23:10:59 -0400 by thomasanup