Docker builds a fully distributed CDH cluster

preface

        Based on the previous setup, it will be found that the three containers are released to one physical machine, and the effect of CDH cluster is realized on one physical machine. This is OK for testing. At that time, the resources were far from enough in the actual environment.

        Next, based on the previous steps, the installation package will be used to build a fully distributed CDH cluster, and the CDH cluster will be built on multiple physical machines.

        The communication problem between cross service containers is the difficulty of building a fully distributed CDH cluster. Here, Dokcer Swarm network will be used to solve this problem.

1. Copy the installation package

Copy the image installation package to each node

The master node copies master-server.tar.gz, hadoop_CDH.zip

Copy agent-server.tar.gz from node

two   Uninstall Docker (each node)

systemctl stop docker
yum -y remove docker-ce docker-ce-cli containerd.io
rm -rf /var/lib/docker
 Uninstall old version:
yum -y remove docker docker-client docker-client-latest docker-common docker-latest \
    docker-latest-logratate docker-logrotate docker-engine

three   Install Docker (each node)

Install required packages: yum install -y yum-utils
 Add domestic yum Source: yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
 to update yum Indexes: yum makecache fast
 install docker: yum -y install docker-ce docker-ce-cli containerd.io
 Test command:
    systemctl start docker \
    && docker version 
result:
Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41

#Configure mirror acceleration
sudo mkdir -p /etc/docker \
&& ( cat <<EOF
{"registry-mirrors":["https://qiyb9988.mirror.aliyuns.com"]}
EOF
) >> /etc/docker/daemon.json \
&& sudo systemctl daemon-reload \
&& sudo systemctl restart docker \
&& systemctl status docker 

three   Initialize swarm (the master node asrserver001 acts as the manager)

# If it is used initially, the swarm node is forced to leave (each node)
docker swarm leave --force

# Note: the advertisement addr must be an intranet address. For each node to ping each other, ssh login is not required,
# Otherwise, the mutual access of containers in the node will be abnormal (such as port 22 access denial)
[root@server001 ~]# docker swarm init --advertise-addr 172.16.0.6

result:
Swarm initialized: current node (iqs8gjyc6rbecu8isps4i5xv9) is now a manager.
To add a worker to this swarm, run the following command:
    # Become the command to be executed by the worker
    docker swarm join --token SWMTKN-1-66m3f30eafi307affyhjwp4954kuai9n5xb1lveetflg4u7bkb-cqzfkonnjxxtk7zqcl9omhs5b 172.16.0.6:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Note: if it is deployed on a cloud service, remember to open port 2377 for the outbound port

4. Join the worker node (slave node server002-003 as work)

# Get the execution command of the previous step and execute it on the slave node respectively
[root@server002 ~]# docker swarm join --token SWMTKN-1-66m3f30eafi307affyhjwp4954kuai9n5xb1lveetflg4u7bkb-cqzfkonnjxxtk7zqcl9omhs5b 172.16.0.6:2377
[root@server003 ~]# docker swarm join --token SWMTKN-1-66m3f30eafi307affyhjwp4954kuai9n5xb1lveetflg4u7bkb-cqzfkonnjxxtk7zqcl9omhs5b 172.16.0.6:2377
 # It can be seen from the above that the swarm node is connected to the swarm cluster through port 2377. Be sure to open this port
 
 result:
 This node joined a swarm as a worker.

5. View cluster node information (primary node)

[root@server001 ~]# docker node ls

# result
ID                            HOSTNAME       STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
iqs8gjyc6rbecu8isps4i5xv9 *   asrserver001   Ready     Active         Leader           20.10.8
rqrxol843ojajibfqbfcsk1l8     asrserver002   Ready     Active                          20.10.8
yu7udkwkul8nujgwdsx1tp5yo     asrserver003   Ready     Active                          20.10.8
 It can be seen from the appeal that each node has formed a cluster, server001 by Leader(manager),Other nodes are worker

6. Create overlay network (master node)

[root@server001 ~]# docker network create --opt encrypted --driver overlay --attachable cdh-net && docker network ls

result:
31qxzd9bs57my40deif8j3hsu
NETWORK ID     NAME              DRIVER    SCOPE
d3b73b97d240   bridge            bridge    local
31qxzd9bs57m   cdh-net           overlay   swarm
3be8470b3027   docker_gwbridge   bridge    local
f2fcf804158d   host              host      local
1oaefqouo4sv   ingress           overlay   swarm
e927f8141ece   none              null      local
# It can be seen from the above that ingress is the network of docker swarm, which is used for communication between cluster nodes
    cdh-net It is a self added network, which is used for communication between cross service containers in the later stage
    score=local by docker Its own network mode is used for the communication mode between single node content containers

# View CDH net network details
[root@server001 ~]# docker network inspect cdh-net
 result:
[{"Name": "cdh-net",
        "Id": "s3q5ldynr8riytkq3a4beyazc",
        "Created": "2021-09-13T05:50:12.398783253Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [{
                    "Subnet": "10.0.1.0/24",
                    "Gateway": "10.0.1.1"}]},
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""},
        "ConfigOnly": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097",
            "encrypted": ""},
        "Labels": null}]
# From the network, we can know the subnet and network card information of CDH net. When each node starts the container, specify the network and allocate the subnet to realize the normal communication across the service container
# The CDH net network cannot be viewed from the worker. It can only be viewed after the container is started to join the network

7. Start container on server001 node

# Load the image package into docker:
[root@server001 ~]# docker load -i /root/master-server.tar.gz && docker images
 result:
Loaded image: master-server/cdh:6.3.2
REPOSITORY          TAG       IMAGE ID       CREATED          SIZE
master-server/cdh   6.3.2     d4f3e4ee3f9e   26 minutes ago   3.62GB

# Create startup container: specify swarm custom network CDH net
[root@server001 ~]# docker run \
--restart always \
-d --name server001 \
--hostname server001 \
--net cdh-net \
--ip 10.0.1.4 \
-p 8020:8020 \
-p 8088:8088 \
-p 19888:19888 \
-p 9870:9870 \
-p 9000:9000 \
-p 7180:7180 \
-p 2181:2181 \
--privileged=true \
-v /usr/local/src/host-config/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
master-server/cdh:6.3.2 \
/usr/sbin/init  \
&& docker ps

8. Start the container on the server002 node

# Load the image package into docker:
[root@server002 ~]# docker load -i /root/agent-server.tar.gz && docker images
 result:
Loaded image: agent-server/cdh:6.3.2
REPOSITORY         TAG       IMAGE ID       CREATED          SIZE
agent-server/cdh   6.3.2     5d91a7f659a1   11 minutes ago   2.8GB

# To create a startup container:
# Specify the swarm custom network CDH net to automatically join the swarm cluster
[root@server002 ~]# docker run -d \
--restart always \
--hostname server002 \
--name server002 \
--net cdh-net \
--ip 10.0.1.6 \
-p 10000:10000 \
-p 2181:2181 \
--privileged=true \
-v /usr/local/src/host-config/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
agent-server/cdh:6.3.2  \
/usr/sbin/init \
&& docker ps

9. Start the container on the server003 node

# Load the image package into docker:
[root@server003 ~]# docker load -i agent-server.tar.gz && docker images
 result:
Loaded image: agent-server/cdh:6.3.2
REPOSITORY         TAG       IMAGE ID       CREATED          SIZE
agent-server/cdh   6.3.2     5d91a7f659a1   11 minutes ago   2.8GB

# To create a startup container:
# Specify the swarm custom network CDH net to automatically join the swarm cluster
[root@server003 ~]# docker run -d \
--restart always \
--hostname server003 \
--name server003 \
--net cdh-net \
--ip 10.0.1.8 \
-p 12345:12345 \
-p 2181:2181 \
--privileged=true \
-v /usr/local/src/host-config/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
agent-server/cdh:6.3.2  \
/usr/sbin/init \
&& docker ps

10. View the cluster network (each node)

Check whether the cluster network configuration composed of each node is correct.

Under normal conditions: for each container added in the CDH net network, the IP address of the node will be added under "Peers", so as to ensure the normal communication between swarm clusters

# All three nodes can view the CDH net network. Only the container ip and name change, and other results remain unchanged
[root@server001 ~]# docker network inspect cdh-net

result:
[{"Name": "cdh-net",
        "Id": "enzbj3sg1wg1s5vn5gextjc59",
        "Created": "2021-09-13T14:27:55.559422055+08:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [{
                    "Subnet": "10.0.1.0/24",
                    "Gateway": "10.0.1.1"}]},
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""},
        "ConfigOnly": false,
        "Containers": {
            "b8e1b1f987f1af38946018f77dfb8429a9d41ae503f4d42f4391fbfae53d0b46": {
                "Name": "server003",  # Container name 
                "EndpointID": "5da0812008ec5af9fac93ed7e8e4ceeb09a1ffb59e3d8b6be83c7bd319a1c3ea",
                "MacAddress": "02:42:0a:00:01:06",
                "IPv4Address": "10.0.1.8/24", # The ip address of the new container is incremented
                "IPv6Address": ""
            },
            "lb-cdh-net": {
                "Name": "cdh-net-endpoint",
                "EndpointID": "48ec1b73e478b7c6475048229a8d803646d66b71a7e7f5b0719641a906d0e07b",
                "MacAddress": "02:42:0a:00:01:07",
                "IPv4Address": "10.0.1.9/24", # 
                "IPv6Address": ""}},
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097",
            "encrypted": ""},
        "Labels": {},
        "Peers": [{
                "Name": "a0f495c4d7a7",
                "IP": "172.16.0.6"        # Intranet ip address of node server001
            },{
                "Name": "973c153cd191",
                "IP": "172.16.0.16"       # Intranet ip address of node server002
            },{
                "Name": "d4f899e63511",
                "IP": "172.16.0.2"}]}]    # Intranet ip address of node server003
       # Nodes maintain communication through port 2377, and container network maintains communication through port 7946
       
       # If Peers cannot see the corresponding ip of the server container after it is started, it should be that the container is not bound to the CDH net network
       # docker network disconnect cdh-net server001
       # docker network connect cdh-net server001 

11. Test network communication between cross service containers (each node)

# Enter each container:
[root@server001 ~]# docker exec -ti --privileged=true server001 /bin/bash
[root@server003 ~]# docker exec -ti --privileged=true server002 /bin/bash
[root@server003 ~]# docker exec -ti --privileged=true server003 /bin/bash
docker exec -ti --privileged=true $(docker ps | awk 'NR==2 {print $1}') /bin/bash

ping server001 -c 3 && ping server002 -c 3 && ping server003 -c 3

result: PING server001 (10.0.1.2) 56(84) bytes of data.
64 bytes from server001.cdh-net (10.0.1.2): icmp_seq=1 ttl=64 time=0.419 ms
64 bytes from server001.cdh-net (10.0.1.2): icmp_seq=2 ttl=64 time=0.342 ms
64 bytes from server001.cdh-net (10.0.1.2): icmp_seq=3 ttl=64 time=0.368 ms

--- server001 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.342/0.376/0.419/0.035 ms
PING server002 (10.0.1.4) 56(84) bytes of data.
64 bytes from server002 (10.0.1.4): icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from server002 (10.0.1.4): icmp_seq=2 ttl=64 time=0.035 ms
64 bytes from server002 (10.0.1.4): icmp_seq=3 ttl=64 time=0.036 ms

--- server002 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.025/0.032/0.036/0.005 ms
PING server003 (10.0.1.8) 56(84) bytes of data.
64 bytes from server003.cdh-net (10.0.1.8): icmp_seq=1 ttl=64 time=0.230 ms
64 bytes from server003.cdh-net (10.0.1.8): icmp_seq=2 ttl=64 time=0.297 ms
64 bytes from server003.cdh-net (10.0.1.8): icmp_seq=3 ttl=64 time=0.319 ms

--- server003 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.230/0.282/0.319/0.037 ms

12. Set ssh across service containers (each node)

# Initialize the root password of each container,
# It is uniformly modified to 12345678 or 123456. Since ASR admin needs to upload the flinkx JSON file through ssh, it needs to be connected with the java port after the password is modified
passwd root

# Modify / etc/host
10.0.1.4 server001
10.0.1.6 server002
10.0.1.8 server003 


# Each container generates and distributes ssh keys,
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa \
&& ssh-copy-id server001 \
&& ssh-copy-id server002 \
&& ssh-copy-id server003

# Test security free access to each container 
ssh server001
ssh server002
ssh server003

13. Copy mysql jdbc driver (in the primary node container)

Copy the parcel installation package and msyql driver package to the server001 container on site, and then operate on server001

The driver is to enable the CDH service to connect to the mysql database and save operation data

First, on the host computer, set mysql of jar The package and the program to be installed are copied to the container/root Directory:
[root@server001 ~]# tree /root/hadoop_CDH/
hadoop_CDH/
├── flink-csd
│   ├── FLINK-1.10.2.jar
│   └── FLINK_ON_YARN-1.10.2.jar
├── mysql-jdbc
│   └── mysql-connector-java.jar
└── parcel
    ├── CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
    ├── FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel
    └── manifest.json
3 directories, 6 files

# Execute on the host to start copying:
[root@server001 ~]# docker cp /root/hadoop_CDH/ server001:/root

# Re enter the container
[root@server001 ~]# docker exec -ti --privileged=true server001 /bin/bash

# Return to server001 container
[root@server001 ~]# mkdir -p /usr/share/java/ \
&& cp /root/hadoop_CDH/mysql-jdbc/mysql-connector-java.jar /usr/share/java/ \
&& rm -rf /root/hadoop_CDH/mysql-jdbc/ \
&& ls /usr/share/java/
Result: 0
mysql-connector-java.jar

14. Configure the parcel installation package (in the master node container)

cd /opt/cloudera/parcel-repo/;mv /root/hadoop_CDH/parcel/* ./ \
&& sha1sum CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel| awk '{ print $1 }' > CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha \
&& sha1sum FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel | awk '{ print $1 }' > FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel.sha \
&& rm -rf /root/hadoop_CDH/parcel/ \
&& chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/* \
&& ll /opt/cloudera/parcel-repo/ 
result:
total 2330372
-rw-r--r-- 1 cloudera-scm cloudera-scm 2082186246 6 June 15-16:15 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
-rw-r--r-- 1 cloudera-scm cloudera-scm         41 9 December 18:11 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm  304055379 12 January 2020 FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel
-rw-r--r-- 1 cloudera-scm cloudera-scm         41 9 December 18:11 FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm      34411 7 September 9:53 manifest.json

15. Configure the Flink installation package (in the master node container)

# After copying the jar of flink to / opt/cloudera/csd /,
cp /root/hadoop_CDH/flink-csd/* /opt/cloudera/csd/ \
    && ll /opt/cloudera/csd/ \
    && rm -rf /root/hadoop_CDH/flink-csd/
result:
total 20
-rw-r--r-- 1 root root 7737 7 September 10:01 FLINK-1.10.2.jar
-rw-r--r-- 1 root root 8260 7 September 10:01 FLINK_ON_YARN-1.10.2.jar

16. Initialize the scm Library of CDH (in the master node container)

mysql and cdh are in the same container to facilitate migration. I tried to make mysql into a separate container before, but I encountered unknown problems in the migration process, so I give up this scheme for the time being

The operation data of cdh can also be stored in oracle and other databases, from scm_prepare_database.sh;

You can see the database connection information from / etc / cloudera SCM server / db.properties

# mysql and cdh are the same container:
    /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm 123456
    
result:
JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing:  /usr/java/jdk1.8.0_181-cloudera/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Tue Jul 06 08:58:16 UTC 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[                          main] DbCommandExecutor              INFO  Successfully connected to database.
All done, your SCM database is configured correctly!

17. Start the agent service of all nodes (in each node container)

# Restart the agent service of each container to prevent problems in distributing and decompressing the parcel installation package
systemctl enable cloudera-scm-agent \
&& systemctl restart cloudera-scm-agent \
&& systemctl status cloudera-scm-agent

18. Start the master service (in the master node container)

systemctl enable cloudera-scm-server \
&& systemctl restart cloudera-scm-server \
    && sleep 2 \
    && tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
# Result: wait for startup
2021-07-06 09:01:33,685 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.  Start success identification
2021-07-06 09:02:23,792 INFO avro-servlet-hb-processor-2:com.cloudera.server.common.AgentAvroServlet: (5 skipped) AgentAvroServlet: heartbeat processing stats: average=46ms, min=11ms, max=192ms.

# Run in container: judge whether the server is started successfully
[root@server001 ~]# curl http://server001:7180
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>

# Run on the host: determine whether the port is mapped out
[root@server001 ~]# curl http://server001:7180
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>
# It indicates that scm has been started successfully and can log in to cm platform

After the CDH service is started, install it according to the previous steps

Tags: Database Docker Hadoop Distribution Container

Posted on Wed, 27 Oct 2021 09:28:13 -0400 by tawevolution