Flink independent cluster deployment and HA deployment

Scene description

172.19.9.202 master node JobManager master / slave
172.19.9.201 slave node TaskManager master / slave
172.19.9.203 slave node TaskManager master / slave

1, SSH master node and slave node settings should be unified

ssh-keygen -t rsa -P ""  Do not set password
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys

ssh localhost

2, Add the key of the master node to the slave node (the same as other nodes)

scp /root/.ssh/id_rsa.pub root@172.19.9.203:/root/.ssh/authorized_keys
scp /root/.ssh/id_rsa.pub root@172.19.9.201:/root/.ssh/authorized_keys

3, Verify whether the master node can log in to two slave nodes without entering a password

ssh 172.19.9.201
ssh 172.19.9.203

The above two and three steps can be configured on other machines respectively

4, Enter the flink configuration directory of 172.19.9.202

cd /usr/local/flink/flink-1.11.2/conf/

Modify the content of flink-conf.yaml as follows. Other configurations depend on the situation, and the key is set to point to the master node

jobmanager.rpc.address: 172.19.9.202 

1. If each machine is set to jobmanager.rpc.address: localhost, the master setting 201 / 203 is HA
2. If each machine is set to jobmanager.rpc.address: after 172.19.9.202
The master capability setting 202 is a master node in an independent cluster.

1. Responsibilities of job manager

The Job Manager is responsible for coordinating distributed computing nodes: also known as Master nodes. It is responsible for scheduling tasks, coordinating CheckPoint, fault recovery, etc. Job
The Manager divides a job into multiple tasks and communicates with the Task Manager through the Actor system to deploy, stop and cancel tasks.
Under the high availability deployment, there will be multiple job managers, including one leader and multiple flowers. Leader is always Active
Status to provide services for the cluster; The Flower is in Standby status. After the Leader goes down, one of the flowers will be selected as the Leader
Continue to provide services for the cluster. Job Manager election is implemented through ZooKeeper.

2. Responsibilities of Task Manager
Task Managr Also known as Werker node:For execution Job Manager Allocated Task (Exactly SuyTak). Task Manger Transfer system resources(CPU,Network, memory)Divided into multiple Task SIot (Calculation slot) Task Run in specific Task Slot upper: Task Manager adopt Actor System and Job Manager Communicate with each other and regularly Task Operation status and Task Manager Submit the running status of to Job Managero Multiple Task Manager Upper Task adopt DataStream State calculation and result interaction.

Defines the maximum primary inventory that the JVM is allowed to allocate on each node. In MB

jobmanager.heap.size: 1024m
taskmanager.heap.size: 2048m

Modify the master as follows

172.19.9.202:8081
#172.19.9.201:8081
#172.19.9.203:8081 

Modify the workers as follows. If there is no master node 202 as the worker, several task managers will not be displayed in the ui interface

172.19.9.203
172.19.9.202
172.19.9.201

5, Download and install zookeeper

Version: apache-zookeeper-3.5.6-bin.tar.gz extract the configuration zoo.cfg file

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper/data
dataLogDir=/tmp/zookeeper/log
# the port at which the clients will connect
clientPort=2181
#
#If you want to set up a cluster, configure IP here
#server.1=192.168.180.132:2888:3888
#server.2=192.168.180.133:2888:3888
#server.3=192.168.180.134:2888:3888
#
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

After configuration, remember to create the corresponding file log output path under tmp
Then enter the bin path and start zookeeper

 ./zkServer.sh start 

6, Modify Flink's configuration file, flink-conf.yaml, and change zookeeper to zookeeper on 172.19.9.202

high-availability: zookeeper
# high-availability.storageDir: hdfs:///flink/ha/
# State storage address in high availability mode
high-availability.storageDir: file:///data/flink/checkpoints
high-availability.zookeeper.quorum: 172.19.9.202:2181

7, Transfer the modified flink-conf.yaml, masters and workers to the file directory corresponding to the other two flinks: / usr/local/flink/flink-1.14.0/conf

scp /usr/local/flink/flink-1.14.0/conf/flink-conf.yaml root@172.19.9.201:/usr/local/flink/flink-1.14.0/conf/
scp /usr/local/flink/flink-1.14.0/conf/flink-conf.yaml root@172.19.9.203:/usr/local/flink/flink-1.14.0/conf/
scp /usr/local/flink/flink-1.14.0/conf/masters root@172.19.9.201:/usr/local/flink/flink-1.14.0/conf/
scp /usr/local/flink/flink-1.14.0/conf/masters root@172.19.9.203:/usr/local/flink/flink-1.14.0/conf/
scp /usr/local/flink/flink-1.14.0/conf/workers root@172.19.9.201:/usr/local/flink/flink-1.14.0/conf/
scp /usr/local/flink/flink-1.14.0/conf/workers root@172.19.9.203:/usr/local/flink/flink-1.14.0/conf/

8, Start cluster

According to the common sense, since ssh should be configured to start only at the master node, other slaves will start along with it. However, on my side, 202, the master node starts only one start, 203 shows two restart, 201 shows three. I have configured three slave nodes in workers. Why can't they start together? The logs entered are all started

After 203 and 201 are started, three task managers are displayed

Start. / start-cluster.sh in 202 server separately, as shown below:

[root@localhost bin]# ./start-cluster.sh 
Starting HA cluster with 3 masters.
root@172.19.9.202's password: 
Starting standalonesession daemon on host localhost.localdomain3.
root@172.19.9.201's password: 
Starting standalonesession daemon on host localhost.localdomain2.
root@172.19.9.203's password: 
Starting standalonesession daemon on host localhost.localdomain4.
root@172.19.9.203's password: 
Starting taskexecutor daemon on host localhost.localdomain4.
root@172.19.9.202's password: 
Starting taskexecutor daemon on host localhost.localdomain3.
root@172.19.9.201's password: 
Starting taskexecutor daemon on host localhost.localdomain2.
[root@localhost bin]# 

The startup log is as follows:

[root@localhost bin]# ./start-cluster.sh 
Starting HA cluster with 3 masters.
[INFO] 1 instance(s) of standalonesession are already running on localhost.localdomain3.
Starting standalonesession daemon on host localhost.localdomain3.
[INFO] 1 instance(s) of standalonesession are already running on localhost.localdomain2.
Starting standalonesession daemon on host localhost.localdomain2.
root@172.19.9.203's password: 
[INFO] 1 instance(s) of standalonesession are already running on localhost.localdomain4.
Starting standalonesession daemon on host localhost.localdomain4.
root@172.19.9.203's password: 
[INFO] 1 instance(s) of taskexecutor are already running on localhost.localdomain4.
Starting taskexecutor daemon on host localhost.localdomain4.
[INFO] 1 instance(s) of taskexecutor are already running on localhost.localdomain3.
Starting taskexecutor daemon on host localhost.localdomain3.
[INFO] 1 instance(s) of taskexecutor are already running on localhost.localdomain2.
Starting taskexecutor daemon on host localhost.localdomain2.

I suddenly understand that I set three server addresses in msater, which means that all three servers are master nodes and slave nodes. Therefore, starting a master node alone will not start the processes of the other two master nodes at the same time.

Then I set the master of the flink to a single node 202. After the change, I start the cluster start-cluster.sh on the 202 server alone. There is only one taskmanager:

I found that I forgot to change the 201 and 203 master files, but after the change, I still didn't start only the master node, and other nodes still couldn't start with me, so I had to start it manually.

Tags: Big Data Hadoop Spark flink

Posted on Thu, 02 Dec 2021 16:30:34 -0500 by angelssin