172.19.9.202 master node JobManager master / slave
172.19.9.201 slave node TaskManager master / slave
172.19.9.203 slave node TaskManager master / slave
1, SSH master node and slave node settings should be unified
ssh-keygen -t rsa -P "" Do not set password cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys chmod 600 /root/.ssh/authorized_keys ssh localhost
2, Add the key of the master node to the slave node (the same as other nodes)
scp /root/.ssh/id_rsa.pub email@example.com:/root/.ssh/authorized_keys scp /root/.ssh/id_rsa.pub firstname.lastname@example.org:/root/.ssh/authorized_keys
3, Verify whether the master node can log in to two slave nodes without entering a password
ssh 172.19.9.201 ssh 172.19.9.203
The above two and three steps can be configured on other machines respectively
4, Enter the flink configuration directory of 172.19.9.202
Modify the content of flink-conf.yaml as follows. Other configurations depend on the situation, and the key is set to point to the master node
1. If each machine is set to jobmanager.rpc.address: localhost, the master setting 201 / 203 is HA
2. If each machine is set to jobmanager.rpc.address: after 172.19.9.202
The master capability setting 202 is a master node in an independent cluster.
1. Responsibilities of job manager
The Job Manager is responsible for coordinating distributed computing nodes: also known as Master nodes. It is responsible for scheduling tasks, coordinating CheckPoint, fault recovery, etc. Job
The Manager divides a job into multiple tasks and communicates with the Task Manager through the Actor system to deploy, stop and cancel tasks.
Under the high availability deployment, there will be multiple job managers, including one leader and multiple flowers. Leader is always Active
Status to provide services for the cluster; The Flower is in Standby status. After the Leader goes down, one of the flowers will be selected as the Leader
Continue to provide services for the cluster. Job Manager election is implemented through ZooKeeper.
2. Responsibilities of Task Manager
Task Managr Also known as Werker node:For execution Job Manager Allocated Task (Exactly SuyTak). Task Manger Transfer system resources(CPU,Network, memory)Divided into multiple Task SIot (Calculation slot) Task Run in specific Task Slot upper: Task Manager adopt Actor System and Job Manager Communicate with each other and regularly Task Operation status and Task Manager Submit the running status of to Job Managero Multiple Task Manager Upper Task adopt DataStream State calculation and result interaction.
Defines the maximum primary inventory that the JVM is allowed to allocate on each node. In MB
jobmanager.heap.size: 1024m taskmanager.heap.size: 2048m
Modify the master as follows
172.19.9.202:8081 #172.19.9.201:8081 #172.19.9.203:8081
Modify the workers as follows. If there is no master node 202 as the worker, several task managers will not be displayed in the ui interface
172.19.9.203 172.19.9.202 172.19.9.201
5, Download and install zookeeper
Version: apache-zookeeper-3.5.6-bin.tar.gz extract the configuration zoo.cfg file
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/tmp/zookeeper/data dataLogDir=/tmp/zookeeper/log # the port at which the clients will connect clientPort=2181 # #If you want to set up a cluster, configure IP here #server.1=192.168.180.132:2888:3888 #server.2=192.168.180.133:2888:3888 #server.3=192.168.180.134:2888:3888 # # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
After configuration, remember to create the corresponding file log output path under tmp
Then enter the bin path and start zookeeper
6, Modify Flink's configuration file, flink-conf.yaml, and change zookeeper to zookeeper on 172.19.9.202
high-availability: zookeeper # high-availability.storageDir: hdfs:///flink/ha/ # State storage address in high availability mode high-availability.storageDir: file:///data/flink/checkpoints high-availability.zookeeper.quorum: 172.19.9.202:2181
7, Transfer the modified flink-conf.yaml, masters and workers to the file directory corresponding to the other two flinks: / usr/local/flink/flink-1.14.0/conf
scp /usr/local/flink/flink-1.14.0/conf/flink-conf.yaml email@example.com:/usr/local/flink/flink-1.14.0/conf/ scp /usr/local/flink/flink-1.14.0/conf/flink-conf.yaml firstname.lastname@example.org:/usr/local/flink/flink-1.14.0/conf/ scp /usr/local/flink/flink-1.14.0/conf/masters email@example.com:/usr/local/flink/flink-1.14.0/conf/ scp /usr/local/flink/flink-1.14.0/conf/masters firstname.lastname@example.org:/usr/local/flink/flink-1.14.0/conf/ scp /usr/local/flink/flink-1.14.0/conf/workers email@example.com:/usr/local/flink/flink-1.14.0/conf/ scp /usr/local/flink/flink-1.14.0/conf/workers firstname.lastname@example.org:/usr/local/flink/flink-1.14.0/conf/
8, Start cluster
According to the common sense, since ssh should be configured to start only at the master node, other slaves will start along with it. However, on my side, 202, the master node starts only one start, 203 shows two restart, 201 shows three. I have configured three slave nodes in workers. Why can't they start together? The logs entered are all started
After 203 and 201 are started, three task managers are displayed
Start. / start-cluster.sh in 202 server separately, as shown below:
[root@localhost bin]# ./start-cluster.sh Starting HA cluster with 3 masters. email@example.com's password: Starting standalonesession daemon on host localhost.localdomain3. firstname.lastname@example.org's password: Starting standalonesession daemon on host localhost.localdomain2. email@example.com's password: Starting standalonesession daemon on host localhost.localdomain4. firstname.lastname@example.org's password: Starting taskexecutor daemon on host localhost.localdomain4. email@example.com's password: Starting taskexecutor daemon on host localhost.localdomain3. firstname.lastname@example.org's password: Starting taskexecutor daemon on host localhost.localdomain2. [root@localhost bin]#
The startup log is as follows:
[root@localhost bin]# ./start-cluster.sh Starting HA cluster with 3 masters. [INFO] 1 instance(s) of standalonesession are already running on localhost.localdomain3. Starting standalonesession daemon on host localhost.localdomain3. [INFO] 1 instance(s) of standalonesession are already running on localhost.localdomain2. Starting standalonesession daemon on host localhost.localdomain2. email@example.com's password: [INFO] 1 instance(s) of standalonesession are already running on localhost.localdomain4. Starting standalonesession daemon on host localhost.localdomain4. firstname.lastname@example.org's password: [INFO] 1 instance(s) of taskexecutor are already running on localhost.localdomain4. Starting taskexecutor daemon on host localhost.localdomain4. [INFO] 1 instance(s) of taskexecutor are already running on localhost.localdomain3. Starting taskexecutor daemon on host localhost.localdomain3. [INFO] 1 instance(s) of taskexecutor are already running on localhost.localdomain2. Starting taskexecutor daemon on host localhost.localdomain2.
I suddenly understand that I set three server addresses in msater, which means that all three servers are master nodes and slave nodes. Therefore, starting a master node alone will not start the processes of the other two master nodes at the same time.
Then I set the master of the flink to a single node 202. After the change, I start the cluster start-cluster.sh on the 202 server alone. There is only one taskmanager:
I found that I forgot to change the 201 and 203 master files, but after the change, I still didn't start only the master node, and other nodes still couldn't start with me, so I had to start it manually.