mysql implementation of high availability architecture MHA

mysql implementation of high availability architecture MHA  





Back to the top

1, Introduction

MHA (Master HA) is an open source MySQL high availability program. It provides the automatic master failover function for MySQL master-slave replication architecture. When MHA monitors a master node failure, it will promote the slave node with the latest data to become a new master node. During this period, MHA will avoid consistency problems by obtaining additional information from other slave nodes. MHA also provides the online switching function of the master node, that is, switching the master/slave node on demand.
MHA is a relatively mature MySQL high availability solution developed by Japanese yoshinorim (formerly worked in DeNA and now on FaceBook). MHA can achieve failover within 30 seconds and ensure data consistency as much as possible during failover. At present, Taobao is also developing a similar product TMHA, which supports one master and one slave.

Back to the top

2, MHA services

2.1 service role

The MHA service has two roles, MHA Manager (management node) and MHA node (data node):
MHA Manager:
Generally, multiple master/slave clusters (groups) are deployed separately on an independent machine to manage. Each master/slave cluster is called one   application is used to manage and coordinate the whole cluster.
MHA node:
Running on each MySQL server (master/slave/manager), it speeds up failover by monitoring scripts with the functions of parsing and cleaning logs.
It is mainly the agent that receives the instructions sent by the management node. The agent needs to run on each mysql node. In short, node is used to collect bin logs generated from the node server. Compare whether the slave node intended to be promoted to the new master node owns and completes the operation. If it is not sent to the new master node, it will be promoted to the master node after local application.

As can be seen from the above figure, ssh is required within each replication group and between the Manager to realize password free interconnection. Only in this way can the Manager connect smoothly and realize the Master-slave switching function when the Master fails.

2.2 tools provided

MHA will provide many tools and programs, the common of which are as follows:
Manager node:
  masterha_check_ssh: MHA dependent ssh environment monitoring tool;
  masterha_check_repl: MYSQL replication environment detection tool;
  masterga_manager: MHA service main program;
  masterha_check_status: MHA operation status detection tool;
  masterha_master_monitor: MYSQL master node availability monitoring tool;
  masterha_ master_ Switch: Master: node switching tool;
  masterha_conf_host: add or delete the configured node;
  masterha_stop: tool to close MHA service.
Node: (these tools are usually triggered by the script of MHA Manager without human operation)
  save_binary_logs: save and copy the binary logs of the master;
  apply_diff_relay_logs: identify different relay log events and apply them to other slave;
  purge_relay_logs: clear relay logs (SQL threads will not be blocked);
Custom extension:
  secondary_check_script: check the availability of the master through multiple network routes;
  master_ip_failover_script: update the master IP used by the application;
  report_script: send report;
  init_conf_load_script: load initial configuration parameters;
  master_ip_online_change_script; Update the IP address of the master node.

2.3 working principle

The working principle of MHA is summarized as follows:
(1) Save binary log events from the master of downtime and crash;
(2) Identify the slave with the latest update;
(3) Apply different relay logs to other slave;
(4) Apply binary log events saved from the master;
(5) Upgrade a slave to a new master;
(6) Use another slave to connect to the new master for replication.

Back to the top

3, Implementation process

3.1 prepare Replication environment for Mysql experiment

3.1.1 relevant configuration

MHA has special requirements for MYSQL replication environment. For example, each node must enable binary log and relay log. Each slave node must display and enable its read only attribute and turn off relay_log_purge function, etc. the configuration is described in advance here.
There are four nodes in the experimental environment, and their roles are assigned as follows (the experimental machines are centos 7.3):

In order to facilitate our later operations, we add the following contents to the configuration contents of the / etc/hosts file of each node: node1 node2 node3 node4

In this way, we can get through the private key access through the host parsing node, which will be much more convenient.
This step is complete.

3.1.2 configuration of the initial master node

We need to modify the database configuration file of the master to initialize it:

[root@master ~]# vim /etc/my.cnf
	server-id = 1				//The id of each node in the replication cluster must be unique
	log-bin = master-log		//Open binary log
	relay-log = relay-log		//Enable relay log
	skip_name_resolve			//Turn off name resolution (not required)
[root@master ~]# systemctl restart mariadb

This step is complete.

3.1.3 configuration of all slave nodes

We modify the database configuration files of the two slave machines. Both machines do the following:

[root@slave1 ~]# vim /etc/my.cnf
	server-id = 2 				//The id of each node in the replication cluster must be unique;
	relay-log = relay-log		//Enable relay log
	log-bin = master-log		//Open binary log
	read_only = ON				//Enable read-only properties
	relay_log_purge = 0			//Do you want to automatically clear the relay logs that are no longer needed
	skip_name_resolve			//Turn off name resolution (not required)
	log_slave_updates = 1       //Causes the updated data to be written to the binary log
[root@slave1 ~]# systemctl restart mariadb
[root@slave2 ~]# vim /etc/my.cnf
	server-id = 3 				//The id of each node in the replication cluster must be unique;
	relay-log = relay-log		//Enable relay log
	log-bin = master-log		//Open binary log
	read_only = ON				//Enable read-only properties
	relay_log_purge = 0			//Do you want to automatically clear the relay logs that are no longer needed
	skip_name_resolve			//Turn off name resolution (not required)
	log_slave_updates = 1       //Causes the updated data to be written to the binary log
[root@slave2 ~]# systemctl restart mariadb

This step is complete.

3.1.4 configure a master-slave replication architecture

The following will only give commands. See my last blog for detailed knowledge and process—— Practical project -- implementation of mysql master-slave architecture.
On the master node:

MariaDB [(none)]>grant replication slave,replication client on *.* to 'slave'@'192.168.%.%' identified by 'keer';
MariaDB [(none)]> show master status;

On the slave node:

MariaDB [(none)]> change master to master_host='', 
    -> master_user='slave', 
    -> master_password='keer',
    -> master_log_file='mysql-bin.000001',
    -> master_log_pos=415;
MariaDB [(none)]> start slave;
MariaDB [(none)]> show slave status\G;

This step is complete.

3.2 installation and configuration of MHA

3.2.1 authorization on master

On all Mysql nodes, users with administrative permissions are authorized to remotely access on other nodes in the local network. Of course, at this time, you only need to run SQL statements like the following on the master node.

MariaDB [(none)]> grant all on *.* to 'mhaadmin'@'192.168.%.%' identified by 'mhapass';

This step is complete.

3.2.2 prepare ssh interworking environment

All nodes in MHA cluster need to communicate with each other based on ssh mutual trust to realize remote control and data management functions. For simplicity, after the Manager node generates a key pair and sets it to remotely connect to the local host, the private key file and authorized_ The keys file can be copied to all the remaining nodes.
The following operations are performed on all nodes:

[root@manager ~]# ssh-keygen -t rsa
[root@manager ~]# ssh-copy-id -i .ssh/ root@node1

After the above operations are performed on all four machines, we can see the following files on the manager machine:

[root@manager ~]# cd .ssh/
[root@manager .ssh]# ls
authorized_keys  id_rsa  known_hosts
[root@manager .ssh]# cat authorized_keys 

The public keys of the four machines are already authorized_keys. Then, we only need to send this file to the other three machines, which can realize ssh password free interworking:

[root@manager .ssh]# scp authorized_keys root@node2:~/.ssh/
[root@manager .ssh]# scp authorized_keys root@node3:~/.ssh/
[root@manager .ssh]# scp authorized_keys root@node4:~/.ssh/

Of course, we can also experiment on the machine to see if ssh still needs to enter a password.
This step is complete.

3.2.3 installation of MHA package

In this step, the Manager node needs to install an additional package. The specific contents to be installed are as follows:

All four nodes need to be installed: mha4mysql-node-0.56-0.el6.norch.rpm
The Manager node needs to be installed separately: mha4mysql-manager-0.56-0.el6.noarch.rpm

The package I need to install has been uploaded to Baidu cloud disk , password: mkcr, you need to download and use by yourself~
We use the rz command to upload them respectively, and then use yum to install them.

[root@manager ~]# rz
[root@manager ~]# ls
anaconda-ks.cfg  initial-setup-ks.cfg                     Pictures
Desktop          mha4mysql-manager-0.56-0.el6.noarch.rpm  Public
Documents        mha4mysql-node-0.56-0.el6.noarch.rpm     Templates
Downloads        Music                                    Videos
[root@manager ~]# yum install -y mha4mysql-node-0.56-0.el6.noarch.rpm 
[root@manager ~]# yum install -y mha4mysql-manager-0.56-0.el6.noarch.rpm 

Other machines are also installed separately, so there are no examples one by one.
This step is complete.

3.2.4 initialize MHA and configure it

The Manager node needs to provide a special configuration file for each monitored master/slave cluster, and all master/slave clusters can also share the global configuration. The global configuration file defaults to / etc/masterha_default.cnf, which is an optional configuration. If you only monitor a group of master/slave clusters, you can also directly provide the default configuration information of each server through the configuration of application. The configuration file path of each application is custom. See the next step for details.

3.2.5 define MHA Management Profile

Create a special management user for MHA for future use. On the master node of mysql, the three nodes are automatically synchronized:

	mkdir /etc/mha_master
	vim /etc/mha_master/mha.cnf

The contents of the configuration file are as follows:;

[server default] 			//It is applicable to the configuration of server1, 2 and 3 servers
user=mhaadmin 				//mha management user
password=mhapass 			//mha management password
manager_workdir=/etc/mha_master/app1 		//mha_master's own work path
manager_log=/etc/mha_master/manager.log 	// mha_master's own log file
remote_workdir=/mydata/mha_master/app1		//Where is the working directory of each remote host
ssh_user=root 				// Key authentication based on ssh
repl_user=slave				//Database user name
repl_password=magedu		//Database password
ping_interval=1 			//ping interval duration
[server1] 					//Node 2
hostname= 	//Node 2 host address
ssh_port=22 				//ssh port of node 2
candidate_master=1 			//Can you become a master candidate node / master node in the future

This step is complete.

3.2.6 detect four nodes

1) Check whether the ssh mutual trust communication configuration between nodes is ok
We enter the following command on the Manager machine to detect:

[root@manager ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf

If the last line shows [info]All SSH connection tests passed successfully.

2) Check whether the connection configuration parameters of the managed MySQL replication cluster are OK

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf

We found that the detection failed. This may be because there is no account on the slave node. Because of this architecture, any slave node may become the master node, so you also need to create an account.
Therefore, we need to perform the following operations on the master node again:

MariaDB [(none)]> grant replication slave,replication client on *.* to 'slave'@'192.168.%.%' identified by 'keer';
MariaDB [(none)]> flush privileges;

After this operation, we run the detection command again:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Thu Nov 23 09:07:08 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 23 09:07:08 2017 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Thu Nov 23 09:07:08 2017 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
MySQL Replication Health is OK.

It can be seen that our test has been ok.
This step is complete.

3.3 start MHA

We execute the following command on the manager node to start MHA:

[root@manager ~]# nohup masterha_manager -conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
[1] 7598

After successful startup, let's check the status of the master node:

[root@manager ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:7598) is running(0:PING_OK), master:

In the above information, "mha (pid:7598) is running(0:PING_OK)" means that the MHA service is running OK. Otherwise, it will be displayed as "mha is stopped(1:NOT_RUNNING)."
If we want to stop MHA, we need to use the stop command:

[root@manager ~]# masterha_stop -conf=/etc/mha_master/mha.cnf

3.4 testing MHA failover

3.4.1 close the mariadb service at the master node to simulate the data crash of the master node

[root@master ~]# killall -9 mysqld mysqld_safe
[root@master ~]# rm -rf /var/lib/mysql/*

3.4.2 viewing logs on the manger node

Let's check the log:

[root@manager ~]# tail -200 /etc/mha_master/manager.log 
Thu Nov 23 09:17:19 2017 - [info] Master failover to completed successfully.

Indicates that the manager detects a node failure, and then automatically performs failover to promote to the primary node.
Note that after the failover is completed, the manager will stop automatically, and masterha is used at this time_ check_ The status command will encounter an error prompt, as shown below:

[root@manager ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha is stopped(2:NOT_RUNNING).

3.5 provide a new slave node to repair the replication cluster

After the original master node fails, you need to prepare a new MySQL node. After restoring the data based on the backup from the master node, configure it as the slave node of the new master. Note that if the newly added node is a new node, its IP address should be configured as the IP of the original master node. Otherwise, you need to modify the corresponding IP address in mha.cnf. Then start the manager again and test its status again.
We will use the newly closed master as the newly added machine to restore the database:
The original slave1 has become the new main machine, so we make a full backup of it, and then send the backup data to the newly added machine:

[root@slave1 ~]# mkdir /backup
[root@slave1 ~]# mysqldump --all-database > /backup/mysql-backup-`date +%F-%T`-all.sql
[root@slave1 ~]# scp /backup/mysql-backup-2017-11-23-09\:57\:09-all.sql root@node2:~

Then perform data recovery on node2 node:

[root@master ~]# mysql < mysql-backup-2017-11-23-09\:57\:09-all.sql

The next step is to configure master-slave. Check the binary log and location of the current master as usual, and then make the following settings:

MariaDB [(none)]> change master to master_host='',  master_user='slave',  master_password='keer', master_log_file='mysql-bin.000006', master_log_pos=925;
MariaDB [(none)]> start slave;
MariaDB [(none)]> show slave status\G;
               Slave_IO_State: Waiting for master to send event
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 925
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 529
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

It can be seen that our master-slave has been configured.
This step is complete.

3.6 perform the check operation again after the new node is provided

Let's check the status again:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf

If an error is reported, authorize again (see details) See above ). If there is no problem, start the manager. Note that the log should be recorded this time:

[root@manager ~]# masterha_manager -conf=/etc/mha_master/mha.cnf > /etc/mha_master/manager.log 2>&1 &
[1] 10012

After successful startup, let's check the status of the master node:

[root@manager ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:9561) is running(0:PING_OK), master:

Our service has been successfully continued.
This step ends.

3.7 precautions for new node online, failover and recovery

1) in the production environment, when your master node hangs up, you must make a backup on the slave node, manually promote the master node to the slave node with the backup file, and indicate the location of the log file from which to copy
2) after each automatic conversion, if the replication health test is not ok, it cannot be started all the time. You must manually repair the master node unless you change the configuration file
3) after the master node is promoted to the slave node manually, run the detection command again

[root@manager ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:9561) is running(0:PING_OK), master:

4) the recovery is successful after running again

[root@manager ~]# masterha_manager --conf=/etc/mha_master/mha.cnf

Above. Our experiment has been successfully completed.
If you have any questions or suggestions, you can discuss more ~ (~  ̄▽  ̄) ~

Author: Ke Er acridine


The copyright of this article belongs to the author and the blog park. Reprint is welcome, but this statement must be retained without the consent of the author, and the original connection must be given in an obvious position on the article page, otherwise the right to investigate legal responsibility is reserved.
It's not easy for everyone to write. I hope to respect the fruits of labor~

Tags: MySQL

Posted on Wed, 01 Dec 2021 21:30:52 -0500 by We Must Design