MHA cluster of MySQL highly available articles

1 preparations

1.1 modify host name

vim /etc/hosts # Add corresponding host 192.168.28.128 mha1 192.168.28.131 mha2 192.168.28.132 mha3

1.2 turn off the firewall and modify selinux

# Turn off firewall systemctl stop firewalld
systemctl disable firewalld # Turn off self start # Modify selinux vim /etc/sysconfig/selinux SELINUX=disabled # Set to disabled

1.3 deploy a MySQL Cluster of 1 master and 2 slave

Create master slave for reference MySQL master-slave building

Note that the following parameters are required

server-id=1 # Each node cannot be the same log-bin=/data/mysql3306/logs/mysql-bin relay-log=/data/mysql3306/logs/relay-log skip-name-resolve # Proposal plus non mandatory #read_only = ON # Open from library, close from main library read only relay_log_purge = 0 # Turn off automatic cleanup of relay logs log_slave_updates = 1 # The data updated by the slave database through binlog is written into the slave database binary log, which must be added. Otherwise, the data may be lost after switching

Create an mha management account

# Special attention: mha Do not use special characters in the password of, otherwise the main database cannot be switched
create user mha@'192.168.28.%' identified by 'MHAadmin123'; create user mha@'localhost' identified by 'MHAadmin123'; grant all on *.* to mha@'192.168.28.%'; grant all on *.* to mha@'localhost';

1.4 configuration mutual trust

Execute on the MHA management node (but it is recommended to execute on each host for the convenience of switching management nodes and maintenance between clusters, but pay attention to host security), including the mutual trust from the local machine to the local machine

ssh-keygen ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

Remember to test whether the configuration is successful after the configuration (it must be tested)

ssh [email protected] ssh [email protected] ssh [email protected] ssh root@mha1 ssh root@mha2 ssh root@mha3

2. MHA tool deployment

2.1 installation of MHA related dependency package

yum install perl-DBI perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-Params-Validate perl-DateTime -y yum install perl-ExtUtils-Embed -y yum install cpan -y yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y

Note: it is not recommended to install MySQL database in rpm package mode, otherwise some packages may conflict here

2.2 install MHA management and node node

# All nodes need to be installed rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm #Management nodes need to be installed (other nodes can also be installed) mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

If the above installation package is not installed completely, an error similar to the following will appear. If there is an error, you can adjust the yum source or find a downloaded student to obtain it

[root@mha3 local]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm error: Failed dependencies: perl(Log::Dispatch) is needed by mha4mysql-manager-0.58-0.el7.centos.noarch perl(Log::Dispatch::File) is needed by mha4mysql-manager-0.58-0.el7.centos.noarch perl(Log::Dispatch::Screen) is needed by mha4mysql-manager-0.58-0.el7.centos.noarch perl(Parallel::ForkManager) is needed by mha4mysql-manager-0.58-0.el7.centos.noarch

2.3 configure mha

Create profile path, log file path

mkdir -p /etc/masterha mkdir -p /var/log/masterha/app1

Create mha profile

vim /etc/masterha/app1.conf user=mha [server default] manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/app1.log master_ip_failover_script=/usr/bin/master_ip_failover master_ip_online_change_script=/usr/bin/master_ip_online_change ##mysql user name and password user=mha password=MHAadmin123 ssh_user=root repl_user=repl repl_password=repl ping_interval=3 remote_workdir=/tmp report_script=/usr/bin/send_report
# secondary_check_script can be omitted
# secondary_check_script=/usr/bin/masterha_secondary_check -s mha2 -s mha3 --user=mha --master_host=mha1 --master_ip=192.168.28.128 --master_port=3306 --password=MHAadmin123 shutdown_script="" report_script="" [server1] hostname=192.168.28.128 master_binlog_dir=/data/mysql3306/logs candidate_master=1 [server2] hostname=192.168.28.131 master_binlog_dir=/data/mysql3306/logs candidate_master=1 check_repl_delay=0 [server3] hostname=192.168.28.132 master_binlog_dir=/data/mysql3306/logs no_master=1

Configure two important scripts master_ip_failover , master_ip_online_change

/usr/bin/master_ip_failover

vim /usr/bin/master_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '192.168.28.199/24'; my $if = 'ens33'; my $ssh_start_vip = "/sbin/ip addr add $vip dev $if"; my $ssh_stop_vip = "/sbin/ip addr del $vip dev $if"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }

/usr/bin/master_ip_online_change

vim /usr/bin/master_ip_online_change #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; #my ( # $command, $ssh_user, $orig_master_host, $orig_master_ip, # $orig_master_port, $new_master_host, $new_master_ip, $new_master_port #); my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user, ); my $vip = '192.168.28.199/24'; my $if = 'ens33'; my $ssh_start_vip = "/sbin/ip addr add $vip dev $if"; my $ssh_stop_vip = "/sbin/ip addr del $vip dev $if"; my $ssh_user = "root"; GetOptions( 'command=s' => \$command, #'ssh_user=s' => \$ssh_user, #'orig_master_host=s' => \$orig_master_host, #'orig_master_ip=s' => \$orig_master_ip, #'orig_master_port=i' => \$orig_master_port, #'new_master_host=s' => \$new_master_host, #'new_master_ip=s' => \$new_master_ip, #'new_master_port=i' => \$new_master_port, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --ssh-user=user --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }

2.4 relevant tests

Detect mutual trust

Check whether mutual trust of each node is normal, similar to the previous check. Here is a script to check

[root@mha3 app1]# masterha_check_ssh --conf=/etc/masterha/app1.conf Sun May 24 17:33:08 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 24 17:33:08 2020 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Sun May 24 17:33:08 2020 - [info] Reading server configuration from /etc/masterha/app1.conf.. Sun May 24 17:33:08 2020 - [info] Starting SSH connection tests.. Sun May 24 17:33:12 2020 - [debug] Sun May 24 17:33:08 2020 - [debug] Connecting via SSH from [email protected](192.168.28.131:22) to [email protected](192.168.28.128:22).. Sun May 24 17:33:10 2020 - [debug] ok. Sun May 24 17:33:10 2020 - [debug] Connecting via SSH from [email protected](192.168.28.131:22) to [email protected](192.168.28.132:22).. Sun May 24 17:33:12 2020 - [debug] ok. Sun May 24 17:33:12 2020 - [debug] Sun May 24 17:33:08 2020 - [debug] Connecting via SSH from [email protected](192.168.28.128:22) to [email protected](192.168.28.131:22).. Sun May 24 17:33:09 2020 - [debug] ok. Sun May 24 17:33:09 2020 - [debug] Connecting via SSH from [email protected](192.168.28.128:22) to [email protected](192.168.28.132:22).. Sun May 24 17:33:12 2020 - [debug] ok. Sun May 24 17:33:13 2020 - [debug] Sun May 24 17:33:09 2020 - [debug] Connecting via SSH from [email protected](192.168.28.132:22) to [email protected](192.168.28.128:22).. Sun May 24 17:33:11 2020 - [debug] ok. Sun May 24 17:33:11 2020 - [debug] Connecting via SSH from [email protected](192.168.28.132:22) to [email protected](192.168.28.131:22).. Sun May 24 17:33:13 2020 - [debug] ok. Sun May 24 17:33:13 2020 - [info] All SSH connection tests passed successfully.

Check whether the replication cluster is normal

If you configure according to my previous steps, there will be the following exceptions

masterha_check_repl --conf=/etc/masterha/app1.conf Sun May 24 17:34:02 2020 - [info] Connecting to [email protected](192.168.28.131:22).. Can't exec "mysqlbinlog": No such file or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 532.

The error information is very clear. The mysqlbinlog command cannot be found. The processing method is simple. Just make a soft connection

ln -s /usr/local/mysql5.7/bin/mysql /usr/bin/ ln -s /usr/local/mysql5.7/bin/mysqlbinlog /usr/bin/

Retest

[root@mha3 app1]# masterha_check_repl --conf=/etc/masterha/app1.conf Sun May 24 17:34:41 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 24 17:34:41 2020 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Sun May 24 17:34:41 2020 - [info] Reading server configuration from /etc/masterha/app1.conf.. Sun May 24 17:34:41 2020 - [info] MHA::MasterMonitor version 0.58. Sun May 24 17:34:42 2020 - [info] GTID failover mode = 0 Sun May 24 17:34:42 2020 - [info] Dead Servers: Sun May 24 17:34:42 2020 - [info] Alive Servers: Sun May 24 17:34:42 2020 - [info] 192.168.28.128(192.168.28.128:3306) Sun May 24 17:34:42 2020 - [info] 192.168.28.131(192.168.28.131:3306) Sun May 24 17:34:42 2020 - [info] 192.168.28.132(192.168.28.132:3306) Sun May 24 17:34:42 2020 - [info] Alive Slaves: Sun May 24 17:34:42 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 17:34:42 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 17:34:42 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 17:34:42 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 17:34:42 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 17:34:42 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 17:34:42 2020 - [info] Current Alive Master: 192.168.28.128(192.168.28.128:3306) Sun May 24 17:34:42 2020 - [info] Checking slave configurations.. Sun May 24 17:34:42 2020 - [info] Checking replication filtering settings.. Sun May 24 17:34:42 2020 - [info] binlog_do_db= , binlog_ignore_db= Sun May 24 17:34:42 2020 - [info] Replication filtering check ok. Sun May 24 17:34:42 2020 - [info] GTID (with auto-pos) is not supported Sun May 24 17:34:42 2020 - [info] Starting SSH connection tests.. Sun May 24 17:34:48 2020 - [info] All SSH connection tests passed successfully. Sun May 24 17:34:48 2020 - [info] Checking MHA Node version.. Sun May 24 17:34:49 2020 - [info] Version check ok. Sun May 24 17:34:49 2020 - [info] Checking SSH publickey authentication settings on the current master.. Sun May 24 17:34:50 2020 - [info] HealthCheck: SSH to 192.168.28.128 is reachable. Sun May 24 17:34:51 2020 - [info] Master MHA Node version is 0.58. Sun May 24 17:34:51 2020 - [info] Checking recovery script configurations on 192.168.28.128(192.168.28.128:3306).. Sun May 24 17:34:51 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql3306/data --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000012 Sun May 24 17:34:51 2020 - [info] Connecting to [email protected](192.168.28.128:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysql3306/data, up to mysql-bin.000012 Sun May 24 17:34:52 2020 - [info] Binlog setting check done. Sun May 24 17:34:52 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun May 24 17:34:52 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.28.131 --slave_ip=192.168.28.131 --slave_port=3306 --workdir=/tmp --target_version=5.7.25-28-log --manager_version=0.58 --relay_log_info=/data/mysql3306/data/relay-log.info --relay_dir=/data/mysql3306/data/ --slave_pass=xxx Sun May 24 17:34:52 2020 - [info] Connecting to [email protected](192.168.28.131:22).. Checking slave recovery environment settings.. Opening /data/mysql3306/data/relay-log.info ... ok. Relay log found at /data/mysql3306/data, up to relay-log.000003 Temporary relay log file is /data/mysql3306/data/relay-log.000003 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun May 24 17:34:53 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.28.132 --slave_ip=192.168.28.132 --slave_port=3306 --workdir=/tmp --target_version=5.7.25-28-log --manager_version=0.58 --relay_log_info=/data/mysql3306/data/relay-log.info --relay_dir=/data/mysql3306/data/ --slave_pass=xxx Sun May 24 17:34:53 2020 - [info] Connecting to [email protected](192.168.28.132:22).. Checking slave recovery environment settings.. Opening /data/mysql3306/data/relay-log.info ... ok. Relay log found at /data/mysql3306/data, up to relay-log.000003 Temporary relay log file is /data/mysql3306/data/relay-log.000003 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun May 24 17:34:54 2020 - [info] Slaves settings check done. Sun May 24 17:34:54 2020 - [info] 192.168.28.128(192.168.28.128:3306) (current master) +--192.168.28.131(192.168.28.131:3306) +--192.168.28.132(192.168.28.132:3306) Sun May 24 17:34:54 2020 - [info] Checking replication health on 192.168.28.131.. Sun May 24 17:34:54 2020 - [info] ok. Sun May 24 17:34:54 2020 - [info] Checking replication health on 192.168.28.132.. Sun May 24 17:34:54 2020 - [info] ok. Sun May 24 17:34:54 2020 - [info] Checking master_ip_failover_script status: Sun May 24 17:34:54 2020 - [info] /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.28.128 --orig_master_ip=192.168.28.128 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ip addr del 192.168.28.199/24 dev ens33==/sbin/ip addr add 192.168.28.199/24 dev ens33=== Checking the Status of the script.. OK Sun May 24 17:34:54 2020 - [info] OK. Sun May 24 17:34:54 2020 - [warning] shutdown_script is not defined. Sun May 24 17:34:54 2020 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.

See that "MySQL Replication Health is OK." means the test passed.

3 MHA test

3.1 start MHA service

The script to start the MHA service is as follows, which can also be written as a script or service

nohup masterha_manager --conf=/etc/masterha/app1.conf < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

After the service is started, the logs are as follows, similar to the cluster detection

Sun May 24 18:31:54 2020 - [info] MHA::MasterMonitor version 0.58. Sun May 24 18:31:55 2020 - [info] GTID failover mode = 0 Sun May 24 18:31:55 2020 - [info] Dead Servers: Sun May 24 18:31:55 2020 - [info] Alive Servers: Sun May 24 18:31:55 2020 - [info] 192.168.28.128(192.168.28.128:3306) Sun May 24 18:31:55 2020 - [info] 192.168.28.131(192.168.28.131:3306) Sun May 24 18:31:55 2020 - [info] 192.168.28.132(192.168.28.132:3306) Sun May 24 18:31:55 2020 - [info] Alive Slaves: Sun May 24 18:31:55 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:31:55 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:31:55 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 18:31:55 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:31:55 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:31:55 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 18:31:55 2020 - [info] Current Alive Master: 192.168.28.128(192.168.28.128:3306) Sun May 24 18:31:55 2020 - [info] Checking slave configurations.. Sun May 24 18:31:55 2020 - [info] Checking replication filtering settings.. Sun May 24 18:31:55 2020 - [info] binlog_do_db= , binlog_ignore_db= Sun May 24 18:31:55 2020 - [info] Replication filtering check ok. Sun May 24 18:31:55 2020 - [info] GTID (with auto-pos) is not supported Sun May 24 18:31:55 2020 - [info] Starting SSH connection tests.. Sun May 24 18:32:01 2020 - [info] All SSH connection tests passed successfully. Sun May 24 18:32:01 2020 - [info] Checking MHA Node version.. Sun May 24 18:32:03 2020 - [info] Version check ok. Sun May 24 18:32:03 2020 - [info] Checking SSH publickey authentication settings on the current master.. Sun May 24 18:32:03 2020 - [info] HealthCheck: SSH to 192.168.28.128 is reachable. Sun May 24 18:32:04 2020 - [info] Master MHA Node version is 0.58. Sun May 24 18:32:04 2020 - [info] Checking recovery script configurations on 192.168.28.128(192.168.28.128:3306).. Sun May 24 18:32:04 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql3306/data --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000013 Sun May 24 18:32:04 2020 - [info] Connecting to [email protected](192.168.28.128:22).. Creating /tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysql3306/data, up to mysql-bin.000013 Sun May 24 18:32:05 2020 - [info] Binlog setting check done. Sun May 24 18:32:05 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sun May 24 18:32:05 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.28.131 --slave_ip=192.168.28.131 --slave_port=3306 --workdir=/tmp --target_version=5.7.25-28-log --manager_version=0.58 --relay_log_info=/data/mysql3306/data/relay-log.info --relay_dir=/data/mysql3306/data/ --slave_pass=xxx Sun May 24 18:32:05 2020 - [info] Connecting to [email protected](192.168.28.131:22).. Checking slave recovery environment settings.. Opening /data/mysql3306/data/relay-log.info ... ok. Relay log found at /data/mysql3306/data, up to relay-log.000005 Temporary relay log file is /data/mysql3306/data/relay-log.000005 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges.. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun May 24 18:32:06 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.28.132 --slave_ip=192.168.28.132 --slave_port=3306 --workdir=/tmp --target_version=5.7.25-28-log --manager_version=0.58 --relay_log_info=/data/mysql3306/data/relay-log.info --relay_dir=/data/mysql3306/data/ --slave_pass=xxx Sun May 24 18:32:06 2020 - [info] Connecting to [email protected](192.168.28.132:22).. Checking slave recovery environment settings.. Opening /data/mysql3306/data/relay-log.info ... ok. Relay log found at /data/mysql3306/data, up to relay-log.000005 Temporary relay log file is /data/mysql3306/data/relay-log.000005 Checking if super_read_only is defined and turned on.. not present or turned off, ignoring. Testing mysql connection and privileges. mysql: [Warning] Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sun May 24 18:32:07 2020 - [info] Slaves settings check done. Sun May 24 18:32:07 2020 - [info] 192.168.28.128(192.168.28.128:3306) (current master) +--192.168.28.131(192.168.28.131:3306) +--192.168.28.132(192.168.28.132:3306) Sun May 24 18:32:07 2020 - [info] Checking master_ip_failover_script status: Sun May 24 18:32:07 2020 - [info] /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.28.128 --orig_master_ip=192.168.28.128 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ip addr del 192.168.28.199/24 dev ens33==/sbin/ip addr add 192.168.28.199/24 dev ens33=== Checking the Status of the script.. OK Sun May 24 18:32:08 2020 - [info] OK. Sun May 24 18:32:08 2020 - [warning] shutdown_script is not defined. Sun May 24 18:32:08 2020 - [info] Set master ping interval 3 seconds. Sun May 24 18:32:08 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Sun May 24 18:32:08 2020 - [info] Starting ping health check on 192.168.28.128(192.168.28.128:3306).. Sun May 24 18:32:08 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

3.2 test automatic switching

Simulation main database down

Execute shutdown in the main database

mysql> shutdown;

Observation log:

The general process in the log is to detect the unavailability of the primary database (192.168.28.128:3306) - > probe three times in a row (the times can be customized) - > detect the remaining surviving nodes in the cluster -- > select a node as the primary node from the alternative primary nodes -- > drift the VIP to the new primary node (delete the VIP on the original host if the original primary node system is normal) - > copy the binlog of the original primary node -- > New master node determines whether to supplement logs -- > all other nodes copy data from the new master node (forming a new cluster)

Sun May 24 18:35:56 2020 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Sun May 24 18:35:56 2020 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql3306/data --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin Sun May 24 18:35:56 2020 - [info] HealthCheck: SSH to 192.168.28.128 is reachable. Sun May 24 18:35:59 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.28.128' (111)) Sun May 24 18:35:59 2020 - [warning] Connection failed 2 time(s).. Sun May 24 18:36:02 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.28.128' (111)) Sun May 24 18:36:02 2020 - [warning] Connection failed 3 time(s).. Sun May 24 18:36:05 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.28.128' (111)) Sun May 24 18:36:05 2020 - [warning] Connection failed 4 time(s).. Sun May 24 18:36:05 2020 - [warning] Master is not reachable from health checker! Sun May 24 18:36:05 2020 - [warning] Master 192.168.28.128(192.168.28.128:3306) is not reachable! Sun May 24 18:36:05 2020 - [warning] SSH is reachable. Sun May 24 18:36:05 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.conf again, and trying to connect to all servers to check server status.. Sun May 24 18:36:05 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 24 18:36:05 2020 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Sun May 24 18:36:05 2020 - [info] Reading server configuration from /etc/masterha/app1.conf.. Sun May 24 18:36:06 2020 - [info] GTID failover mode = 0 Sun May 24 18:36:06 2020 - [info] Dead Servers: Sun May 24 18:36:06 2020 - [info] 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:06 2020 - [info] Alive Servers: Sun May 24 18:36:06 2020 - [info] 192.168.28.131(192.168.28.131:3306) Sun May 24 18:36:06 2020 - [info] 192.168.28.132(192.168.28.132:3306) Sun May 24 18:36:06 2020 - [info] Alive Slaves: Sun May 24 18:36:06 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:06 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:06 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 18:36:06 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:06 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:06 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 18:36:06 2020 - [info] Checking slave configurations.. Sun May 24 18:36:06 2020 - [info] Checking replication filtering settings.. Sun May 24 18:36:06 2020 - [info] Replication filtering check ok. Sun May 24 18:36:06 2020 - [info] Master is down! Sun May 24 18:36:06 2020 - [info] Terminating monitoring script. Sun May 24 18:36:06 2020 - [info] Got exit code 20 (Master dead). Sun May 24 18:36:06 2020 - [info] MHA::MasterFailover version 0.58. Sun May 24 18:36:06 2020 - [info] Starting master failover. Sun May 24 18:36:06 2020 - [info] Sun May 24 18:36:06 2020 - [info] * Phase 1: Configuration Check Phase.. Sun May 24 18:36:06 2020 - [info] Sun May 24 18:36:07 2020 - [info] GTID failover mode = 0 Sun May 24 18:36:07 2020 - [info] Dead Servers: Sun May 24 18:36:07 2020 - [info] 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:07 2020 - [info] Checking master reachability via MySQL(double check)... Sun May 24 18:36:07 2020 - [info] ok. Sun May 24 18:36:07 2020 - [info] Alive Servers: Sun May 24 18:36:07 2020 - [info] 192.168.28.131(192.168.28.131:3306) Sun May 24 18:36:07 2020 - [info] 192.168.28.132(192.168.28.132:3306) Sun May 24 18:36:07 2020 - [info] Alive Slaves: Sun May 24 18:36:07 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:07 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:07 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 18:36:07 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:07 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:07 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 18:36:07 2020 - [info] Starting Non-GTID based failover. Sun May 24 18:36:07 2020 - [info] Sun May 24 18:36:07 2020 - [info] ** Phase 1: Configuration Check Phase completed. Sun May 24 18:36:07 2020 - [info] Sun May 24 18:36:07 2020 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun May 24 18:36:07 2020 - [info] Sun May 24 18:36:07 2020 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun May 24 18:36:07 2020 - [info] Sun May 24 18:36:07 2020 - [info] Forcing shutdown so that applications never connect to the current master.. Sun May 24 18:36:07 2020 - [info] Executing master IP deactivation script: Sun May 24 18:36:07 2020 - [info] /usr/bin/master_ip_failover --orig_master_host=192.168.28.128 --orig_master_ip=192.168.28.128 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ip addr del 192.168.28.199/24 dev ens33==/sbin/ip addr add 192.168.28.199/24 dev ens33=== Disabling the VIP on old master: 192.168.28.128 Sun May 24 18:36:08 2020 - [info] done. Sun May 24 18:36:08 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sun May 24 18:36:08 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun May 24 18:36:08 2020 - [info] Sun May 24 18:36:08 2020 - [info] * Phase 3: Master Recovery Phase.. Sun May 24 18:36:08 2020 - [info] Sun May 24 18:36:08 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun May 24 18:36:08 2020 - [info] Sun May 24 18:36:08 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:154 Sun May 24 18:36:08 2020 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun May 24 18:36:08 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:08 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:08 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 18:36:08 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:08 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:08 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 18:36:08 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:154 Sun May 24 18:36:08 2020 - [info] Oldest slaves: Sun May 24 18:36:08 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:08 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:08 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 18:36:08 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:08 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:08 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 18:36:08 2020 - [info] Sun May 24 18:36:08 2020 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Sun May 24 18:36:08 2020 - [info] Sun May 24 18:36:09 2020 - [info] Fetching dead master's binary logs.. Sun May 24 18:36:09 2020 - [info] Executing command on the dead master 192.168.28.128(192.168.28.128:3306): save_binary_logs --command=save --start_file=mysql-bin.000013 --start_pos=154 --binlog_dir=/data/mysql3306/data --output_file=/tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000013 pos 154 to mysql-bin.000013 EOF into /tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. Dumping effective binlog data from /data/mysql3306/data/mysql-bin.000013 position 154 to tail(177).. ok. Binlog Checksum enabled Concat succeeded. Sun May 24 18:36:11 2020 - [info] scp from [email protected]:/tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog succeeded. Sun May 24 18:36:12 2020 - [info] HealthCheck: SSH to 192.168.28.131 is reachable. Sun May 24 18:36:13 2020 - [info] HealthCheck: SSH to 192.168.28.132 is reachable. Sun May 24 18:36:14 2020 - [info] Sun May 24 18:36:14 2020 - [info] * Phase 3.3: Determining New Master Phase.. Sun May 24 18:36:14 2020 - [info] Sun May 24 18:36:14 2020 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sun May 24 18:36:14 2020 - [info] All slaves received relay logs to the same position. No need to resync each other. Sun May 24 18:36:14 2020 - [info] Searching new master from slaves.. Sun May 24 18:36:14 2020 - [info] Candidate masters from the configuration file: Sun May 24 18:36:14 2020 - [info] 192.168.28.131(192.168.28.131:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:14 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:14 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 18:36:14 2020 - [info] Non-candidate masters: Sun May 24 18:36:14 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 18:36:14 2020 - [info] Replicating from 192.168.28.128(192.168.28.128:3306) Sun May 24 18:36:14 2020 - [info] Not candidate for the new Master (no_master is set) Sun May 24 18:36:14 2020 - [info] New master is 192.168.28.131(192.168.28.131:3306) Sun May 24 18:36:14 2020 - [info] Starting master failover.. Sun May 24 18:36:14 2020 - [info] From: 192.168.28.128(192.168.28.128:3306) (current master) +--192.168.28.131(192.168.28.131:3306) +--192.168.28.132(192.168.28.132:3306) To: 192.168.28.131(192.168.28.131:3306) (new master) +--192.168.28.132(192.168.28.132:3306) Sun May 24 18:36:14 2020 - [info] Sun May 24 18:36:14 2020 - [info] * Phase 3.4: New Master Diff Log Generation Phase.. Sun May 24 18:36:14 2020 - [info] Sun May 24 18:36:14 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun May 24 18:36:14 2020 - [info] Sending binlog.. Sun May 24 18:36:15 2020 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog to [email protected]:/tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog succeeded. Sun May 24 18:36:15 2020 - [info] Sun May 24 18:36:15 2020 - [info] * Phase 3.5: Master Log Apply Phase.. Sun May 24 18:36:15 2020 - [info] Sun May 24 18:36:15 2020 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sun May 24 18:36:15 2020 - [info] Starting recovery on 192.168.28.131(192.168.28.131:3306).. Sun May 24 18:36:15 2020 - [info] Generating diffs succeeded. Sun May 24 18:36:15 2020 - [info] Waiting until all relay logs are applied. Sun May 24 18:36:15 2020 - [info] done. Sun May 24 18:36:15 2020 - [info] Getting slave status.. Sun May 24 18:36:15 2020 - [info] This slave(192.168.28.131)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:154). No need to recover from Exec_Master_Log_Pos. Sun May 24 18:36:15 2020 - [info] Connecting to the target slave host 192.168.28.131, running recover script.. Sun May 24 18:36:15 2020 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=192.168.28.131 --slave_ip=192.168.28.131 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog --workdir=/tmp --target_version=5.7.25-28-log --timestamp=20200524183606 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx Sun May 24 18:36:16 2020 - [info] MySQL client version is 5.7.25. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog on 192.168.28.131:3306. This may take long time... Applying log files succeeded. Sun May 24 18:36:16 2020 - [info] All relay logs were successfully applied. Sun May 24 18:36:16 2020 - [info] Getting new master's binlog name and position.. Sun May 24 18:36:16 2020 - [info] mysql-bin.000008:154 Sun May 24 18:36:16 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.28.131', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun May 24 18:36:16 2020 - [info] Executing master IP activate script: Sun May 24 18:36:16 2020 - [info] /usr/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.28.128 --orig_master_ip=192.168.28.128 --orig_master_port=3306 --new_master_host=192.168.28.131 --new_master_ip=192.168.28.131 --new_master_port=3306 --new_master_user='mha' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ip addr del 192.168.28.199/24 dev ens33==/sbin/ip addr add 192.168.28.199/24 dev ens33=== Enabling the VIP - 192.168.28.199/24 on the new master - 192.168.28.131 Sun May 24 18:36:17 2020 - [info] OK. Sun May 24 18:36:17 2020 - [info] Setting read_only=0 on 192.168.28.131(192.168.28.131:3306).. Sun May 24 18:36:17 2020 - [info] ok. Sun May 24 18:36:17 2020 - [info] ** Finished master recovery successfully. Sun May 24 18:36:17 2020 - [info] * Phase 3: Master Recovery Phase completed. Sun May 24 18:36:17 2020 - [info] Sun May 24 18:36:17 2020 - [info] * Phase 4: Slaves Recovery Phase.. Sun May 24 18:36:17 2020 - [info] Sun May 24 18:36:17 2020 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sun May 24 18:36:17 2020 - [info] Sun May 24 18:36:17 2020 - [info] -- Slave diff file generation on host 192.168.28.132(192.168.28.132:3306) started, pid: 48890. Check tmp log /var/log/masterha/app1/192.168.28.132_3306_20200524183606.log if it takes time.. Sun May 24 18:36:18 2020 - [info] Sun May 24 18:36:18 2020 - [info] Sun May 24 18:36:18 2020 - [info] Log messages from 192.168.28.132 ... Sun May 24 18:36:18 2020 - [info] Sun May 24 18:36:17 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun May 24 18:36:18 2020 - [info] End of log messages from 192.168.28.132. Sun May 24 18:36:18 2020 - [info] -- 192.168.28.132(192.168.28.132:3306) has the latest relay log events. Sun May 24 18:36:18 2020 - [info] Generating relay diff files from the latest slave succeeded. Sun May 24 18:36:18 2020 - [info] Sun May 24 18:36:18 2020 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sun May 24 18:36:18 2020 - [info] Sun May 24 18:36:18 2020 - [info] -- Slave recovery on host 192.168.28.132(192.168.28.132:3306) started, pid: 48892. Check tmp log /var/log/masterha/app1/192.168.28.132_3306_20200524183606.log if it takes time.. Sun May 24 18:36:21 2020 - [info] Sun May 24 18:36:21 2020 - [info] Log messages from 192.168.28.132 ... Sun May 24 18:36:21 2020 - [info] Sun May 24 18:36:18 2020 - [info] Sending binlog.. Sun May 24 18:36:19 2020 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog to [email protected]:/tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog succeeded. Sun May 24 18:36:19 2020 - [info] Starting recovery on 192.168.28.132(192.168.28.132:3306).. Sun May 24 18:36:19 2020 - [info] Generating diffs succeeded. Sun May 24 18:36:19 2020 - [info] Waiting until all relay logs are applied. Sun May 24 18:36:19 2020 - [info] done. Sun May 24 18:36:19 2020 - [info] Getting slave status.. Sun May 24 18:36:19 2020 - [info] This slave(192.168.28.132)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000013:154). No need to recover from Exec_Master_Log_Pos. Sun May 24 18:36:19 2020 - [info] Connecting to the target slave host 192.168.28.132, running recover script.. Sun May 24 18:36:19 2020 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=192.168.28.132 --slave_ip=192.168.28.132 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog --workdir=/tmp --target_version=5.7.25-28-log --timestamp=20200524183606 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx Sun May 24 18:36:20 2020 - [info] MySQL client version is 5.7.25. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.28.128_3306_20200524183606.binlog on 192.168.28.132:3306. This may take long time... Applying log files succeeded. Sun May 24 18:36:20 2020 - [info] All relay logs were successfully applied. Sun May 24 18:36:20 2020 - [info] Resetting slave 192.168.28.132(192.168.28.132:3306) and starting replication from the new master 192.168.28.131(192.168.28.131:3306).. Sun May 24 18:36:20 2020 - [info] Executed CHANGE MASTER. Sun May 24 18:36:20 2020 - [info] Slave started. Sun May 24 18:36:21 2020 - [info] End of log messages from 192.168.28.132. Sun May 24 18:36:21 2020 - [info] -- Slave recovery on host 192.168.28.132(192.168.28.132:3306) succeeded. Sun May 24 18:36:21 2020 - [info] All new slave servers recovered successfully. Sun May 24 18:36:21 2020 - [info] Sun May 24 18:36:21 2020 - [info] * Phase 5: New master cleanup phase.. Sun May 24 18:36:21 2020 - [info] Sun May 24 18:36:21 2020 - [info] Resetting slave info on the new master.. Sun May 24 18:36:21 2020 - [info] 192.168.28.131: Resetting slave info succeeded. Sun May 24 18:36:21 2020 - [info] Master failover to 192.168.28.131(192.168.28.131:3306) completed successfully. Sun May 24 18:36:21 2020 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.28.128(192.168.28.128:3306) to 192.168.28.131(192.168.28.131:3306) succeeded Master 192.168.28.128(192.168.28.128:3306) is down! Check MHA Manager logs at mha3:/var/log/masterha/app1/app1.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 192.168.28.128(192.168.28.128:3306) The latest slave 192.168.28.131(192.168.28.131:3306) has all relay logs for recovery. Selected 192.168.28.131(192.168.28.131:3306) as a new master. 192.168.28.131(192.168.28.131:3306): OK: Applying all logs succeeded. 192.168.28.131(192.168.28.131:3306): OK: Activated master IP address. 192.168.28.132(192.168.28.132:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.28.132(192.168.28.132:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.28.131(192.168.28.131:3306) 192.168.28.131(192.168.28.131:3306): Resetting slave info succeeded. Master failover to 192.168.28.131(192.168.28.131:3306) completed successfully.

Now the VIP is on the 192.168.28.131 machine

The VIP has been deleted from the original master node

3.3 manual switching test

Restore the original master node and join the cluster to ensure that three nodes of the cluster are online

[root@mha1 masterha]# /usr/local/mysql5.7/bin/mysqld_safe --defaults-file=/data/mysql3306/etc/my.cnf & [root@mha1 masterha]# mysql -uroot -p'123456' --socket=/data/mysql3306/tmp/mysql.sock change master to master_host='192.168.28.131',master_user='repl', master_password='repl',master_log_file='mysql-bin.000008',master_log_pos=154; /*It is recommended to back up the primary database and reconfigure synchronization for recovery of production environment*/

At this time, the cluster status is detected again

[root@mha3 app1]# masterha_check_repl --conf=/etc/masterha/app1.conf

Manually switch main library

In many cases, active master-slave switching is required. At this time, the manual switching script of MHA can be used. For example, the master database can be switched back to 192.168.28.128:3306 (at this time, if the MHA is in the startup state, it must be closed)

masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --orig_master_is_new_slave --new_master_host=192.168.28.128 --new_master_port=3306

The switching process is as follows:

[root@mha3 app1]# masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --orig_master_is_new_slave --new_master_host=192.168.28.128 --new_master_port=3306 Sun May 24 19:10:29 2020 - [info] MHA::MasterRotate version 0.58. Sun May 24 19:10:29 2020 - [info] Starting online master switch.. Sun May 24 19:10:29 2020 - [info] Sun May 24 19:10:29 2020 - [info] * Phase 1: Configuration Check Phase.. Sun May 24 19:10:29 2020 - [info] Sun May 24 19:10:29 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 24 19:10:29 2020 - [info] Reading application default configuration from /etc/masterha/app1.conf.. Sun May 24 19:10:29 2020 - [info] Reading server configuration from /etc/masterha/app1.conf.. Sun May 24 19:10:30 2020 - [info] GTID failover mode = 0 Sun May 24 19:10:30 2020 - [info] Current Alive Master: 192.168.28.131(192.168.28.131:3306) Sun May 24 19:10:30 2020 - [info] Alive Slaves: Sun May 24 19:10:30 2020 - [info] 192.168.28.128(192.168.28.128:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 19:10:30 2020 - [info] Replicating from 192.168.28.131(192.168.28.131:3306) Sun May 24 19:10:30 2020 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 24 19:10:30 2020 - [info] 192.168.28.132(192.168.28.132:3306) Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled Sun May 24 19:10:30 2020 - [info] Replicating from 192.168.28.131(192.168.28.131:3306) Sun May 24 19:10:30 2020 - [info] Not candidate for the new Master (no_master is set) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.28.131(192.168.28.131:3306)? (YES/no): yes Sun May 24 19:10:32 2020 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Sun May 24 19:10:32 2020 - [info] ok. Sun May 24 19:10:32 2020 - [info] Checking MHA is not monitoring or doing failover.. Sun May 24 19:10:32 2020 - [info] Checking replication health on 192.168.28.128.. Sun May 24 19:10:32 2020 - [info] ok. Sun May 24 19:10:32 2020 - [info] Checking replication health on 192.168.28.132.. Sun May 24 19:10:32 2020 - [info] ok. Sun May 24 19:10:32 2020 - [info] 192.168.28.128 can be new master. Sun May 24 19:10:32 2020 - [info] From: 192.168.28.131(192.168.28.131:3306) (current master) +--192.168.28.128(192.168.28.128:3306) +--192.168.28.132(192.168.28.132:3306) To: 192.168.28.128(192.168.28.128:3306) (new master) +--192.168.28.132(192.168.28.132:3306) +--192.168.28.131(192.168.28.131:3306) Starting master switch from 192.168.28.131(192.168.28.131:3306) to 192.168.28.128(192.168.28.128:3306)? (yes/NO): yes Sun May 24 19:10:33 2020 - [info] Checking whether 192.168.28.128(192.168.28.128:3306) is ok for the new master.. Sun May 24 19:10:33 2020 - [info] ok. Sun May 24 19:10:33 2020 - [info] 192.168.28.131(192.168.28.131:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Sun May 24 19:10:33 2020 - [info] 192.168.28.131(192.168.28.131:3306): Resetting slave pointing to the dummy host. Sun May 24 19:10:33 2020 - [info] ** Phase 1: Configuration Check Phase completed. Sun May 24 19:10:33 2020 - [info] Sun May 24 19:10:33 2020 - [info] * Phase 2: Rejecting updates Phase.. Sun May 24 19:10:33 2020 - [info] Sun May 24 19:10:33 2020 - [info] Executing master ip online change script to disable write on the current master: Sun May 24 19:10:33 2020 - [info] /usr/bin/master_ip_online_change --command=stop --orig_master_host=192.168.28.131 --orig_master_ip=192.168.28.131 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=192.168.28.128 --new_master_ip=192.168.28.128 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx IN SCRIPT TEST====/sbin/ip addr del 192.168.28.199/24 dev ens33==/sbin/ip addr add 192.168.28.199/24 dev ens33=== Disabling the VIP on old master: 192.168.28.131 Sun May 24 19:10:33 2020 - [info] ok. Sun May 24 19:10:33 2020 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Sun May 24 19:10:33 2020 - [info] Executing FLUSH TABLES WITH READ LOCK.. Sun May 24 19:10:33 2020 - [info] ok. Sun May 24 19:10:33 2020 - [info] Orig master binlog:pos is mysql-bin.000008:154. Sun May 24 19:10:33 2020 - [info] Waiting to execute all relay logs on 192.168.28.128(192.168.28.128:3306).. Sun May 24 19:10:33 2020 - [info] master_pos_wait(mysql-bin.000008:154) completed on 192.168.28.128(192.168.28.128:3306). Executed 0 events. Sun May 24 19:10:33 2020 - [info] done. Sun May 24 19:10:33 2020 - [info] Getting new master's binlog name and position.. Sun May 24 19:10:33 2020 - [info] mysql-bin.000014:154 Sun May 24 19:10:33 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.28.128', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000014', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun May 24 19:10:33 2020 - [info] Executing master ip online change script to allow write on the new master: Sun May 24 19:10:33 2020 - [info] /usr/bin/master_ip_online_change --command=start --orig_master_host=192.168.28.131 --orig_master_ip=192.168.28.131 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=192.168.28.128 --new_master_ip=192.168.28.128 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx IN SCRIPT TEST====/sbin/ip addr del 192.168.28.199/24 dev ens33==/sbin/ip addr add 192.168.28.199/24 dev ens33=== Enabling the VIP - 192.168.28.199/24 on the new master - 192.168.28.128 Sun May 24 19:10:34 2020 - [info] ok. Sun May 24 19:10:34 2020 - [info] Setting read_only=0 on 192.168.28.128(192.168.28.128:3306).. Sun May 24 19:10:34 2020 - [info] ok. Sun May 24 19:10:34 2020 - [info] Sun May 24 19:10:34 2020 - [info] * Switching slaves in parallel.. Sun May 24 19:10:34 2020 - [info] Sun May 24 19:10:34 2020 - [info] -- Slave switch on host 192.168.28.132(192.168.28.132:3306) started, pid: 49178 Sun May 24 19:10:34 2020 - [info] Sun May 24 19:10:35 2020 - [info] Log messages from 192.168.28.132 ... Sun May 24 19:10:35 2020 - [info] Sun May 24 19:10:34 2020 - [info] Waiting to execute all relay logs on 192.168.28.132(192.168.28.132:3306).. Sun May 24 19:10:34 2020 - [info] master_pos_wait(mysql-bin.000008:154) completed on 192.168.28.132(192.168.28.132:3306). Executed 0 events. Sun May 24 19:10:34 2020 - [info] done. Sun May 24 19:10:34 2020 - [info] Resetting slave 192.168.28.132(192.168.28.132:3306) and starting replication from the new master 192.168.28.128(192.168.28.128:3306).. Sun May 24 19:10:34 2020 - [info] Executed CHANGE MASTER. Sun May 24 19:10:34 2020 - [info] Slave started. Sun May 24 19:10:35 2020 - [info] End of log messages from 192.168.28.132 ... Sun May 24 19:10:35 2020 - [info] Sun May 24 19:10:35 2020 - [info] -- Slave switch on host 192.168.28.132(192.168.28.132:3306) succeeded. Sun May 24 19:10:35 2020 - [info] Unlocking all tables on the orig master: Sun May 24 19:10:35 2020 - [info] Executing UNLOCK TABLES.. Sun May 24 19:10:35 2020 - [info] ok. Sun May 24 19:10:35 2020 - [info] Starting orig master as a new slave.. Sun May 24 19:10:35 2020 - [info] Resetting slave 192.168.28.131(192.168.28.131:3306) and starting replication from the new master 192.168.28.128(192.168.28.128:3306).. Sun May 24 19:10:35 2020 - [info] Executed CHANGE MASTER. Sun May 24 19:10:35 2020 - [info] Slave started. Sun May 24 19:10:35 2020 - [info] All new slave servers switched successfully. Sun May 24 19:10:35 2020 - [info] Sun May 24 19:10:35 2020 - [info] * Phase 5: New master cleanup phase.. Sun May 24 19:10:35 2020 - [info] Sun May 24 19:10:35 2020 - [info] 192.168.28.128: Resetting slave info succeeded. Sun May 24 19:10:35 2020 - [info] Switching master to 192.168.28.128(192.168.28.128:3306) completed successfully.

At this time, you can see that the main database has been switched back to the 192.168.28.128:3306 node.

4. Supplement

Two timing tasks are configured for clearing relay log and server clock synchronization, which are configured on each machine

Clear relay log

Because the MHA cluster recommends to close the relay log, the relay log needs to be cleaned manually. Therefore, a scheduled task can be configured for cleaning

00 01 * * 0 /usr/bin/purge_relay_logs --user=mha --password='MHAadmin123' --host=192.168.28.131' --disable_relay_log_purge >> /var/log/masterha/app1/purge_relay_logs.log 2>&1

Configure clock synchronization

You can configure the clock server of the public network or build it yourself (the production environment needs to have a self built clock server, which can be referred to Clock server setup)

*/15 * * * * /usr/sbin/ntpdate ntp1.aliyun.com; /sbin/hwclock -w

5. Summary

The most difficult point in the construction of MHA is that it often depends on the incomplete package installation and the mismatch between related scripts and versions, which leads to the failure of deployment. Another problem is that the cluster replication check and manual switchover of the main database are normal, but the main database cannot be switched when it goes down abnormally (switchover script problem). In the process of practice, my friends can communicate with me more and learn from each other to improve the skills of pit drainage. Buddy dry cargo shop can also be contacted if I need to install the software and the dependent software. I will get the official account of the "dry group" and join the group of technical exchanges with more small partners.