数据库架构:一主两从
master:192.168.8.57
slave1:192.168.8.58
slave2:192.168.8.59
manager:192.168.8.60
MHA工具包:
mha4mysql-manager-0.58.tar.gz
mha4mysql-node-0.58.tar.gz
一、修改master_ip_online_change内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
二、停止MHA监控程序
1 |
|
三、手工停止主库MySQL进程,模拟故障发生
mysqladmin -uroot -pmysql shutdown
四、手工故障切换
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=dead --dead_master_host=192.168.8.57 --dead_master_port=3306 --new_master_host=192.168.8.58 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.8.57. Fri Oct 26 16:18:05 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Oct 26 16:18:05 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Fri Oct 26 16:18:05 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Fri Oct 26 16:18:05 2018 - [info] MHA::MasterFailover version 0.58. Fri Oct 26 16:18:05 2018 - [info] Starting master failover. Fri Oct 26 16:18:05 2018 - [info] Fri Oct 26 16:18:05 2018 - [info] * Phase 1: Configuration Check Phase.. Fri Oct 26 16:18:05 2018 - [info] Fri Oct 26 16:18:07 2018 - [info] GTID failover mode = 1 Fri Oct 26 16:18:07 2018 - [info] Dead Servers: Fri Oct 26 16:18:07 2018 - [info] 192.168.8.57(192.168.8.57:3306) Fri Oct 26 16:18:07 2018 - [info] Checking master reachability via MySQL(double check)... Fri Oct 26 16:18:07 2018 - [info] ok. Fri Oct 26 16:18:07 2018 - [info] Alive Servers: Fri Oct 26 16:18:07 2018 - [info] 192.168.8.58(192.168.8.58:3306) Fri Oct 26 16:18:07 2018 - [info] 192.168.8.59(192.168.8.59:3306) Fri Oct 26 16:18:07 2018 - [info] Alive Slaves: Fri Oct 26 16:18:07 2018 - [info] 192.168.8.58(192.168.8.58:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Fri Oct 26 16:18:07 2018 - [info] GTID ON Fri Oct 26 16:18:07 2018 - [info] Replicating from 192.168.8.57(192.168.8.57:3306) Fri Oct 26 16:18:07 2018 - [info] 192.168.8.59(192.168.8.59:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Fri Oct 26 16:18:07 2018 - [info] GTID ON Fri Oct 26 16:18:07 2018 - [info] Replicating from 192.168.8.57(192.168.8.57:3306) Master 192.168.8.57(192.168.8.57:3306) is dead. Proceed? (yes/NO): yes Fri Oct 26 16:18:14 2018 - [info] Starting GTID based failover. Fri Oct 26 16:18:14 2018 - [info] Fri Oct 26 16:18:14 2018 - [info] ** Phase 1: Configuration Check Phase completed. Fri Oct 26 16:18:14 2018 - [info] Fri Oct 26 16:18:14 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Fri Oct 26 16:18:14 2018 - [info] Fri Oct 26 16:18:14 2018 - [info] HealthCheck: SSH to 192.168.8.57 is reachable. Fri Oct 26 16:18:15 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Fri Oct 26 16:18:15 2018 - [info] Executing master IP deactivation script: Fri Oct 26 16:18:15 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.8.57 --orig_master_ip=192.168.8.57 --orig_master_port=3306 --command=stopssh --ssh_user=root Fri Oct 26 16:18:15 2018 - [info] done. Fri Oct 26 16:18:15 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Fri Oct 26 16:18:15 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Fri Oct 26 16:18:15 2018 - [info] Fri Oct 26 16:18:15 2018 - [info] * Phase 3: Master Recovery Phase.. Fri Oct 26 16:18:15 2018 - [info] Fri Oct 26 16:18:15 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Fri Oct 26 16:18:15 2018 - [info] Fri Oct 26 16:18:15 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000012:359 Fri Oct 26 16:18:15 2018 - [info] Retrieved Gtid Set: a92f70a4-d5ea-11e8-af28-080027c0450d:10 Fri Oct 26 16:18:15 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Fri Oct 26 16:18:15 2018 - [info] 192.168.8.58(192.168.8.58:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Fri Oct 26 16:18:15 2018 - [info] GTID ON Fri Oct 26 16:18:15 2018 - [info] Replicating from 192.168.8.57(192.168.8.57:3306) Fri Oct 26 16:18:15 2018 - [info] 192.168.8.59(192.168.8.59:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Fri Oct 26 16:18:15 2018 - [info] GTID ON Fri Oct 26 16:18:15 2018 - [info] Replicating from 192.168.8.57(192.168.8.57:3306) Fri Oct 26 16:18:15 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000012:359 Fri Oct 26 16:18:15 2018 - [info] Retrieved Gtid Set: a92f70a4-d5ea-11e8-af28-080027c0450d:10 Fri Oct 26 16:18:15 2018 - [info] Oldest slaves: Fri Oct 26 16:18:15 2018 - [info] 192.168.8.58(192.168.8.58:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Fri Oct 26 16:18:15 2018 - [info] GTID ON Fri Oct 26 16:18:15 2018 - [info] Replicating from 192.168.8.57(192.168.8.57:3306) Fri Oct 26 16:18:15 2018 - [info] 192.168.8.59(192.168.8.59:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Fri Oct 26 16:18:15 2018 - [info] GTID ON Fri Oct 26 16:18:15 2018 - [info] Replicating from 192.168.8.57(192.168.8.57:3306) Fri Oct 26 16:18:15 2018 - [info] Fri Oct 26 16:18:15 2018 - [info] * Phase 3.3: Determining New Master Phase.. Fri Oct 26 16:18:15 2018 - [info] Fri Oct 26 16:18:15 2018 - [info] 192.168.8.58 can be new master. Fri Oct 26 16:18:15 2018 - [info] New master is 192.168.8.58(192.168.8.58:3306) Fri Oct 26 16:18:15 2018 - [info] Starting master failover.. Fri Oct 26 16:18:15 2018 - [info] From: 192.168.8.57(192.168.8.57:3306) (current master) +--192.168.8.58(192.168.8.58:3306) +--192.168.8.59(192.168.8.59:3306) To: 192.168.8.58(192.168.8.58:3306) (new master) +--192.168.8.59(192.168.8.59:3306) Starting master switch from 192.168.8.57(192.168.8.57:3306) to 192.168.8.58(192.168.8.58:3306)? (yes/NO): yes Fri Oct 26 16:18:22 2018 - [info] New master decided manually is 192.168.8.58(192.168.8.58:3306) Fri Oct 26 16:18:22 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] * Phase 3.3: New Master Recovery Phase.. Fri Oct 26 16:18:22 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] Waiting all logs to be applied.. Fri Oct 26 16:18:22 2018 - [info] done. Fri Oct 26 16:18:22 2018 - [info] Getting new master's binlog name and position.. Fri Oct 26 16:18:22 2018 - [info] mysql-bin.000011:565 Fri Oct 26 16:18:22 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.8.58', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Oct 26 16:18:22 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000011, 565, a92f70a4-d5ea-11e8-af28-080027c0450d:1-10, a92f70a4-d5ea-11e8-af28-080027c0450f:1-6 Fri Oct 26 16:18:22 2018 - [info] Executing master IP activate script: Fri Oct 26 16:18:22 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.8.57 --orig_master_ip=192.168.8.57 --orig_master_port=3306 --new_master_host=192.168.8.58 --new_master_ip=192.168.8.58 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx Set read_only=0 on the new master. Creating app user on the new master.. Undefined subroutine &main::FIXME_xxx_create_user called at /usr/local/bin/master_ip_failover line 94. Fri Oct 26 16:18:22 2018 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterFailover.pm, ln1612] Failed to activate master IP address for 192.168.8.58(192.168.8.58:3306) with return code 10:0 Fri Oct 26 16:18:22 2018 - [warning] Proceeding. Fri Oct 26 16:18:22 2018 - [info] ** Finished master recovery successfully. Fri Oct 26 16:18:22 2018 - [info] * Phase 3: Master Recovery Phase completed. Fri Oct 26 16:18:22 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] * Phase 4: Slaves Recovery Phase.. Fri Oct 26 16:18:22 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] * Phase 4.1: Starting Slaves in parallel.. Fri Oct 26 16:18:22 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] -- Slave recovery on host 192.168.8.59(192.168.8.59:3306) started, pid: 5792. Check tmp log /var/log/masterha/app1/192.168.8.59_3306_20181026161805.log if it takes time.. Fri Oct 26 16:18:23 2018 - [info] Fri Oct 26 16:18:23 2018 - [info] Log messages from 192.168.8.59 ... Fri Oct 26 16:18:23 2018 - [info] Fri Oct 26 16:18:22 2018 - [info] Resetting slave 192.168.8.59(192.168.8.59:3306) and starting replication from the new master 192.168.8.58(192.168.8.58:3306).. Fri Oct 26 16:18:22 2018 - [info] Executed CHANGE MASTER. Fri Oct 26 16:18:22 2018 - [info] Slave started. Fri Oct 26 16:18:22 2018 - [info] gtid_wait(a92f70a4-d5ea-11e8-af28-080027c0450d:1-10, a92f70a4-d5ea-11e8-af28-080027c0450f:1-6) completed on 192.168.8.59(192.168.8.59:3306). Executed 0 events. Fri Oct 26 16:18:23 2018 - [info] End of log messages from 192.168.8.59. Fri Oct 26 16:18:23 2018 - [info] -- Slave on host 192.168.8.59(192.168.8.59:3306) started. Fri Oct 26 16:18:23 2018 - [info] All new slave servers recovered successfully. Fri Oct 26 16:18:23 2018 - [info] Fri Oct 26 16:18:23 2018 - [info] * Phase 5: New master cleanup phase.. Fri Oct 26 16:18:23 2018 - [info] Fri Oct 26 16:18:23 2018 - [info] Resetting slave info on the new master.. Fri Oct 26 16:18:23 2018 - [info] 192.168.8.58: Resetting slave info succeeded. Fri Oct 26 16:18:23 2018 - [info] Master failover to 192.168.8.58(192.168.8.58:3306) completed successfully. Fri Oct 26 16:18:23 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.8.57(192.168.8.57:3306) to 192.168.8.58(192.168.8.58:3306) succeeded Master 192.168.8.57(192.168.8.57:3306) is down! Check MHA Manager logs at manager for details. Started manual(interactive) failover. Invalidated master IP address on 192.168.8.57(192.168.8.57:3306) Selected 192.168.8.58(192.168.8.58:3306) as a new master. 192.168.8.58(192.168.8.58:3306): OK: Applying all logs succeeded. Failed to activate master IP address for 192.168.8.58(192.168.8.58:3306) with return code 10:0 192.168.8.59(192.168.8.59:3306): OK: Slave started, replicating from 192.168.8.58(192.168.8.58:3306) 192.168.8.58(192.168.8.58:3306): Resetting slave info succeeded. Master failover to 192.168.8.58(192.168.8.58:3306) completed successfully. Fri Oct 26 16:18:23 2018 - [info] Sending mail..
五、查看数据库状态
192.168.8.58
mysql> show slave status \G Empty set (0.00 sec) mysql> show variables like '%read_only%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | innodb_read_only | OFF | | read_only | OFF | | super_read_only | OFF | | transaction_read_only | OFF | | tx_read_only | OFF | +-----------------------+-------+
当前节点变成主库,slave进程停止,只读模式关闭
192.168.8.59
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.8.58 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000011 Read_Master_Log_Pos: 565 Relay_Log_File: slave2-relay-bin.000002 Relay_Log_Pos: 414 Relay_Master_Log_File: mysql-bin.000011 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 565 Relay_Log_Space: 622 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 58 Master_UUID: a92f70a4-d5ea-11e8-af28-080027c0450f Master_Info_File: /mysql/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: a92f70a4-d5ea-11e8-af28-080027c0450b:1-4, a92f70a4-d5ea-11e8-af28-080027c0450d:1-10, a92f70a4-d5ea-11e8-af28-080027c0450f:1-6 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) mysql> show variables like '%read_only%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | innodb_read_only | OFF | | read_only | ON | | super_read_only | OFF | | transaction_read_only | OFF | | tx_read_only | OFF | +-----------------------+-------+
此节点主库变成192.168.8.58,只读模式不变
六、查看复制状态
当前主库和从库数据状态
mysql> show tables; +----------------+ | Tables_in_test | +----------------+ | t1 | | t2 | | t3 | | t4 | | t5 | | t6 | | t7 | | t8 | | t9 | +----------------+
在主库192.168.8.58创建测试表
mysql> create table t10 (id int(6)); Query OK, 0 rows affected (0.35 sec) mysql> show tables; +----------------+ | Tables_in_test | +----------------+ | t1 | | t10 | | t2 | | t3 | | t4 | | t5 | | t6 | | t7 | | t8 | | t9 | +----------------+
在从库192.168.8.59查看数据同步情况
mysql> show tables; +----------------+ | Tables_in_test | +----------------+ | t1 | | t10 | | t2 | | t3 | | t4 | | t5 | | t6 | | t7 | | t8 | | t9 | +----------------+
测试表t10已经同步,复制正常。