When a master server fails, a replicated slave can be "promoted" to take over for it.
If you have several slaves, you need to pick the best slave to promote, and then reconfigure the remaining slaves to replicate from the promoted slave.
Because MySQL Replication is asynchronous, on systems under heavy load, some slaves may not have received every transaction the master committed before it failed. (This can be mitigated in a planned outage.)
So it is necessary before promoting a slave to make sure that it has the most complete transaction history available. Then when changing the other slaves to replicate from the new master, each needs to start replicating from the last transaction that slave already knows about.
First, review the replication status on each slave to see which has received the most data from the now-dead master.
The most up-to-date slave will have the highest Master_Log_File
number. If both slaves were reading from the same file, the one with the highest Read_Master_Log_Pos
is most up-to-date.
Connect to slave_a.example.com
slave_a ~ $ mysql -u root mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Connecting to master Master_Host: master.example.com Master_User: replicator Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 84840 Relay_Log_File: mysql-relay-bin.000004 Relay_Log_Pos: 4 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Connecting Slave_SQL_Running: Yes ......some content not shown
Connect to slave_b.example.com
slave_b ~ $ mysql -u root
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Connecting to master
Master_Host: master.example.com
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 90308
Relay_Log_File: mysql-relay-bin.000004
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
......some content not shown
Here, slave_b is further ahead than slave_a.
Because slave_b is more up-to-date, it will become the new master.
First, make sure this server can't be interrupted accidentally by any efforts to recover the old master, by stopping replication:
mysql> slave stop; Query OK, 0 rows affected (0.02 sec) mysql> reset slave all; Query OK, 0 rows affected (0.06 sec)
Now that slave_b will become a writeable master instead of a read slave, we need to un-set the read_only
configuration.
On slave_b edit /etc/my.cnf
mysql> exit slave_b ~ $ sudoedit /etc/my.cnf
Delete the line read_only = true
/etc/my.cnf[mysqld] read_only = true relay_log = mysql-relay-bin log_slave_updates = 1 log_bin = mysql-bin server_id = 30 ......some content not shown
Now restart MySQL:
slave_b ~ $ sudo service mysqld restart Stopping mysqld: [ OK ] Starting mysqld: [ OK ]
SHOW SLAVE STATUS
shows no replication configuration./etc/my.cnf
does not set read_only
read_only
has value OFF
First, you need to find the last data slave_a received from the master. Begin by finding the binary log file slave_a is currently writing to.
mysql> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 | 38755 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
Now we can use mysqlbinlog
to read the binary log file and find the last statement this server replicated from the old master.
MySQL's data directory is /var/lib/mysql/
in this course. This is set by the datadir
setting in /etc/my.cnf
mysql> exit Bye slave_a ~ $ sudo mysqlbinlog /var/lib/mysql/mysql-bin.000003 | tail -n20 #130410 20:38:59 server id 10 end_log_pos 82400 Query thread_id=7 exec_time=0 error_code=0 SET TIMESTAMP=1365626339/*!*/; BEGIN /*!*/; # at 82400 #130410 20:38:59 server id 10 end_log_pos 82428 Intvar SET INSERT_ID=298/*!*/; # at 82428 #130410 20:38:59 server id 10 end_log_pos 82581 Query thread_id=7 exec_time=0 error_code=0 SET TIMESTAMP=1365626339/*!*/; INSERT INTO stuff VALUES (NULL, 'sing zone wrap swap', '2013-04-09 03:02:38') /*!*/; # at 82581 #130410 20:38:59 server id 10 end_log_pos 82608 Xid = 608 COMMIT/*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
Now on the new master, slave_b find the corresponding event. The central problem now is that when an event replicates from the master, where it is stored in the slave's binary log is entirely up to the slave. If slaves are rebooted or upgraded at different times, their binary log layout can be completely different.
Start by seeing how many binary log files we need to look through.
slave_b ~ $ cd /var/lib/mysql slave_b /var/lib/mysql $ ls mysql-bin.* mysql-bin.000001 mysql-bin.000003 mysql-bin.index mysql-bin.000002 mysql-bin.000004
Now search these files for the exact statement that was the last to replicate to slave_a. When you find it, you want to note the log location of the event that BEGINs next.
slave_b /var/lib/mysql $ sudo mysqlbinlog mysql-bin.000001 | grep "INSERT INTO stuff VALUES (NULL, 'shin noon yard him', '2013-04-09 07:00:08')" --context 10 slave_b /var/lib/mysql $ sudo mysqlbinlog mysql-bin.000002 | grep "INSERT INTO stuff VALUES (NULL, 'shin noon yard him', '2013-04-09 07:00:08')" --context 10 #130411 1:59:03 server id 10 end_log_pos 18811 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1365645543/*!*/; BEGIN /*!*/; # at 18811 #130411 1:59:03 server id 10 end_log_pos 18839 Intvar SET INSERT_ID=296/*!*/; # at 18839 #130411 1:59:03 server id 10 end_log_pos 18991 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1365645543/*!*/; INSERT INTO stuff VALUES (NULL, 'shin noon yard him', '2013-04-09 07:00:08') /*!*/; # at 18991 #130411 1:59:03 server id 10 end_log_pos 19018 Xid = 606 COMMIT/*!*/; # at 19018 #130411 1:59:03 server id 10 end_log_pos 19085 Query thread_id=9 exec_time=0 error_code=0 SET TIMESTAMP=1365645543/*!*/; BEGIN /*!*/; # at 19085 slave_b /var/lib/mysql $
In the output above, we found the last statement to replicate to slave_a, in file mysql-bin.000002
. The next statement (that will be news to slave_a) begins at position 19085
Now restart replication on slave_a pulling events after the most recent from slave_b. Be sure you're using the file and position you just collected from slave_b.
slave_a ~ $ mysql -u root mysql> slave stop; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE MASTER TO MASTER_HOST='slave_b.example.com', MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=19085; Query OK, 0 rows affected (0.05 sec) mysql> slave start; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: slave_b.example.com Master_User: replicator Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000004 Read_Master_Log_Pos: 107 Relay_Log_File: mysql-relay-bin.000004 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 107 Relay_Log_Space: 555 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 30 1 row in set (0.00 sec)
SHOW SLAVE STATUS
shows Master_Host
is slave_b.example.com
SHOW SLAVE STATUS
shows Slave_IO_Running
is Yes
SHOW SLAVE STATUS
shows Slave_SQL_Running
is Yes
SHOW SLAVE STATUS
shows Last_Errno
is 0
No process is foolproof. How can you ensure that slave_a now contains exactly the same data slave_b does?
Because both servers are lightly loaded and the data set is small, we can make a complete backup of our application data (the important
database) and get an MD5 digest of the backup. If both are identical, the MD5 will be identical. If they differ, you could diff
the backups or otherwise work with your application to find and mitigate problems.
On both servers:
slave_a ~ $ mysqldump -u root important --skip-dump-date | md5sum 38f3cbbdb073619c744465b4c48c0463 -
slave_b ~ $ mysqldump -u root important --skip-dump-date | md5sum 38f3cbbdb073619c744465b4c48c0463 -
In this example, the MD5 digests match, so the data must be identical.
Now, insert some new data on the new master, slave_b.
slave_b ~ $ mysql -u root Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6 Server version: 5.5.29-log MySQL Community Server (GPL) Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> insert into important.stuff set details='Now I am the master'; Query OK, 1 row affected (0.03 sec) mysql>
The row you just inserted should show up on the slave_a server almost immediately.
mysql> select * from important.stuff; +-----+---------------------+---------------------+ | id | details | happened | +-----+---------------------+---------------------+ | 412 | Now I am the master | 2013-04-12 19:29:33 | +-----+---------------------+---------------------+ 1 row in set (0.00 sec) mysql>
important.stuff