Description:
When i tried to configurate a crash safe slave with MTS and GTID based replication, but after a OS crash
replication failed to be start.
error log: --------------------------------- 2016-10-26 21:00:23 2699 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql-relay-bin' to avoid this problem. 2016-10-26 21:00:24 2699 [Note] Slave: MTS group recovery relay log info based on Worker-Id 1, group_relay_log_name ./mysql-relay-bin.000011, group_relay_log_pos 2017523 group_master_log_name binlog.000007, group_master_log_pos 2017363 2016-10-26 21:00:24 2699 [ERROR] Error looking for file after ./mysql-relay-bin.000012. 2016-10-26 21:00:24 2699 [ERROR] Failed to initialize the master info structure 2016-10-26 21:00:24 2699 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted. 2016-10-26 21:00:24 2699 [Note] Event Scheduler: Loaded 0 events 2016-10-26 21:00:24 2699 [Note] mysqld: ready for connections. Version: '5.6.31-77.0-log' socket: '/data/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 77.0, Revision 5c1061c ---------------------------------
And, “start slave” also failed
mysql> start slave; ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository
According the manual, When using GTIDs and MASTER_AUTO_POSITION “sync_relay_log=1” is not necessary.
http://dev.mysql.com/doc/refman/5.6/en/replication-solutions-unexpected-slave-halt.html
How to repeat:
STEP1:
run a script to execute update sql on the master continuously
STEP2:
use “kill -9” kill the kvm process of the slave in the physical machine
STEP3:
start mysql of the slave
Suggested fix:
The error ocuurs when mts_recovery_groups() reading the corrupted relay log files in MTS group recovery.
But with “gtid_mode=ON” and “relay_log_recovery = 1”, the relay log files will be discard later, and so, at the beginning should skip the relay log read.
workaround:
The following steps can start slave successfully
reset slave; start slave IO_THREAD; stop slave IO_THREAD; reset slave; start slave;
Don’t forget to add below line to my.cnf to skip erros
slave-skip-errors = all