Solution to MySQL Replication Problems: Master DB crashed

Monday, May 12, 2008

Solution to MySQL Replication Problems: Master DB crashed

Monday, May 12, 2008 ANSOR No comments

It is commons that replication will hang after master database crashed. And the error message you will get when you show slave status is:
Client requested master to start replication from impossible position (server_errno=1236)
This happened because MySQL write to bin-log in batch basis. If every bit of your data is mission critical and any inconsistent between master and salve is not accectable, then you should redo you replication all over again.
So far I do not found any way to prevent this replication failure. However, you may add the follwing line into you my.cnf to reduce the chances it happen:
sync_binlog=1
This means that MySQL will write to bin-log after each update instead of write in batch. As you may expect, this will reduce the performance of the database.
Else, there is another way to resume the replication. The solution is illustrate in this article: Client asks master to start replication from impossible position

I get this question a lot. Why does a slave report that it's trying to replicate from an impossible position? 9 times out of 10 it's because the master crashed and when it came back online a new binlog file was made.
mySQL caches binlog events in the binlog cache, basically events are stored in memory and flushed to disk when the dirty buffer fills up. I believe the variable is called binlog_cache_size.
Here are some steps to recover from this:
Go onto the master execute
SHOW MASTER STATUS
Look at the output and find the log that the slave is pointing to. Look at the File size field.
Next look at the slave output from the slave reporting the issue. Look at Exec_Master_Log_Pos, is that value greater then the File Size on the master if so issue
CHANGE MASTER TO MASTER_LOG_FILE=[NEXT FILE], MASTER_LOG_POS=4;
slave start;
Now if your super sensitive of lost events because a row or two could of been lost from this replication event, do some spot testing for tables written to often. Look for anything that has changed within the outage window, if the data doesn't match the slave then you're master and slave are out of sync, and will require a full clone to get the data back in sync.