Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When mysql semi-sync(after_sync) enabled, data may lost after DDL #1038

Closed
zhongliangkang opened this issue Oct 25, 2021 · 3 comments
Closed

Comments

@zhongliangkang
Copy link

When MySQL semi-sync after_sync enabled, data may lost after DDL in the follow position:

migrator.go

if err := this.initiateServer(); err != nil {
return err
}
defer this.server.RemoveSocketFile()
if err := this.countTableRows(); err != nil {
return err
}
if err := this.addDMLEventsListener(); err != nil {
return err
}
if err := this.applier.ReadMigrationRangeValues(); err != nil {
return err
}

If a insert operation flushed to binlog and waiting for a ACK from slave, then gh-ost run addDMLEventsListener and start apply binlog , the insert binlog will not applied by applier thread because insert binlog pos may has processed, and if this insert commited after gh-ost run ReadMigrationRangeValues , this insert data will not in the MigrationRange, then this insert data lost.

How to repeat:
This bug is hard to repeat in simple way.
Modify gh-ost code and add some test code , place a sleep before addDMLEventsListener to make it easy to repeat.

if err := this.countTableRows(); err != nil {
return err
}
// for test
time.Sleep(30*time.Second)
if err := this.addDMLEventsListener(); err != nil {
return err
}
if err := this.applier.ReadMigrationRangeValues(); err != nil {
return err
}

  1. create a semi-sync(after_sync) mysql master-backup, set rpl_semi_sync_master_timeout=150000;
  2. create table t(id int auto_increment ,primary key(id));
  3. insert into t set id=1;
  4. run gh-ost alter table : alter table t engine=innnodb, when ghc/gho table created and run at the Sleep above, run 'stop slave io_thread' on mysql backup, then insert into t set id=2 on the mysql master, now master will flush id=2 binlog and waiting for ACK from backup
  5. after about 30 seconds, the gh-ost will continue addDMLEventsListener and ReadMigrationRangeValues, now the id=2 trx still waiting for ACK and not commit in Innodb, data lost happen.
  6. after the ddl , id=2 row will not in table t, data lost.

Suggest fix:
After addDMLEventsListener, add a insert OP to GHC table and read from the table ,if insert OK and read data from the table OK, all binlog flushed but not finish commit before addDMLEventsListener commited.

if err := this.addDMLEventsListener(); err != nil {
return err
}
// after add events listener, we need wait all binlog flushed before add listener to commit before read range.
// by insert a single row into ghc table and read it out.
if err := this.WaitAllBinlogCommit(); err != nil {
return err
}
if err := this.applier.ReadMigrationRangeValues(); err != nil {
return err
}

@shaohk
Copy link
Contributor

shaohk commented Nov 15, 2021

#1054
Add a heartbeat between the addDMLEventsListener and ReadMigrationRangeValues to fix this bug.

@zhongliangkang
Copy link
Author

#1054 Add a heartbeat between the addDMLEventsListener and ReadMigrationRangeValues to fix this bug.

It work! We also fix it by writing a flag to ghc table.

@timvaillancourt
Copy link
Collaborator

👋 thanks for this report!

I will close this as a duplicate of #1039. Also please see my recent comment regarding this bug being resolved (potentially - please confirm!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants