-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
follow mode: set sync round after fast catchup. #5349
Conversation
select { | ||
case s.syncNow <- struct{}{}: | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this there was a deadlock when catchup was wrapping up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It deadlocks because periodic sync isn't running when catchup is setting the sync round.
// update sync round before starting services | ||
if err := node.SetSyncRound(uint64(node.ledger.LastRound())); err != nil { | ||
node.log.Warnf("unable to set sync round while resuming fast catchup: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to poll the status of catchpoint catchup to see when it finishes so that we can set the proper sync round again in Conduit.
For example, if we want to start at round 1010, but our catchpoint is at round 1000, we'll now need to catchpoint catchup, wait for completion, then set the sync round to 1010 and wait for regular catchup.
if sRound > 0 && uint64(cpRound) > sRound { | ||
return MakeCatchpointSyncRoundFailure(catchpoint, sRound) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tzaffi this is the error message you're seeing. We had previously been a little over zealous with sync round checking when running fast catchup.
Eric and I had discussed two approaches:
- Remove guard rails from fast catchup. Since it is an admin command that completely changes the state, it seemed reasonable to trust that the user is considering the impact of their action.
- Allow the user to set a sync round to a point in the past. This would allow Conduit to set the sync round forward or backward prior to a fast catchup and avoid this error message. The problem is that it would also let the user set a sync round that would never be reached.
Between these two options 1 seemed like it would cause the fewest surprises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that 1 is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convinced that the issues I'm encountering are un-related to this PR, and that this work solves the problems in question.
Summary
Automatically set the sync round after running fast catchup on a follower node.
Test Plan
New unit test.