-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.4] fix revision loss issue caused by compaction - 17780 #17864
Conversation
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 7173391) Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 9ea2349) Signed-off-by: Wei Fu <fuweid89@gmail.com>
Signed-off-by: Wei Fu <fuweid89@gmail.com>
// NOTE: The proc panics and exit code is 2. It's impossible to restart | ||
// that etcd proc because last exit code is 2 and Restart() refuses to | ||
// start new one. Using IsRunning() function is to cleanup status. | ||
require.False(t, clus.procs[targetIdx].IsRunning()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note:
// The proc panics and exit code is 2. It's impossible to restart
// that etcd proc because last exit code is 2 and Restart() refuses to
// start new one. Using IsRunning() function is to cleanup status.
No, it doesn't. Still checking why main/3.5 branch don't show the error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks
Hi @ahrtr @serathius For the log, 3.4 release only saves consistentIndex into db when there is transcation with changes and snapshot. Lines 103 to 108 in 48b0c49
Lines 2469 to 2477 in 48b0c49
That test doesn't trigger snapshot so that compaction change doesn't save consistent index. However, for the main or 3.5 release, we save index in PreCommit hook. So ForceCommit will save index. Lines 44 to 52 in a2911b4
etcd/server/etcdserver/server.go Lines 312 to 321 in a7a8fb8
It looks safe to merge this. |
/hold |
AnalysisSomehow I missed this info. Confirmed that basically your above comment/analysis is valid. To be clearer, the root cause of the error message The first execution was performed by etcdserver itself on bootstrap. It also updated the Line 262 in 48b0c49
The second execution was driven by the WAL record replaying (because the latest consistentIndex wasn't persisted before crashing). Since the s.compactMainRev has already been updated in the first compaction, so this time the condition Lines 250 to 256 in 48b0c49
SolutionIdeally, we shouldn't perform compaction on bootstrap. The compaction is always coming from raft/applying, so in this case it makes more sense only to trigger the compaction via the WAL replaying logic. But since it's harmless to trigger the compaction twice, so let's keep it as it's for now. I think it's safe to let this PR in for now. We don't want any surprise before we get rid of the OnPreCommitUnsafe and LockInsideApply/LockOutsideApply on the main branch, and before it's well understood by more contributors. |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backport:
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
NOTE:
I didn't see the log in main branch. It seems that etcd replays the WAL after restart.
Run the command
EXPECT_DEBUG=true CPU=4 FAILPOINTS='true' make test-e2e GO_TEST_FLAGS="-run TestReproduce17780 -count=1 -v"
and will see that following log.