-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix state.commit is out of range on restart #11888
Conversation
c98115a
to
b5630ae
Compare
…s and wal snap entries
cc @YoyinZyc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix.
Overall looks great. Some minor changes requested. Could have been caught in our failpoints testing :) |
Just for confirmation, I replicated this on master (without this fix in place):
To replicate the Procfile change should be |
Feedback applied. Thanks for the review!
Where do we test that, by the way? I'd like to review it and see if there is anything that can be done to make it test this case better. |
We've been running failpoints in our CI https://github.com/etcd-io/etcd/blob/master/functional.yaml#L4, but maybe it wasn't enough. lgtm. I will merge after CIs and release them with new 3.3 and 3.4. |
Thanks @gyuho Hold on merging for just a bit. I just noticed that |
Added one more commit to remove orphaned |
hexIndex := strings.TrimSuffix(filepath.Base(filename), ".snap.db") | ||
index, err := strconv.ParseUint(hexIndex, 16, 64) | ||
if err != nil { | ||
return fmt.Errorf("failed to parse index from .snap.db filename '%s': %w", filename, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.snap.db seems to be redundant since it is suffix of filename
Also, we should continue the loop and try to release other files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitted #11899
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just call it .snap
since there are already different files with that suffix. But the purpose of this PR isn't to rethink the file suffix anyway...
Continuing if we hit an error sounds fine. This is merged now though, so that would need a separate PR.
} | ||
// snapshot files can be orphaned if etcd crashes after writing them but before writing the corresponding | ||
// wal log entries | ||
snapshot, err := ss.LoadNewestAvailable(walSnaps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change created a local snapshot
variable. Now the snapshot
declared on line 337 is always passed empty to validation on line 535.
…stent index and closing database on defer `err` variable shared throughout the NewServer function and used on line 396 to defer decision whether backend should be closed when starting the server failed. `snapshot` variable is first defined 407, redeclared locally on line 496 and later again used on line 625. Creation of local variable is a bug introduced in etcd-io#11888. Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
This is a rebase of #10356 (which fixes #10219), with a couple differences:
.snap
files as the very first step of the snapshot process.snap
files are ignored.snap.db
files when they are obviously out-of-datecc @brk0v, @jingyih