-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help Wanted] Possibility of data loss when server restarts immediately after key-value put with explained conditions #14364
Comments
Please provide the following info:
|
@ahrtr (3) |
The workflow may not be correct. Refer to https://github.com/ahrtr/etcd-issues/blob/master/docs/cncf_storage_tag_etcd.md Please open a new issue if you could reproduce the issue in your test environment. |
@ahrtr what I see from the code is different from that document. Can I provide the code diff and necessary screenshots? I hope you also went through the steps I gave in the bug description. |
@ahrtr Please find the code changes and steps to repro.. please kindly note that these changes are just sleep and exit to simulate the condition that I explained in the bug description
//1. Start etcd server with changes //2. Add a key value. Allow etcdserver to acknowledge and exit immediately (with just sleep and exit to simulate the explanation) //3. Remove this control flag file and restart the etcd server //4. Check if key present // We can see no key-value |
Note that it's expected behavior by design. If you really want high availability, then you need to setup a cluster with 3 members at least. Due to the performance concern of bboltDB, etcd periodically commits the transaction instead of committing on each request. So in theory, it's possible that the bboltDB commit might actually fail for whatever system or hardware issue. But it isn't an issue if either of the following condition is true:
If the local WAL entries are successfully persisted, then etcd replays the WAL entries on startup. If there is other healthy members, then the leader will sync the missing data to other members, including the previous problematic one. In your case, you intentionally created a situation in which both conditions are false. So eventually it caused data loss. Please note that it's beyond etcd's capacity to resolve such extreme catastrophic situation, and I believe it's also beyond the capacity of any single project. You need to think about/resolve this from a more high level system architecture, such as back & restore. |
I observed the possibility of data loss and I would like the community to comment / correct me otherwise.
Before explaining that, I would like to explain the happy path when user does a PUT <key, value>. I have tried to only necessary steps to focus this issue. And considered a single etcd instance.
====================================================================================
----------api thread --------------
User calls etcdctl PUT k v
It lands in v3_server.go::put function with the message about k,v
Call delegates to series of function calls and enters v3_server.go::processInternalRaftRequestOnce
It registers for a signal with wait utility against this keyid
Call delegates further to series of function calls and enters raft/node.go::stepWithWaitOption(..message..)
It wraps this message in a msgResult channel and updates its result channel; then sends this message to propc channel.
After sending it waits on msgResult.channel
----------api thread waiting --------------
On seeing a message in propc channel, raft/node.go::run(), it wakes up and sequence of calls adds the message.Entries to raftLog
Notifies the msgResult.channel
----------api thread wakes--------------
10. Upon seeing the msgResult.channel, api thread wakes and returns down the stack back to v3_server.go::processInternalRaftRequestOnce and waits for signal that it registered at step#4
----------api thread waiting --------------
----------api thread wakes--------------
18. User thread here wakes and sends back acknowledgement
----------user sees ok--------------
====================================================================================
Here if step #13 thread is pre-empted and rescheduled by the underlying operating system after completing step #18 and when there is a power failure at the end of step 18 where after user sees error, then the kv is neither written to wal nor to database file
I think this is not seen today because it is a small window where the server has to restart immediately after step 18 (and immediately after step 12 the underlying os must have pre-empted the etcdserver/raft.go::start and added to end of the runnable Q.). Given these multiple conditions, it appears that we dont see data loss.
But it appears from the code that it is possible. To simulate, added sleep after step 12 (also added exit) and 19. I was able to see ok but the data is not in both wal and db.
If I am not correct, my apology and also please correct my understanding.
The text was updated successfully, but these errors were encountered: