-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method (*EtcdServer) IsRaftLoopBlocked
to support checking whether the raft loop is blocked
#16710
base: main
Are you sure you want to change the base?
Conversation
a398af0
to
b90aef1
Compare
…r the raft loop is blocked Signed-off-by: Benjamin Wang <wachao@vmware.com>
b90aef1
to
fc7902a
Compare
cc @serathius |
Copied from the design doc comment:
Since the raft loop deadlock will block the next select statement execution: I can see there are two approaches:
With the goal of prober check fittng in the 1s timeout, looks like the 2nd approach is better, what do you think? @ahrtr |
cc @siyuanfoundation ^ |
This means that you need to remember the timestamp of the last tick, and check it in the liveness probe something like Note that this PR just provides a basic functionality for checking if the raftloop blocks. You can check it async. |
It's not necessary, sent a drafted PR to demonstrate #16713.
Yeah, assuming the "async" here means try send to the dummy channel and then go ahead to the remaining checks in the prober and then validate the previous sent has completed. It seems complicated compared with the counter approach. WDYT? |
I think either way, the interval between checks could be too short to differentiate a slow ready process from a deadlock. I suggest using another longer ticker to reset the count instead of resetting based on probes in #16713 |
In the second approach, it's determined by the administrator case by case. 'prober interval * failure threshold' should be a sane value based on administrator judgement. e.g. if they are using a network based volume like EBS or physically attached SSD..
Adding another ticker may not be optimal, how will you plan set the new ticker interval, is it configurable? It may make etcd set up more complicated. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Link to #16007
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.