-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task list: support raft learner in etcd. #10537
Comments
Thanks @jpbetz for helping on the task list. @WIZARD-CXY and @jingyih will be working on this. Please let us know if you want to help on some of the tasks. |
LGTM, the task is clear. I can help in phase II and III. |
I created the feature branch here: For our development work, we can create pull request against jingyih/etcd/learner. In the end, we will merge all the changes from jingyih/etcd/learner to etcd-io/etcd/master. Since we will not squash commits, all the commits in feature branch will appear in master branch. Here is an example: #9860 Please let me know if you have any question. |
Thanks Jingyi!
We did the same feature branch approach for the clientv3 balancer
improvements last summer. All author attribution of all commits is
retained, so it's works out really nice.
…On Thu, Mar 14, 2019 at 4:14 PM Jingyi Hu ***@***.***> wrote:
@WIZARD-CXY <https://github.com/WIZARD-CXY>
I created the feature branch here:
https://github.com/jingyih/etcd/tree/learner
For our development work, we can create pull request against
jingyih/etcd/learner. In the end, we will merge all the changes from
jingyih/etcd/learner to etcd-io/etcd/master. Since we will not squash
commits, all the commits in feature branch will appear in master branch.
Here is an example: #9860 <#9860>
Please let me know if you have any question.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10537 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAf9RvIQ-BoEluDH_K3JL0_FrwvUDwnkks5vWtfGgaJpZM4bsF-6>
.
|
Roger that |
@jingyih Maybe add |
@WIZARD-CXY Thanks for pointing out:) Added. I changed the wording to be more specific: "Learner reject leadership transfer." |
@jingyih maybe exclude the learner when transfer leadership. I think it is more accurate in this case. |
Sounds good. |
This feature is currently driven by @jingyih from Google and @WIZARD-CXY from Alibaba. So it is a joint effort from the community :P. If you are interested, we definitely want your help. |
@WIZARD-CXY Thanks for the links! @xiang90 I didn't realize it wasn't just RH people. Will remove my comment, thanks for the info! |
I just read I have some thoughts about this, based on my experience building etcd clusters automatically in https://github.com/purpleidea/mgmt/ If these insights are useful, I am happy to share. I don't have a lot of time to contribute new code for this particular implementation at the moment. Some background:
I am about one week away from releasing a re-write of this code to remove all the cruft caused by my lack of knowledge in golang. Some thoughts:
Thanks for reading. If this is not helpful, please be blunt and tell me, and I won't make more noise. |
This task list was meant to be used for tracking the implementation progress. But I just added some context in the beginning. Sorry for the confusion. |
@jingyih seems we are in the end of phase III, what else need I do? From your task list, we left 3 things
|
how about I work on the |
@WIZARD-CXY I am thinking about dropping |
We might want to add:
|
Will do. I will submit a pr collecting the metrics |
Thanks. When you are ready, please create PR against etcd-io/master. Let's stop using the learner feature branch. |
ok |
@jingyih is it possible that an etcd member answer to request from clients while still "learning" |
@fabriziopandini Are you testing with etcd binary built from the master branch? The learner feature is not released yet. Learner does accept certain types of requests in the current implementation. I can provide more info if needed. Just want to first make sure if this is related to the issue you are facing. |
@jingyih thanks for the quick answer.My use case is the following: Joining control-plane nodes is a new kubeadm feature that creates a new etcd member on the joining node and then calls AddMember of the etcd V3 API (currently on etcd 3.1.10). Now, the problem we are experiencing is that etcd / the API server starts to answer before the new etcd member is aligned with the rest of the nodes, giving false answers (after few seconds everything starts to behave as expected). From my understanding, what we are experiencing is etcd answering while still learning. What I'm looking for is:
Thank you in advance for your help |
Please follow the instruction [1] on how to add a new etcd member to cluster. The AddMember API should be called before starting the new etcd member, otherwise the existing cluster will not be able to recognize and verify the new member. |
It looks as though the etcd team was planning to implement "auto promote" (see etcd-io#10537). This commit attempts to lay the groundwork for auto-promoting learners to voters by adding an --auto-promote flag to the add member command. One reason for having the ability to automatically promote learners to voters is that it would enable implementing the Raft paper's recommendation handle reconfiguration changes by first adding new members as non-voters until they have caught up with the leader.
How many learner nodes can an etcd cluster support? Does it make sense to replace watches with learner nodes? Can a learner remain a learner indefinitely? |
Currently in 3.4, the maximum number of learner is 1. This may increase in future.
Currently learner does not support watch API. To scale watch you might want to use: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/grpc_proxy.md#scalable-watch-api
Yes. Currently in 3.4 a leaner will not be auto promoted. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Background
Open questions
How to evaluate learner progressLeader does not respond to ReadIndex message from learnerTask List
The list is subject to change.
Phase I
isLearner
flag toMemberAdd
API request.MemberPromote
API.isLearner
field inStatus
maintenance API response.Phase II
Multiple developers can work in parallel on these tasks.
MemberAdd
withisLearner
flag. Includes both routing the request to raft, and applying the result.etcdctl member add --learner
.MemberPromote
.etcdctl member promote
.isLearner
field toetcdctl member list
.Status
maintenance API withisLearner
field.isLearner
field toetcdctl endpoint status
.canPromote()
check (before sending promote request to Raft consensus).Exclude learner nodes from clientv3 balancer endpoints.-> clientv3 balancer retry when request send to learner fails.Phase III
Multiple developers can work in parallel on these tasks.
etcdctl member add --learner
,etcdctl endpoint status
,etcdctl member list
Add auto-promote, but put behind a feature flag?(moved to future work)Throttle process of a learner catching up from leader.(moved to future work)Update clientv3 documentationFuture Work
These items will likely not be included in v3.4 release.
Pull Requests
Incremental PRs opened against feature branch.
Aggregated PR of feature branch against master branch.
The text was updated successfully, but these errors were encountered: