Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task list: support raft learner in etcd. #10537

Closed
26 of 33 tasks
jingyih opened this issue Mar 12, 2019 · 29 comments
Closed
26 of 33 tasks

Task list: support raft learner in etcd. #10537

jingyih opened this issue Mar 12, 2019 · 29 comments
Labels

Comments

@jingyih
Copy link
Contributor

jingyih commented Mar 12, 2019

Background

Open questions

Task List
The list is subject to change.

Phase I

  • Create new feature branch - The feature branch will be merged back to master when the development is done. We will not squash commits to keep track of contribution.
  • API changes and regenerate go bindings.
    • add isLearner flag to MemberAdd API request.
    • add MemberPromote API.
    • add isLearner field in Status maintenance API response.

Phase II
Multiple developers can work in parallel on these tasks.

  • etcdserver and clientv3 support for MemberAdd with isLearner flag. Includes both routing the request to raft, and applying the result.
  • Add etcdctl member add --learner.
  • etcdserver and clientv3 implementation for MemberPromote.
  • Add etcdctl member promote.
  • Add isLearner field to etcdctl member list.
  • etcdserver and clientv3 support for Status maintenance API with isLearner field.
  • Add isLearner field to etcdctl endpoint status.
  • Learner reject all client requests except s-reads and status requests.
  • Exclude learner in leadership transfer.
  • Implement canPromote() check (before sending promote request to Raft consensus).
  • Add learner limit to etcdserver.
  • Exclude learner nodes from clientv3 balancer endpoints. -> clientv3 balancer retry when request send to learner fails.

Phase III
Multiple developers can work in parallel on these tasks.

  • Add metrics.
  • Integration test: add learner
  • Integration test: promote learner
  • Integration test: learner only accepts s-reads and status requests.
  • Integration test: cannot transfer leadership to learner
  • Integration test: add learner with large delta (CanPromote=false) requiring snapshot catchup
  • Integration test: clientv3 balancer with learner member
  • Integration test: partitioned learner
  • e2e test: etcdctl member add --learner, etcdctl endpoint status, etcdctl member list
  • Add auto-promote, but put behind a feature flag? (moved to future work)
  • Throttle process of a learner catching up from leader. (moved to future work)
  • Update etcdctl documentation
  • Update operations documentation
  • Update API documentation (auto generated when generating go bindings)
  • Update clientv3 documentation

Future Work
These items will likely not be included in v3.4 release.

  • Add auto-promote, but put behind a feature flag?
  • Throttle process of a learner catching up from leader.

Pull Requests
Incremental PRs opened against feature branch.
Aggregated PR of feature branch against master branch.

@jingyih
Copy link
Contributor Author

jingyih commented Mar 12, 2019

/cc @WIZARD-CXY @xiang90 @gyuho @jpbetz

@jingyih
Copy link
Contributor Author

jingyih commented Mar 12, 2019

Thanks @jpbetz for helping on the task list.

@WIZARD-CXY and @jingyih will be working on this. Please let us know if you want to help on some of the tasks.

@jingyih jingyih changed the title Task list: support raft leaner in etcd. Task list: support raft learner in etcd. Mar 13, 2019
@WIZARD-CXY
Copy link
Contributor

LGTM, the task is clear. I can help in phase II and III.

@jingyih
Copy link
Contributor Author

jingyih commented Mar 14, 2019

@WIZARD-CXY

I created the feature branch here:
https://github.com/jingyih/etcd/tree/learner

For our development work, we can create pull request against jingyih/etcd/learner. In the end, we will merge all the changes from jingyih/etcd/learner to etcd-io/etcd/master. Since we will not squash commits, all the commits in feature branch will appear in master branch. Here is an example: #9860

Please let me know if you have any question.

@jpbetz
Copy link
Contributor

jpbetz commented Mar 14, 2019 via email

@WIZARD-CXY
Copy link
Contributor

Roger that

@WIZARD-CXY
Copy link
Contributor

WIZARD-CXY commented Mar 21, 2019

@jingyih Maybe add modify existing code to work with the learner implementation to phase II.
Because I find that TransferLeadership function may transfer leader to a learner node which is unexpected.

@jingyih
Copy link
Contributor Author

jingyih commented Mar 21, 2019

@WIZARD-CXY Thanks for pointing out:) Added. I changed the wording to be more specific: "Learner reject leadership transfer."

@WIZARD-CXY
Copy link
Contributor

WIZARD-CXY commented Mar 22, 2019

@jingyih maybe exclude the learner when transfer leadership. I think it is more accurate in this case.

@jingyih
Copy link
Contributor Author

jingyih commented Mar 22, 2019

@jingyih maybe exclude the learner when transfer leadership. I think it is more accurate in this case.

Sounds good.

@WIZARD-CXY
Copy link
Contributor

@xiang90
Copy link
Contributor

xiang90 commented Apr 5, 2019

For anyone not working "face to face" internally on etcd, it's not obvious what this is. It was mentioned in the etcd community meeting, but I think everyone who attended was a paid RH employee. Thanks!

This feature is currently driven by @jingyih from Google and @WIZARD-CXY from Alibaba. So it is a joint effort from the community :P. If you are interested, we definitely want your help.

@purpleidea
Copy link
Contributor

@WIZARD-CXY Thanks for the links!

@xiang90 I didn't realize it wasn't just RH people. Will remove my comment, thanks for the info!

@purpleidea
Copy link
Contributor

I just read

https://etcd.readthedocs.io/en/latest/server-learner.html

I have some thoughts about this, based on my experience building etcd clusters automatically in https://github.com/purpleidea/mgmt/ If these insights are useful, I am happy to share. I don't have a lot of time to contribute new code for this particular implementation at the moment.

Some background:

I am about one week away from releasing a re-write of this code to remove all the cruft caused by my lack of knowledge in golang.

Some thoughts:

  • Instead of doing member add as a follower, wouldn't it be better for the new prospective member to connect as a client, and do some sort of "sync" to pull most of the data down, and then re-join as a member? This means existing cluster automation tools don't need to change much, they just need to add a single etcd client sync which is optional if they want the initial log data. It also means we DON'T need to break API for this.

  • Currently there's a difficult race:
    Suppose you have an N member cluster (perhaps N is 1 and hostname is A). You decide to add member B. You run member add API from A, and also member B starts up server. If there is any problem with startup, member A is in a borked state. What can you do to unbork? Raft log is blocked. A way to force remove the pending member would be useful.
    If you did this operation with a "raft learner" the server is already started but not voting, so doing the switchover from non-voting server to voting server might be a more difficult problem to deal with, since it's already part of the cluster specification!

  • I think the best way to model all this is to add MemberAPI operations into etcd.Op so that they can happen in a Txn. If this is something that intrigues anyone, then I can write up a proposal. A related proposal is: The Txn Cmp section should support AND and OR and NOT. #10571 and [feat] Watches/events on the member API should be possible #5277

Thanks for reading. If this is not helpful, please be blunt and tell me, and I won't make more noise.

@jingyih
Copy link
Contributor Author

jingyih commented Apr 5, 2019

This task list was meant to be used for tracking the implementation progress. But I just added some context in the beginning. Sorry for the confusion.

@WIZARD-CXY
Copy link
Contributor

WIZARD-CXY commented May 15, 2019

@jingyih seems we are in the end of phase III, what else need I do? From your task list, we left 3 things

Add metrics.
integration test: partitioned learner
e2e test: etcdctl member add --learner, etcdctl endpoint status, etcdctl member list

@WIZARD-CXY
Copy link
Contributor

how about I work on the add metrics

@jingyih
Copy link
Contributor Author

jingyih commented May 15, 2019

@WIZARD-CXY I am thinking about dropping integration test: partitioned learner and the e2e tests. We already added e2e test for etcdctl member add --learner, I do not feel like we need to add e2e test for etcdctl endpoint status and etcdctl member list. So probably the last one is Add metrics. Maybe first come up with a list of metrics that we want to add?

@jingyih
Copy link
Contributor Author

jingyih commented May 15, 2019

We might want to add:

  • etcd_server_is_learner
    Ref:

    # name: "etcd_server_is_leader"

  • We might also want some debugging metrics to count the number of total/failed/successful learner promotion.

@WIZARD-CXY
Copy link
Contributor

Will do. I will submit a pr collecting the metrics

@jingyih
Copy link
Contributor Author

jingyih commented May 16, 2019

Will do. I will submit a pr collecting the metrics

Thanks. When you are ready, please create PR against etcd-io/master. Let's stop using the learner feature branch.

@WIZARD-CXY
Copy link
Contributor

ok

@fabriziopandini
Copy link

fabriziopandini commented Jun 7, 2019

@jingyih is it possible that an etcd member answer to request from clients while still "learning"
We are experiencing something that might be related in Kubernetes see kubernetes-sigs/kind#588 (comment) ...

@jingyih
Copy link
Contributor Author

jingyih commented Jun 7, 2019

@fabriziopandini Are you testing with etcd binary built from the master branch? The learner feature is not released yet.

Learner does accept certain types of requests in the current implementation. I can provide more info if needed. Just want to first make sure if this is related to the issue you are facing.

@fabriziopandini
Copy link

@jingyih thanks for the quick answer.My use case is the following:
I'm a kubeadm contributor and I'm investigating an issue on kind, where they are agreesively (in parallel) joining control-plane plane nodes to a K8s.

Joining control-plane nodes is a new kubeadm feature that creates a new etcd member on the joining node and then calls AddMember of the etcd V3 API (currently on etcd 3.1.10).

Now, the problem we are experiencing is that etcd / the API server starts to answer before the new etcd member is aligned with the rest of the nodes, giving false answers (after few seconds everything starts to behave as expected). From my understanding, what we are experiencing is etcd answering while still learning.

What I'm looking for is:

  • possibly a confirmation that the above analysis makes sense (or at least that this is a possible condition)
  • any suggestions about how to prevent this; the "learner" API is really promising, but I understand it still WIP and I'm open to any suggestion

Thank you in advance for your help

@jingyih
Copy link
Contributor Author

jingyih commented Jun 10, 2019

@jingyih thanks for the quick answer.My use case is the following:
I'm a kubeadm contributor and I'm investigating an issue on kind, where they are agreesively (in parallel) joining control-plane plane nodes to a K8s.

Joining control-plane nodes is a new kubeadm feature that creates a new etcd member on the joining node and then calls AddMember of the etcd V3 API (currently on etcd 3.1.10).

Now, the problem we are experiencing is that etcd / the API server starts to answer before the new etcd member is aligned with the rest of the nodes, giving false answers (after few seconds everything starts to behave as expected). From my understanding, what we are experiencing is etcd answering while still learning.

What I'm looking for is:

  • possibly a confirmation that the above analysis makes sense (or at least that this is a possible condition)
  • any suggestions about how to prevent this; the "learner" API is really promising, but I understand it still WIP and I'm open to any suggestion

Thank you in advance for your help

Please follow the instruction [1] on how to add a new etcd member to cluster. The AddMember API should be called before starting the new etcd member, otherwise the existing cluster will not be able to recognize and verify the new member.

[1] https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/runtime-configuration.md#add-a-new-member

maxenglander added a commit to maxenglander/etcd that referenced this issue Jul 10, 2019
It looks as though the etcd team was planning to implement
"auto promote" (see etcd-io#10537). This commit attempts to lay the
groundwork for auto-promoting learners to voters by adding an
--auto-promote flag to the add member command. One reason
for having the ability to automatically promote learners to
voters is that it would enable implementing the Raft paper's
recommendation handle reconfiguration changes by first adding
new members as non-voters until they have caught up with the
leader.
@arjunsingri
Copy link

How many learner nodes can an etcd cluster support? Does it make sense to replace watches with learner nodes? Can a learner remain a learner indefinitely?

@jingyih
Copy link
Contributor Author

jingyih commented Sep 1, 2019

How many learner nodes can an etcd cluster support?

Currently in 3.4, the maximum number of learner is 1. This may increase in future.

Does it make sense to replace watches with learner nodes?

Currently learner does not support watch API. To scale watch you might want to use: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/grpc_proxy.md#scalable-watch-api

Can a learner remain a learner indefinitely?

Yes. Currently in 3.4 a leaner will not be auto promoted.

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants