-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix control plane node join logic #745
Fix control plane node join logic #745
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a little trouble with the isNodeJoin function. I think it's grown in scope beyond the name implies. Could you please add a godoc about what the function is trying to do?
This PR is untested. I am working on getting myself an AWS account to test this. 🤞 |
I need some help testing this PR. I haven't managed to get myself an AWS account yet :( |
6d0a5a7
to
385177b
Compare
/retest |
Well, the code here seems to do the right thing in that my first Unfortunately, that didn't work because of kubernetes/kubeadm#1432: I had to manually remove the etcd member with |
Also, I know this is a bigger question, but I did get to wondering if "machine existence" is the right way to decide whether to init or join. Do you think there's a way to depend on something in the cluster's status? |
@sethp-nr Thanks for testing this out for me. 🙏 Yeah, I agree that we should make that decision on cluster status and not merely https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/pkg/apis/awsprovider/v1alpha1/awsclusterproviderstatus_types.go This would be something the new data model will have. @vincepri can confirm |
Oh, good catch – I see the new It sounds like whomever is responsible for that first |
And it was my pleasure to test this out – thank you for a quick turnaround! |
@@ -96,37 +96,50 @@ func machinesEqual(m1 *clusterv1.Machine, m2 *clusterv1.Machine) bool { | |||
return m1.Name == m2.Name && m1.Namespace == m2.Namespace | |||
} | |||
|
|||
// isNodeJoin determines if a machine, in scope, should join of the cluster. | |||
func (a *Actuator) isNodeJoin(scope *actuators.MachineScope, controlPlaneMachines []*clusterv1.Machine) (bool, error) { | |||
switch set := scope.Machine.ObjectMeta.Labels["set"]; set { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this PR, but instead of switching on the set
label, this could just check if scope.Machine.Spec.Versions.ControlPlane != ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to determine whether or not this machine in scope should join the cluster or not and also be able to handle all "kinds" of machines- controlplane, worker, and any other we might add later.
scope.Machine.Spec.Versions.ControlPlane != ""
will just tell us whether this is a controlplane machine or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do think that we should be able to potentially special case other solutions as well, but currently we have two different ways of determining a control-plane machine within cluster-api upstream (for the purposes of clusterctl) and per-provider. If we can standardize on a single solution for controlplane or not, then it is less cognitive overhead for the end user.
Removing the requirement of the set
label for control plane instances would also reduce user confusion when trying to deploy a MachineSet or MachineDeployment, where it isn't overly clear that they need to include those in the labels for the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on not having 2 ways of doing the same. As it is not directly related to this PR, I'll follow-up with a PR for this.
return true, nil | ||
case "controlplane": | ||
// Controlplane machines will join the cluster if the cluster has an existing control plane. | ||
controlplaneExists := false | ||
var err error | ||
for _, cm := range controlPlaneMachines { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some type of locking here to avoid parallel initialization between multiple control plane hosts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! But the upstream cluster deployer doesn't support multiple controlplane nodes in the namespace.
https://github.com/kubernetes-sigs/cluster-api/blob/master/cmd/clusterctl/clusterdeployer/clusterclient/clusterclient.go#L1045
So there will be no racing to be the first controlplane
machine.
Once the upstream master start supporting multiple controlplane machines in the cluster, we'll have to make isNodeJoin
thread safe.
Maybe worth adding comments about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this issue kubernetes-sigs/cluster-api#925 and I'll link it in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, multiple control-plane nodes should work today if using clusterctl
it will instantiate them serially. We even have a make target for easy testing make create-cluster-ha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serial instantiation is more for the benefit of kubeadm
which will not support parallel init
and controlplane join
until v1.15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I had forgotten about that. Thanks for reminding. I am thinking do we want to have something cluster spec to synchronize. Also, implementing that will be kinda expanding the scope of this fix. Do you think we can address that in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments about thread safety of isNodeJoin use shared scope instead of creating one for each controlplane verify
/assign @vincepri |
lgtm! |
/lgtm |
@chuckha needs |
/approve headdesk. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ashish-amarnath, chuckha The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Fixes the control plane node join logic
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #740
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: