Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix control plane node join logic #745

Merged
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions pkg/cloud/aws/actuators/machine/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,14 @@ go_test(
embed = [":go_default_library"],
deps = [
"//pkg/apis/awsprovider/v1alpha1:go_default_library",
"//pkg/cloud/aws/actuators:go_default_library",
"//pkg/cloud/aws/filter:go_default_library",
"//pkg/cloud/aws/services/awserrors:go_default_library",
"//pkg/cloud/aws/services/ec2/mock_ec2iface:go_default_library",
"//vendor/github.com/aws/aws-sdk-go/aws:go_default_library",
"//vendor/github.com/aws/aws-sdk-go/service/ec2:go_default_library",
"//vendor/github.com/aws/aws-sdk-go/service/ec2/ec2iface:go_default_library",
"//vendor/github.com/golang/mock/gomock:go_default_library",
"//vendor/k8s.io/apimachinery/pkg/apis/meta/v1:go_default_library",
"//vendor/sigs.k8s.io/cluster-api/pkg/apis/cluster/v1alpha1:go_default_library",
"//vendor/sigs.k8s.io/cluster-api/pkg/controller/machine:go_default_library",
Expand Down
36 changes: 25 additions & 11 deletions pkg/cloud/aws/actuators/machine/actuator.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,37 +96,51 @@ func machinesEqual(m1 *clusterv1.Machine, m2 *clusterv1.Machine) bool {
return m1.Name == m2.Name && m1.Namespace == m2.Namespace
}

// isNodeJoin determines if a machine, in scope, should join of the cluster.
func (a *Actuator) isNodeJoin(scope *actuators.MachineScope, controlPlaneMachines []*clusterv1.Machine) (bool, error) {
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
switch set := scope.Machine.ObjectMeta.Labels["set"]; set {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this PR, but instead of switching on the set label, this could just check if scope.Machine.Spec.Versions.ControlPlane != ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to determine whether or not this machine in scope should join the cluster or not and also be able to handle all "kinds" of machines- controlplane, worker, and any other we might add later.

scope.Machine.Spec.Versions.ControlPlane != "" will just tell us whether this is a controlplane machine or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do think that we should be able to potentially special case other solutions as well, but currently we have two different ways of determining a control-plane machine within cluster-api upstream (for the purposes of clusterctl) and per-provider. If we can standardize on a single solution for controlplane or not, then it is less cognitive overhead for the end user.

Removing the requirement of the set label for control plane instances would also reduce user confusion when trying to deploy a MachineSet or MachineDeployment, where it isn't overly clear that they need to include those in the labels for the template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on not having 2 ways of doing the same. As it is not directly related to this PR, I'll follow-up with a PR for this.

case "node":
// non control plane machines will always join the cluster.
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
return true, nil
case "controlplane":
// control plane machines will join the cluster if the cluster has an existing control plane.
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
controlplaneExists := false
var err error
for _, cm := range controlPlaneMachines {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some type of locking here to avoid parallel initialization between multiple control plane hosts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! But the upstream cluster deployer doesn't support multiple controlplane nodes in the namespace.
https://github.com/kubernetes-sigs/cluster-api/blob/master/cmd/clusterctl/clusterdeployer/clusterclient/clusterclient.go#L1045
So there will be no racing to be the first controlplane machine.
Once the upstream master start supporting multiple controlplane machines in the cluster, we'll have to make isNodeJoin thread safe.

Maybe worth adding comments about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this issue kubernetes-sigs/cluster-api#925 and I'll link it in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, multiple control-plane nodes should work today if using clusterctl it will instantiate them serially. We even have a make target for easy testing make create-cluster-ha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serial instantiation is more for the benefit of kubeadm which will not support parallel init and controlplane join until v1.15.

Copy link
Contributor Author

@ashish-amarnath ashish-amarnath May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I had forgotten about that. Thanks for reminding. I am thinking do we want to have something cluster spec to synchronize. Also, implementing that will be kinda expanding the scope of this fix. Do you think we can address that in a follow-up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber This PR, this is a bug fix. We can make isNodeJoin thread safe as an enhancement and fix to this issue #763
I don't think it should block this PR, do you disagree?

m, err := actuators.NewMachineScope(actuators.MachineScopeParams{
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
Machine: cm,
Cluster: scope.Cluster,
Client: a.clusterClient,
Logger: a.log,
Machine: cm,
Cluster: scope.Cluster,
Client: a.clusterClient,
Logger: a.log,
AWSClients: scope.AWSClients,
})

if err != nil {
return false, errors.Wrapf(err, "failed to create machine scope for machine %q", cm.Name)
a.log.V(2).Info("failed to create machine scope for machine", "machine-name", cm.Name, "machine-namespace", cm.Namespace)
continue
}

ec2svc := ec2.NewService(m.Scope)

ok, err := ec2svc.MachineExists(m)
controlplaneExists, err = ec2svc.MachineExists(m)
if err != nil {
return false, errors.Wrapf(err, "failed to verify existence of machine %q", m.Name())
return false, errors.Wrapf(err, "failed to verify existence of machine %q in namespace %q", m.Machine.Name, m.Machine.Namespace)
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
}

a.log.V(2).Info("Machine joining control plane", "machine-name", scope.Machine.Name, "machine-namespace", scope.Machine.Name, "should-join-control-plane", ok)
return ok, nil
if !controlplaneExists {
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
ashish-amarnath marked this conversation as resolved.
Show resolved Hide resolved
a.log.V(2).Info("Controlplane machine does not exist", "machine-name", m.Machine.Name, "machine-namespace", m.Machine.Namespace)
continue
} else {
a.log.V(2).Info("Controlplane machine exists", "machine-name", m.Machine.Name, "machine-namespace", m.Machine.Namespace)
break
}
}

return false, nil
a.log.V(2).Info("Machine joining control plane", "machine-name", scope.Machine.Name, "machine-namespace", scope.Machine.Name, "should-join-control-plane", controlplaneExists)
return controlplaneExists, err

default:
return false, errors.Errorf("Unknown value %q for label `set` on machine %q, skipping machine creation", set, scope.Machine.Name)
return false, errors.Errorf("Unknown value %q for label `set` on machine %q", set, scope.Machine.Name)
}
}

Expand Down
Loading