Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stronger link between Machine* <-> Cluster #728

Merged

Conversation

vincepri
Copy link
Member

@vincepri vincepri commented Feb 5, 2019

Signed-off-by: Vince Prignano vincepri@vmware.com

What this PR does / why we need it:
This PR allows users to specify which cluster a machine belongs to via the use of labels or annotations. It also solves a major UX issue where users can't create multiple clusters per namespace.

Given that some providers are only using the machine actuator and don't require a cluster, if the cluster name is empty the machine will operate on a nil cluster.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #41

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 5, 2019
@vincepri vincepri force-pushed the stronger-link-machine-cluster branch 2 times, most recently from 2f4f2a6 to cc812ca Compare February 5, 2019 00:07
@vincepri
Copy link
Member Author

vincepri commented Feb 5, 2019

/assign @roberthbailey @detiber

config/crds/cluster_v1alpha1_machine.yaml Outdated Show resolved Hide resolved
@@ -194,9 +194,14 @@ func (r *ReconcileMachine) Reconcile(request reconcile.Request) (reconcile.Resul
}

func (r *ReconcileMachine) getCluster(ctx context.Context, machine *clusterv1.Machine) (*clusterv1.Cluster, error) {
if machine.Labels["cluster"] == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is using a label is still so that people can fake a cluster-id if they don't want to use the cluster object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone brought up that labels were preferred during our weekly meeting. This wouldn't make the cluster optional, but users might use a dummy cluster if they don't want to make sure of the Cluster type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why labels and/or annotations where mentioned as being preferred is because they are a much weaker link between the objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "cluster" label become a required field in machineSpec? i am debating whether we should provide flexibility here, if "cluster" label not been set, use namespace to find cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either be explicit or return no cluster. Returning the first cluster in a namespace (which is the current behavior) might bring unexpected behavior for consumers.

Copy link

@maisem maisem Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to use labels as the governing mechanism to select a cluster, I would imagine using something like a LabelSelector in the machine spec. e.g.

/// [MachineSpec]
// MachineSpec defines the desired state of Machine
type MachineSpec struct {
  ...
  ClusterSelector metav1.LabelSelector `json:"clusterSelector"`
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding selectors: I think that's a mechanism to select a set of things and not necessarily a single thing. It would make more sense having a selector in the Cluster type to select machines. But this doesn't help much when going from machine to cluster.

If you're using a single label, you could consider a non-controller owner reference in the metadata. If you wanted to take it even further, machine deployments and machine sets could have this too. This has the added benefit of kubectl delete -n <namespace> cluster <clustername> doing a cascading deletion across the resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maisem I'm not sure how a cluster selector would work in this case. The label selector wouldn't be able to select a cluster by name given that the name is a field.

As @krousey pointed out, label selector are more suited when querying list of resources. The problem that I see with ownerRef though is that's a larger change and impacts delete which I think it should be revisited during the next iteration.

For this cycle I think having a prefixed label in the machine that points to a cluster name should fix the immediate issue and won't bring in many changes to the current logic and providers.

I'll open issues to address these long-term concerns, how does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the added benefit of kubectl delete -n <namespace> cluster <clustername> doing a cascading deletion across the resources.

@krousey In the case of the AWS provider, the cloud-provider objects managed by the Cluster object are unable to be deleted while there are existing Machines, so the cascading deletion would not happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber Sounds like you want foreground cascading deletion then. https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion

@vincepri This has been an issue for a while. The ownerRef change is something that would need a wider discussion too.

config/crds/cluster_v1alpha1_machinedeployment.yaml Outdated Show resolved Hide resolved
config/crds/cluster_v1alpha1_machineset.yaml Outdated Show resolved Hide resolved
pkg/controller/machine/controller.go Outdated Show resolved Hide resolved
@@ -37,6 +37,9 @@ func TestReconcileRequest(t *testing.T) {
Name: "create",
Namespace: "default",
Finalizers: []string{v1alpha1.MachineFinalizer},
Labels: map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should use an annotation here rather than a label, that said, it would require modifying the clusterctl logic related to determining if the controlplane machine is "ready".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue mentions "stronger" link, which I guess a weak one is stronger than no link. Although I'm wondering if we should make this a field in the MachineSpec instead of a label. What are pros for annotations instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting consensus on a field in the MachineSpec might be difficult, but annotations are not as user visible and less likely for users to modify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'm thinking though that labels might be more useful to query for example machines in a cluster, where annotations is mostly extra metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri according to the k8s docs, Lables are not meant to be meaningful and relevant to users, but do not directly imply semantics: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

While one of the uses for annotations is said to be "Directives from the end-user to the implementations to modify behavior or engage non-standard features": https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/

Since the presence (or absence) of the label changes behavior in the underlying implementation, it would seem to fit more with the annotation use case rather than the label use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeadm uses a label to mark master nodes. Since labels are searchable, maybe they are better for identifying information:

You can use either labels or annotations to attach metadata to Kubernetes objects. Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects.

@vincepri vincepri force-pushed the stronger-link-machine-cluster branch 2 times, most recently from f470475 to 5dfe8c1 Compare February 5, 2019 22:06
@@ -194,9 +194,15 @@ func (r *ReconcileMachine) Reconcile(request reconcile.Request) (reconcile.Resul
}

func (r *ReconcileMachine) getCluster(ctx context.Context, machine *clusterv1.Machine) (*clusterv1.Cluster, error) {
if machine.Labels["cluster"] == "" {
klog.Warningf("Machine %q in namespace %q has no cluster label, returning nil", machine.Name, machine.Namespace)
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this return an error, instead of nil for error?

Copy link
Member Author

@vincepri vincepri Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #728 (comment), I think it makes sense to allow Machines function without a cluster, but open to different opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be just info rather than a warning, see same discussion #644 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri I am not sure I understand what "allow machines to function without a cluster means". If it means that the machines should be created/updated/deleted, then we should return and should allow the controller to perform the reconciliation. Correct me if I am wrong.

@enxebre
Copy link
Member

enxebre commented Feb 6, 2019

This is somehow related to #644
Something that crossed my mind for the use cases wanting to leverage only the machine API it feels weird to force a cluster into the machine actuator interface https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machine/actuator.go#L28 It might make sense to consider removing the cluster from the machine actuator interface, then it'd be up to the actuator implementation to get the cluster Label, ownerRef or whatever we use to link the cluster and use it.

@detiber
Copy link
Member

detiber commented Feb 6, 2019

Something that crossed my mind for the use cases wanting to leverage only the machine API it feels weird to force a cluster into the machine actuator interface https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machine/actuator.go#L28 It might make sense to consider removing the cluster from the machine actuator interface, then it'd be up to the actuator implementation to get the cluster Label, ownerRef or whatever we use to link the cluster and use it.

@enxebre I think it would be overly disruptive to change the machine actuator interface at this point as we are trying to get to v1alpha1 especially since it would require non-trivial changes to all of the current provider implementations. I also don't think it makes sense to require provider implementations to recreate the same logic. I think we should leverage the proposal that @vincepri and @roberthbailey have started for post-v1alpha1 to refactor the types for this: https://docs.google.com/document/d/1pzXtwYWRsOzq5Ftu03O5FcFAlQE26nD3bjYBPenbhjg/edit#

@vincepri vincepri changed the title WIP: Stronger link between Machine* <-> Cluster Stronger link between Machine* <-> Cluster Feb 6, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2019
@vincepri vincepri force-pushed the stronger-link-machine-cluster branch 3 times, most recently from f95bc2b to f412e47 Compare February 6, 2019 20:36
@vincepri
Copy link
Member Author

vincepri commented Feb 6, 2019

@detiber ptal

@detiber
Copy link
Member

detiber commented Feb 6, 2019

@vincepri overall, lgtm. Are there any gitbook changes that are needed for this change?

@vincepri
Copy link
Member Author

vincepri commented Feb 6, 2019

CC @davidewatson, are the changes for gitbook going into a different branch?

@davidewatson
Copy link
Contributor

davidewatson commented Feb 6, 2019

@vincepri: The documentation source exists in the docs/book/ directory of the master branch. There is currently an incomplete PR describing how to regenerate the documentation and manually push it to the gh-pages branch. I'll work on completing this PR now and maybe we can push any changes for this PR afterward?

Machines can be associated with a Cluster using a custom label
`cluster.k8s.io/clusterName`. Providers using the `Cluster` controller must
provide the label which references the name of a cluster living in the same
namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit awkward to me as if the label is used by the Cluster controller.

What about:

Machines can optionally be associated with a Cluster using a custom label... If the label is set, then it must reference the name of a cluster residing in the same namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded, thanks!

@vincepri vincepri force-pushed the stronger-link-machine-cluster branch from bf6bc41 to a30b436 Compare February 7, 2019 15:55
@detiber
Copy link
Member

detiber commented Feb 7, 2019

/hold
/lgtm

lgtm, but adding hold to allow additional feedback prior to the bot merging.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 7, 2019
@vincepri
Copy link
Member Author

vincepri commented Feb 7, 2019

/test pull-cluster-api-test

@vincepri vincepri force-pushed the stronger-link-machine-cluster branch from a30b436 to e23e476 Compare February 8, 2019 17:46
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2019
@vincepri
Copy link
Member Author

vincepri commented Feb 8, 2019

@detiber ptal.

@detiber
Copy link
Member

detiber commented Feb 8, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2019
@vincepri vincepri added this to the v1alpha1 milestone Feb 10, 2019
@detiber
Copy link
Member

detiber commented Feb 11, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 11, 2019
@vincepri
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 11, 2019
Signed-off-by: Vince Prignano <vincepri@vmware.com>
@vincepri vincepri force-pushed the stronger-link-machine-cluster branch from e23e476 to 83e4671 Compare February 11, 2019 15:54
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2019
@vincepri
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 11, 2019
@vincepri
Copy link
Member Author

@detiber
Copy link
Member

detiber commented Feb 11, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit a3f503f into kubernetes-sigs:master Feb 11, 2019
@vincepri vincepri deleted the stronger-link-machine-cluster branch July 26, 2019 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Establish a stronger link between Machines, MachineSets, and MachineDeployments with their Cluster.