Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subresource status failing to stop infinite loop #2795

Closed
cacois opened this issue Apr 7, 2020 · 7 comments
Closed

Subresource status failing to stop infinite loop #2795

cacois opened this issue Apr 7, 2020 · 7 comments
Assignees
Labels
triage/support Indicates an issue that is a support question.

Comments

@cacois
Copy link

cacois commented Apr 7, 2020

Bug Report

What did you do?
A clear and concise description of the steps you took (or insert a code snippet).

As documented here, in order to avoid CR status changes causing an infinite loop of the reconcile() function, we should make the status block a subresource, following steps outlined here.

However, we have found that even though we successfully set the CRD status attribute as a subresource, we still see infinite looping of reconcile() when status is changed. A simple example to reproduce, using the canonical PodSet operator demo, would have the following types spec:

// +kubebuilder:subresource:status
// +kubebuilder:resource:path=podsets,scope=Namespaced
type PodSet struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   PodSetSpec   `json:"spec,omitempty"`
	Status PodSetStatus `json:"status,omitempty"`
}

which results in a CRD with a subresource, as expected:

...
spec:
  group: app.example.com
  names:
    kind: PodSet
    listKind: PodSetList
    plural: podsets
    singular: podset
  scope: Namespaced
  subresources:
    status: {}
  validation:
...

However, placing the following code in the controller reconcile() function:

// NOTE: the 'Updated' timestamp will change each time reconcile() is called
// If status is not treated correctly as a subresource
instance.Status.Updated = time.Now().String()
err = r.client.Status().Update(context.TODO(), podSet)
if err != nil {
	reqLogger.Error(err, "failed to update the podSet")
	return reconcile.Result{}, err
}

This updates the 'Updated' timestamp on the status object every time reconcile() is called. According to the documentation, this change to the status object should not trigger a call of the reconcile() function because the status object is a subresource, thus avoiding an infinite loop. However, the infinite loop occurs. Is the documentation wrong, is there a bug, or am I missing something?

What did you expect to see?

When setting the 'status' attribute as a subresource of a CRD, I should be able to update status in the reconcile() function without re-invoking the reconcile() loop.

What did you see instead? Under which circumstances?

When updating the status attribute, even though its a subresource, I see an infinite loop of the reconcile() function.

Environment

  • operator-sdk version: v0.15.2
  • go version: 1.13.8
  • Kubernetes version information: 1.16.2
  • Kubernetes cluster kind: openshift

  • Are you writing your operator in ansible, helm, or go? go

Possible Solution

Additional context
I've created a functional reproduction of the issue in this repo

@joelanford joelanford self-assigned this Apr 13, 2020
@joelanford joelanford added the triage/support Indicates an issue that is a support question. label Apr 13, 2020
@joelanford
Copy link
Member

This is the way Kubernetes control loops are intended to work. Whenever a change occurs in your CR, it will trigger an event to reconcile your CR again. To solve your problem, you can either use a predicate (e.g. the GenerationChangedPredicate that ignores status updates), or you can make sure you only ever update your CR status Updated field when your reconcile loop actually makes a change to converge the current state to the desired state.

We'll take an action to see if some documentation needs to be updated to make this more clear.

@cacois
Copy link
Author

cacois commented Apr 13, 2020

@joelanford I'm very confused then. The docs specifically say:

Under normal circumstances, If we were updating our resource every time we execute the reconcile cycle, this would trigger an update event which in turn would trigger a reconcile cycle in an endless loop.

For this reason, Status should be modeled as a subresource as explained here.

Is this not specifically saying that this is not the way the control loop should work when updating a subresource (as opposed to a normal resource)?

I can't see any other way to read this, and I have found the same sentiment documented in a number of places.

@joelanford
Copy link
Member

joelanford commented Apr 13, 2020

@cacois Yeah that seems misleading at best.

Status updates, even if using status subresources, bump the generation on CRs. So while changing the status as a subresource doesn’t directly change the main resource, it does for all intents and purposes because the Kubernetes control plane will make that change automatically.

I’ll look into getting that doc updated.

EDIT: This was not correct. Thanks @raffaelespazzoli for the clarification.

@cacois
Copy link
Author

cacois commented Apr 14, 2020

@joelanford Thanks for the explanation! I assume those docs aren't open source, otherwise I'd offer to help. I'll look into the other options you provided.

Thanks again!

@raffaelespazzoli
Copy link
Contributor

raffaelespazzoli commented Apr 14, 2020

let me try to clarify. By using status as subresource, when we upgrade just the status with this syntax: r.Status().Update(context.Background(), instance), the operator still receives an update event.
However in this update event the generation field is not changed. So if we have a predicate that discards update events where the generation field is not changed we avoid an infinite loop.

@joelanford
Copy link
Member

@raffaelespazzoli Thanks for the clarification. I was getting mixed up in my previous comments.

So TL;DR (and to make sure my understanding is now correct) 🙂

  1. Using status subresource prevents resourceVersion from getting bumped in the main resource.
  2. Status changes do not affect generation (even if not using the status subresource)
  3. Updates to status subresources still do cause watchers to see an update event for the main resource.
  4. To reconcile events where only the spec has changed, use the GenerationChangedPredicate

@raffaelespazzoli
Copy link
Contributor

point #2 is wrong. generation is increased by the apiserver every time something changes in a resource (but not if it changes in a subsresource).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

3 participants