Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditions for non-Fatal conditions #2394

Closed
mattmoor opened this issue Nov 2, 2018 · 17 comments
Closed

Conditions for non-Fatal conditions #2394

mattmoor opened this issue Nov 2, 2018 · 17 comments
Assignees
Labels
area/API API objects and controllers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Well-understood/specified features, ready for coding. kind/spec Discussion of how a feature should be exposed to customers.
Milestone

Comments

@mattmoor
Copy link
Member

mattmoor commented Nov 2, 2018

cc @n3wscott @sixolet @evankanderson @steren @cooperneil

tl;dr This issue proposes the introduction of new non-Fatal sub-conditions, which would NOT result in a change to the "happy condition" on changes. These would be suitable for non-Fatal warnings about the state of the resource.

Feedback welcome.

Background

Today, all subconditions contribute towards overall readiness, per errors.md:

  • True when the reconciliation has succeeded. Once all transition conditions have succeeded, the "happy state" condition should be set to True.

To manage these conditions, we use a shared library (thanks @n3wscott ) that defines the following for managing the "happy" (e.g. Ready) condition based on the collective state of the rest:

	// MarkTrue sets the status of t to true, and then marks the happy condition to
	// true if all other dependents are also true.
	MarkTrue(t ConditionType)

	// MarkUnknown sets the status of t to Unknown and also sets the happy condition
	// to Unknown if no other dependent condition is in an error state.
	MarkUnknown(t ConditionType, reason, messageFormat string, messageA ...interface{})

	// MarkFalse sets the status of t and the happy condition to False.
	MarkFalse(t ConditionType, reason, messageFormat string, messageA ...interface{})

This inherently makes all of our sub-conditions what I'll call "Fatal".

Problems of Today

One of the places this bites us today is the Active condition, where we need to surface inactivity (so the Route can hook in the activator pre-stop until #1997 happens). The unfortunate side-effect of this is that the overall Revision reports Ready=False, which propagates to Configuration and Service:

  status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      type: Active
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable

One can imagine other scenarios where surfacing a warning is preferable to an outright failure state.

Proposal

I propose that we relax the requirement that all sub-conditions (potentially) trigger happy-state changes, and partition the sub-conditions of a type as follows:

  • "Fatal" conditions: If a particular condition is one of these, then it participates in readiness as today.
  • "non-Fatal" conditions (aka "Warnings"): If a particular condition is one of these, then it will not effect readiness at all, and is purely informational.

With this, the Revision status above would change to:

  status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      type: Active
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable
@knative-prow-robot knative-prow-robot added area/API API objects and controllers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Well-understood/specified features, ready for coding. kind/spec Discussion of how a feature should be exposed to customers. labels Nov 2, 2018
@n3wscott
Copy link
Contributor

n3wscott commented Nov 2, 2018

looks good to me

It might be nice to add something like log level to the condition?

  status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      role: "informative"
      type: Active
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable

default role or no role == main line happy or fatal condition.

@mattmoor
Copy link
Member Author

mattmoor commented Nov 2, 2018

@n3wscott My bias is to not, since these are a straight copy from the K8s API Principles.

@sixolet
Copy link
Contributor

sixolet commented Nov 2, 2018

Can we call them "terminal" and "non-terminal"?

I nominate that (conventionally) terminal conditions end in Ready (or possibly Succeeded too), like RoutesReady. I'd like to be able to tell the difference at a glance, if possible.

@mattmoor
Copy link
Member Author

mattmoor commented Nov 2, 2018

@sixolet I think this is difficult until we have API versioning (cc @jonjohnsonjr )

@sixolet
Copy link
Contributor

sixolet commented Nov 2, 2018

Oh, as in we already have conditions we want to be terminal named random other things you mean? Yeah, maybe "going forward we do this"?

What other ways might we have of seeing at a glance which conditions are terminal? We can't add a field (the condition schema comes from k8s)

I don't think I want to write code to depend on the names of conditions, I just want my brain to be able to sort them pleasantly.

@mattmoor
Copy link
Member Author

mattmoor commented Nov 2, 2018

as in we already have conditions we want to be terminal named random other things you mean?

Exactly.

I don't think I want to write code to depend on the names of conditions

I'm curious if/how the influence of the subcondition on the Ready condition is useful/relevant for the client to know? I think that in the most interesting cases for warnings, a condition turns False, but doesn't turn Ready=False.

I am curious what patterns you are thinking about that might not fit this naive mold?

@n3wscott
Copy link
Contributor

n3wscott commented Nov 2, 2018

The proposal given was suppose to be how conditions worked but I think it just has a bug.

@mattmoor
Copy link
Member Author

mattmoor commented Nov 2, 2018

@n3wscott I have a fix :)

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 2, 2018
This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision.  In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic.

Peripherally related to: knative#2394
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 2, 2018
This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision.  In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic.

Peripherally related to: knative#2394
@sixolet
Copy link
Contributor

sixolet commented Nov 2, 2018

The subconditions of Ready are most interesting in showing the user their progress as the components of being Ready get ticked off one-by-one. For that, we wouldn't show warning conditions, we'd just show the warning after we've resolved whatever else has happened.

For example, when deploying you might see:

 ⠷ Creating Revision
 . Migrating Traffic

And then later

✔️ Created Revision
✔️ Migrated Traffic
SUCCESS

Representing the ConfigurationsReady and RoutesReady status conditions of Service, respectively. (or the CLI might dig deeper into the status conditions of Revision, and display a mix of status conditions for the two as stages)

Either way, non-terminal conditions aren't things we check off exactly, so we wouldn't include them.

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 2, 2018
This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision.  In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic.

Peripherally related to: knative#2394
@mattmoor
Copy link
Member Author

mattmoor commented Nov 2, 2018

That does beg the question of how we show them :)

knative-prow-robot pushed a commit that referenced this issue Nov 2, 2018
This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision.  In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic.

Peripherally related to: #2394
@evankanderson
Copy link
Member

evankanderson commented Nov 2, 2018

Do non-terminal statuses need to be Conditions? What if Active were another struct alongside Condition, rather than adding additional properties to the Condition?

status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable
    active:
      lastTransitionTime: 2018-11-02T17:32:57Z
      status: "False"

Alternatively, we could have a generic Conditions-like-thing called "States" that provided some sort of lifecycle tracking if desired.

@sixolet
Copy link
Contributor

sixolet commented Nov 2, 2018

I'd like to see non-terminal statuses be Condition-like, though I don't need them to be in Conditions. That way vendors can add warnings about arbitrary vendor-specific things as needed.

mattmoor added a commit to mattmoor/pkg that referenced this issue Nov 2, 2018
This change updates the `ConditionSet` datatype to treat the set of conditions that it is initially supplied with as the set of "terminal" conditions, where as before:
1. If all `True`, results in `Ready=True`
2. If any `False`, results in `Ready=False`
3. Otherwise, `Ready=Unknown`

However, in additional to this initial "terminal" set, other conditions may be set that are "non-terminal" and have no influence over the "happy" condition (e.g. `Ready`).

Related to: knative/serving#2394
@evankanderson
Copy link
Member

Is active a non-terminal status of the sort that would suggest a warning? I think a facility for warnings is probably useful, but I want to make sure that we don't conflate the two. (Given the errors we were seeing where a Revision would go Ready->False when the revision was temporarily Active=False, I'm not sure we'd want to show a warning about Active=False if it's a normal part of successful deployment.)

@mattmoor
Copy link
Member Author

mattmoor commented Nov 2, 2018

I could see some users wanting a warning indication that hitting their service would incur additional latency due to cold starts. (shrug)

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 3, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 3, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 3, 2018
@mattmoor
Copy link
Member Author

mattmoor commented Nov 7, 2018

We decided to move ahead with this in the API Working Group today. In addition to the initial proposal, we want to add a sort of "log level" to indicate the severity of subconditions.

@evankanderson added from slack:

I'm positive on adding a severity enum with a small number of values. I'm thinking Error, Warning, Info with Info as the default.

I think this field name is clear, as are the values. I think the one thing I'd quibble about is the "default" aspect. I think we instead want:

  • "terminal" conditions manifest as severity: Error (this should probably be the "default" when this is missing to properly support legacy, which are all "terminal"), and
  • "non-terminal" conditions manifest as severity: Warning (we could support Info here as well with changes to the ConditionSet interface, but I'm inclined to punt on Info and such a breaking refactor as an optional follow-up).

@sixolet
Copy link
Contributor

sixolet commented Nov 7, 2018

I like the severity field; I don't understand how Info is any more breaking than Warning though.

mattmoor added a commit to mattmoor/pkg that referenced this issue Nov 7, 2018
This adds a Severity field to our common Condition type, as outlined [here](knative/serving#2394 (comment)).

The only way Severity is populated in this change is:
 * Terminal conditions (passed at initial construction) always result in `severity: Error`,
 * non-Terminal conditions (added outside initial set) always result in `severity: Info`.

We should think about how we want to update the `Condition{Set,Manager}` interfaces to surface more control (e.g. `severity: Warning`).
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018
knative-prow-robot pushed a commit to knative/pkg that referenced this issue Nov 8, 2018
* Update the ConditionSet to support non-terminal conditions.

This change updates the `ConditionSet` datatype to treat the set of conditions that it is initially supplied with as the set of "terminal" conditions, where as before:
1. If all `True`, results in `Ready=True`
2. If any `False`, results in `Ready=False`
3. Otherwise, `Ready=Unknown`

However, in additional to this initial "terminal" set, other conditions may be set that are "non-terminal" and have no influence over the "happy" condition (e.g. `Ready`).

Related to: knative/serving#2394

* Add severity handling to ConditionSet.

This adds a Severity field to our common Condition type, as outlined [here](knative/serving#2394 (comment)).

The only way Severity is populated in this change is:
 * Terminal conditions (passed at initial construction) always result in `severity: Error`,
 * non-Terminal conditions (added outside initial set) always result in `severity: Info`.

We should think about how we want to update the `Condition{Set,Manager}` interfaces to surface more control (e.g. `severity: Warning`).
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 9, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 9, 2018
mattmoor added a commit to mattmoor/serving that referenced this issue Nov 9, 2018
@mattmoor
Copy link
Member Author

mattmoor commented Nov 9, 2018

We've picked up changes from knative/pkg that do this, and Active is about to become non-terminal in #2397

@mattmoor mattmoor closed this as completed Nov 9, 2018
knative-prow-robot pushed a commit that referenced this issue Nov 9, 2018
@mattmoor mattmoor modified the milestone: Serving 0.3 Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Well-understood/specified features, ready for coding. kind/spec Discussion of how a feature should be exposed to customers.
Projects
None yet
Development

No branches or pull requests

5 participants