Conditions for non-Fatal conditions #2394

mattmoor · 2018-11-02T19:59:01Z

cc @n3wscott @sixolet @evankanderson @steren @cooperneil

tl;dr This issue proposes the introduction of new non-Fatal sub-conditions, which would NOT result in a change to the "happy condition" on changes. These would be suitable for non-Fatal warnings about the state of the resource.

Feedback welcome.

Background

Today, all subconditions contribute towards overall readiness, per errors.md:

True when the reconciliation has succeeded. Once all transition conditions have succeeded, the "happy state" condition should be set to True.

To manage these conditions, we use a shared library (thanks @n3wscott ) that defines the following for managing the "happy" (e.g. Ready) condition based on the collective state of the rest:

	// MarkTrue sets the status of t to true, and then marks the happy condition to
	// true if all other dependents are also true.
	MarkTrue(t ConditionType)

	// MarkUnknown sets the status of t to Unknown and also sets the happy condition
	// to Unknown if no other dependent condition is in an error state.
	MarkUnknown(t ConditionType, reason, messageFormat string, messageA ...interface{})

	// MarkFalse sets the status of t and the happy condition to False.
	MarkFalse(t ConditionType, reason, messageFormat string, messageA ...interface{})

This inherently makes all of our sub-conditions what I'll call "Fatal".

Problems of Today

One of the places this bites us today is the Active condition, where we need to surface inactivity (so the Route can hook in the activator pre-stop until #1997 happens). The unfortunate side-effect of this is that the overall Revision reports Ready=False, which propagates to Configuration and Service:

  status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      type: Active
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable

One can imagine other scenarios where surfacing a warning is preferable to an outright failure state.

Proposal

I propose that we relax the requirement that all sub-conditions (potentially) trigger happy-state changes, and partition the sub-conditions of a type as follows:

"Fatal" conditions: If a particular condition is one of these, then it participates in readiness as today.
"non-Fatal" conditions (aka "Warnings"): If a particular condition is one of these, then it will not effect readiness at all, and is purely informational.

With this, the Revision status above would change to:

  status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      type: Active
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable

The text was updated successfully, but these errors were encountered:

n3wscott · 2018-11-02T20:36:21Z

looks good to me

It might be nice to add something like log level to the condition?

  status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:32:57Z
      message: The target is not receiving traffic.
      reason: NoTraffic
      status: "False"
      role: "informative"
      type: Active
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable

default role or no role == main line happy or fatal condition.

mattmoor · 2018-11-02T20:38:31Z

@n3wscott My bias is to not, since these are a straight copy from the K8s API Principles.

sixolet · 2018-11-02T20:39:26Z

Can we call them "terminal" and "non-terminal"?

I nominate that (conventionally) terminal conditions end in Ready (or possibly Succeeded too), like RoutesReady. I'd like to be able to tell the difference at a glance, if possible.

mattmoor · 2018-11-02T20:41:52Z

@sixolet I think this is difficult until we have API versioning (cc @jonjohnsonjr )

sixolet · 2018-11-02T20:45:49Z

Oh, as in we already have conditions we want to be terminal named random other things you mean? Yeah, maybe "going forward we do this"?

What other ways might we have of seeing at a glance which conditions are terminal? We can't add a field (the condition schema comes from k8s)

I don't think I want to write code to depend on the names of conditions, I just want my brain to be able to sort them pleasantly.

mattmoor · 2018-11-02T20:55:50Z

as in we already have conditions we want to be terminal named random other things you mean?

Exactly.

I don't think I want to write code to depend on the names of conditions

I'm curious if/how the influence of the subcondition on the Ready condition is useful/relevant for the client to know? I think that in the most interesting cases for warnings, a condition turns False, but doesn't turn Ready=False.

I am curious what patterns you are thinking about that might not fit this naive mold?

n3wscott · 2018-11-02T20:56:50Z

The proposal given was suppose to be how conditions worked but I think it just has a bug.

mattmoor · 2018-11-02T21:10:20Z

@n3wscott I have a fix :)

This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision. In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic. Peripherally related to: knative#2394

sixolet · 2018-11-02T21:34:36Z

The subconditions of Ready are most interesting in showing the user their progress as the components of being Ready get ticked off one-by-one. For that, we wouldn't show warning conditions, we'd just show the warning after we've resolved whatever else has happened.

For example, when deploying you might see:

 ⠷ Creating Revision
 . Migrating Traffic

And then later

✔️ Created Revision
✔️ Migrated Traffic
SUCCESS

Representing the ConfigurationsReady and RoutesReady status conditions of Service, respectively. (or the CLI might dig deeper into the status conditions of Revision, and display a mix of status conditions for the two as stages)

Either way, non-terminal conditions aren't things we check off exactly, so we wouldn't include them.

This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision. In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic. Peripherally related to: knative#2394

mattmoor · 2018-11-02T21:37:34Z

That does beg the question of how we show them :)

This adjusts the condition that the activator keys off of to determine whether it can start forwarding requests to the Revision. In the current world, this change is just splitting hairs because these conditions end up being largely the same; however, this is relevant cleanup if we want to shift to a world where `Active=False` does not trigger `Ready=False` because requests to the activator would incorrectly see an `IsReady()` Revision and incorrectly forward traffic. Peripherally related to: #2394

evankanderson · 2018-11-02T22:17:50Z

Do non-terminal statuses need to be Conditions? What if Active were another struct alongside Condition, rather than adding additional properties to the Condition?

status:
    conditions:
    - lastTransitionTime: 2018-11-02T17:11:37Z
      status: "True"
      type: BuildSucceeded
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ContainerHealthy
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: Ready
    - lastTransitionTime: 2018-11-02T17:11:42Z
      status: "True"
      type: ResourcesAvailable
    active:
      lastTransitionTime: 2018-11-02T17:32:57Z
      status: "False"

Alternatively, we could have a generic Conditions-like-thing called "States" that provided some sort of lifecycle tracking if desired.

sixolet · 2018-11-02T22:26:35Z

I'd like to see non-terminal statuses be Condition-like, though I don't need them to be in Conditions. That way vendors can add warnings about arbitrary vendor-specific things as needed.

This change updates the `ConditionSet` datatype to treat the set of conditions that it is initially supplied with as the set of "terminal" conditions, where as before: 1. If all `True`, results in `Ready=True` 2. If any `False`, results in `Ready=False` 3. Otherwise, `Ready=Unknown` However, in additional to this initial "terminal" set, other conditions may be set that are "non-terminal" and have no influence over the "happy" condition (e.g. `Ready`). Related to: knative/serving#2394

evankanderson · 2018-11-02T22:56:41Z

Is active a non-terminal status of the sort that would suggest a warning? I think a facility for warnings is probably useful, but I want to make sure that we don't conflate the two. (Given the errors we were seeing where a Revision would go Ready->False when the revision was temporarily Active=False, I'm not sure we'd want to show a warning about Active=False if it's a normal part of successful deployment.)

mattmoor · 2018-11-02T23:09:47Z

I could see some users wanting a warning indication that hitting their service would incur additional latency due to cold starts. (shrug)

Related to: knative#2394

mattmoor · 2018-11-07T22:49:51Z

We decided to move ahead with this in the API Working Group today. In addition to the initial proposal, we want to add a sort of "log level" to indicate the severity of subconditions.

@evankanderson added from slack:

I'm positive on adding a severity enum with a small number of values. I'm thinking Error, Warning, Info with Info as the default.

I think this field name is clear, as are the values. I think the one thing I'd quibble about is the "default" aspect. I think we instead want:

"terminal" conditions manifest as severity: Error (this should probably be the "default" when this is missing to properly support legacy, which are all "terminal"), and
"non-terminal" conditions manifest as severity: Warning (we could support Info here as well with changes to the ConditionSet interface, but I'm inclined to punt on Info and such a breaking refactor as an optional follow-up).

sixolet · 2018-11-07T23:01:47Z

I like the severity field; I don't understand how Info is any more breaking than Warning though.

This adds a Severity field to our common Condition type, as outlined [here](knative/serving#2394 (comment)). The only way Severity is populated in this change is: * Terminal conditions (passed at initial construction) always result in `severity: Error`, * non-Terminal conditions (added outside initial set) always result in `severity: Info`. We should think about how we want to update the `Condition{Set,Manager}` interfaces to surface more control (e.g. `severity: Warning`).

Related to: knative#2394

* Update the ConditionSet to support non-terminal conditions. This change updates the `ConditionSet` datatype to treat the set of conditions that it is initially supplied with as the set of "terminal" conditions, where as before: 1. If all `True`, results in `Ready=True` 2. If any `False`, results in `Ready=False` 3. Otherwise, `Ready=Unknown` However, in additional to this initial "terminal" set, other conditions may be set that are "non-terminal" and have no influence over the "happy" condition (e.g. `Ready`). Related to: knative/serving#2394 * Add severity handling to ConditionSet. This adds a Severity field to our common Condition type, as outlined [here](knative/serving#2394 (comment)). The only way Severity is populated in this change is: * Terminal conditions (passed at initial construction) always result in `severity: Error`, * non-Terminal conditions (added outside initial set) always result in `severity: Info`. We should think about how we want to update the `Condition{Set,Manager}` interfaces to surface more control (e.g. `severity: Warning`).

Related to: knative#2394

mattmoor · 2018-11-09T20:57:27Z

We've picked up changes from knative/pkg that do this, and Active is about to become non-terminal in #2397

Related to: #2394

knative-prow-robot assigned mattmoor Nov 2, 2018

mattmoor mentioned this issue Nov 2, 2018

Have Activator key off of Active not Ready. #2395

Merged

mattmoor mentioned this issue Nov 2, 2018

Update the ConditionSet to support non-terminal conditions. knative/pkg#161

Merged

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 3, 2018

Make Active a non-terminal condition.

23a8b70

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 3, 2018

Make Active a non-terminal condition.

435e5bb

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 3, 2018

Make Active a non-terminal condition.

fe1602b

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018

Make Active a non-terminal condition.

4c8c8d2

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018

Make Active a non-terminal condition.

5d9015b

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018

Make Active a non-terminal condition.

d89e7a5

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 8, 2018

Make Active a non-terminal condition.

0e8c5eb

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 9, 2018

Make Active a non-terminal condition.

8850f17

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 9, 2018

Make Active a non-terminal condition.

1b1d9a9

Related to: knative#2394

mattmoor added a commit to mattmoor/serving that referenced this issue Nov 9, 2018

Make Active a non-terminal condition.

5012e7a

Related to: knative#2394

mattmoor closed this as completed Nov 9, 2018

knative-prow-robot pushed a commit that referenced this issue Nov 9, 2018

Make Active a non-terminal condition. (#2397)

6a36566

Related to: #2394

mattmoor modified the milestone: Serving 0.3 Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditions for non-Fatal conditions #2394

Conditions for non-Fatal conditions #2394

mattmoor commented Nov 2, 2018

n3wscott commented Nov 2, 2018

mattmoor commented Nov 2, 2018

sixolet commented Nov 2, 2018

mattmoor commented Nov 2, 2018

sixolet commented Nov 2, 2018

mattmoor commented Nov 2, 2018

n3wscott commented Nov 2, 2018

mattmoor commented Nov 2, 2018

sixolet commented Nov 2, 2018

mattmoor commented Nov 2, 2018

evankanderson commented Nov 2, 2018 •

edited

Loading

sixolet commented Nov 2, 2018

evankanderson commented Nov 2, 2018

mattmoor commented Nov 2, 2018

mattmoor commented Nov 7, 2018

sixolet commented Nov 7, 2018

mattmoor commented Nov 9, 2018

Conditions for non-Fatal conditions #2394

Conditions for non-Fatal conditions #2394

Comments

mattmoor commented Nov 2, 2018

Background

Problems of Today

Proposal

n3wscott commented Nov 2, 2018

mattmoor commented Nov 2, 2018

sixolet commented Nov 2, 2018

mattmoor commented Nov 2, 2018

sixolet commented Nov 2, 2018

mattmoor commented Nov 2, 2018

n3wscott commented Nov 2, 2018

mattmoor commented Nov 2, 2018

sixolet commented Nov 2, 2018

mattmoor commented Nov 2, 2018

evankanderson commented Nov 2, 2018 • edited Loading

sixolet commented Nov 2, 2018

evankanderson commented Nov 2, 2018

mattmoor commented Nov 2, 2018

mattmoor commented Nov 7, 2018

sixolet commented Nov 7, 2018

mattmoor commented Nov 9, 2018

evankanderson commented Nov 2, 2018 •

edited

Loading