Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance user-facing Condition/ContainerTerminated Reasons with backwards compatibility #7539

Open
JeromeJu opened this issue Dec 28, 2023 · 13 comments
Assignees
Labels
kind/misc Categorizes issue or PR as a miscellaneuous one.

Comments

@JeromeJu
Copy link
Member

JeromeJu commented Dec 28, 2023

We have had use cases as specified in #7223 and #7396 that downstream users and Tekton providers would want to have more granular error reasons. As discussed at the API WG and #7390, the existing CRDs {Step, TaskRun, PipelineRun} rely on knative.dev/pkg/apis#Condition for conveying user-facing reasons.

However these Condition reasons are being relied on for our Pipeline For example, dashboard will rely on the error reason to determine the step status.

Blocked issues:
Surfacing of actual Termination Reason in Step Status
Surface TaskRun failure reason when Step or Pod fails
Make user-facing condition messages more granular as canonical error codes

Current workaround:
Option 1: Add a new provisional field in the CRDs for {Step, TaskRun, PipelineRun} Condition/ContainerTerminatedState status
Option 2: Add it to the condition/containerTerminated message

Future work plan:
With a major version bump, i.e. in v2alpha1, we would be able to change the Condition/ContainerTerminated Reason

co-authored with: @renzodavid9

@JeromeJu JeromeJu added the kind/misc Categorizes issue or PR as a miscellaneuous one. label Dec 28, 2023
@JeromeJu JeromeJu self-assigned this Dec 28, 2023
@JeromeJu
Copy link
Member Author

/assign @renzodavid9

@tekton-robot
Copy link
Collaborator

@JeromeJu: GitHub didn't allow me to assign the following users: renzodavid9.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @renzodavid9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@renzodavid9
Copy link
Contributor

/assign me

@tekton-robot
Copy link
Collaborator

@renzodavid9: GitHub didn't allow me to assign the following users: me.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign me

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@renzodavid9
Copy link
Contributor

/assign

@JeromeJu
Copy link
Member Author

JeromeJu commented Jan 4, 2024

Through offline discussion with @renzodavid9 , we've examined that it might not be a feasible way to create a new field in the pod container status to update the container termination reason since these are all knative rather than Tekton conditions.

Pros Cons Example
Add a new field More straightforward when for the transition to replace the condition Reasons during the next major version bump #7545
Add to the condition status message Easier to maintain Users might not be aware of the change

@JeromeJu
Copy link
Member Author

JeromeJu commented Jan 8, 2024

One other question is that are making changes to PipelineRunStatus Condition Reason and PipelineRunStatus Condition Reason also considered as breaking changes?

It looks like that Dashboard relies on the ConditionStatus on this to determine the completion of a PipelineRun (similarly for TaskRuns). Does this imply that the changes made to PipelineRunStatus/TaskRunStatus Reasons will not be breaking changes for Dashboard and others that consumes such status?

cc @AlanGreene @vdemeester @tektoncd/core-collaborators 🙏

@AlanGreene
Copy link
Member

AlanGreene commented Jan 8, 2024

The Dashboard also uses the reason values in a number of places, so changes to these would be considered a breaking change. See response to similar question on another issue proposing changes to the reason values for steps: #7390 (comment)

Changes for PipelineRun and TaskRun would have similar issues.

Here are just a few examples of the Dashboard's current use of the reason field for PipelineRun and TaskRun:

@JeromeJu
Copy link
Member Author

JeromeJu commented Jan 8, 2024

The Dashboard also uses the reason values in a number of places, so changes to these would be considered a breaking change. Changes for PipelineRun and TaskRun would have similar issues.

Thanks @AlanGreene for confirming the understanding on this 🙏
Would there also be a preference on the 2 options, adding a new field in PipelineRun/TaskRun status for enhancing via a new Reason field or adding the granular reasons into the Condition Status message?

@AlanGreene
Copy link
Member

It depends on whether this information is only intended for user consumption or if it's also intended to be used by clients / automation.

If it's intended for clients I don't think the message is a good option. Today I believe most clients treat it as an opaque string and avoid parsing it. If we change this we would need to define a clear API contract for its structure, content, etc. and guard against unexpected changes.

@renzodavid9
Copy link
Contributor

renzodavid9 commented Jan 8, 2024

Following on #7539 (comment), and according with offline conversation with @JeromeJu , to avoid extending the ContainerStateTerminated from the k8s lib, we can instead modify the StepStatus from Tekton to add the new field. It will look something like this:

apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
  name: task1
...
status:
  steps:
  - container: step-first
    imageID: docker-pullable://busybox@sha256:ba76950ac9eaa407512c9d859cea48114eeff8a6f12ebaa5d32ce79d4a017dd8
    name: first
    terminationReason: AnyReason # <- This will be the new field
    terminated:
      containerID: docker://aaa111222
      exitCode: 1
      finishedAt: "2024-01-08T16:27:38Z"
      reason: Error
      startedAt: "2024-01-08T16:27:38Z"

@JeromeJu
Copy link
Member Author

Following on #7539 (comment), and according with offline conversation with @JeromeJu , to avoid extending the ContainerStateTerminated from the k8s lib, we can instead modify the StepStatus from Tekton to add the new field. It will look something like this:

apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
  name: task1
...
status:
  steps:
  - container: step-first
    imageID: docker-pullable://busybox@sha256:ba76950ac9eaa407512c9d859cea48114eeff8a6f12ebaa5d32ce79d4a017dd8
    name: first
    terminationReason: AnyReason # <- This will be the new field
    terminated:
      containerID: docker://aaa111222
      exitCode: 1
      finishedAt: "2024-01-08T16:27:38Z"
      reason: Error
      startedAt: "2024-01-08T16:27:38Z"

Thansk @renzodavid9 for the discussion and this!
This looks like the way to go as consensus reached during the API WG 🙏

The next AI for us might be to create a PR RFC(requesting for feedback). i.e.

And we could together have some documentations in the deprecations md and elsewhere to claim our migration plan when moving to v2alpha1.

renzodavid9 added a commit to renzodavid9/tekton-pipeline that referenced this issue Jan 15, 2024
Related with tektoncd#7539 and tektoncd#7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
renzodavid9 added a commit to renzodavid9/tekton-pipeline that referenced this issue Jan 15, 2024
Related with tektoncd#7539 and tektoncd#7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
@renzodavid9
Copy link
Contributor

Hello there!

The open PR for this currently has ~19 tests failing (unit + integration). This is mainly due to the new field, status. steps[].terminationReason, is expected to have an specific value for some tests where we check a TaskRun status against a desired one. In offline discussion with @JeromeJu, it sounds the proper way is to update the tests to add the new field with the expected value + any extra work needed, taking into account that we might push more changes in the same PR.

Other alternative discussed:

  • For the current tests failing, during comparison, ignore the status. steps[].terminationReason field, but this might not be ideal, looks better to include the new field in the current tests

Making the observation in case we have comments from members in the community. Thanks a lot! 😄

renzodavid9 added a commit to renzodavid9/tekton-pipeline that referenced this issue Jan 22, 2024
Related with tektoncd#7539 and tektoncd#7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
renzodavid9 added a commit to renzodavid9/tekton-pipeline that referenced this issue Jan 22, 2024
Related with tektoncd#7539 and tektoncd#7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
renzodavid9 added a commit to renzodavid9/tekton-pipeline that referenced this issue Jan 29, 2024
Related with tektoncd#7539 and tektoncd#7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
renzodavid9 added a commit to renzodavid9/tekton-pipeline that referenced this issue Feb 5, 2024
Related with tektoncd#7539 and tektoncd#7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
tekton-robot pushed a commit that referenced this issue Feb 7, 2024
Related with #7539 and #7223

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We are adding a new field to store this information per step.

Co-authored-by: JeromeJu <46675578+JeromeJu@users.noreply.github.com>
Co-authored-by: Chitrang Patel <chitrang@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/misc Categorizes issue or PR as a miscellaneuous one.
Projects
None yet
Development

No branches or pull requests

4 participants