[proposal] Capturing step exit codes in Tasks #2800

afrittoli · 2020-06-10T14:59:55Z

Expected Behavior

It should be possible to use easily use off-the-shelves (OTS) images as steps in Tekton tasks.

Actual Behavior

When using OTS images, a step / task author does not control the error case behaviour of the image. It may be desirable for the pipeline to continue even if there was a failure.

One way to achieve this could be to wrap the original image into a custom image, but that does not scale well if the number of images grows.

An alternative way would be to allow a Task to fail without having the whole pipeline failing. We discussed this a lot in the past. Finally tasks will help a bit, cloud events also provide a way around this, however neither solution provide a solution that feels natural for the problem.

Proposed Solution

We could optionally capture the exit code from steps and expose it as test results.
This could be achieved using entrypoint image, which already wraps around all steps images.
This feature could be enabled by setting a flag in the task spec (I would not consider this decision a runtime one).

The benefits would be:

the exit code becomes available as a result to following tasks and conditions, allowing for branching based on the result, using conditions as they exists today
limited change to the task execution logic:
- once a step fails, no more steps are executed (like before)
- if the exit code is captured, the step failure does not cause a task failure
no change to the pipeline logic: once a task fails, the pipeline fails
the solution could co-exist with any future development in terms of conditionals and workflow capabilities

/kind feature

afrittoli · 2020-06-10T15:00:33Z

/cc @pritidesai @dibyom @vdemeester @bobcatfish @skaegi

bitsofinfo · 2020-06-10T15:02:01Z

Thank you! (for reference) https://tektoncd.slack.com/archives/CK3HBG7CM/p1591741813404000

jlpettersson · 2020-06-10T20:31:21Z

Indeed an interesting use case!

I feel that finally: in combination with a SlackMessage Task or a CloudEvent Task is very related here.

afrittoli · 2020-06-10T20:39:47Z

@jlpettersson you can use finally and/or cloud events to achieve that, but you cannot achieve branching within the pipeline that way. The only way today to react to a failed task today is outside the pipeline. A task in finally could trigger another pipeline or a cloud event could also trigger another pipeline.

jlpettersson · 2020-06-10T21:50:31Z

@jlpettersson you can use finally and/or cloud events to achieve that, but you cannot achieve branching within the pipeline that way. The only way today to react to a failed task today is outside the pipeline. A task in finally could trigger another pipeline or a cloud event could also trigger another pipeline.

That is a good point.

A solution here, could be allowedToFail as I proposed in #2635

pritidesai · 2020-06-11T17:44:48Z

@afrittoli I can not think of any use case at this moment, but would be beneficial to pipeline level exception or onFailure

afrittoli · 2020-06-11T17:54:09Z

The use case that triggered this is being able to use off-the-shelves images and control the exit code, so that the pipeline can be branched based on the outcome of the task.
If would allow implementing something like onFailure without forcing the task do define the logic to be taken on failure, leaving it to the next tasks to decide.

bitsofinfo · 2020-06-12T13:53:13Z

Yes I'm not interested in having to create/maintain another thing outside of tekton for capturing/monitoring for events and then making additional calls against tekton to react to those events.

I just need a simple way to get access to a task's exit code. Right now I'm hacking this with something like the below (only for off the shelf images that I can wrap). I obviously can't do this with an off the shelf image that I am not wrapping in my own script task etc.

        <something that can fail, but should not halt the entire execution>
        LAST_EXIT_CODE=$?
        echo -n $LAST_EXIT_CODE > /tekton/results/last-exit-code

Then I can reference the task's exit code in the results from a subsequent pipeline condition.

This of course makes the failing task still look like it's succeeding because the echo exit code is zero... hacky but it works.

See example here for usage of this pattern: https://github.com/bitsofinfo/cicdstatemgr/tree/master/examples/tekton/pipelines

tekton-robot · 2020-08-14T19:46:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T19:46:14Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-08-14T19:46:15Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vdemeester · 2020-08-17T09:15:58Z

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

tekton-robot · 2020-08-17T09:16:00Z

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

afrittoli · 2020-11-02T10:48:49Z

@skaegi I believe this would partly help with your use case of failing tests that should not fail a pipeline :)

pritidesai · 2020-12-04T19:13:59Z

/assign

I will start writing TEP for @skaegi's use case.

bobcatfish · 2021-01-11T17:07:54Z

I'm wondering how this relates to Add a field to Step that allows it to ignore failed prior Steps *within the same Task* #1559 ? (if it does? they feel like similar features to me?)
@afrittoli is it possible to add more detailed use cases? (maybe you are already doing this in the TEP) - as usual I feel skeptical about new features, particularly when they are changing the meaning of failure - but a few use cases can definitely bring me around pretty easily!

afrittoli · 2021-01-11T20:42:58Z

I'm wondering how this relates to Add a field to Step that allows it to ignore failed prior Steps *within the same Task* #1559 ? (if it does? they feel like similar features to me?)

It seems related, even though #1559 puts control on the step after the one that might fail.
I filed this issue to reply to a specific request, and I missed #1559 when doing so, sorry about that.

@afrittoli is it possible to add more detailed use cases? (maybe you are already doing this in the TEP) - as usual I feel skeptical about new features, particularly when they are changing the meaning of failure - but a few use cases can definitely bring me around pretty easily!

The use case is being able to have a task in your pipeline that generates useful information (often by running some kind of test / information) which should not prevent a successful execution of the pipeline. Exposing the original exit code allows tools to visually highlight the fact that an error code was captured.

The only way to achieve something similar today is to use a script that wraps the execution of the tool and captures the original exit code in a result. The limitations of these approach are:

it requires wrapping off-the-shelve images with a script
there is no standard way to expose the original exit code, which makes it impossible for the tools (cli, dashboard) to build any functionality on top

This change has a very limited impact on the code base (see PoC) and rest of Tekton functionality; personally I feel it would be an easy win which would make a few users happy. It would be released in alpha state, so nothing would prevent us from taking a different approach in the future if we see this can be achieved better as part of a different feature.

bobcatfish · 2021-01-11T20:51:54Z

Exposing the original exit code allows tools to visually highlight the fact that an error code was captured.

I'm guessing these tools would probably want the logs too, right? I'm not sure the exit code alone is enough information.

afrittoli · 2021-01-11T20:52:59Z

This change could be used also to achieve a series of steps like: running tests, post-process and upload logs and test results, all as part of the same task. In cases where a test failure should fail the pipeline, it would require the last step to exit with the exit code that was captured from the test-running step.

This specific use case could probably be implemented by introducing pre/post steps, or finally at task level, or by allowing a pipeline to be executed as a task (i.e. on a single node, with no need for extra I/O) thus leveraging the pipeline workflow capabilities in a task. But this will require more time and effort to design and implement.

afrittoli · 2021-01-11T20:55:26Z

Exposing the original exit code allows tools to visually highlight the fact that an error code was captured.

I'm guessing these tools would probably want the logs too, right? I'm not sure the exit code alone is enough information.

Yeah, sure, but logs would still be available the same way they are today? The exit code should be enough info to make a step or task as "yellow".

bobcatfish · 2021-01-11T21:02:13Z

The exit code should be enough info to make a step or task as "yellow".

Would it be a situation like certain exit codes are yellow and some are red?

I'm also wondering: isn't this information available in the container statuses in a taskrun already, if a system really wanted to use it?

(I might be able to understand better if you could explain the use case in more detail)

skaegi · 2021-01-12T00:10:21Z

For exit codes I think we should stick with just...

exit code == 0 (success)
exit code != 0 (failure)

A warn or yellow state is something that can get layered later by somehow tagging the step or even by searching the logs for [WARN] or similar. It might be nice to have smarts to mark a step as being "yellow" but there still are lots of cases where a step's success is not critical to the success of the task.

Some real world examples where you don't "necessarily" want to fail a build...

Linting
Localization checks
Unit tests for experimental features typically behind a feature flag
"clean" operations where it's possible there's nothing to clean-up
"create" operations where it's possible the entity is already created

--
(Slightly off topic) Gnu Make has this support and used "ignore-errors", that might be a good choice for syntax ... https://www.gnu.org/software/make/manual/html_node/Errors.html

vdemeester · 2021-01-12T17:30:44Z

So I am a bit torn about this feature (wasn't sure to comment here or on the TEP).

I tend to think, as it can definitely be done using the feature set (aka wrapping the binary/command/process run in a step and write in a result), I would advocate towards not adding it. That said, as @afrittoli, this would mean, to display things like a "yellow" task (similar to a yellow build/job in Jenkins), external tooling (dashboard, cli, …) would need to rely on some conventions rather than a "contract" from the API.

bobcatfish · 2021-01-12T21:34:31Z

I agree with @vdemeester - im gonna try to concentrate my feedback on the TEP PR (tektoncd/community#302) - I think there are a few features we're discussing simultaneously here (maybe? e.g. allowing a step to fail, exposing exit codes), I'm hoping we can narrow it down a bit in the TEP.

skaegi · 2021-01-25T17:16:42Z

/assign

tekton-robot · 2021-04-25T17:38:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

pritidesai · 2021-04-26T04:37:08Z

/remove-lifecycle stale
Problem statement (TEP-0040) is merged, working on the proposal.

vdemeester · 2021-04-26T08:22:15Z

/lifecycle frozen

bobcatfish · 2021-09-10T16:11:20Z

oooo I think maybe this can be closed now that @pritidesai has implemented onError?

pritidesai · 2021-09-10T16:26:56Z

yes please 🙏 , this can be closed now since we have onError in a step now.
Implemented in #4106 ✅

tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 10, 2020

tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 14, 2020

tekton-robot closed this as completed Aug 14, 2020

tekton-robot reopened this Aug 17, 2020

tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020

vdemeester mentioned this issue Nov 20, 2020

Towards v1 API #3548

Closed

17 tasks

tekton-robot assigned pritidesai Dec 4, 2020

This was referenced Jan 12, 2021

TEP-0040 - ignore step error tektoncd/community#302

Merged

Use the status of step as a parameter in the next step #3220

Closed

tekton-robot assigned skaegi Jan 25, 2021

ghost mentioned this issue Feb 26, 2021

Add a field to Step that allows it to ignore failed prior Steps *within the same Task* #1559

Closed

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2021

tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2021

tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 26, 2021

pritidesai closed this as completed Sep 10, 2021

lbernick added this to Pipelines V1 Jan 11, 2022

lbernick moved this to Done in Pipelines V1 Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal] Capturing step exit codes in Tasks #2800

[proposal] Capturing step exit codes in Tasks #2800

afrittoli commented Jun 10, 2020

afrittoli commented Jun 10, 2020

bitsofinfo commented Jun 10, 2020 •

edited

Loading

jlpettersson commented Jun 10, 2020

afrittoli commented Jun 10, 2020

jlpettersson commented Jun 10, 2020

pritidesai commented Jun 11, 2020

afrittoli commented Jun 11, 2020

bitsofinfo commented Jun 12, 2020 •

edited

Loading

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

vdemeester commented Aug 17, 2020

tekton-robot commented Aug 17, 2020

afrittoli commented Nov 2, 2020

pritidesai commented Dec 4, 2020

bobcatfish commented Jan 11, 2021

afrittoli commented Jan 11, 2021

bobcatfish commented Jan 11, 2021

afrittoli commented Jan 11, 2021

afrittoli commented Jan 11, 2021

bobcatfish commented Jan 11, 2021

skaegi commented Jan 12, 2021 •

edited

Loading

vdemeester commented Jan 12, 2021

bobcatfish commented Jan 12, 2021

skaegi commented Jan 25, 2021

tekton-robot commented Apr 25, 2021

pritidesai commented Apr 26, 2021

vdemeester commented Apr 26, 2021

bobcatfish commented Sep 10, 2021

pritidesai commented Sep 10, 2021

[proposal] Capturing step exit codes in Tasks #2800

[proposal] Capturing step exit codes in Tasks #2800

Comments

afrittoli commented Jun 10, 2020

Expected Behavior

Actual Behavior

Proposed Solution

afrittoli commented Jun 10, 2020

bitsofinfo commented Jun 10, 2020 • edited Loading

jlpettersson commented Jun 10, 2020

afrittoli commented Jun 10, 2020

jlpettersson commented Jun 10, 2020

pritidesai commented Jun 11, 2020

afrittoli commented Jun 11, 2020

bitsofinfo commented Jun 12, 2020 • edited Loading

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

tekton-robot commented Aug 14, 2020

vdemeester commented Aug 17, 2020

tekton-robot commented Aug 17, 2020

afrittoli commented Nov 2, 2020

pritidesai commented Dec 4, 2020

bobcatfish commented Jan 11, 2021

afrittoli commented Jan 11, 2021

bobcatfish commented Jan 11, 2021

afrittoli commented Jan 11, 2021

afrittoli commented Jan 11, 2021

bobcatfish commented Jan 11, 2021

skaegi commented Jan 12, 2021 • edited Loading

vdemeester commented Jan 12, 2021

bobcatfish commented Jan 12, 2021

skaegi commented Jan 25, 2021

tekton-robot commented Apr 25, 2021

pritidesai commented Apr 26, 2021

vdemeester commented Apr 26, 2021

bobcatfish commented Sep 10, 2021

pritidesai commented Sep 10, 2021

bitsofinfo commented Jun 10, 2020 •

edited

Loading

bitsofinfo commented Jun 12, 2020 •

edited

Loading

skaegi commented Jan 12, 2021 •

edited

Loading