-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal] Capturing step exit codes in Tasks #2800
Comments
Thank you! (for reference) https://tektoncd.slack.com/archives/CK3HBG7CM/p1591741813404000 |
Indeed an interesting use case! I feel that |
@jlpettersson you can use finally and/or cloud events to achieve that, but you cannot achieve branching within the pipeline that way. The only way today to react to a failed task today is outside the pipeline. A task in finally could trigger another pipeline or a cloud event could also trigger another pipeline. |
That is a good point. A solution here, could be allowedToFail as I proposed in #2635 |
@afrittoli I can not think of any use case at this moment, but would be beneficial to pipeline level exception or onFailure |
The use case that triggered this is being able to use off-the-shelves images and control the exit code, so that the pipeline can be branched based on the outcome of the task. |
Yes I'm not interested in having to create/maintain another thing outside of tekton for capturing/monitoring for events and then making additional calls against tekton to react to those events. I just need a simple way to get access to a task's exit code. Right now I'm hacking this with something like the below (only for off the shelf images that I can wrap). I obviously can't do this with an off the shelf image that I am not wrapping in my own script task etc.
Then I can reference the task's exit code in the results from a subsequent pipeline condition. This of course makes the failing task still look like it's succeeding because the See example here for usage of this pattern: https://github.com/bitsofinfo/cicdstatemgr/tree/master/examples/tekton/pipelines |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
@vdemeester: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@skaegi I believe this would partly help with your use case of failing tests that should not fail a pipeline :) |
/assign I will start writing TEP for @skaegi's use case. |
|
It seems related, even though #1559 puts control on the step after the one that might fail.
The use case is being able to have a task in your pipeline that generates useful information (often by running some kind of test / information) which should not prevent a successful execution of the pipeline. Exposing the original exit code allows tools to visually highlight the fact that an error code was captured. The only way to achieve something similar today is to use a script that wraps the execution of the tool and captures the original exit code in a result. The limitations of these approach are:
This change has a very limited impact on the code base (see PoC) and rest of Tekton functionality; personally I feel it would be an easy win which would make a few users happy. It would be released in alpha state, so nothing would prevent us from taking a different approach in the future if we see this can be achieved better as part of a different feature. |
I'm guessing these tools would probably want the logs too, right? I'm not sure the exit code alone is enough information. |
This change could be used also to achieve a series of steps like: running tests, post-process and upload logs and test results, all as part of the same task. In cases where a test failure should fail the pipeline, it would require the last step to exit with the exit code that was captured from the test-running step. This specific use case could probably be implemented by introducing pre/post steps, or finally at task level, or by allowing a pipeline to be executed as a task (i.e. on a single node, with no need for extra I/O) thus leveraging the pipeline workflow capabilities in a task. But this will require more time and effort to design and implement. |
Yeah, sure, but logs would still be available the same way they are today? The exit code should be enough info to make a step or task as "yellow". |
Would it be a situation like certain exit codes are yellow and some are red? I'm also wondering: isn't this information available in the container statuses in a taskrun already, if a system really wanted to use it? (I might be able to understand better if you could explain the use case in more detail) |
For exit codes I think we should stick with just...
A warn or yellow state is something that can get layered later by somehow tagging the step or even by searching the logs for [WARN] or similar. It might be nice to have smarts to mark a step as being "yellow" but there still are lots of cases where a step's success is not critical to the success of the task. Some real world examples where you don't "necessarily" want to fail a build...
-- |
So I am a bit torn about this feature (wasn't sure to comment here or on the TEP). I tend to think, as it can definitely be done using the feature set (aka wrapping the binary/command/process run in a step and write in a result), I would advocate towards not adding it. That said, as @afrittoli, this would mean, to display things like a "yellow" task (similar to a yellow build/job in Jenkins), external tooling (dashboard, cli, …) would need to rely on some conventions rather than a "contract" from the API. |
I agree with @vdemeester - im gonna try to concentrate my feedback on the TEP PR (tektoncd/community#302) - I think there are a few features we're discussing simultaneously here (maybe? e.g. allowing a step to fail, exposing exit codes), I'm hoping we can narrow it down a bit in the TEP. |
/assign |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale |
/lifecycle frozen |
oooo I think maybe this can be closed now that @pritidesai has implemented onError? |
yes please 🙏 , this can be closed now since we have |
Expected Behavior
It should be possible to use easily use off-the-shelves (OTS) images as steps in Tekton tasks.
Actual Behavior
When using OTS images, a step / task author does not control the error case behaviour of the image. It may be desirable for the pipeline to continue even if there was a failure.
One way to achieve this could be to wrap the original image into a custom image, but that does not scale well if the number of images grows.
An alternative way would be to allow a Task to fail without having the whole pipeline failing. We discussed this a lot in the past. Finally tasks will help a bit, cloud events also provide a way around this, however neither solution provide a solution that feels natural for the problem.
Proposed Solution
We could optionally capture the exit code from steps and expose it as test results.
This could be achieved using entrypoint image, which already wraps around all steps images.
This feature could be enabled by setting a flag in the task spec (I would not consider this decision a runtime one).
The benefits would be:
/kind feature
The text was updated successfully, but these errors were encountered: