-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Partial Pipeline execution #50
Comments
@BenTheElder, @cjwagner and some other Prow folks indicated that this would be a very desirable feature for them - particularly in a case where your pipeline has 2 phases, one that builds a bunch of stuff and then subsequent phases that use that built stuff, it'd be handy to be able to resume after the point where the stuff is built |
Just to add (or rather try to help clarify) a use case here. This is a very useful feature for long-running pipelines that probably fall outside the strict CI scope. Most pipelines I have in mind are essentially workflow automation pipelines and have external dependencies such as 3rd party systems that need to be up / reachable. When such a pipeline fails at step 7/11, you really don't want to rerun the whole thing. The Jenkins Restart from stage feature is ideal for the pipeline to essentially pick up where it left off. The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined. Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines, significantly widening the scope of problems tekton pipelines can be used to solve. (Plus anyone who relies on this on Jenkins will find it easier to migrate to Tekton). As a final point, I would like to clarify that in my use case, support for restarting with "different PipelineParams" (as mentioned in the description) is not a necessary feature. I am sure people have use cases for that too, but I personally like the approach Jenkins takes here: you can either restart the whole pipeline with different params (new pipeline run), or restart from stage, when it failed, always with the same params (retry failed pipeline run, starting from failed task). Hope this helps. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
@vdemeester: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This one is on our roadmap: https://github.com/tektoncd/pipeline/blob/master/roadmap.md /lifecycle frozen |
I think where this falls apart is lacking copy-on-write workspaces. Tasks can easily introduce changes to workspaces which makes execution non-idempotent. If instead workspaces were always inputs xor outputs the input to any given stage would still exist when the retry is attempted. |
input/output workspace layering cannot be done efficiently without copy-on-write workspaces. It could be done with an NFS server and overlay filesystems, or there appear to be some COW volumes but I do not know enough about k8s to say if it's usable for this. |
I think there will be use cases where folks want to make these kinds of non-idempotent changes to workspaces, so even with COW I'm not sure we could fully solve this problem? If I'm wrong it would probably help if you could explain with an example. Also: it sounds like COW workspaces would be an interesting feature in general - if you feel motivated it'd be great to have a separate issue to dive into this in detail |
Quick update here: @jerop created a design for #1797 which has some interesting ideas that could be applied to a design for partial execution (design doc). |
TEP-0123 Specifying on-demand-retry in a pipelineTask does not offer solution for this feature. But proposes a feature to allow specifying |
Can Tekton Pipeline provide the functionality to rerun failed tasks in a pipeline? This would be very useful for our scenario, where a complex pipeline fails on the last task, requiring manual intervention to manually rerun the failed task. GitHub Actions has a similar feature. |
The work for this task is to design this feature and present one or more proposals (before implementing).
Expected Behavior
If a pipeline has many tasks and takes a long time to run (e.g. tens of minutes, or even hours), and one Task fails, it might be desirable to be able to pick up execution where the Task failed, with different PipelineParams (e.g. from a different git commit), so you can resume the Pipeline without having to rerun the whole thing.
Some ideas for how to implement this:
It is also worth considering what this could be like via a UI: if one is viewing a Pipeline in a UI, and wants to re-run only a portion of the Pipeline, they probably want the user experience to be as if they were still running the same Pipeline, even if underneath a new Pipeline is created.
Actual Behavior
At the moment, if any Task in a Pipeline fails, your options to rerun the rest of the Pipeline would be:
Additional Info
This originally came up in discussion about #39, in the context of whether or not we'd want to always use the same git commit from a source for all Tasks in a Pipeline, or if we wanted sometimes for a Task to always use HEAD. This would allow a user to change a repo, by updating HEAD, between Task executions.
The feature of partial pipeline execution could be an alternative to this.
The text was updated successfully, but these errors were encountered: