diff --git a/teps/0046-finallytask-execution-post-timeout.md b/teps/0046-finallytask-execution-post-timeout.md new file mode 100644 index 000000000..9b1e0ebd7 --- /dev/null +++ b/teps/0046-finallytask-execution-post-timeout.md @@ -0,0 +1,278 @@ +--- +status: proposed +title: Finally tasks execution post pipelinerun timeout +creation-date: '2021-01-26' +last-updated: '2021-04-02' +authors: +- '@souleb' +--- + +# TEP-0046: Finally tasks execution post pipelinerun timeout +--- + + +- [# TEP-0046: Finally tasks execution post pipelinerun timeout](#-tep-0046-finally-tasks-execution-post-pipelinerun-timeout) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) +- [Proposal](#proposal) +- [Test Plan](#test-plan) +- [Alternatives](#alternatives) + - [Finally block level timeout flag](#finally-block-level-timeout-flag) + - [Pipelinerun timeout is inclusive of the finally tasks timeout](#pipelinerun-timeout-is-inclusive-of-the-finally-tasks-timeout) + - [Finally Timeout flag at Pipelinerun Spec](#finally-timeout-flag-at-pipelinerun-spec) +- [Follow-on work](#follow-on-work) + + +## Summary + + + +This TEP adresses issue [`#2989`](https://github.com/tektoncd/pipeline/issues/2989). + +The proposal is to enable finally tasks to execute when the non-finally tasks have timed out. + +## Motivation + + + +The finally task [`design document`](https://docs.google.com/document/d/1lxpYQHppiWOxsn4arqbwAFDo4T0-LCqpNa6p-TJdHrw/edit#heading=h.w51ed6k2inef) list the following use cases : + +- Cleanup cluster resources after finishing (with success/failure) integration tests (Dogfooding Scenario) +- Update Pull Request with what happened overall in the pipeline (pipeline level) +- Report Test Results at the end of the test pipeline (Notifications Scenario) + +Unfortunately if a pipeline's execution reaches the defined timeout value before executing finally tasks, the pipelinerun stop and reports a failed status without executing the finally tasks. + +Here is an example pipeline run with a finally task: + +```yaml +apiVersion: tekton.dev/v1beta1 +kind: PipelineRun +metadata: + name: hello-world-pipeline-run-with-timeout +spec: + timeout: "0h0m60s" + pipelineSpec: + tasks: + - name: task1 + timeout: "0h0m30s" + taskSpec: + steps: + - name: hello + image: ubuntu + script: | + echo "Hello World!" + sleep 10 + finally: + - name: task2 + params: + - name: echoStatus + value: "$(tasks.task1.status)" + taskSpec: + params: + - name: echoStatus + steps: + - name: verify-status + image: ubuntu + script: | + if [ $(params.echoStatus) == "Succeeded" ] + then + echo " Hello World echoed successfully" + fi +``` + +The finally task runs after the task completion and both execute normally. + + +| NAME | TASK NAME | STARTED | DURATION | STATUS | +|----------------------------------------------------------------|------------------|----------------|------------|------------------------| +| ∙ hello-world-pipeline-run-with-timeout-task2-kxtc6 | task2 | 19 seconds ago | 7 seconds | Succeeded | +| ∙ hello-world-pipeline-run-with-timeout-task1-bqmzz | task1 | 35 seconds ago | 16 seconds | Succeeded | +| | | | | | + + +Now if we change the task script in order to have it exceed its timeout (30s), we get the following status report: + +| NAME | TASK NAME | STARTED | DURATION | STATUS | +|-----------------------------------------------------|-----------|----------------|------------|------------------------| +| ∙ hello-world-pipeline-run-with-timeout-task2-44tsb | task2 | 8 seconds ago | 5 seconds | Succeeded | +| ∙ hello-world-pipeline-run-with-timeout-task1-wgcq7 | task1 | 38 seconds ago | 30 seconds | Failed(TaskRunTimeout) | +| | | | | | + + +The finally task still executes after the task failure. + + +Finally if we reduce the pipelinerun timeout to 10s, our status report shows: + +`PipelineRun "hello-world-pipeline-run-with-timeout" failed to finish within "10s" (TaskRun "hello-world-pipeline-run-with-timeout-task1-q7fw4" failed to finish within "30s")` + +| NAME | TASK NAME | STARTED | DURATION | STATUS | +|-----------------------------------------------------|-----------|---------------|------------|------------------------| +| ∙ hello-world-pipeline-run-with-timeout-task1-q7fw4 | task1 | 2 minutes ago | 30 seconds | Failed(TaskRunTimeout) | +| | | | | | +| | | | | | + + +The pipelinerun timeout take precedence over the task timeout. After 10s the task fails... And the finally task does not get the chance to execute. + + +For this reason, it is currently not possible to rely on Finally tasks for any of the aforementioned use cases. + +### Goals + + + +Enable the uses cases : + +- Cleanup cluster resources after finishing (with success/failure) integration tests (Dogfooding Scenario) +- Update Pull Request with what happened overall in the pipeline (pipeline level) +- Report Test Results at the end of the test pipeline (Notifications Scenario) + +When a pipelinerun times out. + +## Proposal + + + +Enable finally task to run when a pipeline times out. + +Add a new flag `tasksTimeouts` which will define a timeout for the dag tasks. The finally tasks timeout will be `timeout - tasksTimeout` with `timeout >= tasksTimeout` and ```timeout```being the current timeout flag. + +When `tasksTimeout` is not defined, `timeout` is used for the tasks timeout (the behavior is unchanged). + +```yaml +spec: + tasksTimeout: "1h0m0s" + timeout: "2h0m0s" + pipelineSpec: + tasks: + - name: tests + taskRef: + Name: integration-test + finally: + - name: cleanup-test + taskRef: + Name: cleanup +``` + + +This will enable users to manage run time behavior, and make sure their finally tasks run as intended by scoping the tasks runtime period. + + +The default ```default-timeout-minutes``` configurable via configmap is kept with the same behavior` + +## Test Plan + + + +- Unit tests +- End-to-end tests +- Examples + + +## Alternatives + + +### Finally block level timeout flag + +Enable finally task to run when a pipeline times out. This implies a behavioral change, as finally tasks will run no matter what. + +Enable pipeline authors to specify a timeout field for finally tasks. In all normal run, that timeout is not needed and finally tasks execute after non-finally tasks. But in case of timed out pipeline, the finally task execution is bounded by the declared timeout. + +```yaml +spec: + tasks: + - name: tests + taskRef: + Name: integration-test + finally: + timeout: "0h0m10s" + - name: cleanup-test + taskRef: + Name: cleanup +``` + +This solution is not backward compatible as the finally tasks are currently defined as a list field in the pipelineRunSpec type. +### Pipelinerun timeout is inclusive of the finally tasks timeout + +We could consider that the pipelinerun timeout is inclusive of the finally tasks timeout. So, during execution, we could stop executing dag tasks at some point to give enough time for finally tasks to execute before timing out the pipelinerun (dag tasks timeout = pipelinerun timeout - finally tasks timeout). + +This solution was deemed confusing. The user could expect the `timeout` to be for the dag tasks entirely. This is reducing the dagtasks runtime and reduces the user possibilitie sto configure it. + + +### Finally Timeout flag at Pipelinerun Spec + +We could add a new flag at the pipelineRun level `finallyTimeout` similar to the timeout flag. If specified, pipelineRun timeout (default is one hour) applies to dag tasks only. The dag tasks will stop executing once it meets the pipelineRun timeout. The finally tasks starts executing at this point and will be executed until meets the timeout specified in finallyTimeout. + + +## Follow-on work + +We believe that this proposal is an improvement, but that we should go further and offer users a way to control timeouts for pipelines, tasks, finally tasks and perhaps steps in the future. + +Another TEP will be created to further detail the following proposal: + + A new flag ```timeout``` which would be a dictionary of a set of timeouts. + +```yaml +spec: + timeout: + pipeline: "0h4m0s" + tasks: "0h1m0s" + finally: "0h3m0s" + pipelineSpec: + tasks: + - name: tests + taskRef: + Name: integration-test + finally: + - name: cleanup-test + taskRef: + Name: cleanup + ... +``` + +Having a dictionary will enable adding more timeout logic in the future. \ No newline at end of file diff --git a/teps/README.md b/teps/README.md index 7137deda4..9ffdf03a5 100644 --- a/teps/README.md +++ b/teps/README.md @@ -173,6 +173,7 @@ This is the complete list of Tekton teps: |[TEP-0042](0042-taskrun-breakpoint-on-failure.md) | taskrun-breakpoint-on-failure | proposed | 2021-03-21 | |[TEP-0044](0044-decouple-task-composition-from-scheduling.md) | Decouple Task Composition from Scheduling | proposed | 2021-03-10 | |[TEP-0045](0045-whenexpressions-in-finally-tasks.md) | WhenExpressions in Finally Tasks | implementable | 2021-01-28 | +|[TEP-0046](0046-finallytask-execution-post-timeout.md) | Finally tasks execution post pipelinerun timeout | proposed | 2021-04-02 | |[TEP-0047](0047-pipeline-task-display-name.md) | Pipeline Task Display Name | proposed | 2021-02-10 | |[TEP-0049](0049-aggregate-status-of-dag-tasks.md) | Aggregate Status of DAG Tasks | proposed | 2021-03-25 | |[TEP-0050](0050-ignore-task-failures.md) | Ignore Task Failures | proposed | 2021-02-19 |