-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0046: Finally tasks execution post pipelinerun timeout #326
Conversation
Running Finally on pipeline timeout makes sense to me. Only question to my mind is whether we need a variable or status (or something similar) that the Finally task can check to see if a Pipeline timeout occurred. e.g. For reporting purposes. /lgtm |
Yeah I think that is logical next step. |
we could be little more specific with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I think it's a very good idea.
There are some details to be clarified in terms of backward compatibility and introducing this feature as alpha first.
nitty-gritty. | ||
--> | ||
|
||
Enable finally task to run when a pipeline times out. This implies a behavioral change, as finally tasks will run no matter what. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to put this change of behaviour behind a feature flag, which will be default preserve the current behaviour. I think a possible solution could be as follows:
- when disabled (default), the new API field is not accepted via validation. The timeout is considered as an overall pipeline timeout. If it happens before finally, finally does not run, if it happens during finally, finally is interrupted and fails.
- when enabled, the new API field is accepted via validation. The timeout is not applied to the pipeline excluding finally.
Alternatively, we could have two flags, one which controls the API change and one that controls how timeout is handled, to provide backward compatibility. In this scenario:
- if the API change is enabled (default to disabled), a finally timeout can be specified, but the overall behaviour depends by the other flag
- if the timeout backward compatibility flag is enabled (default to true), the timeout specified today applies to the overall pipeline. If a finally timeout is specified, the main pipeline will timeout after (overall timeout - finally timeout)
This second approach might make sense if we have a single flag to control all alpha API flag. One might want to enable alpha API parts but still maintain the current behaviour for pipeline timeouts.
@bobcatfish @vdemeester ^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, because this is a change in behaviour, we need to to preverve the current behaviour for a given amount of time.
What we are adding here is:
- A new field for timeout in
finally
task (with a default that could be the same default as forTaskRun
) - A change in behaviour so that the
pipelinerun
timeout doesn't apply tofinally
task.
I feel both can be seen as separate, as, we could have the 2nd without the first (just relying on the default taskrun timeout for finally
task). At the same time, the 1st could be implemented independently of the behaviour change. Hence I am more on a flag on the controller to change the behavior, and add the timeout field under a general "alpha" feature flag — which is the 2nd approach @afrittoli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand here. Relying on the taskrun default timeout is in itself a behavioral change. So in based on the @afrittoli proposition 2 flags are still needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@souleb maybe the missing detail is that in TEP-0033 we are discussing adding an "alpha" feature flag that will need to be enabled to use all new alpha features, including specifying a timeout for finally tasks
i agree with @vdemeester and @afrittoli that it there are 2 separate features here and we can approach them differently
@pritidesai: GitHub didn't allow me to assign the following users: chhsia0. Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ab272bf
to
01a88c8
Compare
In terms of backward compatibility, as the new field would be optional, the only thing I can think of is the change in the behavior. Like After the implementation of this TEP, the finally tasks always run. Do you think of anything else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment on how to do this change in a backward compatible way, but other than that, looks good to me.
nitty-gritty. | ||
--> | ||
|
||
Enable finally task to run when a pipeline times out. This implies a behavioral change, as finally tasks will run no matter what. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, because this is a change in behaviour, we need to to preverve the current behaviour for a given amount of time.
What we are adding here is:
- A new field for timeout in
finally
task (with a default that could be the same default as forTaskRun
) - A change in behaviour so that the
pipelinerun
timeout doesn't apply tofinally
task.
I feel both can be seen as separate, as, we could have the 2nd without the first (just relying on the default taskrun timeout for finally
task). At the same time, the 1st could be implemented independently of the behaviour change. Hence I am more on a flag on the controller to change the behavior, and add the timeout field under a general "alpha" feature flag — which is the 2nd approach @afrittoli.
/assign pritidesai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the work on this @souleb 🙏🏾
adding a consideration brought up at the api wg meeting today
the possible solutions so far are modifying the meaning of pipelinerun timeout to be dag tasks timeout and it may be confusing that when pipelinerun has reached its timeout, the finally tasks are executed afterwards. an alternative we could consider is that pipelinerun timeout is inclusive of the finally tasks timeout. so, during execution, we could stop executing dag tasks at some point to give enough time for finally tasks to execute before timing out the pipelinerun (dag tasks timeout = pipelinerun timeout - finally tasks timeout). this could also be as confusing, but it may be worthwhile to consider it and add it to the alternatives section
Based on the discussion with @sbwsg, @vdemeester, @afrittoli, @jerop, and @skaegi in the API WG 0n 02/08, propose two separate features:
Option A: This can be a new flag at the pipelineRun level Option B:
Option C: Deprecate I think the simplest would be option C. @gpaul This proposal is the addressing the feature you requested, please provide your feedback, if possible 🙏 |
5c9794a
to
9db32a8
Compare
Like we discussed in the API WG today, it does look like there is inconsistency but its implicit i.e. |
Yeah I am on board with this idea. Both parties (author and runner) can justify controlling timeouts I think, since there's usage-specific context on both sides. |
The last api wg was too late for me 😞 I'd really like to move forward with this TEP. I think that everyone was onboard after the last demo... So what I propose is to go on with the |
I think that's a good idea - it gives us a chance to weigh the pros and cons of the timeouts revamp in isolation without blocking this feature. /approve What do others think? |
agreed, let's focus on this small change for now and can revisit the others later 👍🏾 /approve (needs a non-googler to |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jerop, sbwsg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@souleb please squash the commits (we prefer one commit per PR) and resolve the tep linting issue |
e954b59
to
b5a32de
Compare
|
||
```yaml | ||
spec: | ||
tasksTimeout: "1h0m0s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be deprecated.
spec:
timeout:
pipeline: "0h4m0s"
tasks: "0h1m0s"
finally: "0h3m0s"
The tasks
key in timeout
would replace it, with same behavior. The current timeout
would be replaced as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the current proposal because it's small and it only adds functionality, but I don't like the idea of adding a new field if we plan to deprecate it already. Is there any reason for not using the final syntax right away?
spec:
timeout: "0h4m0s" # Existing field, current behaviour
timeouts:
tasks: "0h1m0s" # New field
Thanks a bunch @souleb for all your hard work. We appreciate your patience 🙏 Here is the list of items I have collected and re-written the proposal: ToDo:
ProposalIntroduce a new section
This new Pipeline TimeoutThe users have an ability to specify the timeout of the entire pipeline. The value specified in the following section will overwrite the default pipeline timeout. The default pipeline timeout is configurable via ConfigMap default-timeout-minutes. This specification is equivalent to the traditional pipeline level timeout specified in the
Tasks TimeoutThe users have an ability to specify the timeout for the
Finally TimeoutThe users have an ability to specify the timeout for the
Combination of TimeoutsThe users have an ability to specify the timeout of the entire pipeline and restrict some portion of it to either Combination 1: Set the timeout for the entire
Combination 2: Set the timeout for the entire
Some of the validations being done as part of the creation of
This new section This kind of proposal allows us to smoothly transition away from the old usage to this new section.
|
05d1be4
to
7ab7aa5
Compare
@pritidesai all done. Thanks! |
thanks @souleb, looks good to me, please fix the linting failure:
|
tasksTimeout defines a timeout for the dag tasks. The finally tasks timeout willbe timeout - tasksTimeout with timeout >= tasksTimeout. When tasksTimeout is not defined, the behavior is unchanged. This will enable users to manage run time behavior and make sure their finally tasks run as intended by scoping the tasks runtime period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the updates. @pritidesai I think all your comments have been addressed?
Just a few nits - but nothing blocking.
/lgtm
<!-- | ||
This section is incredibly important for producing high quality user-focused | ||
documentation such as release notes or a development roadmap. It should be | ||
possible to collect this information before implementation begins in order to | ||
avoid requiring implementors to split their attention between writing release | ||
notes and implementing the feature itself. | ||
A good summary is probably at least a paragraph in length. | ||
Both in this section and below, follow the guidelines of the [documentation | ||
style guide]. In particular, wrap lines to a reasonable length, to make it | ||
easier for reviewers to cite specific portions, and to minimize diff churn on | ||
updates. | ||
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: could you cleanup the comments (can be a follow-up)
|
||
Some of the validations being done as part of the creation of `pipelineRun` CRD: | ||
1. Users can either specify the traditional timeout field `spec.timeout` or this new section `spec.timeouts`. Specifying both fields are restricted. | ||
2. With this new section, the amount of timeouts in `tasks` and `finally` must be less than the pipeline timeout. If both specified, the sum of the `tasks` and the `finally` must match the pipeline timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I'm not fully convinced we need to enforce an exact match.
I understand it might be useful for users to be notified if things don't match up, but I wonder if in future timeouts might be set in different resources, and ensuring they match up might be difficult.
Anyways, something we can discuss on the PR, not blocking here.
```yaml | ||
kind: PipelineRun | ||
spec: | ||
timeouts: | ||
pipeline: "0h4m0s" | ||
tasks: "0h1m0s" | ||
``` | ||
|
||
Combination 2: Set the timeout for the entire `pipeline` and reserve a portion of it for `finally`. | ||
|
||
```yaml | ||
kind: PipelineRun | ||
spec: | ||
timeouts: | ||
pipeline: "0h4m0s" | ||
finally: "0h3m0s" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the following is valid too?
kind: PipelineRun
spec:
timeouts:
tasks: "0h1m0s"
finally: "0h3m0s"
|
||
### Finally Timeout flag at Pipelinerun Spec | ||
|
||
We could add a new flag at the pipelineRun level `finallyTimeout` similar to the timeout flag. If specified, pipelineRun timeout (default is one hour) applies to dag tasks only. The dag tasks will stop executing once it meets the pipelineRun timeout. The finally tasks starts executing at this point and will be executed until meets the timeout specified in finallyTimeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I think the cons of this approach is that the timeout
pipeline has different meaning depending on whether finallyTimeout
is specified or not, which may be confusing to users.
It as the plus of not requiring us to deprecate the existing field.
Given that the related issue (tektoncd/pipeline#2989) is closed, should this be marked as implemented? |
Proposal to enable finally tasks to execute when a pipelinerun has reached timeout.
Add a new flag
tasksTimeouts
which will define a timeout for the dag tasks. The finally tasks timeout will betimeout - tasksTimeout
withtimeout >= tasksTimeout
andtimeout
being the current timeout flag.When
tasksTimeout
is not defined,timeout
is used for the tasks timeout (the behavior is unchanged).This will enable users to manage run time behavior and make sure their finally tasks run as intended by scoping the tasks runtime period.
/kind tep
cc @jerop @pritidesai