-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP ~ Automatically manage concurrent PipelineRuns #716
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,368 @@ | ||
--- | ||
status: proposed | ||
title: 'Manage PipelineRun concurrency' | ||
creation-date: '2022-05-26' | ||
last-updated: '2022-05-26' | ||
authors: | ||
- '@williamlfish' | ||
- '@vdemeester' | ||
--- | ||
|
||
# TEP-0117: Manage PipelineRun concurrency | ||
|
||
<!-- | ||
**Note:** Please remove comment blocks for sections you've filled in. | ||
When your TEP is complete, all of these comment blocks should be removed. | ||
|
||
To get started with this template: | ||
|
||
- [ ] **Fill out this file as best you can.** | ||
At minimum, you should fill in the "Summary", and "Motivation" sections. | ||
These should be easy if you've preflighted the idea of the TEP with the | ||
appropriate Working Group. | ||
- [ ] **Create a PR for this TEP.** | ||
Assign it to people in the Working Group that are sponsoring this process. | ||
- [ ] **Merge early and iterate.** | ||
Avoid getting hung up on specific details and instead aim to get the goals of | ||
the TEP clarified and merged quickly. The best way to do this is to just | ||
start with the high-level sections and fill out details incrementally in | ||
subsequent PRs. | ||
|
||
Just because a TEP is merged does not mean it is complete or approved. Any TEP | ||
marked as a `proposed` is a working document and subject to change. You can | ||
denote sections that are under active debate as follows: | ||
|
||
``` | ||
<<[UNRESOLVED optional short context or usernames ]>> | ||
Stuff that is being argued. | ||
<<[/UNRESOLVED]>> | ||
``` | ||
|
||
When editing TEPS, aim for tightly-scoped, single-topic PRs to keep discussions | ||
focused. If you disagree with what is already in a document, open a new PR | ||
with suggested changes. | ||
|
||
If there are new details that belong in the TEP, edit the TEP. Once a | ||
feature has become "implemented", major changes should get new TEPs. | ||
|
||
The canonical place for the latest set of instructions (and the likely source | ||
of this file) is [here](/teps/tools/tep-template.md.template). | ||
|
||
--> | ||
|
||
<!-- | ||
This is the title of your TEP. Keep it short, simple, and descriptive. A good | ||
title can help communicate what the TEP is and should be considered as part of | ||
any review. | ||
--> | ||
|
||
<!-- | ||
A table of contents is helpful for quickly jumping to sections of a TEP and for | ||
highlighting any additional information provided beyond the standard TEP | ||
template. | ||
|
||
Ensure the TOC is wrapped with | ||
<code><!-- toc --&rt;<!-- /toc --&rt;</code> | ||
tags, and then generate with `hack/update-toc.sh`. | ||
--> | ||
|
||
<!-- toc --> | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Use Cases](#use-cases) | ||
- [Requirements](#requirements) | ||
- [Proposal](#proposal) | ||
- [Notes and Caveats](#notes-and-caveats) | ||
- [Design Details](#design-details) | ||
- [Design Evaluation](#design-evaluation) | ||
- [Reusability](#reusability) | ||
- [Simplicity](#simplicity) | ||
- [Flexibility](#flexibility) | ||
- [User Experience](#user-experience) | ||
- [Performance](#performance) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
- [Implementation Plan](#implementation-plan) | ||
- [Test Plan](#test-plan) | ||
- [Infrastructure Needed](#infrastructure-needed) | ||
- [Upgrade and Migration Strategy](#upgrade-and-migration-strategy) | ||
- [Implementation Pull Requests](#implementation-pull-requests) | ||
- [References](#references) | ||
<!-- /toc --> | ||
|
||
## Summary | ||
Manage pipeline run concurrency via configurable strategies. | ||
|
||
## Motivation | ||
There are situations where pipelines can overlap, and they should not. For instance, a pr is created and that triggers ci to so a PipelineRun is created, a developer notices a small typo, pushed again, and a new PipelineRun is triggered, the first PipelineRun is no longer relevant, but potentially still running. | ||
Another example is managing available shared resources, lets say tekton triggers an external device and there are a set available amount, or simply resource constrained clusters. | ||
|
||
|
||
2 proposed cancel strategies. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just a formatting note, I think it would be helpful to separate out use cases from proposed strategies-- that makes it a bit easier to separate design from motivation and make sure we have a strategy for addressing each use case; up to you though! |
||
- ZeroOverlap: | ||
- Pipelineruns with the same key do not overlap, so if one is running and a new one is scheduled, the former is canceled. | ||
- Configure cancel status ( with finally, stop, cancel ) | ||
- Queues: | ||
- Queues have a max run ( how many can run at the same time ) | ||
- fifo | ||
- Queue max can be over-ridden, but is always the last pipelinerun created. | ||
|
||
|
||
Strategies can be mixed together BUT their concurrency keys cannot be the same ( cancel eachother out ). | ||
|
||
Mixing strategies would allow for better resource management for teams. For instance, 3 teams are working on a mobile application with only 5 test devices so our max queue size is 5. If team 1 works in really fast bursts, | ||
they might clog the queue with small pr changes. Leaving team 2 & 3 only able to get a build in when lucky. With zeroOverlap, and a queue, a single pr with many quick changes will only ever occupy one slot in the queue. | ||
|
||
### Requirements | ||
- PipelineRuns are canceled based on the strategy defined. | ||
- Simple interface set at the PipelineRun manifest. | ||
- New PipelineRun should wait for the canceled one to finish when applicable. | ||
- All PipelineRuns follow fifo patterns. | ||
|
||
## Proposal | ||
Adding a key on the PipelineRun manifest seems like the most appropriate, and simple solution. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if it could make sense to have any of this defined on the Pipeline instead. for example, in the case where you want only one CI run at a time:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah this is interesting, like having the idea of overlap closer to the actual thing you don't want overlapped ( for lack of better phrasing lolol 🙈 )? There is something that feels nice and clean about that, but I wonder if in practice it will be hard for users to remember where one config goes vs another? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah! The reason I suggest this is because the Pipeline author knows whether concurrency is appropriate for the pipeline, and this will allow users to not have to set concurrency controls on every pipelinerun for that pipeline. I'm sure we could explore adding this to the PipelineRun later if there's a clear need. What do you mean by it being hard to remember where one config goes vs another? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All I meant on the topic of remembering where configs might go, if the zeroOverlap is on the pipeline, but the queue config is on the PipelineRun, then there are 2 areas that are managing a similar topic ( concurrency ). That being said, I dont think it is a hard stop or anything, just a thought. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could probably add both options to the Pipeline spec for now (like in the alternative syntax example I put below) and then we could always add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So there is a debate of "is concurrency an authoring or a runtime concern ?" 🙃.
I partly disagree on this. It might be true for some usecase, but I am not sure this would be the majority, at all. My take on this is almost the opposite as @lbernick 👼🏼 🙃. I'd rather start by having this on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking about this a bit more today-- what would we do if two PipelineRuns had the same concurrency key but different concurrency strategies? I think the concurrency strategy has to be defined somewhere other than the PipelineRun-- either on the Pipeline or in a configmap or new CRD. I do see your point though about how different users of a given pipeline might want to configure concurrency differently, which makes me lean more towards cluster-level configuration (like in your original hackmd). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aren't we thinking too much about this though ? Cluster-level configuration is a bit tricky. It works for some cases, mainly, for cluster-admins to limit the number of "resources" consumed by tekton, but teams on their own namespaces will probably need to handle those per namespace or "project" or pipeline. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To detail this a bit more:
In a gist, I really do think, while experimenting, while it's in alpha & all, we can not overthink this, and see with "usage" if it really is a problem or not. |
||
|
||
Short example: | ||
```yaml | ||
concurrency: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. potential alternative syntax:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This alternative syntax does not give the ability to use both a queue and zeroOverlap at the same time right? or am i misunderstanding part of it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't. Imagine there are 5 PipelineRuns running, with max = 5. My understanding would be that if a new PipelineRun was created and the strategy was "queue", it would wait until one finished before starting. If the strategy was "cancel", it would cancel the earliest created PipelineRun and then start. ZeroOverlap is "cancel" with max = 1. I'm not understanding why you might want to use both at the same time; they seem incompatible with each other. Can you describe what you had in mind? |
||
zeroOverlap: | ||
key: "the-key-to-match" | ||
cancelStatus: "Stop" | ||
queue: | ||
key: "a-diff-key-to-match" | ||
maxRun: 5 | ||
``` | ||
Full example from trigger template: | ||
```yaml | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: PipelineRun | ||
metadata: | ||
generateName: tekton-pipelines-pr-ci- | ||
spec: | ||
concurrency: | ||
zeroOverlap: | ||
key: $(tt.params.repo)-pr-$(params.pr-number) | ||
cancelStatus: "Stop" | ||
queue: | ||
key: $(tt.params.repo) | ||
maxRun: 5 | ||
pipelineRef: | ||
name: tekton-pipelines-pr-ci | ||
params: | ||
- name: repo | ||
value: $(tt.params.repo) | ||
- name: pr-number | ||
value: $(tt.params.pr-number) | ||
serviceAccountName: pipeline-sa | ||
workspaces: | ||
- name: ssh-creds | ||
secret: | ||
secretName: ssh-things | ||
``` | ||
In the above example, max 5 pipeline runs can run at the same time for a repo, and only one from a specific pr can overlap. | ||
|
||
Let's imagine an org enables the ability for a higher max run threshold. Updating the pipelinerun manifest to a higher max should be all that is needed. The newly | ||
created max is the now current max for its key, even if its position in the queue does not mean it will be immediately scheduled. So if the max was 2, with 5 pipelineruns created, | ||
2 are running and 3 are pending, a 6th is created with max of 4, now the 2 most recent created pipelineruns that where pending, begin running and 2 are now pending, and the | ||
newest created (with the new max) will still be last in the queue. If the max is dropped again, and more are running then should be, allow the queue to resolve itself by having | ||
pipelineruns that are running finish, but never allowing a pipelinerun to enter a running state until there is space in the queue. | ||
|
||
### Notes and Caveats | ||
|
||
<!-- | ||
(optional) | ||
|
||
Go in to as much detail as necessary here. | ||
- What are the caveats to the proposal? | ||
- What are some important details that didn't come across above? | ||
- What are the core concepts and how do they relate? | ||
--> | ||
|
||
|
||
## Design Details | ||
|
||
<!-- | ||
This section should contain enough information that the specifics of your | ||
change are understandable. This may include API specs (though not always | ||
required) or even code snippets. If there's any ambiguity about HOW your | ||
proposal will be implemented, this is the place to discuss them. | ||
|
||
If it's helpful to include workflow diagrams or any other related images, | ||
add them under "/teps/images/". It's upto the TEP author to choose the name | ||
of the file, but general guidance is to include at least TEP number in the | ||
file name, for example, "/teps/images/NNNN-workflow.jpg". | ||
--> | ||
|
||
|
||
## Design Evaluation | ||
<!-- | ||
How does this proposal affect the api conventions, reusability, simplicity, flexibility | ||
and conformance of Tekton, as described in [design principles](https://github.com/tektoncd/community/blob/master/design-principles.md) | ||
--> | ||
|
||
### Reusability | ||
|
||
<!-- | ||
https://github.com/tektoncd/community/blob/main/design-principles.md#reusability | ||
|
||
- Are there existing features related to the proposed features? Were the existing features reused? | ||
- Is the problem being solved an authoring-time or runtime-concern? Is the proposed feature at the appropriate level | ||
authoring or runtime? | ||
--> | ||
|
||
### Simplicity | ||
|
||
<!-- | ||
https://github.com/tektoncd/community/blob/main/design-principles.md#simplicity | ||
|
||
- How does this proposal affect the user experience? | ||
- What’s the current user experience without the feature and how challenging is it? | ||
- What will be the user experience with the feature? How would it have changed? | ||
- Does this proposal contain the bare minimum change needed to solve for the use cases? | ||
- Are there any implicit behaviors in the proposal? Would users expect these implicit behaviors or would they be | ||
surprising? Are there security implications for these implicit behaviors? | ||
--> | ||
|
||
### Flexibility | ||
|
||
<!-- | ||
https://github.com/tektoncd/community/blob/main/design-principles.md#flexibility | ||
|
||
- Are there dependencies that need to be pulled in for this proposal to work? What support or maintenance would be | ||
required for these dependencies? | ||
- Are we coupling two or more Tekton projects in this proposal (e.g. coupling Pipelines to Chains)? | ||
- Are we coupling Tekton and other projects (e.g. Knative, Sigstore) in this proposal? | ||
- What is the impact of the coupling to operators e.g. maintenance & end-to-end testing? | ||
- Are there opinionated choices being made in this proposal? If so, are they necessary and can users extend it with | ||
their own choices? | ||
--> | ||
|
||
### Conformance | ||
|
||
<!-- | ||
https://github.com/tektoncd/community/blob/main/design-principles.md#conformance | ||
|
||
- Does this proposal require the user to understand how the Tekton API is implemented? | ||
- Does this proposal introduce additional Kubernetes concepts into the API? If so, is this necessary? | ||
- If the API is changing as a result of this proposal, what updates are needed to the | ||
[API spec](https://github.com/tektoncd/pipeline/blob/main/docs/api-spec.md)? | ||
--> | ||
|
||
### User Experience | ||
|
||
<!-- | ||
(optional) | ||
|
||
Consideration about the user experience. Depending on the area of change, | ||
users may be Task and Pipeline editors, they may trigger TaskRuns and | ||
PipelineRuns or they may be responsible for monitoring the execution of runs, | ||
via CLI, dashboard or a monitoring system. | ||
|
||
Consider including folks that also work on CLI and dashboard. | ||
--> | ||
|
||
### Performance | ||
|
||
<!-- | ||
(optional) | ||
|
||
Consider which use cases are impacted by this change and what are their | ||
performance requirements. | ||
- What impact does this change have on the start-up time and execution time | ||
of TaskRuns and PipelineRuns? | ||
- What impact does it have on the resource footprint of Tekton controllers | ||
as well as TaskRuns and PipelineRuns? | ||
--> | ||
|
||
### Risks and Mitigations | ||
|
||
<!-- | ||
What are the risks of this proposal and how do we mitigate? Think broadly. | ||
For example, consider both security and how this will impact the larger | ||
Tekton ecosystem. Consider including folks that also work outside the WGs | ||
or subproject. | ||
- How will security be reviewed and by whom? | ||
- How will UX be reviewed and by whom? | ||
--> | ||
|
||
### Drawbacks | ||
|
||
<!-- | ||
Why should this TEP _not_ be implemented? | ||
--> | ||
|
||
## Alternatives | ||
|
||
<!-- | ||
What other approaches did you consider and why did you rule them out? These do | ||
not need to be as detailed as the proposal, but should include enough | ||
information to express the idea and why it was not acceptable. | ||
--> | ||
|
||
|
||
## Implementation Plan | ||
|
||
<!-- | ||
What are the implementation phases or milestones? Taking an incremental approach | ||
makes it easier to review and merge the implementation pull request. | ||
--> | ||
|
||
|
||
### Test Plan | ||
|
||
<!-- | ||
Consider the following in developing a test plan for this enhancement: | ||
- Will there be e2e and integration tests, in addition to unit tests? | ||
- How will it be tested in isolation vs with other components? | ||
|
||
No need to outline all the test cases, just the general strategy. Anything | ||
that would count as tricky in the implementation and anything particularly | ||
challenging to test should be called out. | ||
|
||
All code is expected to have adequate tests (eventually with coverage | ||
expectations). | ||
--> | ||
|
||
### Infrastructure Needed | ||
|
||
<!-- | ||
(optional) | ||
|
||
Use this section if you need things from the project or working group. | ||
Examples include a new subproject, repos requested, GitHub details. | ||
Listing these here allows a working group to get the process for these | ||
resources started right away. | ||
--> | ||
|
||
### Upgrade and Migration Strategy | ||
|
||
<!-- | ||
(optional) | ||
|
||
Use this section to detail whether this feature needs an upgrade or | ||
migration strategy. This is especially useful when we modify a | ||
behavior or add a feature that may replace and deprecate a current one. | ||
--> | ||
|
||
### Implementation Pull Requests | ||
|
||
<!-- | ||
Once the TEP is ready to be marked as implemented, list down all the GitHub | ||
merged pull requests. | ||
|
||
Note: This section is exclusively for merged pull requests for this TEP. | ||
It will be a quick reference for those looking for implementation of this TEP. | ||
--> | ||
|
||
## References | ||
|
||
<!-- | ||
(optional) | ||
|
||
Use this section to add links to GitHub issues, other TEPs, design docs in Tekton | ||
shared drive, examples, etc. This is useful to refer back to any other related links | ||
to get more details. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine there are also some CD related use cases here: maybe you want to build an image and tag as "latest", or deploy some infra to a cluster, and you want to make sure the most recently created pipelinerun is also the one that is applied last.