Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP ~ Automatically manage concurrent PipelineRuns #716

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
368 changes: 368 additions & 0 deletions teps/0117-manage-pipelinerun-concurrency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
---
status: proposed
title: 'Manage PipelineRun concurrency'
creation-date: '2022-05-26'
last-updated: '2022-05-26'
authors:
- '@williamlfish'
- '@vdemeester'
---

# TEP-0117: Manage PipelineRun concurrency

<!--
**Note:** Please remove comment blocks for sections you've filled in.
When your TEP is complete, all of these comment blocks should be removed.

To get started with this template:

- [ ] **Fill out this file as best you can.**
At minimum, you should fill in the "Summary", and "Motivation" sections.
These should be easy if you've preflighted the idea of the TEP with the
appropriate Working Group.
- [ ] **Create a PR for this TEP.**
Assign it to people in the Working Group that are sponsoring this process.
- [ ] **Merge early and iterate.**
Avoid getting hung up on specific details and instead aim to get the goals of
the TEP clarified and merged quickly. The best way to do this is to just
start with the high-level sections and fill out details incrementally in
subsequent PRs.

Just because a TEP is merged does not mean it is complete or approved. Any TEP
marked as a `proposed` is a working document and subject to change. You can
denote sections that are under active debate as follows:

```
<<[UNRESOLVED optional short context or usernames ]>>
Stuff that is being argued.
<<[/UNRESOLVED]>>
```

When editing TEPS, aim for tightly-scoped, single-topic PRs to keep discussions
focused. If you disagree with what is already in a document, open a new PR
with suggested changes.

If there are new details that belong in the TEP, edit the TEP. Once a
feature has become "implemented", major changes should get new TEPs.

The canonical place for the latest set of instructions (and the likely source
of this file) is [here](/teps/tools/tep-template.md.template).

-->

<!--
This is the title of your TEP. Keep it short, simple, and descriptive. A good
title can help communicate what the TEP is and should be considered as part of
any review.
-->

<!--
A table of contents is helpful for quickly jumping to sections of a TEP and for
highlighting any additional information provided beyond the standard TEP
template.

Ensure the TOC is wrapped with
<code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
tags, and then generate with `hack/update-toc.sh`.
-->

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Use Cases](#use-cases)
- [Requirements](#requirements)
- [Proposal](#proposal)
- [Notes and Caveats](#notes-and-caveats)
- [Design Details](#design-details)
- [Design Evaluation](#design-evaluation)
- [Reusability](#reusability)
- [Simplicity](#simplicity)
- [Flexibility](#flexibility)
- [User Experience](#user-experience)
- [Performance](#performance)
- [Risks and Mitigations](#risks-and-mitigations)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Implementation Plan](#implementation-plan)
- [Test Plan](#test-plan)
- [Infrastructure Needed](#infrastructure-needed)
- [Upgrade and Migration Strategy](#upgrade-and-migration-strategy)
- [Implementation Pull Requests](#implementation-pull-requests)
- [References](#references)
<!-- /toc -->

## Summary
Manage pipeline run concurrency via configurable strategies.

## Motivation
There are situations where pipelines can overlap, and they should not. For instance, a pr is created and that triggers ci to so a PipelineRun is created, a developer notices a small typo, pushed again, and a new PipelineRun is triggered, the first PipelineRun is no longer relevant, but potentially still running.
Another example is managing available shared resources, lets say tekton triggers an external device and there are a set available amount, or simply resource constrained clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine there are also some CD related use cases here: maybe you want to build an image and tag as "latest", or deploy some infra to a cluster, and you want to make sure the most recently created pipelinerun is also the one that is applied last.



2 proposed cancel strategies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a formatting note, I think it would be helpful to separate out use cases from proposed strategies-- that makes it a bit easier to separate design from motivation and make sure we have a strategy for addressing each use case; up to you though!

- ZeroOverlap:
- Pipelineruns with the same key do not overlap, so if one is running and a new one is scheduled, the former is canceled.
- Configure cancel status ( with finally, stop, cancel )
- Queues:
- Queues have a max run ( how many can run at the same time )
- fifo
- Queue max can be over-ridden, but is always the last pipelinerun created.


Strategies can be mixed together BUT their concurrency keys cannot be the same ( cancel eachother out ).

Mixing strategies would allow for better resource management for teams. For instance, 3 teams are working on a mobile application with only 5 test devices so our max queue size is 5. If team 1 works in really fast bursts,
they might clog the queue with small pr changes. Leaving team 2 & 3 only able to get a build in when lucky. With zeroOverlap, and a queue, a single pr with many quick changes will only ever occupy one slot in the queue.

### Requirements
- PipelineRuns are canceled based on the strategy defined.
- Simple interface set at the PipelineRun manifest.
- New PipelineRun should wait for the canceled one to finish when applicable.
- All PipelineRuns follow fifo patterns.

## Proposal
Adding a key on the PipelineRun manifest seems like the most appropriate, and simple solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it could make sense to have any of this defined on the Pipeline instead. for example, in the case where you want only one CI run at a time:

kind: Pipeline
spec:
  concurrency:
    zeroOverlap:
      key: $(params.pull-request-id)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this is interesting, like having the idea of overlap closer to the actual thing you don't want overlapped ( for lack of better phrasing lolol 🙈 )? There is something that feels nice and clean about that, but I wonder if in practice it will be hard for users to remember where one config goes vs another?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! The reason I suggest this is because the Pipeline author knows whether concurrency is appropriate for the pipeline, and this will allow users to not have to set concurrency controls on every pipelinerun for that pipeline. I'm sure we could explore adding this to the PipelineRun later if there's a clear need. What do you mean by it being hard to remember where one config goes vs another?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I meant on the topic of remembering where configs might go, if the zeroOverlap is on the pipeline, but the queue config is on the PipelineRun, then there are 2 areas that are managing a similar topic ( concurrency ). That being said, I dont think it is a hard stop or anything, just a thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could probably add both options to the Pipeline spec for now (like in the alternative syntax example I put below) and then we could always add concurrency to PipelineRun later if needed. Do you think the "queue" option is something you would want to specify on the PipelineRun?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a debate of "is concurrency an authoring or a runtime concern ?" 🙃.

The reason I suggest this is because the Pipeline author knows whether concurrency is appropriate for the pipeline, and this will allow users to not have to set concurrency controls on every pipelinerun for that pipeline.

I partly disagree on this. It might be true for some usecase, but I am not sure this would be the majority, at all.
If I have a Go Pipeline (build, test, publish an image or something), as an author, I do not need nor want to set concurrency on this, as I don't really have to care about it. But as a consumer of this Pipeline, I may want to have only 1 PipelineRun for this Pipeline to run per Go project.
In addition, putting this on the Pipeline spec might reduce Pipeline "share-ability". Taking that Go Pipeline again, and assuming it's available in a catalog (or better, through the hub) ; if I want to achieve what I described above, I actually need to "duplicate" the Pipeline or augment it using, for example, Kustomize.

My take on this is almost the opposite as @lbernick 👼🏼 🙃. I'd rather start by having this on the PipelineRun spec as it's closer to a "runtime" concern than an authoring one. And then I would explore the in Pipeline. Not having it on PipelineRun also reduce a bit it's "potential", because then any tool (or user) using embedded specs (like pipelines-as-code) will not benefit from it at all.

Copy link
Member

@lbernick lbernick Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more today-- what would we do if two PipelineRuns had the same concurrency key but different concurrency strategies? I think the concurrency strategy has to be defined somewhere other than the PipelineRun-- either on the Pipeline or in a configmap or new CRD. I do see your point though about how different users of a given pipeline might want to configure concurrency differently, which makes me lean more towards cluster-level configuration (like in your original hackmd).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we thinking too much about this though ?
Who are we targeting for this feature ? Do we assume the majority of the usage of this features will be tools (pipeline-as-code, team's tooling, …) or end-users ? Can this be documented as an actual "known limitation" ? Can we make those keys unique maybe (compute and append a hash on the referenced pipeline or embedded pipeline definition, …) ?

Cluster-level configuration is a bit tricky. It works for some cases, mainly, for cluster-admins to limit the number of "resources" consumed by tekton, but teams on their own namespaces will probably need to handle those per namespace or "project" or pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To detail this a bit more:

  • on the "making things unique", the controller could take the user's provided concurrency-key, but when "computing" it, also "hashing" the pipelinespec the pipelinerun refers to (be it through pipelineref or embedded spec). That way, it wouldn't "conflict" if 2 PR refers to different pipeline (or an update on a pipeline) but they same concurrency key. This comes with its own set of challenges/problems but we could keep that approach in mind while experimenting.
  • the "main" assumption I am making here though, is that in most of the cases, the users doesn't create the PipelineRun himself (aka authoring it), tools do it for them (clients like tkn, tools like pipelines-as-code, home-grown tool, ibm stuff, …). And in the small amount of cases where the user author them, it's probably fine to have this known limitation/issue.

In a gist, I really do think, while experimenting, while it's in alpha & all, we can not overthink this, and see with "usage" if it really is a problem or not.


Short example:
```yaml
concurrency:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential alternative syntax:

concurrency:
   key: $(params.foo)
   strategy: cancel # options here are "queue" and "cancel"
   max: 1  # we can choose a default value for max depending on the concurrency strategy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alternative syntax does not give the ability to use both a queue and zeroOverlap at the same time right? or am i misunderstanding part of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. Imagine there are 5 PipelineRuns running, with max = 5. My understanding would be that if a new PipelineRun was created and the strategy was "queue", it would wait until one finished before starting. If the strategy was "cancel", it would cancel the earliest created PipelineRun and then start. ZeroOverlap is "cancel" with max = 1.

I'm not understanding why you might want to use both at the same time; they seem incompatible with each other. Can you describe what you had in mind?

zeroOverlap:
key: "the-key-to-match"
cancelStatus: "Stop"
queue:
key: "a-diff-key-to-match"
maxRun: 5
```
Full example from trigger template:
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: tekton-pipelines-pr-ci-
spec:
concurrency:
zeroOverlap:
key: $(tt.params.repo)-pr-$(params.pr-number)
cancelStatus: "Stop"
queue:
key: $(tt.params.repo)
maxRun: 5
pipelineRef:
name: tekton-pipelines-pr-ci
params:
- name: repo
value: $(tt.params.repo)
- name: pr-number
value: $(tt.params.pr-number)
serviceAccountName: pipeline-sa
workspaces:
- name: ssh-creds
secret:
secretName: ssh-things
```
In the above example, max 5 pipeline runs can run at the same time for a repo, and only one from a specific pr can overlap.

Let's imagine an org enables the ability for a higher max run threshold. Updating the pipelinerun manifest to a higher max should be all that is needed. The newly
created max is the now current max for its key, even if its position in the queue does not mean it will be immediately scheduled. So if the max was 2, with 5 pipelineruns created,
2 are running and 3 are pending, a 6th is created with max of 4, now the 2 most recent created pipelineruns that where pending, begin running and 2 are now pending, and the
newest created (with the new max) will still be last in the queue. If the max is dropped again, and more are running then should be, allow the queue to resolve itself by having
pipelineruns that are running finish, but never allowing a pipelinerun to enter a running state until there is space in the queue.

### Notes and Caveats

<!--
(optional)

Go in to as much detail as necessary here.
- What are the caveats to the proposal?
- What are some important details that didn't come across above?
- What are the core concepts and how do they relate?
-->


## Design Details

<!--
This section should contain enough information that the specifics of your
change are understandable. This may include API specs (though not always
required) or even code snippets. If there's any ambiguity about HOW your
proposal will be implemented, this is the place to discuss them.

If it's helpful to include workflow diagrams or any other related images,
add them under "/teps/images/". It's upto the TEP author to choose the name
of the file, but general guidance is to include at least TEP number in the
file name, for example, "/teps/images/NNNN-workflow.jpg".
-->


## Design Evaluation
<!--
How does this proposal affect the api conventions, reusability, simplicity, flexibility
and conformance of Tekton, as described in [design principles](https://github.com/tektoncd/community/blob/master/design-principles.md)
-->

### Reusability

<!--
https://github.com/tektoncd/community/blob/main/design-principles.md#reusability

- Are there existing features related to the proposed features? Were the existing features reused?
- Is the problem being solved an authoring-time or runtime-concern? Is the proposed feature at the appropriate level
authoring or runtime?
-->

### Simplicity

<!--
https://github.com/tektoncd/community/blob/main/design-principles.md#simplicity

- How does this proposal affect the user experience?
- What’s the current user experience without the feature and how challenging is it?
- What will be the user experience with the feature? How would it have changed?
- Does this proposal contain the bare minimum change needed to solve for the use cases?
- Are there any implicit behaviors in the proposal? Would users expect these implicit behaviors or would they be
surprising? Are there security implications for these implicit behaviors?
-->

### Flexibility

<!--
https://github.com/tektoncd/community/blob/main/design-principles.md#flexibility

- Are there dependencies that need to be pulled in for this proposal to work? What support or maintenance would be
required for these dependencies?
- Are we coupling two or more Tekton projects in this proposal (e.g. coupling Pipelines to Chains)?
- Are we coupling Tekton and other projects (e.g. Knative, Sigstore) in this proposal?
- What is the impact of the coupling to operators e.g. maintenance & end-to-end testing?
- Are there opinionated choices being made in this proposal? If so, are they necessary and can users extend it with
their own choices?
-->

### Conformance

<!--
https://github.com/tektoncd/community/blob/main/design-principles.md#conformance

- Does this proposal require the user to understand how the Tekton API is implemented?
- Does this proposal introduce additional Kubernetes concepts into the API? If so, is this necessary?
- If the API is changing as a result of this proposal, what updates are needed to the
[API spec](https://github.com/tektoncd/pipeline/blob/main/docs/api-spec.md)?
-->

### User Experience

<!--
(optional)

Consideration about the user experience. Depending on the area of change,
users may be Task and Pipeline editors, they may trigger TaskRuns and
PipelineRuns or they may be responsible for monitoring the execution of runs,
via CLI, dashboard or a monitoring system.

Consider including folks that also work on CLI and dashboard.
-->

### Performance

<!--
(optional)

Consider which use cases are impacted by this change and what are their
performance requirements.
- What impact does this change have on the start-up time and execution time
of TaskRuns and PipelineRuns?
- What impact does it have on the resource footprint of Tekton controllers
as well as TaskRuns and PipelineRuns?
-->

### Risks and Mitigations

<!--
What are the risks of this proposal and how do we mitigate? Think broadly.
For example, consider both security and how this will impact the larger
Tekton ecosystem. Consider including folks that also work outside the WGs
or subproject.
- How will security be reviewed and by whom?
- How will UX be reviewed and by whom?
-->

### Drawbacks

<!--
Why should this TEP _not_ be implemented?
-->

## Alternatives

<!--
What other approaches did you consider and why did you rule them out? These do
not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->


## Implementation Plan

<!--
What are the implementation phases or milestones? Taking an incremental approach
makes it easier to review and merge the implementation pull request.
-->


### Test Plan

<!--
Consider the following in developing a test plan for this enhancement:
- Will there be e2e and integration tests, in addition to unit tests?
- How will it be tested in isolation vs with other components?

No need to outline all the test cases, just the general strategy. Anything
that would count as tricky in the implementation and anything particularly
challenging to test should be called out.

All code is expected to have adequate tests (eventually with coverage
expectations).
-->

### Infrastructure Needed

<!--
(optional)

Use this section if you need things from the project or working group.
Examples include a new subproject, repos requested, GitHub details.
Listing these here allows a working group to get the process for these
resources started right away.
-->

### Upgrade and Migration Strategy

<!--
(optional)

Use this section to detail whether this feature needs an upgrade or
migration strategy. This is especially useful when we modify a
behavior or add a feature that may replace and deprecate a current one.
-->

### Implementation Pull Requests

<!--
Once the TEP is ready to be marked as implemented, list down all the GitHub
merged pull requests.

Note: This section is exclusively for merged pull requests for this TEP.
It will be a quick reference for those looking for implementation of this TEP.
-->

## References

<!--
(optional)

Use this section to add links to GitHub issues, other TEPs, design docs in Tekton
shared drive, examples, etc. This is useful to refer back to any other related links
to get more details.
-->
2 changes: 2 additions & 0 deletions teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,3 +262,5 @@ This is the complete list of Tekton teps:
|[TEP-0112](0112-replace-volumes-with-workspaces.md) | Replace Volumes with Workspaces | proposed | 2022-06-02 |
|[TEP-0114](0114-custom-tasks-beta.md) | Custom Tasks Beta | implementable | 2022-07-12 |
|[TEP-0116](0116-referencing-finally-task-results-in-pipeline-results.md) | Referencing Finally Task Results in Pipeline Results | implementable | 2022-07-15 |
|[TEP-0117](0117-manage-pipelinerun-concurrency.md) | Manage PipelineRun concurrency | proposed | 2022-05-26 |