tektoncd · williamlfish · May 27, 2022 · Jul 29, 2022 · Jul 29, 2022 · Jul 29, 2022
diff --git a/teps/0117-manage-pipelinerun-concurrency.md b/teps/0117-manage-pipelinerun-concurrency.md
@@ -0,0 +1,368 @@
+---
+status: proposed
+title: 'Manage PipelineRun concurrency'
+creation-date: '2022-05-26'
+last-updated: '2022-05-26'
+authors:
+- '@williamlfish'
+- '@vdemeester'
+---
+
+# TEP-0117: Manage PipelineRun concurrency
+
+<!--
+**Note:** Please remove comment blocks for sections you've filled in.
+When your TEP is complete, all of these comment blocks should be removed.
+
+To get started with this template:
+
+- [ ] **Fill out this file as best you can.**
+  At minimum, you should fill in the "Summary", and "Motivation" sections.
+  These should be easy if you've preflighted the idea of the TEP with the
+  appropriate Working Group.
+- [ ] **Create a PR for this TEP.**
+  Assign it to people in the Working Group that are sponsoring this process.
+- [ ] **Merge early and iterate.**
+  Avoid getting hung up on specific details and instead aim to get the goals of
+  the TEP clarified and merged quickly. The best way to do this is to just
+  start with the high-level sections and fill out details incrementally in
+  subsequent PRs.
+
+Just because a TEP is merged does not mean it is complete or approved. Any TEP
+marked as a `proposed` is a working document and subject to change. You can
+denote sections that are under active debate as follows:
+
+```
+<<[UNRESOLVED optional short context or usernames ]>>
+Stuff that is being argued.
+<<[/UNRESOLVED]>>
+```
+
+When editing TEPS, aim for tightly-scoped, single-topic PRs to keep discussions
+focused. If you disagree with what is already in a document, open a new PR
+with suggested changes.
+
+If there are new details that belong in the TEP, edit the TEP. Once a
+feature has become "implemented", major changes should get new TEPs.
+
+The canonical place for the latest set of instructions (and the likely source
+of this file) is [here](/teps/tools/tep-template.md.template).
+
+-->
+
+<!--
+This is the title of your TEP. Keep it short, simple, and descriptive. A good
+title can help communicate what the TEP is and should be considered as part of
+any review.
+-->
+
+<!--
+A table of contents is helpful for quickly jumping to sections of a TEP and for
+highlighting any additional information provided beyond the standard TEP
+template.
+
+Ensure the TOC is wrapped with
+  <code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
+tags, and then generate with `hack/update-toc.sh`.
+-->
+
+<!-- toc -->
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+  - [Use Cases](#use-cases)
+  - [Requirements](#requirements)
+- [Proposal](#proposal)
+  - [Notes and Caveats](#notes-and-caveats)
+- [Design Details](#design-details)
+- [Design Evaluation](#design-evaluation)
+  - [Reusability](#reusability)
+  - [Simplicity](#simplicity)
+  - [Flexibility](#flexibility)
+  - [User Experience](#user-experience)
+  - [Performance](#performance)
+  - [Risks and Mitigations](#risks-and-mitigations)
+  - [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+- [Implementation Plan](#implementation-plan)
+  - [Test Plan](#test-plan)
+  - [Infrastructure Needed](#infrastructure-needed)
+  - [Upgrade and Migration Strategy](#upgrade-and-migration-strategy)
+  - [Implementation Pull Requests](#implementation-pull-requests)
+- [References](#references)
+<!-- /toc -->
+
+## Summary
+Manage pipeline run concurrency via configurable strategies.
+
+## Motivation
+There are situations where pipelines can overlap, and they should not. For instance, a pr is created and that triggers ci to so a PipelineRun is created, a developer notices a small typo, pushed again, and a new PipelineRun is triggered, the first PipelineRun is no longer relevant, but potentially still running.  
+Another example is managing available shared resources, lets say tekton triggers an external device and there are a set available amount, or simply resource constrained clusters. 
+
+
+2 proposed cancel strategies. 
+- ZeroOverlap:
+  - Pipelineruns with the same key do not overlap, so if one is running and a new one is scheduled, the former is canceled.
+  - Configure cancel status ( with finally, stop, cancel )
+- Queues:
+  - Queues have a max run ( how many can run at the same time )
+  - fifo
+  - Queue max can be over-ridden, but is always the last pipelinerun created.     
+
+
+Strategies can be mixed together BUT their concurrency keys cannot be the same ( cancel eachother out ).
+
+Mixing strategies would allow for better resource management for teams. For instance, 3 teams are working on a mobile application with only 5 test devices so our max queue size is 5. If team 1 works in really fast bursts, 
+they might clog the queue with small pr changes. Leaving team 2 & 3 only able to get a build in when lucky. With zeroOverlap, and a queue, a single pr with many quick changes will only ever occupy one slot in the queue.    
+
+### Requirements
+- PipelineRuns are canceled based on the strategy defined.
+- Simple interface set at the PipelineRun manifest.
+- New PipelineRun should wait for the canceled one to finish when applicable.
+- All PipelineRuns follow fifo patterns. 
+
+## Proposal
+Adding a key on the PipelineRun manifest seems like the most appropriate, and simple solution.  
+
+Short example:  
+```yaml
+concurrency:
+  zeroOverlap:
+    key: "the-key-to-match"
+    cancelStatus: "Stop"
+  queue:
+    key: "a-diff-key-to-match"
+    maxRun: 5
+```
+Full example from trigger template:
+```yaml
+apiVersion: tekton.dev/v1beta1
+kind: PipelineRun
+metadata:
+  generateName: tekton-pipelines-pr-ci-
+spec:
+  concurrency:
+    zeroOverlap:
+      key: $(tt.params.repo)-pr-$(params.pr-number)
+      cancelStatus: "Stop"
+    queue:
+      key: $(tt.params.repo)
+      maxRun: 5
+  pipelineRef:
+    name: tekton-pipelines-pr-ci
+  params:
+    - name: repo
+      value: $(tt.params.repo)
+    - name: pr-number
+      value: $(tt.params.pr-number)
+  serviceAccountName: pipeline-sa
+  workspaces:
+    - name: ssh-creds
+      secret:
+        secretName: ssh-things
+```
+In the above example, max 5 pipeline runs can run at the same time for a repo, and only one from a specific pr can overlap.  
+
+Let's imagine an org enables the ability for a higher max run threshold. Updating the pipelinerun manifest to a higher max should be all that is needed. The newly 
+created max is the now current max for its key, even if its position in the queue does not mean it will be immediately scheduled. So if the max was 2, with 5 pipelineruns created,
+2 are running and 3 are pending, a 6th is created with max of 4, now the 2 most recent created pipelineruns that where pending, begin running and 2 are now pending, and the 
+newest created (with the new max) will still be last in the queue. If the max is dropped again, and more are running then should be, allow the queue to resolve itself by having 
+pipelineruns that are running finish, but never allowing a pipelinerun to enter a running state until there is space in the queue.
+
+### Notes and Caveats
+
+<!--
+(optional)
+
+Go in to as much detail as necessary here.
+- What are the caveats to the proposal?
+- What are some important details that didn't come across above?
+- What are the core concepts and how do they relate?
+-->
+
+
+## Design Details
+
+<!--
+This section should contain enough information that the specifics of your
+change are understandable. This may include API specs (though not always
+required) or even code snippets. If there's any ambiguity about HOW your
+proposal will be implemented, this is the place to discuss them.
+
+If it's helpful to include workflow diagrams or any other related images,
+add them under "/teps/images/". It's upto the TEP author to choose the name
+of the file, but general guidance is to include at least TEP number in the
+file name, for example, "/teps/images/NNNN-workflow.jpg".
+-->
+
+
+## Design Evaluation
+<!--
+How does this proposal affect the api conventions, reusability, simplicity, flexibility
+and conformance of Tekton, as described in [design principles](https://github.com/tektoncd/community/blob/master/design-principles.md)
+-->
+
+### Reusability
+
+<!--
+https://github.com/tektoncd/community/blob/main/design-principles.md#reusability
+
+- Are there existing features related to the proposed features? Were the existing features reused?
+- Is the problem being solved an authoring-time or runtime-concern? Is the proposed feature at the appropriate level
+authoring or runtime?
+-->
+
+### Simplicity
+
+<!--
+https://github.com/tektoncd/community/blob/main/design-principles.md#simplicity
+
+- How does this proposal affect the user experience?
+- What’s the current user experience without the feature and how challenging is it?
+- What will be the user experience with the feature? How would it have changed?
+- Does this proposal contain the bare minimum change needed to solve for the use cases?
+- Are there any implicit behaviors in the proposal? Would users expect these implicit behaviors or would they be
+surprising? Are there security implications for these implicit behaviors?
+-->
+
+### Flexibility
+
+<!--
+https://github.com/tektoncd/community/blob/main/design-principles.md#flexibility
+
+- Are there dependencies that need to be pulled in for this proposal to work? What support or maintenance would be
+required for these dependencies?
+- Are we coupling two or more Tekton projects in this proposal (e.g. coupling Pipelines to Chains)?
+- Are we coupling Tekton and other projects (e.g. Knative, Sigstore) in this proposal?
+- What is the impact of the coupling to operators e.g. maintenance & end-to-end testing?
+- Are there opinionated choices being made in this proposal? If so, are they necessary and can users extend it with
+their own choices?
+-->
+
+### Conformance
+
+<!--
+https://github.com/tektoncd/community/blob/main/design-principles.md#conformance
+
+- Does this proposal require the user to understand how the Tekton API is implemented?
+- Does this proposal introduce additional Kubernetes concepts into the API? If so, is this necessary?
+- If the API is changing as a result of this proposal, what updates are needed to the
+[API spec](https://github.com/tektoncd/pipeline/blob/main/docs/api-spec.md)?
+-->
+
+### User Experience
+
+<!--
+(optional)
+
+Consideration about the user experience. Depending on the area of change,
+users may be Task and Pipeline editors, they may trigger TaskRuns and
+PipelineRuns or they may be responsible for monitoring the execution of runs,
+via CLI, dashboard or a monitoring system.
+
+Consider including folks that also work on CLI and dashboard.
+-->
+
+### Performance
+
+<!--
+(optional)
+
+Consider which use cases are impacted by this change and what are their
+performance requirements.
+- What impact does this change have on the start-up time and execution time
+of TaskRuns and PipelineRuns?
+- What impact does it have on the resource footprint of Tekton controllers
+as well as TaskRuns and PipelineRuns?
+-->
+
+### Risks and Mitigations
+
+<!--
+What are the risks of this proposal and how do we mitigate? Think broadly.
+For example, consider both security and how this will impact the larger
+Tekton ecosystem. Consider including folks that also work outside the WGs
+or subproject.
+- How will security be reviewed and by whom?
+- How will UX be reviewed and by whom?
+-->
+
+### Drawbacks
+
+<!--
+Why should this TEP _not_ be implemented?
+-->
+
+## Alternatives
+
+<!--
+What other approaches did you consider and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->
+
+
+## Implementation Plan
+
+<!--
+What are the implementation phases or milestones? Taking an incremental approach
+makes it easier to review and merge the implementation pull request.
+-->
+
+
+### Test Plan
+
+<!--
+Consider the following in developing a test plan for this enhancement:
+- Will there be e2e and integration tests, in addition to unit tests?
+- How will it be tested in isolation vs with other components?
+
+No need to outline all the test cases, just the general strategy. Anything
+that would count as tricky in the implementation and anything particularly
+challenging to test should be called out.
+
+All code is expected to have adequate tests (eventually with coverage
+expectations).
+-->
+
+### Infrastructure Needed
+
+<!--
+(optional)
+
+Use this section if you need things from the project or working group.
+Examples include a new subproject, repos requested, GitHub details.
+Listing these here allows a working group to get the process for these
+resources started right away.
+-->
+
+### Upgrade and Migration Strategy
+
+<!--
+(optional)
+
+Use this section to detail whether this feature needs an upgrade or
+migration strategy. This is especially useful when we modify a
+behavior or add a feature that may replace and deprecate a current one.
+-->
+
+### Implementation Pull Requests
+
+<!--
+Once the TEP is ready to be marked as implemented, list down all the GitHub
+merged pull requests.
+
+Note: This section is exclusively for merged pull requests for this TEP.
+It will be a quick reference for those looking for implementation of this TEP.
+-->
+
+## References
+
+<!--
+(optional)
+
+Use this section to add links to GitHub issues, other TEPs, design docs in Tekton
+shared drive, examples, etc. This is useful to refer back to any other related links
+to get more details.
+-->
diff --git a/teps/README.md b/teps/README.md
@@ -262,3 +262,5 @@ This is the complete list of Tekton teps:
 |[TEP-0112](0112-replace-volumes-with-workspaces.md) | Replace Volumes with Workspaces | proposed | 2022-06-02 |
 |[TEP-0114](0114-custom-tasks-beta.md) | Custom Tasks Beta | implementable | 2022-07-12 |
 |[TEP-0116](0116-referencing-finally-task-results-in-pipeline-results.md) | Referencing Finally Task Results in Pipeline Results | implementable | 2022-07-15 |
+|[TEP-0117](0117-manage-pipelinerun-concurrency.md) | Manage PipelineRun concurrency | proposed | 2022-05-26 |
+