Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP ~ Automatically manage concurrent PipelineRuns #716

Closed

Conversation

williamlfish
Copy link

TEP outlining a feature that would give developers the option to have the PipelineRuns that potentially overlap an ability to only run the most recent one. Please feel free to let me know if any other information outlining the idea is needed or if I can be helpful in clarifying.

I would also love any input on what to call this? Describing the situation more succinctly would be ideal and am open to any recommendations 😅

@tekton-robot tekton-robot requested review from jerop and kimsterv May 27, 2022 00:32
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 27, 2022
@williamlfish
Copy link
Author

/assign @PuneetPunamiya

@vdemeester
Copy link
Member

I think this has some similarities with https://hackmd.io/GK_1_6DWTvSiVHBL6umqDA (that I have yet to propose as a TEP 😅 )
/assign

@williamlfish
Copy link
Author

@vdemeester I like a lot if what you have in that doc! You mention using annotations/cm, and that feels a lot more flexible. This is also a feature I'm very interested in seeing progress on 😸, so in an effort to continue the conversation, and perhaps meld the TEPs into one, should we continue discussing details in that md file you posted? or is here more appropriate? Thanks!

@vdemeester
Copy link
Member

@vdemeester I like a lot if what you have in that doc! You mention using annotations/cm, and that feels a lot more flexible. This is also a feature I'm very interested in seeing progress on smile_cat, so in an effort to continue the conversation, and perhaps meld the TEPs into one, should we continue discussing details in that md file you posted? or is here more appropriate? Thanks!

Yes we should definitely merge the two into one TEP and collaborate on it. I think we can try to merge the content of this and the hackmd in the hackmd, and once we feel it's kind-of ready, we update this pull-request. How does it sound ?

(feel free to ping me on slack as well 😉 )

@williamlfish
Copy link
Author

@vdemeester sounds good to me! ima mark this as a WIP, and will begin to leave thoughts on the md

@williamlfish williamlfish changed the title TEP ~ Automatically manage concurrent PipelineRuns [WIP ]TEP ~ Automatically manage concurrent PipelineRuns Jun 1, 2022
@tekton-robot tekton-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2022
@jerop
Copy link
Member

jerop commented Jun 6, 2022

/kind tep

@tekton-robot tekton-robot added kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 6, 2022
@jerop jerop mentioned this pull request Jun 10, 2022
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from puneetpunamiya after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2022
@williamlfish williamlfish changed the title [WIP ]TEP ~ Automatically manage concurrent PipelineRuns TEP ~ Automatically manage concurrent PipelineRuns Jul 29, 2022
@tekton-robot tekton-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2022
@williamlfish
Copy link
Author

@vdemeester updated! 🎈

Copy link
Member

@lbernick lbernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @williamlfish this is a great idea!


## Motivation
There are situations where pipelines can overlap, and they should not. For instance, a pr is created and that triggers ci to so a PipelineRun is created, a developer notices a small typo, pushed again, and a new PipelineRun is triggered, the first PipelineRun is no longer relevant, but potentially still running.
Another example is managing available shared resources, lets say tekton triggers an external device and there are a set available amount, or simply resource constrained clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine there are also some CD related use cases here: maybe you want to build an image and tag as "latest", or deploy some infra to a cluster, and you want to make sure the most recently created pipelinerun is also the one that is applied last.

- All PipelineRuns follow fifo patterns.

## Proposal
Adding a key on the PipelineRun manifest seems like the most appropriate, and simple solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it could make sense to have any of this defined on the Pipeline instead. for example, in the case where you want only one CI run at a time:

kind: Pipeline
spec:
  concurrency:
    zeroOverlap:
      key: $(params.pull-request-id)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this is interesting, like having the idea of overlap closer to the actual thing you don't want overlapped ( for lack of better phrasing lolol 🙈 )? There is something that feels nice and clean about that, but I wonder if in practice it will be hard for users to remember where one config goes vs another?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! The reason I suggest this is because the Pipeline author knows whether concurrency is appropriate for the pipeline, and this will allow users to not have to set concurrency controls on every pipelinerun for that pipeline. I'm sure we could explore adding this to the PipelineRun later if there's a clear need. What do you mean by it being hard to remember where one config goes vs another?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I meant on the topic of remembering where configs might go, if the zeroOverlap is on the pipeline, but the queue config is on the PipelineRun, then there are 2 areas that are managing a similar topic ( concurrency ). That being said, I dont think it is a hard stop or anything, just a thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could probably add both options to the Pipeline spec for now (like in the alternative syntax example I put below) and then we could always add concurrency to PipelineRun later if needed. Do you think the "queue" option is something you would want to specify on the PipelineRun?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a debate of "is concurrency an authoring or a runtime concern ?" 🙃.

The reason I suggest this is because the Pipeline author knows whether concurrency is appropriate for the pipeline, and this will allow users to not have to set concurrency controls on every pipelinerun for that pipeline.

I partly disagree on this. It might be true for some usecase, but I am not sure this would be the majority, at all.
If I have a Go Pipeline (build, test, publish an image or something), as an author, I do not need nor want to set concurrency on this, as I don't really have to care about it. But as a consumer of this Pipeline, I may want to have only 1 PipelineRun for this Pipeline to run per Go project.
In addition, putting this on the Pipeline spec might reduce Pipeline "share-ability". Taking that Go Pipeline again, and assuming it's available in a catalog (or better, through the hub) ; if I want to achieve what I described above, I actually need to "duplicate" the Pipeline or augment it using, for example, Kustomize.

My take on this is almost the opposite as @lbernick 👼🏼 🙃. I'd rather start by having this on the PipelineRun spec as it's closer to a "runtime" concern than an authoring one. And then I would explore the in Pipeline. Not having it on PipelineRun also reduce a bit it's "potential", because then any tool (or user) using embedded specs (like pipelines-as-code) will not benefit from it at all.

Copy link
Member

@lbernick lbernick Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more today-- what would we do if two PipelineRuns had the same concurrency key but different concurrency strategies? I think the concurrency strategy has to be defined somewhere other than the PipelineRun-- either on the Pipeline or in a configmap or new CRD. I do see your point though about how different users of a given pipeline might want to configure concurrency differently, which makes me lean more towards cluster-level configuration (like in your original hackmd).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we thinking too much about this though ?
Who are we targeting for this feature ? Do we assume the majority of the usage of this features will be tools (pipeline-as-code, team's tooling, …) or end-users ? Can this be documented as an actual "known limitation" ? Can we make those keys unique maybe (compute and append a hash on the referenced pipeline or embedded pipeline definition, …) ?

Cluster-level configuration is a bit tricky. It works for some cases, mainly, for cluster-admins to limit the number of "resources" consumed by tekton, but teams on their own namespaces will probably need to handle those per namespace or "project" or pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To detail this a bit more:

  • on the "making things unique", the controller could take the user's provided concurrency-key, but when "computing" it, also "hashing" the pipelinespec the pipelinerun refers to (be it through pipelineref or embedded spec). That way, it wouldn't "conflict" if 2 PR refers to different pipeline (or an update on a pipeline) but they same concurrency key. This comes with its own set of challenges/problems but we could keep that approach in mind while experimenting.
  • the "main" assumption I am making here though, is that in most of the cases, the users doesn't create the PipelineRun himself (aka authoring it), tools do it for them (clients like tkn, tools like pipelines-as-code, home-grown tool, ibm stuff, …). And in the small amount of cases where the user author them, it's probably fine to have this known limitation/issue.

In a gist, I really do think, while experimenting, while it's in alpha & all, we can not overthink this, and see with "usage" if it really is a problem or not.


Short example:
```yaml
concurrency:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential alternative syntax:

concurrency:
   key: $(params.foo)
   strategy: cancel # options here are "queue" and "cancel"
   max: 1  # we can choose a default value for max depending on the concurrency strategy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alternative syntax does not give the ability to use both a queue and zeroOverlap at the same time right? or am i misunderstanding part of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. Imagine there are 5 PipelineRuns running, with max = 5. My understanding would be that if a new PipelineRun was created and the strategy was "queue", it would wait until one finished before starting. If the strategy was "cancel", it would cancel the earliest created PipelineRun and then start. ZeroOverlap is "cancel" with max = 1.

I'm not understanding why you might want to use both at the same time; they seem incompatible with each other. Can you describe what you had in mind?

Another example is managing available shared resources, lets say tekton triggers an external device and there are a set available amount, or simply resource constrained clusters.


2 proposed cancel strategies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a formatting note, I think it would be helpful to separate out use cases from proposed strategies-- that makes it a bit easier to separate design from motivation and make sure we have a strategy for addressing each use case; up to you though!

@afrittoli
Copy link
Member

/assign @lbernick

@lbernick
Copy link
Member

lbernick commented Aug 9, 2022

@williamlfish and I are now working together in this doc

@vdemeester sorry to create another doc to merge! this doc is going in a bit of a different direction from the hackmd so I think it is useful to have both. would love your input here

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2022
@tekton-robot
Copy link
Contributor

@williamlfish: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lbernick
Copy link
Member

I think this is a pretty meaty feature and would benefit from us agreeing on the goals of this TEP before moving to the proposal. @williamlfish is on board with me opening an initial PR for the problem statement (chatted offline) so I created #791

@pritidesai
Copy link
Member

API WG - @lbernick to check with @williamlfish if we can close this one

@williamlfish
Copy link
Author

@pritidesai @vdemeester @lbernick im okay with this being closed. would love to be tagged in future efforts on this tho 😄

@pritidesai
Copy link
Member

thank you @williamlfish 👍

thank you @lbernick for adding @williamlfish as a TEP author. Please also add @williamlfish as a co-author to the commit and PR description:

Co-authored-by: 

@vdemeester
Copy link
Member

@pritidesai @vdemeester @lbernick im okay with this being closed. would love to be tagged in future efforts on this tho smile

@williamlfish We need to reconcile the content of this with the tep-120. So it's gonna be more that just tagged in future efforts, we'll call you 😬

@lbernick
Copy link
Member

Please also add williamlfish as a co-author to the commit and PR description

Thanks for the reminder @pritidesai! I don't want to amend the commit history now that it's already merged, but I'll update the PR description and will add that next time on the commit message.

@tekton-robot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2022
@williamlfish williamlfish deleted the manage-pipelinerun-conccurency branch November 28, 2022 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: UnAssigned
Development

Successfully merging this pull request may close these issues.

8 participants