From 87541bd03bd49a92d1ad4065d73153a4c4b681dd Mon Sep 17 00:00:00 2001 From: NikeNano Date: Thu, 8 Oct 2020 08:43:20 +0200 Subject: [PATCH] TEP 13-Adding a limit to pipeline concurrency This TEP describes how to add pipeline concurrency limits. Author: Niklas Hansson Co-authored-by: Jerop Kipruto --- teps/0013-limit-pipeline-concurrency.md | 282 ++++++++++++++++++++++++ 1 file changed, 282 insertions(+) create mode 100644 teps/0013-limit-pipeline-concurrency.md diff --git a/teps/0013-limit-pipeline-concurrency.md b/teps/0013-limit-pipeline-concurrency.md new file mode 100644 index 000000000..f1af2193e --- /dev/null +++ b/teps/0013-limit-pipeline-concurrency.md @@ -0,0 +1,282 @@ +--- +title: pipeline-concurrency +authors: + - "@NikeNano" +creation-date: 2020-10-07 +last-updated: 2020-11-15 +status: proposed +--- + +# TEP-0013: Limit Pipeline concurrency + + + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Requirements](#requirements) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Risks and Mitigations](#risks-and-mitigations) + - [Performance](#performance) +- [Design Details](#design-details) +- [Test Plan](#test-plan) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Upgrade & Migration Strategy](#upgrade--migration-strategy) +- [References](#references) + + +## Summary + +Enable users to define the concurrency of a Pipeline to limit how many tasks are run simultaneously. + + + + +## Motivation + +Enable users to limit the number of tasks that can run simultaneously in a pipeline, which could help with: + +- Tracking and limiting how much resources a Pipeline is consuming, and thus how much it costs. + + + +### Goals + +- Limit how many tasks can run concurrently in a Pipeline. + + + +### Non-Goals +- Limit the number of concurrent of Pipelines, as described in [pipeline issue #1305](https://github.com/tektoncd/pipeline/issues/1305). + + + +## Requirements + +- Users can specify the maximum number of Tasks that can run concurrently in a Pipeline. + +## Proposal + +. + +We propose to extend the Tekton pipeline ecosystem with an separate service, called `Limit Service`, which will control when `TaskRuns` are allowed to be executed by the controller. While also allowing for users to extend the `Limit Service` according to there needs. Further discussed in [Design Details](#design-details) below. + + +### User Stories + + + +#### Story 1 +User has a Pipeline with 100 independent Tasks but they don't want all 100 tasks to run at once. +#### Story 2 +User wants to limit amount of resources used by a Pipeline at a given time. +### Risks and Mitigations + + +What if a user mistakenly sets the maximum number of concurrent tasks to zero or less? Does this mean no tasks are run until the pipeline times out? To mitigate against this, we will require that the maximum limit of concurrent tasks should be greater than zero and add validation to ensure it is greater than zero (which would throw an error if it set to zero or less). + +### Performance + + +Given that this allows users to limit the number of concurrent `TaskRuns` in a given `PipelineRun`, the execution time of the `PipelineRun` could increase. However, this allows users to limit the resources used and save costs. +## Design Details + + + +We propose to extend the logic of the `PipelineRun` controller to create all `TaskRuns` with `spec.status.Pending`. In order for an external service called `Limit Service` to control when an `TaskRun` is allowed to be considered by the `Task` controller for execution. This requires extending the `Task` controller to only consider `TaskRuns` which don't have `spec.status.Pending`. The `Limit Service` will update `TaskRuns` and remove the `spec.status.Pending` when considered ready for execution. + +The following examples aims to describe the proposed solution: + +1. `PipelineRun` is created +2. Pipelines controller sees `PipelineRun`, starts creating `TaskRuns` <-- each TaskRun is created with `spec.status.Pending` as proposed in (TEP 15)[https://github.com/tektoncd/community/pull/203] +3. Pipelines controller sees the new `TaskRuns`, but they all have `spec.status.Pending`; it doesn't do anything with them +4. `Limit Service` also sees the `TaskRun` with `spec.status.Pending`. +5. When `Limit Service` decided the `TaskRun` can run, it removes `spec.status.Pending` from the TaskRuns(s) +6. Pipelines controller now sees the `TaskRuns` are not longer pending, and it starts executing them + +Separating the logic if a `TaskRun` is allowed to run from the `Task` controller allows for extensibility for adding custom logic to the `Limit Service`. + +As suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800), we can add a field - `MaxParallelTasks` - to `PipelineRunSpec` which is an integer that represents the maximum number of `Tasks` that can run concurrently in the `Pipeline`. + +type PipelineRunSpec struct { + PipelineSpec *PipelineSpec `json:"pipelineSpec,omitempty"` + ... + // MaxParallelTasks holds the maximum count of parallel taskruns + // +optional + MaxParallelTasks int `json:"maxParallelTasks,omitempty"` +} + +The `Limit Service` could run similar to a control loop checking `TaskRuns` and the restrictions of `MaxParallelTasks` for the related `Pipeline`. If the count of running `TaskRuns` is less than `MaxParallelTasks`, a `TaskRun` would be update and `spec.status.Pending` removed. If the count of running `TaskRuns` equals `MaxParallelTasks`, no `TaskRun` would be updated until later when another `TaskRun` is completed. + +`MaxParallelTasks` has to be >= 0 in. If `MaxParallelTasks` is not specified there should be no limit to how many `TaskRun` that can run in parallel and thus `spec.status.Pending` should be removed from all `TaskRuns`. + +In order to not end up with a deadlock the order of the `Tasks` in a `Pipeline` has to be respected and accounted for by the `Limit service`. + +## Test Plan + + +e2e and unit tests +## Drawbacks + +It could affect the performance of the scheduling by increasing the execution time of the `PipelineRuns`. + +## Alternatives +1. Limit the number of concurrent tasks by setting the resource limitations of each task high enough that there is not enough resource to run more than a certain number of tasks concurrently. However, this is not easily configurable and it is complicated because have to compute the relation between resources and tasks. +2. Utilizing a [pod quota per namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-pod-namespace/). However this would limit all resources in the namespace not only the `PipelineRun` of interest which the limitation is put on. +3. Add logic to the `PipelineRun` controller to check how many `TaskRuns` are running in the `PipelineRun`. This would make the controller logic more complex, but has the advantage that the controller would have all the logic combined. However it would allow for less flexibility for users to implement custom logic. + + +## Upgrade & Migration Strategy + + +The `MaxParallelTasks` in `PipelineRunSpec` will be optional and if not set `spec.status.Pending` will be removed from all `TaskRuns` immediately by the `Limit Service`. An alternative is to not set `spec.status.Pending` when `MaxParallelTasks` is not specified. + +## References + + +- Issue: https://github.com/tektoncd/pipeline/issues/2591 +- POC: https://github.com/tektoncd/pipeline/pull/3112