Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0130: Pipeline-level service #943

Merged
merged 1 commit into from
Mar 7, 2023
Merged

Conversation

lbernick
Copy link
Member

@lbernick lbernick commented Jan 26, 2023

This commit opens a new TEP proposing better support for services
with the same lifespan as a PipelineRun.

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jan 26, 2023
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 26, 2023
@lbernick lbernick force-pushed the sidecar branch 2 times, most recently from ebf51b9 to 2f9807e Compare January 30, 2023 15:28
@afrittoli
Copy link
Member

/assign @afrittoli
/assign @dibyom

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2023
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2023
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TEP seems really weird to me (and the title). It is focused on one single tool/use-case over hundreds. "Pipeline-level docker daemon" is just one possible use cases of pipeline level sidecar, I don't really understand why it's the problem statement instead of part of the uses cases of something called "pipeline level sidecars" to be honest.

As of today, if the SA that runs the PipelineRun has enough privileges (aka some access to the cluster it runs in), it's implementable as part of the Pipeline itself : one task that deploy something (a docker daemon, …) and wait for it to be ready, and a finally task that will terminate that deployment.

The only advantage of adding this as part of the API (pipeline-level sidecars) is to not require this level of privilege for tasks running as part of the Pipeline (as this will be the controller creating resources).

### Reusability

A few sidecar configurations, such as docker daemons and database containers, likely make up the majority of use cases.
A major open question of this proposal is whether we want to introduce a concept of a reusable sidecar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very worried about this concept. If it's "re-usable", it's outside of the lifecycle of a PipelineRun. Which means, for me, that it should be outside of the responsabilities of the tektoncd/pipeline controller as well. If one need a docker daemon to be available accross PipelineRun for a given namespace, it's an infrastructure / namespace management problem, not a tektoncd/pipeline problem to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reusable in this case was reuse across tasks in a pipeline and I'd say it should be in scope for CI systems given the number of use cases it supports

Copy link
Member Author

@lbernick lbernick Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the proposal to clarify that reusing sidecars across pipelines is out of scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reusable in this case was reuse across tasks in a pipeline and I'd say it should be in scope for CI systems given the number of use cases it supports

Ah, that makes more sense then 😛

teps/0130-pipeline-level-docker-daemon.md Outdated Show resolved Hide resolved
teps/0130-pipeline-level-docker-daemon.md Outdated Show resolved Hide resolved

A few sidecar configurations, such as docker daemons and database containers, likely make up the majority of use cases.
A major open question of this proposal is whether we want to introduce a concept of a reusable sidecar.
For example, we could include a docker daemon sidecar in the catalog,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be adding another "type" to the catalog which makes me wonder if these sidecars could be special purpose tasks

@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2023
@lbernick
Copy link
Member Author

lbernick commented Feb 6, 2023

This TEP seems really weird to me (and the title). It is focused on one single tool/use-case over hundreds. "Pipeline-level docker daemon" is just one possible use cases of pipeline level sidecar, I don't really understand why it's the problem statement instead of part of the uses cases of something called "pipeline level sidecars" to be honest.

Thanks for the feedback! The reason I chose to do it this way is that "more complex docker build pipelines" is the problem I'm hoping to address, while "pipeline level sidecars" is a solution to that problem. Docker build pipelines are the most relevant use case I know of (task-level sidecars are a better solution for integration tests, and the boskos use case seems a lot less important than docker builds IMO). If you know of other use cases or feel strongly about reframing it around pipeline-level sidecars, maybe I could make the TEP more general?

As of today, if the SA that runs the PipelineRun has enough privileges (aka some access to the cluster it runs in), it's implementable as part of the Pipeline itself : one task that deploy something (a docker daemon, …) and wait for it to be ready, and a finally task that will terminate that deployment.

The only advantage of adding this as part of the API (pipeline-level sidecars) is to not require this level of privilege for tasks running as part of the Pipeline (as this will be the controller creating resources).

I think writing a Task to create and delete a deployment and service is non-trivial. I updated the "user experience" section with some details on what this would look like, but the user experience is complex enough that I think the feature is worth adding.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2023
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2023
@vdemeester
Copy link
Member

I think writing a Task to create and delete a deployment and service is non-trivial. I updated the "user experience" section with some details on what this would look like, but the user experience is complex enough that I think the feature is worth adding.

I agree it's non-trivial, but on the other hand, it's easily shareable 👼🏼 and thus doesn't need to be written by all users of tektoncd/pipeline.

@lbernick
Copy link
Member Author

lbernick commented Feb 8, 2023

I think writing a Task to create and delete a deployment and service is non-trivial. I updated the "user experience" section with some details on what this would look like, but the user experience is complex enough that I think the feature is worth adding.

I agree it's non-trivial, but on the other hand, it's easily shareable 👼🏼 and thus doesn't need to be written by all users of tektoncd/pipeline.

I've added an alternative of "catalog tasks for docker daemons" (which could also be generic catalog tasks for spinning up a deployment + service and tearing them down). As you pointed out, you'd still need to use a service account with the right permissions.

Is this your preferred alternative? Or do you think there's value in the pipeline-level sidecar feature?

@dibyom
Copy link
Member

dibyom commented Feb 13, 2023

Is this your preferred alternative? Or do you think there's value in the pipeline-level sidecar feature?

I think the task based approach is worth considering because its more easily reusable

@pritidesai
Copy link
Member

API WG - in review cycle
/assign @vdemeester

@lbernick lbernick changed the title TEP-0130: Pipeline-level docker daemon TEP-0130: Pipeline-level build cache Feb 21, 2023
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this is still too specific 🙃 or too generic, or both at the same time. What is implied by "build cache" ? This is still very docker task specific even. Handling "(dockerfile) build cache" when using docker is way way way different that doing this with kaniko or buildah (or whatever other tool). "Pipeline level build cache" is, for me, a use case, not a TEP proposal 😅.

I deeply think we should focus more on writing smart, re-usable Task that fulfil those use-case today (as we already have v1) before trying to solve them by tektoncd/pipeline features. In my opinion, we need to exercise and push our current API (v1) to its limit before trying to add too much to it.

teps/0130-pipeline-level-build-cache.md Outdated Show resolved Hide resolved
- name: docker-tls-certs
```

Here's what the [docker-build catalog task](https://github.com/tektoncd/catalog/blob/81bf7dc5610d5fa17281940a72a6377604105cea/task/docker-build/0.1/docker-build.yaml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is what a docker-build Task could look like today as well. What is missing to make this "smarter" today, is a way to conditionnally start a sidecar (for example, in case of params.docker_host is empty or default value) and same for other possible requirements to run the sidecar (like securityContext.privileged, …)

teps/0130-pipeline-level-build-cache.md Outdated Show resolved Hide resolved
Comment on lines 483 to 498
### Improve our documentation

Instead of building new features, we could improve our documentation to help users better understand how to build a docker daemon
into their PipelineRun. See [user experience](#user-experience) for an example of what this would look like.

### Catalog Tasks for docker daemon

We could add Tasks to the verified catalog (or to a community catalog) for spinning up and tearing down a docker daemon (or more generically, a deployment with a service).
This would prevent users from having to write their own daemon creation/teardown tasks, but they would still have to provide their kube credentials
and use a service account with rolebindings that allow it to create services and deployments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes. For me those are the valid, most important and "today doable" action to take. From these, their usage and feedback from it, we may want to enhance v1 or define v2 better.

@dibyom
Copy link
Member

dibyom commented Feb 21, 2023

I still think a Pipeline level sidecar is the right way to go here, but catalog tasks could also work.

Yeah, I'm still leaning towards a task - a challenge I see is that a user might need to specify a "cleanup-docker-daemon" task in addition to a "setup-docker-daemon" task - maybe we could do figure out a way to add some magic to simplify that. Or we could just write a custom task for this (though we don't really have a catalog of custom tasks today)


The docker-build catalog Task only allows building a single image and pushing it to a registry.
The following use cases are not addressed:
- Building multiple images concurrently with the same daemon to save resources and share cached base layers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What use case is driving this proposal? What is the proposed storage for such images?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use cases are listed under the motivation section:

  • User wants to test an image after building, and only push images that pass tests to the artifact repo
  • User wants to save resources by using the same docker daemon/cached layers for multiple images built in the same pipelinerun (e.g. via matrix)

Image layers would be cached by the docker daemon (as it does currently). The user is still responsible for pushing the built image to a repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lbernick yup I understand the two bullet points listed in the TEP as well. My question was more aligning with @vdemeester's feedback/comment, should we propose a feature in the pipeline controller for one single technology - docker daemon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping that the reframing of this TEP as "pipeline level service" makes it generic enough. Yes, docker daemon is just one tool, but docker builds are a very common CI/CD use case and I think it's important to consider features that will improve the UX for docker builds. I also listed a few more potential uses for a Pipeline level service, although I think they're less compelling.

@lbernick lbernick force-pushed the sidecar branch 2 times, most recently from 8fcd387 to 9cc067b Compare February 21, 2023 19:19
@lbernick lbernick changed the title TEP-0130: Pipeline-level build cache TEP-0130: Pipeline-level docker daemon Feb 21, 2023
@lbernick
Copy link
Member Author

Discussion from Pipelines WG today: experiment and get feedback on the "catalog task" approach before proposing building a new feature.

@dibyom @vdemeester I've rewritten this proposal to list out the alternatives rather than proposing one. I'd still like to merge this TEP rather than closing it because it has a bunch of ideas and syntax examples written out for how we can address this use case that I think are helpful for reference. We can always update the TEP with the results of our experimentation, and mark it as obsolete/rejected if we find that the catalog task approach works.

Chatting offline with @vdemeester about a better name for this TEP.

@lbernick lbernick force-pushed the sidecar branch 2 times, most recently from d8d0e0a to 3299282 Compare February 22, 2023 15:28
@lbernick lbernick changed the title TEP-0130: Pipeline-level docker daemon TEP-0130: Pipeline-level service Feb 22, 2023
@lbernick
Copy link
Member Author

@vdemeester and I agreed on "pipeline level service" as a title for this TEP-- it is not docker specific and still makes sense if we choose to use catalog tasks as our solution.

teps/0130-pipeline-level-build-cache.md Outdated Show resolved Hide resolved
teps/0130-pipeline-level-service.md Outdated Show resolved Hide resolved
teps/0130-pipeline-level-service.md Outdated Show resolved Hide resolved
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am thinking outloud while doing this review 👼🏼.

For this feature, if we look at GitHub Actions : it is not something doable in GitHub Actions — if you compare what's comparable, aka a Job in GitHub workflows is a Task ; Pipelines are multiple workflows hooked together, and by "design" they run on different VM, thus different docker daemon, … so you cannot have a Service that runs for the full "pipelinerun" lifecycle. Users of GitHub Actions are designing around that, and maybe so should users of Tekton 👼🏼.

Pushing an image to a registry and pulling it elsewhere is relatively cheap. Especially if using a internal or "close-by" registry (like ghcr for github workflows).

Comment on lines 61 to 63
Right now, the best way for users to create a Pipeline-level service is to have one Task that spins up the service and a Finally Task that
tears it down, but this is non-trivial to write and leads to poor separation of concerns (app developers writing build Pipelines must also handle
infrastructure details of spinning up the service).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I can agree with this (aka "app developers writing build Pipelines, …"), I also disagree a bit. Either they are in a devops world and they should care, or they are not. If they do not care, they do not necessarily care that it's shared between several PipelineRun.

There could also be several ways to actually "create" a PipelineRun, that uses theses techniques (A task that spins something and a finally, …) without requiring the developer to care about. Using a resolver, or kustomize (or helm, or …) are all valid architectural decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased this a bit, but I think the main point I'm trying to get across is that you have to build the infrastructure directly into your pipeline, when different people might be responsible for these components. You're right that it depends on architecture whether the app developer interacts with the k8s cluster or not; what I'm trying to say is more that we should make it easy to separate this out for companies where it would be two different roles. Is there a better way of phrasing this you have in mind?

teps/0130-pipeline-level-service.md Outdated Show resolved Hide resolved
Comment on lines 107 to 108
- Service is only accessible to the single PipelineRun that uses it
(doing otherwise would violate [SLSA L3 build isolation requirements](https://slsa.dev/spec/v0.1/requirements#isolated))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure using the same service in several PipelineRun breaks that SLSA rules. The rules states that "if MUST NOT be possible for on build to persiste or influence the build environment for a subsequent build". This can still be achieved by sharing a service, it really depends on the service "design".
The example with docker build are the --no-cache and --pull flag that essentially make the build act based of a clean docker daemon (so not "influence" from other things targeting that same daemon).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted-- I made the goal that we should support SLSA L3 and added a "security" section noting some of the points on docker builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also one point on SLSA. I feel tektoncd should help setuping up CI system that are SLSA compliant (L1 -> LX), but tektoncd itself doesn't have be compliant or to enforce anything related to SLSA (as it can be use for something completely different, like data science with kubeflow, …)

A few other CI/CD systems have sidecar-like features.

The closest feature to Pipeline-level sidecar services is the Argo Workflows [daemon containers](https://argoproj.github.io/argo-workflows/walk-through/daemon-containers/) feature.
Daemon containers share the same lifespan as a Template, which can be analogous to a Task or a Pipeline. (Argo Workflows also has a [sidecar feature](https://argoproj.github.io/argo-workflows/walk-through/sidecars/) similar to Tekton's Task sidecars.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task/Pipeline or TaskRun/PipelineRun ? As it seems to be per exectuion, I think it's the later right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here-- Argo workflows Templates are analogous to Pipelines and Tasks, and sidecars are defined in Tasks rather than TaskRuns. Would you prefer "PipelineRun-level services" to "Pipeline-level services"? I think the service would be defined in the Pipeline so I prefer "Pipeline-level services".

teps/0130-pipeline-level-service.md Show resolved Hide resolved
fails due to a misconfigured PipelineRun.

Cons:
- Sidecar configurations will likely be copy-pasted between Pipelines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be made "shareable" somehow (a new type, or a way to refer to a common definition from something, like form bundles, or …)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm can you elaborate on what you had in mind? I remember you weren't in favor of standalone/reusable sidecars so I'm surprised to see the suggestion of a new type. Hoping to keep most of the design discussion to a later PR though.


#### Open questions

- Should Pipeline-level sidecars terminate before or after Finally Tasks?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely after. They are created before any TaskRun, and they are deleted after any TaskRun.

This commit opens a new TEP proposing better support for services
with the same lifespan as a PipelineRun.
@lbernick
Copy link
Member Author

For this feature, if we look at GitHub Actions : it is not something doable in GitHub Actions — if you compare what's comparable, aka a Job in GitHub workflows is a Task ; Pipelines are multiple workflows hooked together, and by "design" they run on different VM, thus different docker daemon, … so you cannot have a Service that runs for the full "pipelinerun" lifecycle. Users of GitHub Actions are designing around that, and maybe so should users of Tekton 👼🏼.

Doing nothing is definitely a valid option here, and I've added it as a listed alternative. Tekton Tasks and Pipelines are designed differently than GitHub Actions, so let's consider the use cases and decide what would make the most sense for Tekton.

Note that this PR isn't actually proposing anything. We still are planning to experiment with Catalog Tasks and Custom Tasks, and can gather user feedback about how important a feature like this actually is (although I think docker build is a very common use case and worth improving!)

Pushing an image to a registry and pulling it elsewhere is relatively cheap. Especially if using a internal or "close-by" registry (like ghcr for github workflows).

That's true; however the other benefits are an improved UX and reduced resource consumption. I've noted this in the "do nothing" alternative.

@dibyom
Copy link
Member

dibyom commented Mar 1, 2023

The problem statement and list of alternatives are in this PR. The discussion now seems mainly about picking a specific option. So, IMO we can merge this PR - I don't think we are selecting the way forward in this PR
/approve

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2023
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lbernick, I like the idea.
When moving into the implementable phase it may be good to have other non-docker use cases documented as well, but otherwise it looks great
/approve

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, dibyom, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dibyom
Copy link
Member

dibyom commented Mar 7, 2023

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 7, 2023
@tekton-robot tekton-robot merged commit c919bbb into tektoncd:main Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants