-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0130: Pipeline-level service #943
Conversation
ebf51b9
to
2f9807e
Compare
/assign @afrittoli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TEP seems really weird to me (and the title). It is focused on one single tool/use-case over hundreds. "Pipeline-level docker daemon" is just one possible use cases of pipeline level sidecar, I don't really understand why it's the problem statement instead of part of the uses cases of something called "pipeline level sidecars" to be honest.
As of today, if the SA that runs the PipelineRun has enough privileges (aka some access to the cluster it runs in), it's implementable as part of the Pipeline itself : one task that deploy something (a docker daemon, …) and wait for it to be ready, and a finally task that will terminate that deployment.
The only advantage of adding this as part of the API (pipeline-level sidecars) is to not require this level of privilege for tasks running as part of the Pipeline (as this will be the controller creating resources).
### Reusability | ||
|
||
A few sidecar configurations, such as docker daemons and database containers, likely make up the majority of use cases. | ||
A major open question of this proposal is whether we want to introduce a concept of a reusable sidecar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very worried about this concept. If it's "re-usable", it's outside of the lifecycle of a PipelineRun. Which means, for me, that it should be outside of the responsabilities of the tektoncd/pipeline controller as well. If one need a docker daemon to be available accross PipelineRun for a given namespace, it's an infrastructure / namespace management problem, not a tektoncd/pipeline problem to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reusable in this case was reuse across tasks in a pipeline and I'd say it should be in scope for CI systems given the number of use cases it supports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the proposal to clarify that reusing sidecars across pipelines is out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reusable in this case was reuse across tasks in a pipeline and I'd say it should be in scope for CI systems given the number of use cases it supports
Ah, that makes more sense then 😛
|
||
A few sidecar configurations, such as docker daemons and database containers, likely make up the majority of use cases. | ||
A major open question of this proposal is whether we want to introduce a concept of a reusable sidecar. | ||
For example, we could include a docker daemon sidecar in the catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be adding another "type" to the catalog which makes me wonder if these sidecars could be special purpose tasks
Thanks for the feedback! The reason I chose to do it this way is that "more complex docker build pipelines" is the problem I'm hoping to address, while "pipeline level sidecars" is a solution to that problem. Docker build pipelines are the most relevant use case I know of (task-level sidecars are a better solution for integration tests, and the boskos use case seems a lot less important than docker builds IMO). If you know of other use cases or feel strongly about reframing it around pipeline-level sidecars, maybe I could make the TEP more general?
I think writing a Task to create and delete a deployment and service is non-trivial. I updated the "user experience" section with some details on what this would look like, but the user experience is complex enough that I think the feature is worth adding. |
I agree it's non-trivial, but on the other hand, it's easily shareable 👼🏼 and thus doesn't need to be written by all users of tektoncd/pipeline. |
I've added an alternative of "catalog tasks for docker daemons" (which could also be generic catalog tasks for spinning up a deployment + service and tearing them down). As you pointed out, you'd still need to use a service account with the right permissions. Is this your preferred alternative? Or do you think there's value in the pipeline-level sidecar feature? |
I think the task based approach is worth considering because its more easily reusable |
API WG - in review cycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion this is still too specific 🙃 or too generic, or both at the same time. What is implied by "build cache" ? This is still very docker task specific even. Handling "(dockerfile) build cache" when using docker
is way way way different that doing this with kaniko
or buildah
(or whatever other tool). "Pipeline level build cache" is, for me, a use case, not a TEP proposal 😅.
I deeply think we should focus more on writing smart, re-usable Task
that fulfil those use-case today (as we already have v1) before trying to solve them by tektoncd/pipeline
features. In my opinion, we need to exercise and push our current API (v1
) to its limit before trying to add too much to it.
- name: docker-tls-certs | ||
``` | ||
|
||
Here's what the [docker-build catalog task](https://github.com/tektoncd/catalog/blob/81bf7dc5610d5fa17281940a72a6377604105cea/task/docker-build/0.1/docker-build.yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is what a docker-build
Task
could look like today as well. What is missing to make this "smarter" today, is a way to conditionnally start a sidecar (for example, in case of params.docker_host
is empty or default value) and same for other possible requirements to run the sidecar (like securityContext.privileged
, …)
### Improve our documentation | ||
|
||
Instead of building new features, we could improve our documentation to help users better understand how to build a docker daemon | ||
into their PipelineRun. See [user experience](#user-experience) for an example of what this would look like. | ||
|
||
### Catalog Tasks for docker daemon | ||
|
||
We could add Tasks to the verified catalog (or to a community catalog) for spinning up and tearing down a docker daemon (or more generically, a deployment with a service). | ||
This would prevent users from having to write their own daemon creation/teardown tasks, but they would still have to provide their kube credentials | ||
and use a service account with rolebindings that allow it to create services and deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes. For me those are the valid, most important and "today doable" action to take. From these, their usage and feedback from it, we may want to enhance v1
or define v2
better.
Yeah, I'm still leaning towards a task - a challenge I see is that a user might need to specify a "cleanup-docker-daemon" task in addition to a "setup-docker-daemon" task - maybe we could do figure out a way to add some magic to simplify that. Or we could just write a custom task for this (though we don't really have a catalog of custom tasks today) |
|
||
The docker-build catalog Task only allows building a single image and pushing it to a registry. | ||
The following use cases are not addressed: | ||
- Building multiple images concurrently with the same daemon to save resources and share cached base layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What use case is driving this proposal? What is the proposed storage for such images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use cases are listed under the motivation section:
- User wants to test an image after building, and only push images that pass tests to the artifact repo
- User wants to save resources by using the same docker daemon/cached layers for multiple images built in the same pipelinerun (e.g. via matrix)
Image layers would be cached by the docker daemon (as it does currently). The user is still responsible for pushing the built image to a repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @lbernick yup I understand the two bullet points listed in the TEP as well. My question was more aligning with @vdemeester's feedback/comment, should we propose a feature in the pipeline controller for one single technology - docker daemon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping that the reframing of this TEP as "pipeline level service" makes it generic enough. Yes, docker daemon is just one tool, but docker builds are a very common CI/CD use case and I think it's important to consider features that will improve the UX for docker builds. I also listed a few more potential uses for a Pipeline level service, although I think they're less compelling.
8fcd387
to
9cc067b
Compare
Discussion from Pipelines WG today: experiment and get feedback on the "catalog task" approach before proposing building a new feature. @dibyom @vdemeester I've rewritten this proposal to list out the alternatives rather than proposing one. I'd still like to merge this TEP rather than closing it because it has a bunch of ideas and syntax examples written out for how we can address this use case that I think are helpful for reference. We can always update the TEP with the results of our experimentation, and mark it as obsolete/rejected if we find that the catalog task approach works. Chatting offline with @vdemeester about a better name for this TEP. |
d8d0e0a
to
3299282
Compare
@vdemeester and I agreed on "pipeline level service" as a title for this TEP-- it is not docker specific and still makes sense if we choose to use catalog tasks as our solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I am thinking outloud while doing this review 👼🏼.
For this feature, if we look at GitHub Actions : it is not something doable in GitHub Actions — if you compare what's comparable, aka a Job
in GitHub workflows is a Task
; Pipelines
are multiple workflows hooked together, and by "design" they run on different VM, thus different docker daemon, … so you cannot have a Service that runs for the full "pipelinerun" lifecycle. Users of GitHub Actions are designing around that, and maybe so should users of Tekton 👼🏼.
Pushing an image to a registry and pulling it elsewhere is relatively cheap. Especially if using a internal or "close-by" registry (like ghcr for github workflows).
teps/0130-pipeline-level-service.md
Outdated
Right now, the best way for users to create a Pipeline-level service is to have one Task that spins up the service and a Finally Task that | ||
tears it down, but this is non-trivial to write and leads to poor separation of concerns (app developers writing build Pipelines must also handle | ||
infrastructure details of spinning up the service). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I can agree with this (aka "app developers writing build Pipelines, …"), I also disagree a bit. Either they are in a devops world and they should care, or they are not. If they do not care, they do not necessarily care that it's shared between several PipelineRun
.
There could also be several ways to actually "create" a PipelineRun, that uses theses techniques (A task that spins something and a finally, …) without requiring the developer to care about. Using a resolver, or kustomize (or helm, or …) are all valid architectural decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased this a bit, but I think the main point I'm trying to get across is that you have to build the infrastructure directly into your pipeline, when different people might be responsible for these components. You're right that it depends on architecture whether the app developer interacts with the k8s cluster or not; what I'm trying to say is more that we should make it easy to separate this out for companies where it would be two different roles. Is there a better way of phrasing this you have in mind?
teps/0130-pipeline-level-service.md
Outdated
- Service is only accessible to the single PipelineRun that uses it | ||
(doing otherwise would violate [SLSA L3 build isolation requirements](https://slsa.dev/spec/v0.1/requirements#isolated)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure using the same service in several PipelineRun breaks that SLSA rules. The rules states that "if MUST NOT be possible for on build to persiste or influence the build environment for a subsequent build". This can still be achieved by sharing a service, it really depends on the service "design".
The example with docker build
are the --no-cache
and --pull
flag that essentially make the build act based of a clean docker daemon (so not "influence" from other things targeting that same daemon).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted-- I made the goal that we should support SLSA L3 and added a "security" section noting some of the points on docker builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also one point on SLSA. I feel tektoncd should help setuping up CI system that are SLSA compliant (L1 -> LX), but tektoncd itself doesn't have be compliant or to enforce anything related to SLSA (as it can be use for something completely different, like data science with kubeflow, …)
A few other CI/CD systems have sidecar-like features. | ||
|
||
The closest feature to Pipeline-level sidecar services is the Argo Workflows [daemon containers](https://argoproj.github.io/argo-workflows/walk-through/daemon-containers/) feature. | ||
Daemon containers share the same lifespan as a Template, which can be analogous to a Task or a Pipeline. (Argo Workflows also has a [sidecar feature](https://argoproj.github.io/argo-workflows/walk-through/sidecars/) similar to Tekton's Task sidecars.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task/Pipeline or TaskRun/PipelineRun ? As it seems to be per exectuion, I think it's the later right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here-- Argo workflows Templates are analogous to Pipelines and Tasks, and sidecars are defined in Tasks rather than TaskRuns. Would you prefer "PipelineRun-level services" to "Pipeline-level services"? I think the service would be defined in the Pipeline so I prefer "Pipeline-level services".
fails due to a misconfigured PipelineRun. | ||
|
||
Cons: | ||
- Sidecar configurations will likely be copy-pasted between Pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be made "shareable" somehow (a new type, or a way to refer to a common definition from something, like form bundles, or …)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm can you elaborate on what you had in mind? I remember you weren't in favor of standalone/reusable sidecars so I'm surprised to see the suggestion of a new type. Hoping to keep most of the design discussion to a later PR though.
|
||
#### Open questions | ||
|
||
- Should Pipeline-level sidecars terminate before or after Finally Tasks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely after. They are created before any TaskRun, and they are deleted after any TaskRun.
This commit opens a new TEP proposing better support for services with the same lifespan as a PipelineRun.
Doing nothing is definitely a valid option here, and I've added it as a listed alternative. Tekton Tasks and Pipelines are designed differently than GitHub Actions, so let's consider the use cases and decide what would make the most sense for Tekton. Note that this PR isn't actually proposing anything. We still are planning to experiment with Catalog Tasks and Custom Tasks, and can gather user feedback about how important a feature like this actually is (although I think docker build is a very common use case and worth improving!)
That's true; however the other benefits are an improved UX and reduced resource consumption. I've noted this in the "do nothing" alternative. |
The problem statement and list of alternatives are in this PR. The discussion now seems mainly about picking a specific option. So, IMO we can merge this PR - I don't think we are selecting the way forward in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lbernick, I like the idea.
When moving into the implementable phase it may be good to have other non-docker use cases documented as well, but otherwise it looks great
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli, dibyom, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
This commit opens a new TEP proposing better support for services
with the same lifespan as a PipelineRun.
/kind tep