Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve chicken and egg problem of Tekton config-as-code #859

Closed
bobcatfish opened this issue May 14, 2019 · 19 comments
Closed

Solve chicken and egg problem of Tekton config-as-code #859

bobcatfish opened this issue May 14, 2019 · 19 comments
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. maybe-next-milestone For consideration when planning the next milestone priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@bobcatfish
Copy link
Collaborator

bobcatfish commented May 14, 2019

Expected Behavior

It should be possible to define Pipeline and Task definitions and have them committed to a repo which they act on (e.g. this is the intent of the tekton dir in this repo!).

  • Changes to these definitions that are committed to the repo should be picked up and used as soon as they are merged
  • Changes to these definitions in pull requests should be used when testing the pull request, but without affecting any other Runs (e.g. already executing Runs, Runs executing against other Pull Requests)

Actual Behavior

We aren't yet using tekton fully via dogfooding, but once we do we will run into a chicken and egg problem: the PipelineRuns that will be generated by Prow (see #531 and kubernetes/test-infra#11888) will be created assuming that the Pipeline it refers to (and the Tasks it refers to) already exist in the cluster.

Additionally, the PipelineRuns it generates for pull requests will all refer to the same Pipeline, which means that if a pull request changes a Pipeline or Task:

  1. Nothing would be applying the changed Pipeline or Task
  2. If we do add a mechanism to apply the changes before the PipelineRun is created, this change would apply to all PipelineRuns in the same cluster that refer to those Pipelines and Tasks

Additional information

This is being presented in the context of Prow b/c that is what we will be using in the near future for triggering PipelineRuns against this repo, but the solution to this issue should be applicable regardless of what the event triggering system is, whether it's Prow or something completely different (see #315).

@bobcatfish bobcatfish added design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels May 14, 2019
@cjwagner
Copy link

Can you explain the "chicken and egg" part of this problem a little more? AFAICT the Pipeline and Tasks come before the PipelineRun. What part of creating a Pipeline or Task depends on the PipelineRun?

Changes to these definitions that are committed to the repo should be picked up and used as soon as they are merged

This can be achieved with a postsubmit ProwJob that syncs the checked in YAML with what is deployed in the cluster. (Basically kubectl apply -f with a little extra logic to delete resources that are no longer have associated YAML checked in.)

Changes to these definitions in pull requests should be used when testing the pull request, but without affecting any other Runs (e.g. already executing Runs, Runs executing against other Pull Requests)

This is a limitation general to all ProwJobs. Today PRs that modify presubmit job config for the presubmits running on that same PR run without the config changes made in the PR. This is something we've discussed changing, but we haven't designed a solution yet.
Solving this problem will be harder without the ability to embed other CRDs into the PipelineRun spec because we'd need to create and manage PR-specific variants of the Pipeline, Task and PipelineResource CRDs.

@bobcatfish
Copy link
Collaborator Author

(I also want to be clear that whatever solution we come up with here should work when using Prow, but also should work with some event triggering soution completely not Prow as well!)

Can you explain the "chicken and egg" part of this problem a little more?

What I mean by "chicken and egg" is something like, if you want to use Pipeline + Task + PipelineRun to do all the CI,

  • The Pipeline has to exist for the PipelineRun to work
  • For changes to the Pipeline to be reflected, the Pipeline has to be applied by the PipelineRun

I think I'm still not being as clear as I'd like to be so plz push back if that doesn't make sense and I'll keep trying XD

This is a limitation general to all ProwJobs. Today PRs that modify presubmit job config for the presubmits running on that same PR run without the config changes made in the PR. This is something we've discussed changing, but we haven't designed a solution yet.

I think this is pretty much a variant of what I'm trying to describe as the "chicken and egg" problem :D

Solving this problem will be harder without the ability to embed other CRDs into the PipelineRun spec because we'd need to create and manage PR-specific variants of the Pipeline, Task and PipelineResource CRDs.

Right, for the Pipeline and the Task we'd need a way to create versions that are not referenced by other Runs.

I don't think embedding can be the only solution though b/c embedding will make Pipeline and Task sharing and reuse more difficult 🤔

@afrittoli
Copy link
Member

Some thoughts about this.
It is definitely an important issue to solve, but I'm not convinced it's something that can fully be solved on Tekton side. The ability to run pre-merge pipeline/task definitions is a functionality of the CI system that triggers the pipeline.

That said we could provide functionality on Tekton side that help CI system is realizing this.
Something that might help is the ability for a task/pipelineRun to reference a task/pipeline by some kind of URI (instead of name only) e.g. git://repo/path/to/pipeline@gitref. A task/pipeline defined that way would be pulled from git and expanded in the task/pipelineRun definition. That would allow combining the power of embedding with the reuse of pipelines.

What happens when a PR that modifies the task/pipeline definition is merged? The desired behaviour would be that any running tasks/pipelines keep running on the old definition and any new task/pipeline uses the new definition. However if the new definition is applied to the cluster this would affect in-flight tasks/pipelines.

I think there are two ways we can solve this. One is two always embed tasks/pipelines (either explicitly or by URI) and the other is to have the controller creating a copy of the tasks/pipelines before starting a run, and updating the run to use the copy instead of the original. Either way we would have a proliferation of copies of Tasks and Pipelines, and a single Task/Pipeline resource applied to the cluster would only be useful for things like UIs that visualize / help build the task / pipeline.

@vdemeester
Copy link
Member

That said we could provide functionality on Tekton side that help CI system is realizing this.
Something that might help is the ability for a task/pipelineRun to reference a task/pipeline by some kind of URI (instead of name only) e.g. git://repo/path/to/pipeline@gitref. A task/pipeline defined that way would be pulled from git and expanded in the task/pipelineRun definition. That would allow combining the power of embedding with the reuse of pipelines.

Yes 👼 I want to open an issue on tektoncd/catalog or here to talk about integration with pipeline and task definitions.

What happens when a PR that modifies the task/pipeline definition is merged? The desired behaviour would be that any running tasks/pipelines keep running on the old definition and any new task/pipeline uses the new definition. However if the new definition is applied to the cluster this would affect in-flight tasks/pipelines.

I think there are two ways we can solve this. One is two always embed tasks/pipelines (either explicitly or by URI) and the other is to have the controller creating a copy of the tasks/pipelines before starting a run, and updating the run to use the copy instead of the original. Either way we would have a proliferation of copies of Tasks and Pipelines, and a single Task/Pipeline resource applied to the cluster would only be useful for things like UIs that visualize / help build the task / pipeline.

I feel this is the tricky part. We may want to have some sort of revision notion (or version, simpler to grasp) for our Pipeline and Tasks. A _Run would reference a Pipeline or a Task (by default latest revision), but when created, would also know which revision it targets (and thus would still use the previous revision/version of the definition). That would fix the in-flight problem.

For a PR changing a Task or Pipeline definition, we want these definitions to be changed only in the context of the PR. This is trickier to achieve with that revision idea, but maybe Tekton could have a "dry-run" mode or one-shot mode that would create temporary task (with generated name based on the prefix) ?

(I am completely thinking outloud 👼 🙇‍♂️)

@xihaLong
Copy link

"What happens when a PR that modifies the task/pipeline definition is merged?"

Another good question to ask is "How do I run a CI/CD pipeline against my changes to the pipelines of other teams/my entire org?" or in other words

"How do I avoid breaking the world with my pipeline changes?"

The pipeline itself being kept under source control (as recommended practice) may go a long way to allowing for roll-back of a definition. Providing tools to inspect or run pipelines in a lightweight manner from the command prompt may provide the types of insight necessary.

As a core example, I might think of #994 in different terms. "How do I print the expected pipeline easily before the run starts?" Then building on that answer, "How do I print the diff of two sets of pipelines?"
If I can review the diff of two pipelines, this will be interesting information.

Going into the chat to see how others solve this question currently. Depending on feedback, will write up a shared Doc with my own set of suggestions in this area

@bobcatfish
Copy link
Collaborator Author

I've been thinking a cool solution to this problem could be referring to Pipelines (and Tasks) in git repos.

In #1839 we're adding support for storing Tasks (and Pipelines) in OCI image registries, and we've opened the possibility of storing them in git repos.

e.g. OCI image:

apiVersion: tekton.dev/v1alpha1
kind: TaskRun
metadata:
  name: my-task-run
spec:
  taskRef:
    image:
      name: gcr.io/my/catalog:v1.2.3
      task: my-task

e.g. git:

apiVersion: tekton.dev/v1alpha1
kind: PipelineRun
metadata:
  name: my-pipeline-run
spec:
  taskRef:
    image:
      pullSecret: my-pull-secret
      name: gcr.io/my/catalog:v1.2.3
      pipeline: my-pipeline

If you setup your triggering such that the PipelineRuns created for a project refer to a Pipeline in a git repo, you could have logic that adds the commitish of a PullRequest such that whatever is in the repo gets run.

You could combine this with OCI images in whatever way makes sense to you, e.g. Pipelines are referred to by their location in a git repo (they might be more likely to change) and Tasks are usually referred to by their location in an image registry (assuming they change less).

This potentially would solve the problem for Pipelines and Tasks but not for the triggering logic itself. Could be an improvement tho?

@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 15, 2020
@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vdemeester
Copy link
Member

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

@tekton-robot
Copy link
Collaborator

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Aug 17, 2020
@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020
@bobcatfish
Copy link
Collaborator Author

This is in our roadmap: https://github.com/tektoncd/pipeline/blob/master/roadmap.md

/lifecycle frozen

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 17, 2020
@bobcatfish bobcatfish added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Aug 24, 2020
@afrittoli
Copy link
Member

This demo from a KubeCon talk is a way to solve the issue:

  • detect if a change alters the pipeline definitions at all
  • if so, create a new namespace on the fly, deploy pipelines in there
  • send a cloud event to the event listener in the new namespace
    Code is in my plumbing fork https://github.com/andreafrittoli/plumbing

@bobcatfish
Copy link
Collaborator Author

I think @coryrc was looking into a similar approach ^^

@coryrc
Copy link

coryrc commented Sep 17, 2020

Thanks for the link! I really need to learn kustomize.

@chmouel
Copy link
Member

chmouel commented Sep 18, 2020

I do have a working POC here https://github.com/chmouel/tekton-asa-code the diagram mentions some openshift feature (ie: the console) but it can do without.

I am in process to convert it to a GitHUB apps instead of having the user using webhooks.

@bobcatfish
Copy link
Collaborator Author

Note we've been discussing a related question about git support in #2298

@jerop jerop added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 20, 2021
@bobcatfish
Copy link
Collaborator Author

Some recent proposals related to this:

@jerop
Copy link
Member

jerop commented Feb 17, 2023

We now have Remote Resolution (TEP-0060) so closing this feature request, feel free to reopen if there's any remaining item I missed

@jerop jerop closed this as completed Feb 17, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Tekton Pipelines Roadmap Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) design This task is about creating and discussing a design help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. maybe-next-milestone For consideration when planning the next milestone priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
Status: Done
Status: Done
Development

No branches or pull requests

9 participants