Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0086: Larger Results via Sidecar Logs #745

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 65 additions & 16 deletions teps/0086-changing-the-way-result-parameters-are-stored.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,16 @@
status: proposed
title: Changing the way result parameters are stored
creation-date: '2021-09-27'
last-updated: '2022-06-09'
last-updated: '2022-08-01'
authors:
- '@tlawrie'
- '@imjasonh'
- '@bobcatfish'
- '@pritidesai'
- '@tomcli'
- '@ScrapCodes'
- '@chitrangpatel'
- '@jerop'
---

# TEP-0086: Changing the way result parameters are stored
Expand All @@ -21,7 +23,7 @@ authors:
- [Non-Goals](#non-goals)
- [Use Cases (optional)](#use-cases-optional)
- [Requirements](#requirements)
- [Required](#required)
- [Proposal](#proposal)
- [Alternatives](#alternatives)
- [Result Sidecar - Upload results from sidecar](#result-sidecar---upload-results-from-sidecar)
- [Option: Supporting multiple sidecars](#option-supporting-multiple-sidecars)
Expand All @@ -44,7 +46,10 @@ authors:
- [Store results on PVCs](#store-results-on-pvcs)
- [No change. Use workspaces.](#no-change-use-workspaces)
- [Repurpose Artifact Storage API](#repurpose-artifact-storage-api)
- [Using logs emitted by the Task](#using-logs-emitted-by-the-task)
- [Sidecar Logs](#sidecar-logs)
- [Feature Gates for Sidecar Logs](#feature-gates-for-sidecar-logs)
- [Design Details of Sidecar Logs](#design-details-of-sidecar-logs)
- [Caveats of Sidecar Logs](#caveats-of-sidecar-logs)
- [Infrastructure Needed (optional)](#infrastructure-needed-optional)
- [Upgrade & Migration Strategy (optional)](#upgrade--migration-strategy-optional)
- [Implementation Pull request(s)](#implementation-pull-requests)
Expand Down Expand Up @@ -115,6 +120,19 @@ Additionally, this will help projects that wrap/abstract Tekton where users unde
* Define a clear upper limit on the expected maximum size of a result.
* Support an environment where executing pods are not permitted to make network connections within the cluster.

## Proposal

There's no solution that will meet all the [requirements](#requirements) described above. There are several efforts to
build proof of concepts for different solutions described in [alternatives](#alternatives). We propose that we provide
the multiple solutions, all guarded behind a `larger-results` feature flag, so that we can experiment and figure out a
way forward. These gated solutions will be alpha, and can be changed or removed at any time.

These are the proof of concepts we'll provide and their feature flags:

| Solution | Feature Gate | Description |
|-------------------------------|----------------------------------|------------------------------------------------------------------------|
| [Sidecar Logs](#sidecar-logs) | `larger-results`: `sidecar-logs` | Use stdout logs from a dedicated `Sidecar` to return a `Result` object |

## Alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative that might help with some of the sidecar issues mentioned above - OIDC API but using the entrypoint as the client instead of a dedicated sidecar.

  • You can reuse the existing Pod resources to handle the uploading (no additional sidecar overhead).
  • There's no ambiguity when the user container is complete.
  • Avoids concerns about result availability since the upload is done from the user Pod itself.

Copy link
Member Author

@jerop jerop Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the same alternative described in https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md#considerations?

in that case, the OIDC approach involves storing the results externally (HTTP server) which is something we can explore alongside the solutions for external storage - which makes this alternative out of scope for this PR

cc @pritidesai

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, but without the Sidecar. You also don't need to store the results externally - the OIDC API can sit local on the cluster and write to the Results to the CRD similar to what's proposed with the log based approach. The OIDC bits are just to ensure that the upload requests are coming from the expected Pods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @wlynch who will be responsible for managing OIDC? Do we need a separate controller to manage the API? How would the pipeline controller work without access to openID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the rough plan was to use the built in cluster service account token projection, then use the TokenReview API to verify the token (or worst case verify via OIDC discovery) to authz requests coming from Pods. I think it's reasonable to require this, especially since it's been stable since 1.20 and I think most cloud providers have this enabled by default.

Task output API would be hosted in a container controlled by Tekton itself, probably another Pod in the tekton-pipelines namespace so that it can have separate RBAC permissions from the controller.

cc @tlawrie @mattmoor in case I missed anything.

Copy link
Member Author

@jerop jerop Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlynch thanks for sharing this alternative - it sounds worthwhile pursing and was wondering if you could add it to the alternatives listed in https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md#alternatives 🙏🏾

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this sounds promising, we should add it to the alternatives and see if we can try out a PoC for this approach


### Result Sidecar - Upload results from sidecar
Expand Down Expand Up @@ -374,21 +392,47 @@ Cons:
- [Docs on setting up storage](https://github.com/tektoncd/pipeline/blob/main/docs/install.md#configuring-pipelineresource-storage)
- [Interface](https://github.com/tektoncd/pipeline/blob/main/pkg/artifacts/artifacts_storage.go#L39-L47)

### Using logs emitted by the Task
### Sidecar Logs

- We are also exploring using **stdout logs from a dedicated sidecar to return a json result object** as a simpler way
to support larger TaskResults, but we need to explore this in a POC as we suspect we may not be able to rely on logs for this.
- The controller would wait for the sidecar to exit and then read the logs based on a particular query and append info to the TaskRun
- Potential to use a CloudEvent object to wrap result object
We propose using **stdout logs from a dedicated `Sidecar` to return a json `Result` object** to support larger
`Results`. The controller would wait for the `Sidecar` to exit and then read the logs based on a particular query and
append info to the `TaskRun`.

Cons:
- No guarantees on when logs will be available (would have to wait for this before considering a TaskRun complete)
- No guarantee you'll be able to access a log before it disappears (e.g. logs will not be available via the k8s API
once a pod is deleted)
- The storage of the result parameter may still be limited by a number of scenarios, including:
- [1.5 MB CRD size](https://github.com/kubernetes/kubernetes/issues/82292)
- The total size of the PipelineRun _if_ the TaskRun content is included, however
[TEP-100 is removing this](https://github.com/tektoncd/community/blob/main/teps/0100-embedded-taskruns-and-runs-status-in-pipelineruns.md)
We propose injecting a dedicated `Sidecar` alongside the `Steps` which will watch the `Results` path of the `Steps`.
This `Sidecar` will output the name of the `Result` and its contents to stdout in a parsable pattern. The `TaskRun`
controller will access the stdout logs of the `Sidecar`, extract the `Result` and its contents during reconciliation.
After the `Steps` have terminated, the `TaskRun` controller will attach the `Results` from the logs of the `Sidecar`
instead of using the termination message (hence bypassing the 4K limit) to update the `Results` fields in the CRD.
This method keeps the rest of the current functionality identical and does not require any external storage mechanism.
For further details, see the [demonstration][demo] of the proof of concept.

#### Feature Gates for Sidecar Logs

This solution will be gated using a `larger-results` feature flag which users can set to `"sidecar-logs"` to enable it.
This provides an opportunity to experiment with this solution to provide `Results` within the CRDs as we figure out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 In general, I'd like us to decouple the storing results outside the CRD problem from the results can only be a few KB problem. I think being able to store results above a few KB will unlock quite a few use cases regardless of whether the result is stored in the CRD or not. And storing it in the CRD is a great first step since we don't have to figure out the interfaces for fetching results from external storage either.

That being said, we should also experiment with other ways of getting these larger results from the TaskRun onto the CRD e.g. sending results over HTTPS

the solutions for external storage.

In [TEP-0100][tep-0100] we proposed changes to `PipelineRun` status to reduce the amount of information stored about
the status of `TaskRuns` and `Runs` to improve performance, reduce memory bloat and improve extensibility. Now that
those changes have been implemented, the `PipelineRun` status is set up to handle larger `Results` without
exacerbating the performance and storage issues that were there before. For `ChildReferences` to be populated, the
`embedded-status` must be set to `"minimal"`. Thus, will require that minimal embedded status is enabled during the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems a bit out of place

migration until it becomes the only supported status. This requirement also makes the behavior clearer to users -
auto-adding the minimal status when users have not enabled it makes the user experience opaque and surprising.

#### Design Details of Sidecar Logs

In order to parse the `Results` volume and output the contents to stdout in the `Container`, the `Sidecar` will run
a script that:
- is auto-generated based on the required `Results` - identified from `taskSpec.results` field.
- periodically checks for files in the directory `/tekton/results`.
- when a `Result` is found, it prints it to stdout in a parsable pattern.
- When all the expected `Results` are found, it breaks out of the periodic loop and exits.

#### Caveats of Sidecar Logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also note the RBAC considerations mentioned above


- The storage of the result parameter may still be limited by the maximum allowed [CRD size][crd-size]. Note that the
maximum size is shared among all the `Results` of the `TaskRun` and all its other contents (e.g. `Status`).

### Using embedded storage client to store result files in remote storage

Expand Down Expand Up @@ -452,3 +496,8 @@ It will be a quick reference for those looking for implementation of this TEP.
- [HackMD Result Collector Sidecar Design](https://hackmd.io/a6Kl4oS0SaOyBqBPTirzaQ)
- [TEP-0086 Design Breakout Session Recording](https://drive.google.com/file/d/1lIqyy1RyZMYOrMCC2CLZD8eOf0NrVeDb/view?usp=sharing)
- [TEP-0086 Design Breakout Session Notes](https://hackmd.io/YU_g27vRS2S5DwfBXDGpYA?view)

[crd-size]: https://github.com/kubernetes/kubernetes/issues/82292
[tep-0100]: ./0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
[demo]: https://drive.google.com/file/d/14tDHNgpzOZ--5nMsOsTBhxsDgDDM_7iQ/view?t=1h01m41s
[tektoncd/pipeline#3497]: https://github.com/tektoncd/pipeline/issues/3497
2 changes: 1 addition & 1 deletion teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ This is the complete list of Tekton teps:
|[TEP-0083](0083-scheduled-and-polling-runs-in-tekton.md) | Scheduled and Polling runs in Tekton | proposed | 2021-09-13 |
|[TEP-0084](0084-endtoend-provenance-collection.md) | end-to-end provenance collection | proposed | 2022-05-12 |
|[TEP-0085](0085-per-namespace-controller-configuration.md) | Per-Namespace Controller Configuration | proposed | 2021-10-14 |
|[TEP-0086](0086-changing-the-way-result-parameters-are-stored.md) | Changing the way result parameters are stored | proposed | 2022-06-09 |
|[TEP-0086](0086-changing-the-way-result-parameters-are-stored.md) | Changing the way result parameters are stored | proposed | 2022-08-01 |
|[TEP-0088](0088-result-summaries.md) | Tekton Results - Record Summaries | proposed | 2021-10-01 |
|[TEP-0089](0089-nonfalsifiable-provenance-support.md) | Non-falsifiable provenance support | implementable | 2022-01-18 |
|[TEP-0090](0090-matrix.md) | Matrix | implemented | 2022-06-30 |
Expand Down