Larger results using sidecar logs

Prior to this, we were extracting results from tasks via the termination messages which had a limit of only 4 KB per pod. If users had many results then the results would need to become smaller to obey the upper limit of 4 KB. We now run a dedicated sidecar that has access to the results of all the steps. This sidecar prints out the result and its content to stdout. The logs of the sidecar are parsed by the taskrun controller and the results updated instead of termination logs. We set an upper limit on the results to 1KB but users can have as many such results as needed.
tektoncd · Oct 28, 2022 · 06bbc1d · 06bbc1d
1 parent 6bc53ca
commit 06bbc1d
Show file tree

Hide file tree

Showing 20 changed files with 643 additions and 37 deletions.
diff --git a/cmd/entrypoint/main.go b/cmd/entrypoint/main.go
@@ -50,7 +50,8 @@ var (
 	breakpointOnFailure = flag.Bool("breakpoint_on_failure", false, "If specified, expect steps to not skip on failure")
 	onError             = flag.String("on_error", "", "Set to \"continue\" to ignore an error and continue when a container terminates with a non-zero exit code."+
 		" Set to \"stopAndFail\" to declare a failure with a step error and stop executing the rest of the steps.")
-	stepMetadataDir = flag.String("step_metadata_dir", "", "If specified, create directory to store the step metadata e.g. /tekton/steps/<step-name>/")
+	stepMetadataDir                  = flag.String("step_metadata_dir", "", "If specified, create directory to store the step metadata e.g. /tekton/steps/<step-name>/")
+	dontSendResultsToTerminationPath = flag.Bool("dont_send_results_to_termination_path", false, "If specified, dont send results to the termination path.")
 )
 
 const (
@@ -142,12 +143,13 @@ func main() {
 			stdoutPath: *stdoutPath,
 			stderrPath: *stderrPath,
 		},
-		PostWriter:          &realPostWriter{},
-		Results:             strings.Split(*results, ","),
-		Timeout:             timeout,
-		BreakpointOnFailure: *breakpointOnFailure,
-		OnError:             *onError,
-		StepMetadataDir:     *stepMetadataDir,
+		PostWriter:                       &realPostWriter{},
+		Results:                          strings.Split(*results, ","),
+		Timeout:                          timeout,
+		BreakpointOnFailure:              *breakpointOnFailure,
+		OnError:                          *onError,
+		StepMetadataDir:                  *stepMetadataDir,
+		DontSendResultsToTerminationPath: *dontSendResultsToTerminationPath,
 	}
 
 	// Copy any creds injected by the controller into the $HOME directory of the current

diff --git a/config/enable-log-access-to-controller/clusterrole.yaml b/config/enable-log-access-to-controller/clusterrole.yaml
@@ -0,0 +1,13 @@
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: tekton-pipelines-controller-pod-log-access
+  labels:
+    app.kubernetes.io/component: controller
+    app.kubernetes.io/instance: default
+    app.kubernetes.io/part-of: tekton-pipelines
+rules:
+  - apiGroups: [""]
+    # Controller needs to get the logs of the results sidecar created by TaskRuns to extract results.
+    resources: ["pods/log"]
+    verbs: ["get"]
diff --git a/config/enable-log-access-to-controller/clusterrolebinding.yaml b/config/enable-log-access-to-controller/clusterrolebinding.yaml
@@ -0,0 +1,16 @@
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: tekton-pipelines-controller-pod-log-access
+  labels:
+    app.kubernetes.io/component: controller
+    app.kubernetes.io/instance: default
+    app.kubernetes.io/part-of: tekton-pipelines
+subjects:
+  - kind: ServiceAccount
+    name: tekton-pipelines-controller
+    namespace: tekton-pipelines
+roleRef:
+  kind: ClusterRole
+  name: tekton-pipelines-controller-pod-log-access
+  apiGroup: rbac.authorization.k8s.io
diff --git a/docs/install.md b/docs/install.md
@@ -24,6 +24,7 @@ This guide explains how to install Tekton Pipelines. It covers the following top
     - [Customizing the Pipelines Controller behavior](#customizing-the-pipelines-controller-behavior)
     - [Alpha Features](#alpha-features)
     - [Beta Features](#beta-features)
+- [Enabling larger results using sidecar logs](#enabling-larger-results-using-sidecar-logs)
 - [Configuring High Availability](#configuring-high-availability)
 - [Configuring tekton pipeline controller performance](#configuring-tekton-pipeline-controller-performance)
 - [Creating a custom release of Tekton Pipelines](#creating-a-custom-release-of-tekton-pipelines)
@@ -420,6 +421,8 @@ features](#alpha-features) to be used.
   name, kind, and API version information for each `TaskRun` and `Run` in the `PipelineRun` instead. Set it to "both" to
   do both. For more information, see [Configuring usage of `TaskRun` and `Run` embedded statuses](pipelineruns.md#configuring-usage-of-taskrun-and-run-embedded-statuses).
 
+- `enable-sidecar-logs-results`: Set this flag to "true" to enable use of a results sidecar logs to extract results larger than the size of the termination message. While termination message restrics the combined size of results to 4K per pod, enabling this feature will allow 1K per result (as many results as required).
+
 For example:
 
 ```yaml
@@ -467,6 +470,55 @@ the `feature-flags` ConfigMap alongside your Tekton Pipelines deployment via
 
 For beta versions of Tekton CRDs, setting `enable-api-fields` to "beta" is the same as setting it to "stable".
 
+## Enabling larger results using sidecar logs
+
+**Note**: The maximum size of a Task's results is limited by the container termination message feature of Kubernetes, as results are passed back to the controller via this mechanism. At present, the limit is “4096 bytes”.
+
+To exceed this limit of 4096 bytes, you can enable larger results using sidecar logs. By enabling this feature, you will have a limit of 1024 bytes per result with no restriction on the number of results.
+
+**Note**: to enable this feature, you need to grant `get` access to all `pods/log` to the `Tekton pipeline controller`. This means that the tekton pipeline controller has the ability to access the pod logs.
+
+1. Create a cluster role by applying the following spec. 
+
+```yaml
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: tekton-pipelines-controller-pod-log-access
+  labels:
+    app.kubernetes.io/component: controller
+    app.kubernetes.io/instance: default
+    app.kubernetes.io/part-of: tekton-pipelines
+rules:
+  - apiGroups: [""]
+    # Controller needs to get the logs of the results sidecar created by TaskRuns to extract results.
+    resources: ["pods/log"]
+    verbs: ["get"]
+```
+
+2. Create a cluster role binding by applying the folowing spec.
+
+```yaml
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: tekton-pipelines-controller-pod-log-access
+  labels:
+    app.kubernetes.io/component: controller
+    app.kubernetes.io/instance: default
+    app.kubernetes.io/part-of: tekton-pipelines
+subjects:
+  - kind: ServiceAccount
+    name: tekton-pipelines-controller
+    namespace: tekton-pipelines
+roleRef:
+  kind: ClusterRole
+  name: tekton-pipelines-controller-pod-log-access
+  apiGroup: rbac.authorization.k8s.io
+```
+
+3. Enable the feature flag to use sidecar logs by setting `enable-sidecar-logs-results: "true"` in the [configMap](#customizing-the-pipelines-controller-behavior).
+
 ## Configuring High Availability
 
 If you want to run Tekton Pipelines in a way so that webhooks are resiliant against failures and support

diff --git a/docs/tasks.md b/docs/tasks.md
@@ -23,6 +23,7 @@ weight: 200
   - [Specifying `Resources`](#specifying-resources)
   - [Specifying `Workspaces`](#specifying-workspaces)
   - [Emitting `Results`](#emitting-results)
+    - [Larger `Results` using sidecar logs](#larger-results-using-sidecar-logs)
   - [Specifying `Volumes`](#specifying-volumes)
   - [Specifying a `Step` template](#specifying-a-step-template)
   - [Specifying `Sidecars`](#specifying-sidecars)
@@ -835,7 +836,7 @@ This also means that the number of Steps in a Task affects the maximum size of a
 as each Step is implemented as a container in the TaskRun's pod.
 The more containers we have in our pod, *the smaller the allowed size of each container's
 message*, meaning that the **more steps you have in a Task, the smaller the result for each step can be**. 
-For example, if you have 10 steps, the size of each step's Result will have a maximum of less than 1KB*.
+For example, if you have 10 steps, the size of each step's Result will have a maximum of less than 1KB.
 
 If your `Task` writes a large number of small results, you can work around this limitation
 by writing each result from a separate `Step` so that each `Step` has its own termination message.
@@ -847,6 +848,15 @@ available size will less than 4096 bytes.
 As a general rule-of-thumb, if a result needs to be larger than a kilobyte, you should likely use a
 [`Workspace`](#specifying-workspaces) to store and pass it between `Tasks` within a `Pipeline`.
 
+#### Larger `Results` using sidecar logs
+
+This is an experimental feature. The `enable-sidecar-logs-results` feature flag must be set to `"true"`](./install.md#enabling-larger-results-using-sidecar-logs)
+
+Instead of using termination messages to store results, the taskrun controller injects a sidecar container which monitors the results of all the steps. The sidecar mounts the volume where results of all the steps are stored. As soon as it finds a new result, it logs it to std out. The controller has access to the logs of the sidecar container (Caution: we need you to enable access to [kubernetes pod/logs](./install.md#enabling-larger-results-using-sidecar-logs). 
+
+**Note**: This feature allows users to store up to `1 KB per result`. Because we are not limited by the size of the termination messages, users can have as many results as they require where each result can be up to 1 KB in size. If the size of a result exceeds 1KB, then the TaskRun will be placed into a failed state with the following message: `Result exceeded the maximum allowed limit of 1024 bytes.` 
+
+
 ### Specifying `Volumes`
 
 Specifies one or more [`Volumes`](https://kubernetes.io/docs/concepts/storage/volumes/) that the `Steps` in your

diff --git a/examples/v1beta1/pipelineruns/alpha/pipelinerun-large-results.yaml b/examples/v1beta1/pipelineruns/alpha/pipelinerun-large-results.yaml
@@ -0,0 +1,81 @@
+apiVersion: tekton.dev/v1beta1
+kind: Task
+metadata:
+  name: large-result
+spec:
+  results:
+    - name: result1
+    - name: result2
+    - name: result3
+    - name: result4
+    - name: result5
+  steps:
+    - name: step1
+      image: alpine
+      script: |
+        cat /dev/urandom | head -c 750 | base64 | tee $(results.result1.path);
+        cat /dev/urandom | head -c 750 | base64 | tee $(results.result2.path);
+        cat /dev/urandom | head -c 750 | base64 | tee $(results.result3.path);
+        cat /dev/urandom | head -c 750 | base64 | tee $(results.result4.path);
+        cat /dev/urandom | head -c 750 | base64 | tee $(results.result5.path);
+---
+apiVersion: tekton.dev/v1beta1
+kind: Task
+metadata:
+  name: concat-text
+spec:
+  params:
+    - name: param1
+    - name: param2
+    - name: param3
+  results:
+    - name: concatenated-text
+      description: concatenate strings
+  steps:
+    - name: concat
+      image: alpine
+      command: ["/bin/sh", "-c"]
+      args:
+        - echo $(params.param1) +++ $(params.param2) +++ $(params.param3)| tee $(results.concatenated-text.path) ;
+---
+apiVersion: tekton.dev/v1beta1
+kind: Pipeline
+metadata:
+  name: concat-text-pipeline
+spec:
+  tasks:
+    - name: first-task
+      taskRef:
+        name: large-result
+    - name: second-task
+      taskRef:
+        name: large-result
+    - name: third-task
+      taskRef:
+        name: large-result
+    - name: last-task
+      runAfter:
+        - first-task
+        - second-task
+        - third-task
+      params:
+        - name: param1
+          value: $(tasks.first-task.results.result1)
+        - name: param2
+          value: $(tasks.second-task.results.result3)
+        - name: param3
+          value: $(tasks.third-task.results.result5)
+      taskRef:
+        name: concat-text
+  results:
+    - name: sum
+      description: the concat of all texts
+      value: $(tasks.last-task.results.concatenated-text)
+---
+apiVersion: tekton.dev/v1beta1
+kind: PipelineRun
+metadata:
+  name: concat-text-pipeline-run
+spec:
+  pipelineRef:
+    name: concat-text-pipeline
diff --git a/examples/v1beta1/taskruns/alpha/large-task-result.yaml b/examples/v1beta1/taskruns/alpha/large-task-result.yaml
@@ -0,0 +1,28 @@
+apiVersion: tekton.dev/v1beta1
+kind: TaskRun
+metadata:
+  generateName: larger-results-
+spec:
+  taskSpec:
+    description: |
+      A task that creates results > termination message limit of 4K per pod!
+    results:
+      - name: result1
+      - name: result2
+      - name: result3
+      - name: result4
+      - name: result5
+    steps:
+      - name: step1
+        image: bash:latest
+        script: |
+          #!/usr/bin/env bash
+          cat /dev/urandom | head -c 750 | base64 | tee /tekton/results/result1 #about 1 K result
+          cat /dev/urandom | head -c 750 | base64 | tee /tekton/results/result2 #about 1 K result
+      - name: step2
+        image: bash:latest
+        script: |
+          #!/usr/bin/env bash
+          cat /dev/urandom | head -c 750 | base64 | tee /tekton/results/result3 #about 1 K result
+          cat /dev/urandom | head -c 750 | base64 | tee /tekton/results/result4 #about 1 K result
+          cat /dev/urandom | head -c 750 | base64 | tee /tekton/results/result5 #about 1 K result
diff --git a/pkg/apis/config/feature_flags.go b/pkg/apis/config/feature_flags.go
@@ -64,6 +64,8 @@ const (
 	DefaultEmbeddedStatus = FullEmbeddedStatus
 	// DefaultEnableSpire is the default value for "enable-spire".
 	DefaultEnableSpire = false
+	// DefaultSidecarLogsResults is the default value for "enable-larger-results".
+	DefaultSidecarLogsResults = false
 
 	disableAffinityAssistantKey         = "disable-affinity-assistant"
 	disableCredsInitKey                 = "disable-creds-init"
@@ -76,6 +78,7 @@ const (
 	sendCloudEventsForRuns              = "send-cloudevents-for-runs"
 	embeddedStatus                      = "embedded-status"
 	enableSpire                         = "enable-spire"
+	enableSidecarLogsResults            = "enable-sidecar-logs-results"
 )
 
 // FeatureFlags holds the features configurations
@@ -93,6 +96,7 @@ type FeatureFlags struct {
 	AwaitSidecarReadiness            bool
 	EmbeddedStatus                   string
 	EnableSpire                      bool
+	EnableSidecarLogsResults         bool
 }
 
 // GetFeatureFlagsConfigName returns the name of the configmap containing all
@@ -144,6 +148,9 @@ func NewFeatureFlagsFromMap(cfgMap map[string]string) (*FeatureFlags, error) {
 	if err := setEmbeddedStatus(cfgMap, DefaultEmbeddedStatus, &tc.EmbeddedStatus); err != nil {
 		return nil, err
 	}
+	if err := setFeature(enableSidecarLogsResults, DefaultSidecarLogsResults, &tc.EnableSidecarLogsResults); err != nil {
+		return nil, err
+	}
 
 	// Given that they are alpha features, Tekton Bundles and Custom Tasks should be switched on if
 	// enable-api-fields is "alpha". If enable-api-fields is not "alpha" then fall back to the value of

diff --git a/pkg/apis/pipeline/v1beta1/taskrun_types.go b/pkg/apis/pipeline/v1beta1/taskrun_types.go
@@ -181,6 +181,8 @@ const (
 	TaskRunReasonsResultsVerificationFailed TaskRunReason = "TaskRunResultsVerificationFailed"
 	// AwaitingTaskRunResults is the reason set when waiting upon `TaskRun` results and signatures to verify
 	AwaitingTaskRunResults TaskRunReason = "AwaitingTaskRunResults"
+	// TaskRunReasonTimedOut is the reason set when the Taskrun has timed out
+	TaskRunReasonResultLargerThanAllowedLimit TaskRunReason = "TaskRunResultLargerThanAllowedLimit"
 )
 
 func (t TaskRunReason) String() string {

diff --git a/pkg/entrypoint/entrypointer.go b/pkg/entrypoint/entrypointer.go
@@ -80,6 +80,9 @@ type Entrypointer struct {
 	OnError string
 	// StepMetadataDir is the directory for a step where the step related metadata can be stored
 	StepMetadataDir string
+
+	//Dont Send results to the termination path.
+	DontSendResultsToTerminationPath bool
 }
 
 // Waiter encapsulates waiting for files to exist.
@@ -183,7 +186,7 @@ func (e Entrypointer) Go() error {
 
 	// strings.Split(..) with an empty string returns an array that contains one element, an empty string.
 	// This creates an error when trying to open the result folder as a file.
-	if len(e.Results) >= 1 && e.Results[0] != "" {
+	if !e.DontSendResultsToTerminationPath && len(e.Results) >= 1 && e.Results[0] != "" {
 		if err := e.readResultsFromDisk(pipeline.DefaultResultPath); err != nil {
 			logger.Fatalf("Error while handling results: %s", err)
 		}

diff --git a/pkg/pod/entrypoint.go b/pkg/pod/entrypoint.go
@@ -108,7 +108,7 @@ var (
 // command, we must have fetched the image's ENTRYPOINT before calling this
 // method, using entrypoint_lookup.go.
 // Additionally, Step timeouts are added as entrypoint flag.
-func orderContainers(commonExtraEntrypointArgs []string, steps []corev1.Container, taskSpec *v1beta1.TaskSpec, breakpointConfig *v1beta1.TaskRunDebug, waitForReadyAnnotation bool) ([]corev1.Container, error) {
+func orderContainers(commonExtraEntrypointArgs []string, steps []corev1.Container, taskSpec *v1beta1.TaskSpec, breakpointConfig *v1beta1.TaskRunDebug, waitForReadyAnnotation bool, isSidecarLogsResultsEnabled bool) ([]corev1.Container, error) {
 	if len(steps) == 0 {
 		return nil, errors.New("No steps specified")
 	}
@@ -133,6 +133,9 @@ func orderContainers(commonExtraEntrypointArgs []string, steps []corev1.Containe
 			"-termination_path", terminationPath,
 			"-step_metadata_dir", filepath.Join(runDir, idx, "status"),
 		)
+		if isSidecarLogsResultsEnabled == true {
+			argsForEntrypoint = append(argsForEntrypoint, "-dont_send_results_to_termination_path")
+		}
 		argsForEntrypoint = append(argsForEntrypoint, commonExtraEntrypointArgs...)
 		if taskSpec != nil {
 			if taskSpec.Steps != nil && len(taskSpec.Steps) >= i+1 {