From 6c132b937ea225e4fa584a87be64e58cde5ac59b Mon Sep 17 00:00:00 2001 From: Scott Date: Thu, 24 Oct 2019 11:49:02 -0400 Subject: [PATCH] Document bug with sidecar usage of nop container Sidecars are stopped by having their Image field swapped out to the `nop` image. When the nop image starts up in the sidecar container it is supposed to immediately exit because `nop` doesn't include the sidecar's command. However, when the `nop` image *does* contain the command that the sidecar is running, the sidecar container will actually never stop and the Task will eventually timeout. For most sidecars this issue will not manifest - the `nop` container that Tekton provides out of the box includes only a very limited set of commands. However, if a Tekton operator overrides the `nop` image when deploying the tekton controller (for example, because their organization requires images configured for Tekton to be built on their org's own base image) then there is a risk that `nop` will start offering more commands and therefore introduce a higher risk that a sidecar's command will be runnable by the `nop` image finally increasing the likelihood of Tasks with sidecars running until timeout. This issue is a known bug with the way sidecars operate at the moment and is being tracked in https://github.com/tektoncd/pipeline/issues/1347 but should be documented clearly. --- docs/developers/README.md | 14 ++++++++++++-- docs/taskruns.md | 6 ++++++ docs/tasks.md | 8 ++++++++ 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/docs/developers/README.md b/docs/developers/README.md index 29654e2a964..876935c1a84 100644 --- a/docs/developers/README.md +++ b/docs/developers/README.md @@ -183,5 +183,15 @@ begin. On completion of all steps in a Task the TaskRun reconciler stops any sidecar containers. The `Image` field of any sidecar containers is swapped to the nop image. Kubernetes observes the change and relaunches the container -with updated container image. The nop container image exits. The container -is considered `Terminated` by Kubernetes and the TaskRun's Pod stops. +with updated container image. The nop container image exits immediately +*because it does not provide the command that the sidecar is configured to run*. +The container is considered `Terminated` by Kubernetes and the TaskRun's Pod +stops. + +There is a known issue with this implementation of sidecar support. When the +`nop` image does provide the sidecar's command, the sidecar will continue to +run even after `nop` has been swapped into the sidecar container's image +field. See https://github.com/tektoncd/pipeline/issues/1347 for the issue +tracking this bug. Until this issue is resolved the best way to avoid it is to +avoid overriding the `nop` image when deploying the tekton controller, or +ensuring that the overridden `nop` image contains as few commands as possible. diff --git a/docs/taskruns.md b/docs/taskruns.md index 8027539224a..30f77eeb4dc 100644 --- a/docs/taskruns.md +++ b/docs/taskruns.md @@ -590,6 +590,12 @@ order to terminate the sidecars they will be restarted with a new Pod will include the sidecar container with a Retry Count of 1 and with a different container image than you might be expecting. +Note: The configured "nop" image must not provide the command that the +sidecar is expected to run. If it does provide the command then it will +not exit. This will result in the sidecar running forever and the Task +eventually timing out. https://github.com/tektoncd/pipeline/issues/1347 +is the issue where this bug is being tracked. + --- Except as otherwise noted, the content of this page is licensed under the diff --git a/docs/tasks.md b/docs/tasks.md index b6eeaefe3e8..2cbfc0b056e 100644 --- a/docs/tasks.md +++ b/docs/tasks.md @@ -447,6 +447,14 @@ volumes: emptyDir: {} ``` +Note: There is a known bug with Tekton's existing sidecar implementation. +Tekton uses a specific image, called "nop", to stop sidecars. The "nop" image +is configurable using a flag of the Tekton controller. If the configured "nop" +image contains the command that the sidecar was running before the sidecar +was stopped then the sidecar will actually keep running, causing the TaskRun's +Pod to remain running, and eventually causing the TaskRun to timeout rather +then exit successfully. Issue https://github.com/tektoncd/pipeline/issues/1347 +has been created to track this bug. ### Variable Substitution