-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidecars doesn't get terminated when the binary is in the nop image #1347
Comments
This is a very cheeky hack, sidecar is currently broken with our nop image so we just use nightly `nop` from upstream CI. `nop` should not change or do anything differently with a different base so we should be safe until tektoncd#1347 gets fixed. Signed-off-by: Chmouel Boudjnah <chmouel@redhat.com>
This is a very cheeky hack, sidecar is currently broken with our nop image so we just use nightly `nop` from upstream CI. `nop` should not change or do anything differently with a different base so we should be safe until tektoncd#1347 gets fixed. Signed-off-by: Chmouel Boudjnah <chmouel@redhat.com>
Related Kubernetes issue for a RFE allowing |
Ah, great work figuring this out @chmouel! I'm wondering - what is the reason for overriding the nop image? |
Oh wait, nevermind, I see that https://github.com/google/ko#overriding-the-default-base-image describes some reasons to do so. |
Our case is a bit different, As a policy (and our CI enforces it) all our images needs to use our official distro so we have to base the |
Does UBI contain a |
There is also the ongoing sidecar KEP which is progressively being implemented in Kubernetes. kubernetes/enhancements#753 (and which Tekton may eventually use "under the hood" to run the sidecars). |
@sbwsg ah really nice yes we do have a kill binary : but if I understand your comment here #1131 (comment) you want to SIGKILL process 1 in the sidecar container before it gets replaced to So what I am trying to say is that if we come down to the bullet point number Having said that I don't see any alternative and if we implement #1131 things would def be better, I am still wondering why k8 allows us to change the image name and not the entrypoints,
|
Ah yeah good point! |
At least in the short term i think we should document this. I will do this today. |
Sidecars are stopped by having their Image field swapped out to the `nop` image. When the nop image starts up in the sidecar container it is supposed to immediately exit because `nop` doesn't include the sidecar's command. However, when the `nop` image *does* contain the command that the sidecar is running, the sidecar container will actually never stop and the Task will eventually timeout. For most sidecars this issue will not manifest - the `nop` container that Tekton provides out of the box includes only a very limited set of commands. However, if a Tekton operator overrides the `nop` image when deploying the tekton controller (for example, because their organization requires images configured for Tekton to be built on their org's own base image) then there is a risk that `nop` will start offering more commands and therefore introduce a higher risk that a sidecar's command will be runnable by the `nop` image finally increasing the likelihood of Tasks with sidecars running until timeout. This issue is a known bug with the way sidecars operate at the moment and is being tracked in #1347 but should be documented clearly.
Now that this is documented I'm going to close this issue. |
Expected Behavior
Sidecars get terminated along the
main
containerActual Behavior
This is a followup to the discussion we had with @sbwsg on this issue :
#1253 (comment)
Since the sidecar tests has been implemented we have seen some issues on our openshift based CI. The test would run waiting for a terminate state and fails waiting. Here is the test :
https://github.com/chmouel/tektoncd-pipeline/blob/chmouel-ci-test-1809/test/sidecar_test.go#L105-L107
We believe that we only just figured this out, It seems that it is because of the base image we are using that are based on a RHEL image called
registry.access.redhat.com/ubi8/ubi:latest
.With KO the
nop
image is by default based according to https://github.com/google/ko#overriding-the-default-base-image ongcr.io/distroless/base:latest
which has no/bin/sh
while the RHEL image has.We are guessing of what's happening is :
main
container runs with asidecar
containerhttps://github.com/chmouel/tektoncd-pipeline/blob/chmouel-ci-test-1809/test/sidecar_test.go#L49-L50
main
container gets killed,tekton
controller sees that there is a sidecar container and replaces the main image name with anop
image and keep the same arguments.nop
container is able to run those arguments then it continue running instead of getting to killed state as it should be.Steps to Reproduce the Problem
/bin/sh
Additional Info
We probably want to figure why rewriting the
Entrypoint
is not possible./kind bug
/cc @sbwsg
The text was updated successfully, but these errors were encountered: