testSideCar is failing #1253

chmouel · 2019-08-30T07:23:00Z

testSideCar test is failing on our CI based on openshift. I am not really sure if it's an openshift issue or a test issue.

Expected Behavior

Working tests

Actual Behavior

=== RUN   TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete
--- FAIL: TestSidecarTaskSupport (59.28s)
    init_test.go:113: Create namespace arendelle-g99jz to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-g99jz"
    --- FAIL: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (58.74s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
        sidecar_test.go:132: Sidecar container did not terminate as expected, Terminated state: &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2019-08-30 08:49:51 +0200 CEST,FinishedAt:2019-08-30 08:50:30 +0200 CEST,ContainerID:cri-o://dc798098f24caa42cbb1d08eb507f921f43a5da74d32377f560209bbbd7e2c38,}
        sidecar_test.go:140: Either the primary or sidecar containers did not terminate
FAIL
-------

events :

33s         Warning   Failed      pod/sidecar-test-task-run-0-pod-993fa8   Error: container create failed: container_linux.go:336: starting container process caused "exec: \"sh\": executable file not found in $PATH"

Steps to Reproduce the Problem

% go test -failfast -v -count=1 -tags=e2e -ldflags '-X github.com/tektoncd/pipeline/test.missingKoFatal=false' ./test -timeout=20m --kubeconfig $KUBECONFIG -run TestSidecarTaskSuppor

Additional Info

% oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.13-201908210601+15d7c4b-dirty", GitCommit:"15d7c4b", GitTreeState:"dirty", BuildDate:"2019-08-21T13:24:47Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+df9cebc", GitCommit:"df9cebc", GitTreeState:"clean", BuildDate:"2019-08-06T18:31:45Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

chmouel · 2019-08-30T07:23:27Z

/assign
/kind bug

Sidecar tests were failing on our CI when using the busybox images, switching it to ubuntu image 'fix' it. I tried a few other way to fix it properly (change the paths, change loop to sleep infinity etc...) with no success. Closes tektoncd#1253

Sidecar tests were failing on our CI when using the busybox images, switching it to ubuntu image 'fix' it. I tried a few other way to fix it properly (change the paths, change loop to sleep infinity etc...) with no success. Closes #1253

Sidecar tests were failing on our CI when using the busybox images, switching it to ubuntu image 'fix' it. I tried a few other way to fix it properly (change the paths, change loop to sleep infinity etc...) with no success. Closes tektoncd#1253

bobcatfish · 2019-08-30T14:54:04Z

hm @chmouel maybe we should leave this open to continue to investigate and find the root cause? sounds like we're still not completely sure why this is happening? (probably would be hard for non-open shift folks to investigate tho so I'll leave it up to you!)

chmouel · 2019-09-02T07:43:05Z

@bobcatfish sounds good, ill dig into it a bit more! (i don't think i have the rights to reopen it tho)

chmouel · 2019-09-02T11:46:40Z

So i spent some more time on this and there is definitively something funky going on in there.

If I use the ubuntu image in the sidecar test and run the tests :

% git reset --hard
HEAD is now at 39505213 Add comments about the two scenarios for upgrade tests

%  go test -failfast -v -count=1 -tags=e2e -ldflags '-X github.com/tektoncd/pipeline/test.missingKoFatal=false' ./test -timeout=20m --kubeconfig $KUBECONFIG -run TestSidecarTaskSupport 
=== RUN   TestSidecarTaskSupport
2019-09-02T13:11:07.920+0200    info    logging/config.go:43    Successfully created the logger.        {"knative.dev/jsonconfig": "{\n\t  \"level\": \"info\",\n\t  \"encoding\": \"console\",\n\t  \"outputPaths\": [\"stdout\"],\n\t  \"errorOutputPaths\": [\"stderr\"],\n\t  \"encoderConfig\": {\n\t    \"timeKey\": \"ts\",\n\t    \"messageKey\": \"message\",\n\t    \"levelKey\": \"level\",\n\t    \"nameKey\": \"logger\",\n\t    \"callerKey\": \"caller\",\n\t    \"messageKey\": \"msg\",\n\t    \"stacktraceKey\": \"stacktrace\",\n\t    \"lineEnding\": \"\",\n\t    \"levelEncoder\": \"\",\n\t    \"timeEncoder\": \"iso8601\",\n\t    \"durationEncoder\": \"\",\n\t    \"callerEncoder\": \"\"\n\t  }\n\t}"}
2019-09-02T13:11:07.920+0200    info    logging/config.go:43    Logging level set to info
2019-09-02T13:11:07.921+0200    warn    logging/config.go:45    Fetch GitHub commit ID from kodata failed: "KO_DATA_PATH" does not exist or is empty
=== RUN   TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete
=== RUN   TestSidecarTaskSupport/A_sidecar_that_terminates_early_does_not_cause_problems_running_Steps
--- PASS: TestSidecarTaskSupport (30.79s)
    init_test.go:113: Create namespace arendelle-4vpzh to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-4vpzh"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (14.71s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_terminates_early_does_not_cause_problems_running_Steps (15.59s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-1"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-1"
PASS
Using kubeconfig at `/Users/chmouel/.kube/config.os4` with cluster ``
ok      github.com/tektoncd/pipeline/test       30.822s
Now using project "arendelle-4vpzh" on server "https://api.chmouelx.devcluster.openshift.com:6443".

test run fine and I see the image used being the ubuntu one :

% kubectl get pods -o yaml |grep -A1 while
      - while [[ true ]] ; do echo "hello from sidecar" ; done
      image: ubuntu

Now let's change it back to busybox as previously commited and run the test :

% git diff
diff --git a/test/sidecar_test.go b/test/sidecar_test.go
index c1b711cf..0a64aa35 100644
--- a/test/sidecar_test.go
+++ b/test/sidecar_test.go
@@ -64,12 +64,12 @@ func TestSidecarTaskSupport(t *testing.T) {
                                tb.TaskSpec(
                                        tb.Step(
                                                primaryContainerName,
-                                               "ubuntu",
+                                               "busybox:1.31.0-musl",
                                                tb.StepCommand(test.stepCommand...),
                                        ),
                                        tb.Sidecar(
                                                sidecarContainerName,
-                                               "ubuntu",
+                                               "busybox:1.31.0-musl",
                                                tb.Command(test.sidecarCommand...),
                                        ),
                                ),
=== RUN   TestSidecarTaskSupport
2019-09-02T13:12:29.401+0200    info    logging/config.go:43    Successfully created the logger.        {"knative.dev/jsonconfig": "{\n\t  \"level\": \"info\",\n\t  \"encoding\": \"console\",\n\t  \"outputPaths\": [\"stdout\"],\n\t  \"errorOutputPaths\": [\"stderr\"],\n\t  \"encoderConfig\": {\n\t    \"timeKey\": \"ts\",\n\t    \"messageKey\": \"message\",\n\t    \"levelKey\": \"level\",\n\t    \"nameKey\": \"logger\",\n\t    \"callerKey\": \"caller\",\n\t    \"messageKey\": \"msg\",\n\t    \"stacktraceKey\": \"stacktrace\",\n\t    \"lineEnding\": \"\",\n\t    \"levelEncoder\": \"\",\n\t    \"timeEncoder\": \"iso8601\",\n\t    \"durationEncoder\": \"\",\n\t    \"callerEncoder\": \"\"\n\t  }\n\t}"}
2019-09-02T13:12:29.401+0200    info    logging/config.go:43    Logging level set to info
2019-09-02T13:12:29.401+0200    warn    logging/config.go:45    Fetch GitHub commit ID from kodata failed: "KO_DATA_PATH" does not exist or is empty
=== RUN   TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete
--- FAIL: TestSidecarTaskSupport (50.31s)
    init_test.go:113: Create namespace arendelle-2nsrm to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-2nsrm"
    --- FAIL: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (49.79s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
        sidecar_test.go:132: Sidecar container did not terminate as expected, Terminated state: &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2019-09-02 13:12:40 +0200 CEST,FinishedAt:2019-09-02 13:13:18 +0200 CEST,ContainerID:cri-o://afc749c68a07ff35a9836e306ffa8f75e8c58641503c890cc2deeef04d51e9f3,}
        sidecar_test.go:140: Either the primary or sidecar containers did not terminate
FAIL
Using kubeconfig at `/Users/chmouel/.kube/config.os4` with cluster ``
FAIL    github.com/tektoncd/pipeline/test       50.349s
Now using project "arendelle-2nsrm" on server "https://api.chmouelx.devcluster.openshift.com:6443".

It's failing not being able to launch a shell sh as seen from openshift events :

3m4s        Warning   Failed      pod/sidecar-test-task-run-0-pod-6da97c   Error: container create failed: container_linux.go:336: starting container process caused "exec: \"sh\": executable file not found in $PATH"
3m41s       Normal    Succeeded   taskrun/sidecar-test-task-run-0          All Steps have completed executing

looking at the pod created it weirdly didn't use the busybox image but the nop image which does not have any shell in there :

% kubectl get pods -o yaml |grep -A1 while
      - while [[ true ]] ; do echo "hello from sidecar" ; done
      image: index.docker.io/chmouel/nop-8eac7c133edad5df719dc37b36b62482@sha256:35b759c6f75df697f96937543ceb2c13e5f79e9e39345d5aa95ca0ecc08159fc

Trying to compare on minikube, the tests do succeed :

--- PASS: TestSidecarTaskSupport (54.14s)
    init_test.go:113: Create namespace arendelle-q6vdw to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-q6vdw"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (49.07s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_terminates_early_does_not_cause_problems_running_Steps (5.04s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-1"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-1"
PASS
Using kubeconfig at `/Users/chmouel/.kube/config.os4` with cluster ``
ok      github.com/tektoncd/pipeline/test       54.173s
NAME                                 READY   STATUS               RESTARTS   AGE
sidecar-test-task-run-0-pod-64ceab   0/2     ContainerCannotRun   1          55s
sidecar-test-task-run-1-pod-273fbb   0/2     Completed            0          6s

but it still use nop instead of busybox in there (while still successful) :

% kubectl -n arendelle-q6vdw get pod sidecar-test-task-run-0-pod-64ceab -o yaml|grep -A1 while
    - while [[ true ]] ; do echo "hello from sidecar" ; done
    image: index.docker.io/chmouel/nop-8eac7c133edad5df719dc37b36b62482@sha256:35b759c6f75df697f96937543ceb2c13e5f79e9e39345d5aa95ca0ecc08159fc

This is quite surprising, being was able to launch a sh command, while the image has definitively not a shell/sh :

% docker run --entrypoint /bin/sh --privileged -it index.docker.io/chmouel/nop-8eac7c133edad5df719dc37b36b62482@sha256:35b759c6f75df697f96937543ceb2c13e5f79e9e39345d5aa95ca0ecc08159fc
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/bin/sh\": stat /bin/sh: no such file or directory": unknown.

So well, i am more confused than before about what's going on, the questions would be :

why is it that the busybox image get replace by nop one for sidecars (probably a bug somewhere)
how did minikube being able to launch shell command while on the nop image

chmouel · 2019-09-02T16:10:12Z

I think I figured out the issue :

On OpenShift when the sidecar fails it returns an Error instead of ContainerCannotRun, perhaps that's because this is based on kubernetes 1.13.x while minikube is based on 1.15.x
When we have ContainerCannotRun in the test we consider it as okay,

pipeline/test/sidecar_test.go

Line 131 in 3950521

if c.State.Terminated == nil || (c.State.Terminated.Reason != "Completed" && c.State.Terminated.Reason != "ContainerCannotRun") {

,
The image switch to nop is actually documented :

https://github.com/tektoncd/pipeline/blame/master/docs/developers/README.md#L183-L187

I think there is an issue here where it does reset it to nop but doesnt reset the entrypoint :

https://github.com/tektoncd/pipeline/blob/master/pkg/reconciler/taskrun/sidecars/stop.go#L48-L50

so the pod gets restarted with the nop image and the previous entrypoint and then fails (as ContainerCannotRun on k8 or Error on minishift)

I am still trying to figure out why with the ubuntu image the sidecontainer doesnt get replaced?

cc @dicarlo2 @sbwsg

ghost · 2019-09-03T13:42:55Z

Hm, it looks like the ubuntu container isn't actually running at all. It starts and stops in the same second and running kubectl logs on that pod shows the following:

sh: 1: [[: not found

I'm surprised this isn't exiting with an error status, but it at least explains why ubuntu isn't being replaced with the nop image - it exits immediately on its own so the taskrun controller doesn't need to stop the sidecar.

In regards (4) - it's not possible to update a container's Entrypoint or Command fields once a Pod is deployed; only its Image field. This is why "ContainerCannotRun" is a valid exit for the sidecar - the nop image may not contain the binary used by the container's Entrypoint.

So I wonder if the fix might be to consider any "Terminated" status of the sidecar as a successful exit, regardless of Reason. I'll make a PR that puts back the busybox images and updates the test. Then, @chmouel, if you have time to try it out on OpenShift that would be much appreciated!

chmouel · 2019-09-03T13:45:21Z

@sbwsg yes you are right! this is indeed weird, our CI wil run this on whenever this gets merged, thanks,

Fixes #1253 When a sidecar is forcefully exited by the taskrun controller it may error out. This happens because the sidecar is terminated by having its image field rewritten to be tekton's nop container. The nop container is bare bones and probably doesnt contain the entrypoint command configured for the sidecar. When this occurs the container will exit with an error of some kind. On GKE this error's Reason is "ContainerCannotRun" but on MiniShift/OpenShift the error Reason is "Error". The sidecar e2e test previously required a termination Reason of "ContainerCannotRun" or "Completed" but this appears to be too rigid. Instead the test now considers a sidecar that has been terminated for any reason to be a success.

…tom base image

…ustom base image

tekton-robot assigned chmouel Aug 30, 2019

tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 30, 2019

chmouel mentioned this issue Aug 30, 2019

Use ubuntu images for sidecar tests #1254

Merged

tekton-robot closed this as completed in #1254 Aug 30, 2019

ghost reopened this Sep 3, 2019

ghost mentioned this issue Sep 3, 2019

Accept any sidecar termination reason #1265

Merged

3 tasks

tekton-robot closed this as completed in #1265 Sep 3, 2019

chmouel mentioned this issue Sep 23, 2019

Sidecars doesn't get terminated when the binary is in the nop image #1347

Closed

bobcatfish mentioned this issue May 20, 2020

Flakey sidecar container test #2656

Closed

bahetiamit added a commit to bahetiamit/pipeline that referenced this issue Oct 9, 2020

Fixes tektoncd#1253. TestSidecarTaskSupport will now work on with cus…

f9779f5

…tom base image

bahetiamit added a commit to bahetiamit/pipeline that referenced this issue Oct 9, 2020

Fixes tektoncd#1253. TestSidecarTaskSupport e2e test case fixed for c…

455724b

…ustom base image

bahetiamit mentioned this issue Oct 9, 2020

TestSidecarTaskSupport e2e test case fix for custom base image #3358

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testSideCar is failing #1253

testSideCar is failing #1253

chmouel commented Aug 30, 2019 •

edited

Loading

chmouel commented Aug 30, 2019 •

edited

Loading

bobcatfish commented Aug 30, 2019

chmouel commented Sep 2, 2019 •

edited

Loading

chmouel commented Sep 2, 2019

chmouel commented Sep 2, 2019

ghost commented Sep 3, 2019

chmouel commented Sep 3, 2019

testSideCar is failing #1253

testSideCar is failing #1253

Comments

chmouel commented Aug 30, 2019 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

chmouel commented Aug 30, 2019 • edited Loading

bobcatfish commented Aug 30, 2019

chmouel commented Sep 2, 2019 • edited Loading

chmouel commented Sep 2, 2019

chmouel commented Sep 2, 2019

ghost commented Sep 3, 2019

chmouel commented Sep 3, 2019

chmouel commented Aug 30, 2019 •

edited

Loading

chmouel commented Aug 30, 2019 •

edited

Loading

chmouel commented Sep 2, 2019 •

edited

Loading