Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testSideCar is failing #1253

Closed
chmouel opened this issue Aug 30, 2019 · 7 comments · Fixed by #1254 or #1265
Closed

testSideCar is failing #1253

chmouel opened this issue Aug 30, 2019 · 7 comments · Fixed by #1254 or #1265
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@chmouel
Copy link
Member

chmouel commented Aug 30, 2019

testSideCar test is failing on our CI based on openshift. I am not really sure if it's an openshift issue or a test issue.

Expected Behavior

Working tests

Actual Behavior

=== RUN   TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete
--- FAIL: TestSidecarTaskSupport (59.28s)
    init_test.go:113: Create namespace arendelle-g99jz to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-g99jz"
    --- FAIL: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (58.74s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
        sidecar_test.go:132: Sidecar container did not terminate as expected, Terminated state: &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2019-08-30 08:49:51 +0200 CEST,FinishedAt:2019-08-30 08:50:30 +0200 CEST,ContainerID:cri-o://dc798098f24caa42cbb1d08eb507f921f43a5da74d32377f560209bbbd7e2c38,}
        sidecar_test.go:140: Either the primary or sidecar containers did not terminate
FAIL
-------

events :

33s         Warning   Failed      pod/sidecar-test-task-run-0-pod-993fa8   Error: container create failed: container_linux.go:336: starting container process caused "exec: \"sh\": executable file not found in $PATH"

Steps to Reproduce the Problem

% go test -failfast -v -count=1 -tags=e2e -ldflags '-X github.com/tektoncd/pipeline/test.missingKoFatal=false' ./test -timeout=20m --kubeconfig $KUBECONFIG -run TestSidecarTaskSuppor

Additional Info

% oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.13-201908210601+15d7c4b-dirty", GitCommit:"15d7c4b", GitTreeState:"dirty", BuildDate:"2019-08-21T13:24:47Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+df9cebc", GitCommit:"df9cebc", GitTreeState:"clean", BuildDate:"2019-08-06T18:31:45Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"}
@chmouel
Copy link
Member Author

chmouel commented Aug 30, 2019

/assign
/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 30, 2019
chmouel added a commit to chmouel/tektoncd-pipeline that referenced this issue Aug 30, 2019
Sidecar tests were failing on our CI when using the busybox images, switching it
to ubuntu image 'fix' it.

I tried a few other way to fix it properly (change the paths, change loop to
sleep infinity etc...) with no success.

Closes tektoncd#1253
tekton-robot pushed a commit that referenced this issue Aug 30, 2019
Sidecar tests were failing on our CI when using the busybox images, switching it
to ubuntu image 'fix' it.

I tried a few other way to fix it properly (change the paths, change loop to
sleep infinity etc...) with no success.

Closes #1253
chmouel added a commit to chmouel/tektoncd-pipeline that referenced this issue Aug 30, 2019
Sidecar tests were failing on our CI when using the busybox images, switching it
to ubuntu image 'fix' it.

I tried a few other way to fix it properly (change the paths, change loop to
sleep infinity etc...) with no success.

Closes tektoncd#1253
@bobcatfish
Copy link
Collaborator

hm @chmouel maybe we should leave this open to continue to investigate and find the root cause? sounds like we're still not completely sure why this is happening? (probably would be hard for non-open shift folks to investigate tho so I'll leave it up to you!)

@chmouel
Copy link
Member Author

chmouel commented Sep 2, 2019

@bobcatfish sounds good, ill dig into it a bit more! (i don't think i have the rights to reopen it tho)

@chmouel
Copy link
Member Author

chmouel commented Sep 2, 2019

So i spent some more time on this and there is definitively something funky going on in there.

If I use the ubuntu image in the sidecar test and run the tests :

% git reset --hard
HEAD is now at 39505213 Add comments about the two scenarios for upgrade tests

%  go test -failfast -v -count=1 -tags=e2e -ldflags '-X github.com/tektoncd/pipeline/test.missingKoFatal=false' ./test -timeout=20m --kubeconfig $KUBECONFIG -run TestSidecarTaskSupport 
=== RUN   TestSidecarTaskSupport
2019-09-02T13:11:07.920+0200    info    logging/config.go:43    Successfully created the logger.        {"knative.dev/jsonconfig": "{\n\t  \"level\": \"info\",\n\t  \"encoding\": \"console\",\n\t  \"outputPaths\": [\"stdout\"],\n\t  \"errorOutputPaths\": [\"stderr\"],\n\t  \"encoderConfig\": {\n\t    \"timeKey\": \"ts\",\n\t    \"messageKey\": \"message\",\n\t    \"levelKey\": \"level\",\n\t    \"nameKey\": \"logger\",\n\t    \"callerKey\": \"caller\",\n\t    \"messageKey\": \"msg\",\n\t    \"stacktraceKey\": \"stacktrace\",\n\t    \"lineEnding\": \"\",\n\t    \"levelEncoder\": \"\",\n\t    \"timeEncoder\": \"iso8601\",\n\t    \"durationEncoder\": \"\",\n\t    \"callerEncoder\": \"\"\n\t  }\n\t}"}
2019-09-02T13:11:07.920+0200    info    logging/config.go:43    Logging level set to info
2019-09-02T13:11:07.921+0200    warn    logging/config.go:45    Fetch GitHub commit ID from kodata failed: "KO_DATA_PATH" does not exist or is empty
=== RUN   TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete
=== RUN   TestSidecarTaskSupport/A_sidecar_that_terminates_early_does_not_cause_problems_running_Steps
--- PASS: TestSidecarTaskSupport (30.79s)
    init_test.go:113: Create namespace arendelle-4vpzh to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-4vpzh"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (14.71s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_terminates_early_does_not_cause_problems_running_Steps (15.59s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-1"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-1"
PASS
Using kubeconfig at `/Users/chmouel/.kube/config.os4` with cluster ``
ok      github.com/tektoncd/pipeline/test       30.822s
Now using project "arendelle-4vpzh" on server "https://api.chmouelx.devcluster.openshift.com:6443".

test run fine and I see the image used being the ubuntu one :

% kubectl get pods -o yaml |grep -A1 while
      - while [[ true ]] ; do echo "hello from sidecar" ; done
      image: ubuntu

Now let's change it back to busybox as previously commited and run the test :

% git diff
diff --git a/test/sidecar_test.go b/test/sidecar_test.go
index c1b711cf..0a64aa35 100644
--- a/test/sidecar_test.go
+++ b/test/sidecar_test.go
@@ -64,12 +64,12 @@ func TestSidecarTaskSupport(t *testing.T) {
                                tb.TaskSpec(
                                        tb.Step(
                                                primaryContainerName,
-                                               "ubuntu",
+                                               "busybox:1.31.0-musl",
                                                tb.StepCommand(test.stepCommand...),
                                        ),
                                        tb.Sidecar(
                                                sidecarContainerName,
-                                               "ubuntu",
+                                               "busybox:1.31.0-musl",
                                                tb.Command(test.sidecarCommand...),
                                        ),
                                ),
=== RUN   TestSidecarTaskSupport
2019-09-02T13:12:29.401+0200    info    logging/config.go:43    Successfully created the logger.        {"knative.dev/jsonconfig": "{\n\t  \"level\": \"info\",\n\t  \"encoding\": \"console\",\n\t  \"outputPaths\": [\"stdout\"],\n\t  \"errorOutputPaths\": [\"stderr\"],\n\t  \"encoderConfig\": {\n\t    \"timeKey\": \"ts\",\n\t    \"messageKey\": \"message\",\n\t    \"levelKey\": \"level\",\n\t    \"nameKey\": \"logger\",\n\t    \"callerKey\": \"caller\",\n\t    \"messageKey\": \"msg\",\n\t    \"stacktraceKey\": \"stacktrace\",\n\t    \"lineEnding\": \"\",\n\t    \"levelEncoder\": \"\",\n\t    \"timeEncoder\": \"iso8601\",\n\t    \"durationEncoder\": \"\",\n\t    \"callerEncoder\": \"\"\n\t  }\n\t}"}
2019-09-02T13:12:29.401+0200    info    logging/config.go:43    Logging level set to info
2019-09-02T13:12:29.401+0200    warn    logging/config.go:45    Fetch GitHub commit ID from kodata failed: "KO_DATA_PATH" does not exist or is empty
=== RUN   TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete
--- FAIL: TestSidecarTaskSupport (50.31s)
    init_test.go:113: Create namespace arendelle-2nsrm to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-2nsrm"
    --- FAIL: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (49.79s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
        sidecar_test.go:132: Sidecar container did not terminate as expected, Terminated state: &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2019-09-02 13:12:40 +0200 CEST,FinishedAt:2019-09-02 13:13:18 +0200 CEST,ContainerID:cri-o://afc749c68a07ff35a9836e306ffa8f75e8c58641503c890cc2deeef04d51e9f3,}
        sidecar_test.go:140: Either the primary or sidecar containers did not terminate
FAIL
Using kubeconfig at `/Users/chmouel/.kube/config.os4` with cluster ``
FAIL    github.com/tektoncd/pipeline/test       50.349s
Now using project "arendelle-2nsrm" on server "https://api.chmouelx.devcluster.openshift.com:6443".

It's failing not being able to launch a shell sh as seen from openshift events :

3m4s        Warning   Failed      pod/sidecar-test-task-run-0-pod-6da97c   Error: container create failed: container_linux.go:336: starting container process caused "exec: \"sh\": executable file not found in $PATH"
3m41s       Normal    Succeeded   taskrun/sidecar-test-task-run-0          All Steps have completed executing

looking at the pod created it weirdly didn't use the busybox image but the nop image which does not have any shell in there :

% kubectl get pods -o yaml |grep -A1 while
      - while [[ true ]] ; do echo "hello from sidecar" ; done
      image: index.docker.io/chmouel/nop-8eac7c133edad5df719dc37b36b62482@sha256:35b759c6f75df697f96937543ceb2c13e5f79e9e39345d5aa95ca0ecc08159fc

Trying to compare on minikube, the tests do succeed :

--- PASS: TestSidecarTaskSupport (54.14s)
    init_test.go:113: Create namespace arendelle-q6vdw to deploy to
    init_test.go:125: Verify SA "default" is created in namespace "arendelle-q6vdw"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete (49.07s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-0"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-0"
    --- PASS: TestSidecarTaskSupport/A_sidecar_that_terminates_early_does_not_cause_problems_running_Steps (5.04s)
        sidecar_test.go:84: Creating Task "sidecar-test-task-1"
        sidecar_test.go:89: Creating TaskRun "sidecar-test-task-run-1"
PASS
Using kubeconfig at `/Users/chmouel/.kube/config.os4` with cluster ``
ok      github.com/tektoncd/pipeline/test       54.173s
NAME                                 READY   STATUS               RESTARTS   AGE
sidecar-test-task-run-0-pod-64ceab   0/2     ContainerCannotRun   1          55s
sidecar-test-task-run-1-pod-273fbb   0/2     Completed            0          6s

but it still use nop instead of busybox in there (while still successful) :

% kubectl -n arendelle-q6vdw get pod sidecar-test-task-run-0-pod-64ceab -o yaml|grep -A1 while
    - while [[ true ]] ; do echo "hello from sidecar" ; done
    image: index.docker.io/chmouel/nop-8eac7c133edad5df719dc37b36b62482@sha256:35b759c6f75df697f96937543ceb2c13e5f79e9e39345d5aa95ca0ecc08159fc

This is quite surprising, being was able to launch a sh command, while the image has definitively not a shell/sh :

% docker run --entrypoint /bin/sh --privileged -it index.docker.io/chmouel/nop-8eac7c133edad5df719dc37b36b62482@sha256:35b759c6f75df697f96937543ceb2c13e5f79e9e39345d5aa95ca0ecc08159fc
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/bin/sh\": stat /bin/sh: no such file or directory": unknown.

So well, i am more confused than before about what's going on, the questions would be :

  • why is it that the busybox image get replace by nop one for sidecars (probably a bug somewhere)
  • how did minikube being able to launch shell command while on the nop image

@chmouel
Copy link
Member Author

chmouel commented Sep 2, 2019

I think I figured out the issue :

  1. On OpenShift when the sidecar fails it returns an Error instead of ContainerCannotRun, perhaps that's because this is based on kubernetes 1.13.x while minikube is based on 1.15.x

  2. When we have ContainerCannotRun in the test we consider it as okay,

    if c.State.Terminated == nil || (c.State.Terminated.Reason != "Completed" && c.State.Terminated.Reason != "ContainerCannotRun") {
    ,

  3. The image switch to nop is actually documented :

https://github.com/tektoncd/pipeline/blame/master/docs/developers/README.md#L183-L187

  1. I think there is an issue here where it does reset it to nop but doesnt reset the entrypoint :

https://github.com/tektoncd/pipeline/blob/master/pkg/reconciler/taskrun/sidecars/stop.go#L48-L50

so the pod gets restarted with the nop image and the previous entrypoint and then fails (as ContainerCannotRun on k8 or Error on minishift)

  1. I am still trying to figure out why with the ubuntu image the sidecontainer doesnt get replaced?

cc @dicarlo2 @sbwsg

@ghost
Copy link

ghost commented Sep 3, 2019

Hm, it looks like the ubuntu container isn't actually running at all. It starts and stops in the same second and running kubectl logs on that pod shows the following:

sh: 1: [[: not found

I'm surprised this isn't exiting with an error status, but it at least explains why ubuntu isn't being replaced with the nop image - it exits immediately on its own so the taskrun controller doesn't need to stop the sidecar.

In regards (4) - it's not possible to update a container's Entrypoint or Command fields once a Pod is deployed; only its Image field. This is why "ContainerCannotRun" is a valid exit for the sidecar - the nop image may not contain the binary used by the container's Entrypoint.

So I wonder if the fix might be to consider any "Terminated" status of the sidecar as a successful exit, regardless of Reason. I'll make a PR that puts back the busybox images and updates the test. Then, @chmouel, if you have time to try it out on OpenShift that would be much appreciated!

@ghost ghost reopened this Sep 3, 2019
@chmouel
Copy link
Member Author

chmouel commented Sep 3, 2019

@sbwsg yes you are right! this is indeed weird, our CI wil run this on whenever this gets merged, thanks,

@ghost ghost mentioned this issue Sep 3, 2019
3 tasks
tekton-robot pushed a commit that referenced this issue Sep 3, 2019
Fixes #1253

When a sidecar is forcefully exited by the taskrun controller it may
error out. This happens because the sidecar is terminated by having its
image field rewritten to be tekton's nop container. The nop container is
bare bones and probably doesnt contain the entrypoint command configured
for the sidecar. When this occurs the container will exit with an error
of some kind. On GKE this error's Reason is "ContainerCannotRun" but on
MiniShift/OpenShift the error Reason is "Error".

The sidecar e2e test previously required a termination Reason of
"ContainerCannotRun" or "Completed" but this appears to be too rigid.
Instead the test now considers a sidecar that has been terminated for
any reason to be a success.
bahetiamit added a commit to bahetiamit/pipeline that referenced this issue Oct 9, 2020
bahetiamit added a commit to bahetiamit/pipeline that referenced this issue Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
3 participants