Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun reports as successful when the pod was evicted #6145

Closed
drewbailey opened this issue Feb 10, 2023 · 6 comments
Closed

TaskRun reports as successful when the pod was evicted #6145

drewbailey opened this issue Feb 10, 2023 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@drewbailey
Copy link
Contributor

Expected Behavior

When a pod/container is evicted, the taskrun should fail, and include a reason/message associated to the eviction

Actual Behavior

the task run reports successful or shows an exit code / message from a container (137 reason Failed, not evicted)

Steps to Reproduce the Problem

  1. Run a pipeline with a emptdir workspace that has a size limit
  - emptyDir:
      sizeLimit: 10Gi
    name: workspace-user-repo
  1. Task executes code that exceeds the limit

Sometimes the taskrun will fail correctly stating that there was an eviction,

          message: 'Usage of EmptyDir volume "ws-hdl48" exceeds the limit "10Gi". '
          reason: Failed
          status: "False"

Other times it does not In these cases the container itself contains the eviction error. This seems to maybe be a race between the containers in the pod finishing and the eviction taking place?

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"8f94681cd294aa8cfd3407b8191f6c70214973a4", GitTreeState:"clean", BuildDate:"2023-01-18T15:51:24Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.16-eks-ffeb93d", GitCommit:"52e500d139bdef42fbc4540c357f0565c7867a81", GitTreeState:"clean", BuildDate:"2022-11-29T18:41:42Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

Client version: 0.29.1
Pipeline version: v0.40.2

Task run status

status:
  completionTime: "2023-02-10T17:19:38Z"
  conditions:
  - lastTransitionTime: "2023-02-10T17:19:38Z"
    message: All Steps have completed executing
    reason: Succeeded
    status: "True"
    type: Succeeded
  podName: c4382b4f-af66-478e-b603-0b9b5a2f9127-linux-amd64-user-code-pod
  sidecars:
  - container: sidecar-ssh-key-setup-sidecar
    name: ssh-key-setup-sidecar
    terminated:
      containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
      exitCode: 137
      finishedAt: "2023-02-10T17:20:08Z"
      reason: Error
      startedAt: "2023-02-10T17:18:10Z"
  startTime: "2023-02-10T17:18:02Z"
  steps:
  - container: step-place-tools
    imageID: 
    name: place-tools
    terminated:
      containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
      exitCode: 0
      finishedAt: "2023-02-10T17:18:17Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:17Z"
  - container: step-linux-amd64-user-code-extract-workspace-user-repo
    imageID: 
    name: linux-amd64-user-code-extract-workspace-user-repo
    terminated:
      containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
      exitCode: 0
      finishedAt: "2023-02-10T17:18:17Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:17Z"
  - container: step-linux-amd64-user-code-extract-workspace-root-cache
    name: linux-amd64-user-code-extract-workspace-root-cache
    terminated:
      containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
      exitCode: 0
      finishedAt: "2023-02-10T17:18:45Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:17Z"
  - container: step-user-code
    name: user-code
    terminated:
      containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
      exitCode: 0
      finishedAt: "2023-02-10T17:18:49Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:46Z"
  - container: step-linux-amd64-user-code-persist-workspace-root-cache
    name: linux-amd64-user-code-persist-workspace-root-cache
    terminated:
      containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
      exitCode: 0
      finishedAt: "2023-02-10T17:19:35Z"
      reason: Completed
      startedAt: "2023-02-10T17:18:49Z"
  - container: step-linux-amd64-user-code-persist-workspace-user-repo
    name: linux-amd64-user-code-persist-workspace-user-repo
    terminated:
      containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
      exitCode: 0
      finishedAt: "2023-02-10T17:19:35Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:35Z"
  - container: step-linux-amd64-user-code-extract-workspace-private
    name: linux-amd64-user-code-extract-workspace-private
    terminated:
      containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
      exitCode: 0
      finishedAt: "2023-02-10T17:19:36Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:36Z"
  - container: step-post-user-code
    name: post-user-code
    terminated:
      containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
      exitCode: 0
      finishedAt: "2023-02-10T17:19:37Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:36Z"
  - container: step-linux-amd64-user-code-persist-workspace-private
    name: linux-amd64-user-code-persist-workspace-private
    terminated:
      containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
      exitCode: 0
      finishedAt: "2023-02-10T17:19:37Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:37Z"
  - container: step-finally-exit
    name: finally-exit
    terminated:
      containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
      exitCode: 0
      finishedAt: "2023-02-10T17:19:38Z"
      reason: Completed
      startedAt: "2023-02-10T17:19:38Z"

Pod status & container statuses

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:06Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:17Z"
    reason: PodFailed
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:17Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-02-10T17:18:02Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
    lastState: {}
    name: sidecar-ssh-key-setup-sidecar
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://336d7a36b4598f653f94b16840879980ed2d2ff74ba50e1c3983171433d16eb3
        exitCode: 137
        finishedAt: "2023-02-10T17:20:08Z"
        reason: Error
        startedAt: "2023-02-10T17:18:10Z"
  - containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
    lastState: {}
    name: step-finally-exit
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://135240c44b4059d60cf3b376c35f959ad4bb9b854d5902cc7d2b59b5d9f6df88
        exitCode: 0
        finishedAt: "2023-02-10T17:19:38Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:38.142Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:10Z"
  - containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
    lastState: {}
    name: step-linux-amd64-user-code-extract-workspace-private
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://5e8a1a38767009a8b05efd46635f29bc96edd5698ccea5dd93579d132632c50a
        exitCode: 0
        finishedAt: "2023-02-10T17:19:36Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:36.131Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:09Z"
  - containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
    lastState: {}
    name: step-linux-amd64-user-code-extract-workspace-root-cache
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://3dd259a891d7a5639aecb1ae5d98c8f878fa5ed1945cf05ffbd1cf9813ccd7f5
        exitCode: 0
        finishedAt: "2023-02-10T17:18:45Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.687Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:07Z"
  - containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
    lastState: {}
    name: step-linux-amd64-user-code-extract-workspace-user-repo
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://642849d43d7a5ac81d33952c6a8ecd1184cf82d74fda0ee240c53cecc5447d04
        exitCode: 0
        finishedAt: "2023-02-10T17:18:17Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.346Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:07Z"
  - containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
    lastState: {}
    name: step-linux-amd64-user-code-persist-workspace-private
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://84aa4feb2bef2e6e4fe239f987a2c69fa91dd0039ded8fcb8f85057d6c4d0965
        exitCode: 0
        finishedAt: "2023-02-10T17:19:37Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:37.820Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:09Z"
  - containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
    lastState: {}
    name: step-linux-amd64-user-code-persist-workspace-root-cache
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://5887ed9ad53114bd729ae07a1e6127438f8ec2fb157e123c8d65a5f60fd9f682
        exitCode: 0
        finishedAt: "2023-02-10T17:19:35Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:49.447Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:08Z"
  - containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
    lastState: {}
    name: step-linux-amd64-user-code-persist-workspace-user-repo
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://b769e06ea058d600cc729d1cc096a31c11ac0cddaa12960099911ff99a9b21bb
        exitCode: 0
        finishedAt: "2023-02-10T17:19:35Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:35.792Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:08Z"
  - containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
    lastState: {}
    name: step-place-tools
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://084b76ebd5dd2966941e141ab83f454d6a142c12a89a685cf4fd47baf3a6c33b
        exitCode: 0
        finishedAt: "2023-02-10T17:18:17Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:17.015Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:07Z"
  - containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
    lastState: {}
    name: step-post-user-code
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://ef423030366d101e17833e0b6fb81db300704002bb714f5d99282626466f3b70
        exitCode: 0
        finishedAt: "2023-02-10T17:19:37Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:19:36.478Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:09Z"
  - containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
    lastState: {}
    name: step-user-code
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: docker://8bc929f05506f2f5ca702ff113eea0d1b84f4db2016c4b2bcb5e843ce5fb6e7d
        exitCode: 0
        finishedAt: "2023-02-10T17:18:49Z"
        message: '[{"key":"StartedAt","value":"2023-02-10T17:18:46.095Z","type":3}]'
        reason: Completed
        startedAt: "2023-02-10T17:18:08Z"
  hostIP: 172.18.56.28
  initContainerStatuses:
  - containerID: docker://afcf639aac8badfbeb0bf04e64f1b904db2ab9d68b9f5f0bce59672885d6f8a5
    lastState: {}
    name: prepare
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://afcf639aac8badfbeb0bf04e64f1b904db2ab9d68b9f5f0bce59672885d6f8a5
        exitCode: 0
        finishedAt: "2023-02-10T17:18:04Z"
        reason: Completed
        startedAt: "2023-02-10T17:18:04Z"
  - containerID: docker://97783ec41d3e64e053d4615f8665b662b0d48610e8322188dfcb80371af5ec7a
    lastState: {}
    name: place-scripts
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://97783ec41d3e64e053d4615f8665b662b0d48610e8322188dfcb80371af5ec7a
        exitCode: 0
        finishedAt: "2023-02-10T17:18:05Z"
        reason: Completed
        startedAt: "2023-02-10T17:18:05Z"
  message: 'Usage of EmptyDir volume "ws-49mb5" exceeds the limit "10Gi". '
  phase: Failed
  qosClass: Burstable
  reason: Evicted
  startTime: "2023-02-10T17:18:02Z"

@drewbailey drewbailey added the kind/bug Categorizes issue or PR as related to a bug. label Feb 10, 2023
@drewbailey
Copy link
Contributor Author

drewbailey commented Feb 10, 2023

It seems like #5646 might handle this error case. Before DidTaskRunFail would check all the ContainerStatuses even if the pod.Status.Phase == corev1.PodFailed. Looking at the container statuses here it seems like #5646 would now cover this case where the eviction happens after the containers exit.

@afrittoli
Copy link
Member

@drewbailey #5646 was merged and included in release v0.45.x - would you be able to verify if this issue could be closed then? Alternatively, would you be interested in designing a test for this case?

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 1, 2023
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants