Step freezes when container image can't be pulled (ImagePullBackOff) #3555

fernandrone · 2024-03-25T21:01:59Z

Component

server

Describe the bug

On a Kubernetes backend, if any container that is part of a step fails to pull an image and gets stuck in an ImagePullBackOff error, the step will just keep running indefinitely, with no feedback for the user.

I think the expected behavior here would be something along these lines:

Woodpecker to try to pull the image for a while
If it fails (after a timeout) it displays a error to the user informing that it timed out/failed to pull the specific image
It fails the step
It terminates the pod on the cluster

I'd assume that similar errors can happen if other issues cause a Pod to be in a pending state (for example, there are no nodes available in the cluster). Maybe a similar "timeout" strategy could be implemented to deal with all these similar scenarios?

Note: canceling the pipeline terminates the pipeline and terminates the pod, but marks the pipeline as successful, which is another issue.

System Info

{"source":"https://github.com/woodpecker-ci/woodpecker","version":"2.3.0"}

Additional context

Here's a sample I did to showcase the issue (it's running in an internal Woodpecker cluster based on Woodpecker 2.3 so I can't share an open link).

I have built a pipeline where I have referenced an image that does not exist, image: broken-image-ref.

Here's the result. It just stays stuck on the broken step, indefinitely (or at least possibility until the pipeline timeout; didn't get to wait that long) without logging anything.

If I go look at this pod in my cluster, I can see that it is stuck with the ImagePullBackOff error:

...
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               4m55s                  default-scheduler        Successfully assigned woodpecker-pipelines/wp-01hsvnwbdgge7msffe0qn6zz68 to < redacated >
  Normal   SuccessfulAttachVolume  4m45s                  attachdetach-controller  AttachVolume.Attach succeeded for volume  < redacated >
  Warning  Failed                  3m25s (x6 over 4m43s)  kubelet                  Error: ImagePullBackOff
  Normal   Pulling                 3m10s (x4 over 4m44s)  kubelet                  Pulling image "broken-image-ref"
  Warning  Failed                  3m10s (x4 over 4m43s)  kubelet                  Failed to pull image "broken-image-ref": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/broken-image-ref:latest": failed to resolve reference "docker.io/library/broken-image-ref:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  Warning  Failed                  3m10s (x4 over 4m43s)  kubelet                  Error: ErrImagePull
  Normal   BackOff                 2m58s (x7 over 4m43s)  kubelet                  Back-off pulling image "broken-image-ref"

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]

The text was updated successfully, but these errors were encountered:

close: #3555 Put the same logic from `waitStep` and call the function `isImagePullBackOffState` in the `tailStep` function. --------- Co-authored-by: elias.souza <elias.souza@quintoandar.com.br> Co-authored-by: Anbraten <6918444+anbraten@users.noreply.github.com>

fernandrone added the bug Something isn't working label Mar 25, 2024

qwerty287 added the backend/kubernetes label Mar 26, 2024

qwerty287 added this to the 2.x.x milestone Mar 26, 2024

eliasscosta mentioned this issue Apr 1, 2024

Handle ImagePullBackOff pod status #3580

Merged

anbraten closed this as completed in #3580 Apr 15, 2024

qwerty287 modified the milestones: 2.x.x, 2.5.0 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step freezes when container image can't be pulled (ImagePullBackOff) #3555

Step freezes when container image can't be pulled (ImagePullBackOff) #3555

fernandrone commented Mar 25, 2024 •

edited

Loading

Step freezes when container image can't be pulled (ImagePullBackOff) #3555

Step freezes when container image can't be pulled (ImagePullBackOff) #3555

Comments

fernandrone commented Mar 25, 2024 • edited Loading

Component

Describe the bug

System Info

Additional context

Validations

fernandrone commented Mar 25, 2024 •

edited

Loading