You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a Kubernetes backend, if any container that is part of a step fails to pull an image and gets stuck in an ImagePullBackOff error, the step will just keep running indefinitely, with no feedback for the user.
I think the expected behavior here would be something along these lines:
Woodpecker to try to pull the image for a while
If it fails (after a timeout) it displays a error to the user informing that it timed out/failed to pull the specific image
It fails the step
It terminates the pod on the cluster
I'd assume that similar errors can happen if other issues cause a Pod to be in a pending state (for example, there are no nodes available in the cluster). Maybe a similar "timeout" strategy could be implemented to deal with all these similar scenarios?
Note: canceling the pipeline terminates the pipeline and terminates the pod, but marks the pipeline as successful, which is another issue.
Here's a sample I did to showcase the issue (it's running in an internal Woodpecker cluster based on Woodpecker 2.3 so I can't share an open link).
I have built a pipeline where I have referenced an image that does not exist, image: broken-image-ref.
Here's the result. It just stays stuck on the broken step, indefinitely (or at least possibility until the pipeline timeout; didn't get to wait that long) without logging anything.
If I go look at this pod in my cluster, I can see that it is stuck with the ImagePullBackOff error:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m55s default-scheduler Successfully assigned woodpecker-pipelines/wp-01hsvnwbdgge7msffe0qn6zz68 to < redacated >
Normal SuccessfulAttachVolume 4m45s attachdetach-controller AttachVolume.Attach succeeded for volume < redacated >
Warning Failed 3m25s (x6 over 4m43s) kubelet Error: ImagePullBackOff
Normal Pulling 3m10s (x4 over 4m44s) kubelet Pulling image "broken-image-ref"
Warning Failed 3m10s (x4 over 4m43s) kubelet Failed to pull image "broken-image-ref": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/broken-image-ref:latest": failed to resolve reference "docker.io/library/broken-image-ref:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning Failed 3m10s (x4 over 4m43s) kubelet Error: ErrImagePull
Normal BackOff 2m58s (x7 over 4m43s) kubelet Back-off pulling image "broken-image-ref"
close: #3555
Put the same logic from `waitStep` and call the function
`isImagePullBackOffState` in the `tailStep` function.
---------
Co-authored-by: elias.souza <elias.souza@quintoandar.com.br>
Co-authored-by: Anbraten <6918444+anbraten@users.noreply.github.com>
Component
server
Describe the bug
On a Kubernetes backend, if any container that is part of a step fails to pull an image and gets stuck in an ImagePullBackOff error, the step will just keep running indefinitely, with no feedback for the user.
I think the expected behavior here would be something along these lines:
I'd assume that similar errors can happen if other issues cause a Pod to be in a pending state (for example, there are no nodes available in the cluster). Maybe a similar "timeout" strategy could be implemented to deal with all these similar scenarios?
Note: canceling the pipeline terminates the pipeline and terminates the pod, but marks the pipeline as successful, which is another issue.
System Info
Additional context
Here's a sample I did to showcase the issue (it's running in an internal Woodpecker cluster based on Woodpecker 2.3 so I can't share an open link).
I have built a pipeline where I have referenced an image that does not exist,
image: broken-image-ref
.Here's the result. It just stays stuck on the broken step, indefinitely (or at least possibility until the pipeline timeout; didn't get to wait that long) without logging anything.
If I go look at this pod in my cluster, I can see that it is stuck with the ImagePullBackOff error:
Validations
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]The text was updated successfully, but these errors were encountered: