-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod StartError
seems to be ignored
#4011
Comments
Fix operator.go#1073: log.Infof("Processing ready daemon pod: %v", pod.ObjectMeta.SelfLink)
}
for _, s := range append(pod.Status.InitContainerStatuses, pod.Status.ContainerStatuses...) {
t := s.State.Terminated
if t != nil && t.ExitCode > 0 {
newPhase, message = inferFailedReason(pod)
}
} |
This has not been seen in the wild. Maybe a rare K3S only issue, for example. Fix might create new bugs. |
I'm running into this exact issue on a production EKS cluster. Sometimes, a pod within a workflow will fail with
When this happens, the workflow in Argo remains stuck in Granted this error shouldn't happen in the first place, and it seems related to a specific AMI or kernel version (see aws/karpenter-provider-aws#7510), but when it does happen it's problematic that there's no way to handle it in Argo. |
Summary
Pod failed to start - workflows should have errored. But remained
Running
Diagnostics
What version of Argo Workflows are you running?
master
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: