-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STS pod stuck pending until deleted #3420
Comments
Can you show the pod metadata? It might be stuck deleting. |
Here's from another pod currently stuck pending (minor redactions for envs):
Node |
Some additional context: Karpenter consolidation is enabled in this cluster, and this pod was running at some point (before stuck Pending). It seems like there's a weird interplay between this STS that has a priority class higher than other services, karpenter consolidation, and this service's podAntiAffinity which forbids replicas from running on the same node. In this stuck state, there's only 2 nodes that the STS is allowed to run on (via node selector), but a target of 3 replicas. So during consolidation Karpenter got rid of one essential nodes (probably the pod priority trying to evict other stuff on a node that Karpenter chose for deprovisioning). |
Browsing through the Since
Looking at pod & node ages, this theory seems plausible: I've seen these STS pods sometimes be delayed to initialize due to the EBS CSI driver struggling to keep up with node rotations (~4-5 minute delay, i.e. longer than it takes a new node to become Ready). There's probably a Kubernetes-specific bug report here for it not clearing |
@tzneal is deeper on this than me. |
Thanks for giving this all a read! I'll continue to try to come up with a more concrete reproduction if y'all think it'd be helpful. |
FWIW seems like the invalid |
Karpenter is ignoring the pod since it has a nominated node name so we expect it to schedule. We trust the scheduler to be correct, and in this case it's not. If you can upgrade to v1.23, that should resolve it. I'll look into it a bit on the Karpenter side. |
Would it be reasonable to have consolidation ignore the presence While unrelated to this OP issue, we've also seen Karpenter consolidate & have to re-provision nodes JIT for pods stuck in CrashLoopBackoff. It feels like having "to be scheduled" pods considered in (preventing) consolidation is reasonable (would solve OP issue & node thrashing due to CrashLoopBackoff). Should I test if Kubernetes can get itself unstuck by manually adding a node (but not manually clearing |
I looked at a few options for solving this, but none of them are great and open up possibilities for other problems (either under-provisioning or over-provisioning). Just ignoring nominated node name when provisioning can cause over-provisioning during pod evictions. We don't treat CrashLoopBackoff pods any differently that I'm aware of. If you run into issues with this, file another issue and include Karpenter logs. |
Thanks for all the time spent looking at this. I'll close this since we've narrowed down the issue to be a Kubernetes bug (& without a great way for Karpenter to work around it). |
Version
Karpenter Version: v0.22.1
Kubernetes Version: v1.22.16-eks-ffeb93d
Expected Behavior
Karpenter should provision a new node so that the pod is able to be scheduled.
Actual Behavior
Karpenter is ignoring the pod all together. There's no
karpenter
events associated with the pod, and restarting Karpenter doesn't help—the pod must be deleted.Steps to Reproduce the Problem
I haven't been able to narrow down the exact reproduction yet, but it seems to only be affecting STS and we've experienced it a half dozen or so times over the past 2-3 weeks. I'm hoping that the logs provided can indicate the smoking gun causing this.
The only thing that stood out to me is that the pod had
nominatedNodeName: ip-10-0-45-138.ap-northeast-3.compute.internal
in its status, but that node no longer existed.Along with this issue #1051, it seemed like the most likely cause.
Resource Specs and Logs
The stuck pod was 27h old at the time this was grabbed:
Pod events:
Pod conditions / status:
Long form
status
:Community Note
The text was updated successfully, but these errors were encountered: