Inconsistent behavior on deleting a running pod in DAG #3097
Not able to reproduce on the master branch. |
@sarabala1979 I can see a similar thing happening with argo v2.9.0-rc2. I submit the same workflow. If I run
If I run `kubectl delete pod --grace-period=0 , I get
If I run
But sometimes (happened once out of many trials), it does detect that the pod got deleted but after a long time,
The workflow's duration was 44 seconds but the pod's duration was 13 minutes (which is probably the time it took to mark the pod as deleted). |
"pod termination" is a red-herring - this was changed to "pod deleted" 7th May (see #2855). As with The cause of "pod deleted" is someone deleted pod with The outcome is different if you use
The workflow controller is not informed of forced deletion. We can't really help you if you use forced delete - by using it you are saying you don't mind your system being put into bad state. I believe this issues is |
Can you help me understand this?
I do not think that should be the case. Adding
If k8s state is updated properly, shouldn't the workflow controller (which uses k8s for tracking its state) detect this? |
Also, this still doesn't explain why the task is marked as "Succeeded" when |
All I'm really saying is that, from the workflow controller logs, we never see the pod being deleted if you use force. I'm not sure why this is and will speak to my team later. |
@jessesuen and @alexmt - it looks like the pod informer does not get notified if the pod is deleted using |
} Delete without force: {
Bug found:
Deleting pod is not "insignificant" |
What happened:
When a running pod of a DAG from a workflow is deleted, the pod fails with either
What you expected to happen:
Expected the pod to fail with
phase: Error
.How to reproduce it (as minimally and precisely as possible):
Sometimes the pod status is set to
and sometimes toError
.Anything else we need to know?:
Other debugging information (if applicable):
phase: Error
:phase: Failed
argo get -o yaml
forphase: Failed
phase: Failed
argo get -o yaml
forphase: Error
phase: Error
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
