-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operation cannot be fulfilled on resourcequotas "gke-resource-quotas": the object has been modified; please apply your changes to the latest version and try again #3217
Comments
Argo Workflow never changes resource quotas directly, so it unless the change to resource quotes somehow as a side-effect of creating pods, we are not sure what is happening here. |
It's a Kubernetes bug. kubernetes/kubernetes#67761 But it's a bug that seems fairly straightforward to work around in Argo; just retry if this particular failure condition is hit. @Ziyang2go I think you said you had a hack fix, can you attach it? |
@bryanlarsen are you able to provide more information about the fix please? E.g. some code? |
|
Must match on error message: |
@bryanlarsen The workaround that we did is to requeue the workflow when we have create pod error rather than fail the workflow. That can work for us because we have pre-defined workflow template and the only error that we saw in creating pod is this error and it can resolve itself in the retry process. we are on argo version 2.4.3 and here is the patch that we made:
|
@Ziyang2go @bryanlarsen if I create a development build of the workflow controller - could you please test it? Thank you. |
I'm getting the exact same error when trying to use minio as an atrifactory. @alexec I'd be happy to test any such solution |
@iMoses I have pushed |
I'm still getting the same error with the patched image:
|
thank you @iMoses - can you try with https://github.com/argoproj/argo/blob/master/docs/fields.md#fields-14 |
I've created a new image for testing if you would like to try it: |
I've created another test image: argoproj/workflow-controller:fix-3791. Can you please try it out to confirm it fixes your problem? |
Available for testing in v2.11.0-rc1. |
Checklist:
What happened:
Argo workflows failed with the error "Operation cannot be fulfilled on resourcequotas "gke-resource-quotas": the object has been modified; please apply your changes to the latest version and try again"
What you expected to happen:
Workflows complete.
How to reproduce it (as minimally and precisely as possible):
Try and run two workflows simultaneously, each with a parallelism of > 300 or so.
Sample workflows attached.
Anything else we need to know?:
This seems to be a Kubernetes bug, but a workaround in Argo would be much more under our control as we're running in GKE.
kubernetes/kubernetes#67761
Environment:
Other debugging information (if applicable):
time="2020-06-11T18:14:20Z" level=info msg="node &NodeStatus{ID:loops-param-result-b-1634545795,Name:loops-param-result-b[1].sleep(322:2),DisplayName:sleep(322:2),Type:Pod,TemplateName:sleep-n-sec,TemplateRef:nil,Phase:Error,BoundaryID:loops-param-result-b,Message:,StartedAt:2020-06-11 18:14:20.364568768 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:&Inputs{Parameters:[]Parameter{Parameter{Name:seconds,Default:nil,Value:*2,ValueFrom:nil,GlobalName:,},},Artifacts:[]Artifact{},},Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:local/loops-param-result-b,ResourcesDuration:ResourcesDuration{},HostNodeName:,} message: Operation cannot be fulfilled on resourcequotas \"gke-resource-quotas\": the object has been modified; please apply your changes to the latest version and try again" namespace=argo workflow=loops-param-result-b
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: