Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"gke-resource-quotas" error updating resources #2346

Closed
mattmoor opened this issue Aug 9, 2020 · 9 comments · Fixed by knative/networking#79
Closed

"gke-resource-quotas" error updating resources #2346

mattmoor opened this issue Aug 9, 2020 · 9 comments · Fixed by knative/networking#79
Labels
bug Something isn't working lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@mattmoor
Copy link
Member

mattmoor commented Aug 9, 2020

I have been intermittently seeing flakes with:

Operation cannot be fulfilled on resourcequotas "gke-resource-quotas": the object has been modified; please apply your changes to the latest version and try again

an example

I think @vaikas mentioned something like this in eventing as well.

@mattmoor
Copy link
Member Author

another example

@chizhg
Copy link
Member

chizhg commented Aug 12, 2020

It seems to be a Kubernetes bug - argoproj/argo-workflows#3217?

@mattmoor
Copy link
Member Author

Here's the kubernetes bug: kubernetes/kubernetes#67761

mattmoor added a commit to mattmoor/networking that referenced this issue Aug 12, 2020
@mattmoor
Copy link
Member Author

mattmoor commented Aug 12, 2020

Well, given that the issue is still open and kingress conformance creates pods, svcs, and ingresses all of which go through the resourcequota I figured I'd port Ville's hack to kingress conformance: knative/networking#79

@mattmoor
Copy link
Member Author

cc @tcnghia for context

knative-prow-robot pushed a commit to knative/networking that referenced this issue Aug 12, 2020
mattmoor added a commit to mattmoor/networking that referenced this issue Aug 12, 2020
knative-prow-robot pushed a commit to knative/networking that referenced this issue Aug 12, 2020
* Retry Creates on resourcequota conflicts. (#79)

This follows @vaikas PR here: knative/eventing#3215

The kubernetes issue is tracked here: kubernetes/kubernetes#67761

Fixes: knative/test-infra#2346

* Include Get in the UpdateRetry. (#82)

This adjusts the update retry around the ingress update in #79 to include the Get.

The original change was to guard against issues updating gke-resource-quotas, but there is a low incidence of conflicts simply updating the kingress itself.

Here's an example from net-contour:

```
=== CONT  TestIngressConformance/5/update
    update.go:88: Error updating Ingress: Operation cannot be fulfilled on ingresses.networking.internal.knative.dev "ingress-conformance-5-update-eghinekn": the object has been modified; please apply your changes to the latest version and try again
```

However, to resolve this, we actually have to refetch the kingress shell we've stuck the desired IngressSpec into otherwise it will just retry until it has exhausted its attempts because the resourceVersion we're sending back is never changed (and this is what the optimistic concurrency keys off of).
@mattmoor mattmoor reopened this Sep 4, 2020
@mattmoor
Copy link
Member Author

mattmoor commented Sep 4, 2020

Awesome, so this is back, but in a form that our previous workaround no longer works:

visibility.go:332: Error creating Pod: Operation cannot be fulfilled on resourcequotas "gke-resource-quotas": StorageError: invalid object, Code: 4, Key: /registry/resourcequotas/serving-tests/gke-resource-quotas, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 7aaedbdf-caa8-41e7-94cb-f8c053038e86, UID in object meta:

@mattmoor
Copy link
Member Author

mattmoor commented Sep 4, 2020

@mattmoor
Copy link
Member Author

mattmoor commented Sep 9, 2020

Looks like: kubernetes/kubernetes#82130

mattmoor added a commit to mattmoor/pkg that referenced this issue Sep 9, 2020
This is an expansion of the workaround that vaikas initially added to address knative/test-infra#2346, however, a few new error types have emerged, which this will retry.
knative-prow-robot pushed a commit to knative/pkg that referenced this issue Sep 9, 2020
This is an expansion of the workaround that vaikas initially added to address knative/test-infra#2346, however, a few new error types have emerged, which this will retry.
@github-actions
Copy link

github-actions bot commented Dec 8, 2020

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen.Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 8, 2020
@github-actions github-actions bot closed this as completed Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants