fix: race condition for ingress resource status #1931
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
A race condition was found when using a
LoadBalancer
type service for the Kong proxy which in some deployments could trigger a permanent failure to resolveIngress
resource statuses. This patch ensures that we don't expect theLoadBalancer
to be immediately ready when the controller manager first starts, and waits to start the status update goroutine until it's resolved. Informational logging is also provided to help indicate if the controller manager is "stuck" because theLoadBalancer
is not being provisioned, instead of the previous behavior where this would just fail silently.Which issue this PR fixes
Fixes #1925
Special notes for your reviewer:
I could only trigger this issue on GKE because the
LoadBalancer
provisioning can be quite slow there, this fix has been tested and verified in that environment manually. I'm looking into why our GKE tests missed this problem, but that's more about protecting against future regressions so I don't think that should hold up the patch given the light touch and laser focus of this patch, and the manual testing and verification.It is my opinion that we might want to consider an alternative status update implementation for future releases, but for this PR I simply stuck with a light touch and a fix for the immediate problem at hand. I previously reported on the potential issues with our current implementation in #1492 and that may be considered a follow up.
PR Readiness Checklist:
CHANGELOG.md
release notes have been updated