-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Do not update MS status when unable to get workload cluster or machine node #10436
Closed
+10
−3
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't address:
Specifically:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sbueringer.
This PR does not update any fields of MS.Status when unable to get workload cluster or machine Node due to ErrClusterLocked or any other errors. Because the ErrClusterLocked error can be recovered soon after reconciling again, and the error that cannot get machine Node (e.g. network issue, or apiserver unavailable temporarily) should also be recovered soon after reconciling again. Are there other "unhappy path" you meant?
Since this PR returns error at line 891, so it won't update ms.Status at line 904 and newStatus.ObservedGeneration at line 924. This is as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, apiserver down for a longer period of time / indefinitely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the apiserver goes down for a longer period of time or indefinitely, KCP controller should catch it and handle correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KCP won't always be able to recover (if that's what you meant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fabriziopandini as discussed with @sbueringer in current thread, IMHO this PR provides a simple patch to solve the MS status update issue described in #10195, and there would be no need to introduce the suggested handling code (provided by @sbueringer and IMHO a little complex)
BTW I was on vacation last week so sorry for the late reply. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into it as soon as possible, but we already invested a lot of time in trying to find an acceptable way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to express above that it's not acceptable that the MS status is permanently not updated anymore when a workload cluster stays unreachable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I described that
2) in case the apiserver goes down for a longer period of time or indefinitely, the MS status won't be updated, but KCP controller should catch it and handle correctly. In this case updating MS status won't help.
3) Even if we want to update MS status in this special case, it would be not good to set th MS ready replicas to 0, instead IMHO a new status UnknownReplicas should be added (but this changes the current API definition and is not a trivial change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure updating the MS status won't contribute to KCP trying to remediate the problem, but it will actually keep the status up-to-date. The status is much more than just the replica fields