-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1955300: tighten operator availability conditions #2721
Bug 1955300: tighten operator availability conditions #2721
Conversation
f8151de
to
365dee5
Compare
Tests passed but unfortunately must gather failed
/retest |
@kikisdeliveryservice: This pull request references Bugzilla bug 1955300, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The clusteroperator version object is updated to the new version when Progressing=False. However, syncAvailableStatus is using the incoming version in its status before syncProgressing officially updates the clusteroperator object. This yields available at the incoming version before we are finished.
The most common sync error that was see is RequiredPoolsFailed, which does not mean that the operator itself is impaired. Let's only set Available = False when operand syncs fail.
365dee5
to
ecd414c
Compare
/test e2e-aws |
@kikisdeliveryservice: This pull request references Bugzilla bug 1955300, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
1 similar comment
/retest |
/retest-required |
/skip |
/test e2e-agnostic-upgrade |
/bugzilla refresh The requirements for Bugzilla bugs have changed, recalculating validity. |
@openshift-merge-robot: This pull request references Bugzilla bug 1955300, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, curious, does this put us on par with the Openshift operator availability definition? Or is there more than just the requiredPools available=false scenario?
One more question below:
/bugzilla refresh |
@kikisdeliveryservice: This pull request references Bugzilla bug 1955300, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Would like to get this in to get some ci runs to further iterate. Can I please get a LGTM as it's a high priority BZ. |
will try to review it by tomorrow unless Jerry gets to it before that. |
We can do any followup work on messaging in subsequent PR about #2721 (comment) . Considering priority, let's get it merged as this would help Kirsten to monitor further logs in CI and periodic runs. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kikisdeliveryservice, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@kikisdeliveryservice: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@kikisdeliveryservice: An error was encountered updating to the MODIFIED state for bug 1955300 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
code 32000: Subcomponet is mandatory for the component 'Machine Config Operator' in the product 'OpenShift Container Platform'.
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@sdodson: An error was encountered updating to the MODIFIED state for bug 1955300 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
code 32000: Subcomponet is mandatory for the component 'Machine Config Operator' in the product 'OpenShift Container Platform'.
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@sdodson: All pull requests linked via external trackers have merged: Bugzilla bug 1955300 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.9 |
@kikisdeliveryservice: new pull request created: #2946 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The MCO available status should only be set to False when we believe that the operator/operands have problems/aren't working correctly. An Operator can be degraded (such as in the case of a degraded master pool) but that does not mean that the operator itself is Unavailable and not functioning properly.
We are setting Available = False when we have Degraded = True across the board and by far this is being set when we have a problem with syncRequiredPools. Pool issues do not mean that the operator itself has a problem (if it did it wouldve failed the other syncs first). Instead, let syncAvailable() look at the results of the syncs itself and only set Available = False when the syncing of operands, etc.. fails (which occurs much less often). When those other syncs fail, we have good reason to believe something is fundamentally wrong with the MCO thus setting Available = False
Also fix Available message version. We are using the incoming version as Available at before the version is fully rolled out to the MCO. Instead we should be looking at the clusteroperator version and using that instead.