-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controlplane upgrades prep #1947
Controlplane upgrades prep #1947
Conversation
709d215
to
7ec67d4
Compare
e2e-gcp-op failure is an infra flake (hitting limits) |
7ec67d4
to
ba51ce8
Compare
The baseline cluster tests aren't really covering any of the new code here; the upgrade and Also man...we really need to solve the problem of having controller pod logs go away when we upgrade and they get rescheduled. |
Prep for futher work.
Will be used by future work to do more work on control plane nodes.
ba51ce8
to
07589f8
Compare
OK cool, seeing the expected logs in e2e-gcp-op; this one should be good to go! Dunno about metal-ipi. |
/approve @openshift/openshift-team-mco-maintainers ptal |
Metal-ipi job is dead. Let's try these 2 again. /test e2e-gcp-op |
Gah I meant /test e2e-gcp-upgrade |
/retest |
Today the MCO arbitrarily chooses nodes to update from the set of candidates. For the control plane, we want to update etcd followers first, deferring the update of the leader until last. In the future, we might want to do something more intelligent for workers too. Factor out some logic in the node controller for this, including a stub for finding the current etcd leader, though it's all still a no-op.
07589f8
to
cf12209
Compare
Ooh hey, I had a logic error there causing us to try to update all controlplane nodes at once...hooray for CI tests. |
OK e2e-gcp-upgrade is unbroken. Hmm interesting, some of Prometheus test failures in e2e-gcp-op seem to be unique. |
/test e2e-aws |
This one should be good to go, can I get a lgtm? |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, runcom, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
Can't affect rhel7 nodes really |
@cgwalters: /override requires a failed status context to operate on.
Only the following contexts were expected:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-aws-scaleup-rhel7 |
@cgwalters: Overrode contexts on behalf of cgwalters: ci/prow/e2e-aws-scaleup-rhel7 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is prep for fixing the upgrade problems, and I'm confident now we're not breaking anything new here. |
@cgwalters: Overrode contexts on behalf of cgwalters: ci/prow/e2e-gcp-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Catching up on all the issues behind this PR. I don't like adding new etcd-specific logic to the MCO. It feels like this moves us backwards from separating out etcd concerns into its own operator. Can we do this in a more component-agnostic way? Something like, a node label or annotation that specifies upgrade priority classes maybe? |
Similar requirement came up here: #662 (comment) |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
Let's debate in #1897 ? |
Prep work for #1946 - basically stubbing out infrastructure. See patches for details.