Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlplane upgrades prep #1947

Merged

Conversation

cgwalters
Copy link
Member

Prep work for #1946 - basically stubbing out infrastructure. See patches for details.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2020
@cgwalters cgwalters force-pushed the controlplane-upgrades-prep branch from 709d215 to 7ec67d4 Compare July 24, 2020 15:08
@ashcrow
Copy link
Member

ashcrow commented Jul 24, 2020

e2e-gcp-op failure is an infra flake (hitting limits)

@cgwalters cgwalters force-pushed the controlplane-upgrades-prep branch from 7ec67d4 to ba51ce8 Compare July 24, 2020 17:36
@cgwalters
Copy link
Member Author

The baseline cluster tests aren't really covering any of the new code here; the upgrade and e2e-gcp-op tests are most useful.

Also man...we really need to solve the problem of having controller pod logs go away when we upgrade and they get rescheduled.

Will be used by future work to do more work on control plane
nodes.
@cgwalters cgwalters force-pushed the controlplane-upgrades-prep branch from ba51ce8 to 07589f8 Compare July 24, 2020 18:47
@cgwalters
Copy link
Member Author

OK cool, seeing the expected logs in e2e-gcp-op; this one should be good to go!
e2e-gcp-upgrade is quite red across the board which...a lot of that will hopefully what will be fixed by #1946 ! We'll see.

Dunno about metal-ipi.

@runcom
Copy link
Member

runcom commented Jul 27, 2020

/approve

@openshift/openshift-team-mco-maintainers ptal

@kikisdeliveryservice
Copy link
Contributor

Metal-ipi job is dead. Let's try these 2 again.

/test e2e-gcp-op
/test e2e-aws-scaleup-rhel7

@kikisdeliveryservice
Copy link
Contributor

Gah I meant

/test e2e-gcp-upgrade

@sinnykumari
Copy link
Contributor

/retest

Today the MCO arbitrarily chooses nodes to update from the
set of candidates.  For the control plane, we want to update
etcd followers first, deferring the update of the leader until
last.

In the future, we might want to do something more intelligent
for workers too.

Factor out some logic in the node controller for this, including
a stub for finding the current etcd leader, though it's all still
a no-op.
@cgwalters cgwalters force-pushed the controlplane-upgrades-prep branch from 07589f8 to cf12209 Compare July 28, 2020 20:07
@cgwalters
Copy link
Member Author

cgwalters commented Jul 28, 2020

Ooh hey, I had a logic error there causing us to try to update all controlplane nodes at once...hooray for CI tests.

@cgwalters
Copy link
Member Author

OK e2e-gcp-upgrade is unbroken.

Hmm interesting, some of Prometheus test failures in e2e-gcp-op seem to be unique.

@cgwalters
Copy link
Member Author

/test e2e-aws
/test e2e-gcp-op

@cgwalters
Copy link
Member Author

This one should be good to go, can I get a lgtm?

@sinnykumari
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,runcom,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member Author

Can't affect rhel7 nodes really
/override e2e-aws-scaleup-rhel7

@openshift-ci-robot
Copy link
Contributor

@cgwalters: /override requires a failed status context to operate on.
The following unknown contexts were given:

  • e2e-aws-scaleup-rhel7

Only the following contexts were expected:

  • ci/prow/e2e-aws
  • ci/prow/e2e-aws-scaleup-rhel7
  • ci/prow/e2e-gcp-op
  • ci/prow/e2e-gcp-upgrade
  • ci/prow/e2e-metal-ipi
  • ci/prow/e2e-ovn-step-registry
  • ci/prow/images
  • ci/prow/okd-e2e-aws
  • ci/prow/okd-images
  • ci/prow/unit
  • ci/prow/verify
  • tide

In response to this:

Can't affect rhel7 nodes really
/override e2e-aws-scaleup-rhel7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

/override ci/prow/e2e-aws-scaleup-rhel7

@openshift-ci-robot
Copy link
Contributor

@cgwalters: Overrode contexts on behalf of cgwalters: ci/prow/e2e-aws-scaleup-rhel7

In response to this:

/override ci/prow/e2e-aws-scaleup-rhel7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

This is prep for fixing the upgrade problems, and I'm confident now we're not breaking anything new here.
/override ci/prow/e2e-gcp-upgrade

@openshift-ci-robot
Copy link
Contributor

@cgwalters: Overrode contexts on behalf of cgwalters: ci/prow/e2e-gcp-upgrade

In response to this:

This is prep for fixing the upgrade problems, and I'm confident now we're not breaking anything new here.
/override ci/prow/e2e-gcp-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ericavonb
Copy link
Contributor

Catching up on all the issues behind this PR. I don't like adding new etcd-specific logic to the MCO. It feels like this moves us backwards from separating out etcd concerns into its own operator. Can we do this in a more component-agnostic way?

Something like, a node label or annotation that specifies upgrade priority classes maybe?

@ericavonb
Copy link
Contributor

Similar requirement came up here: #662 (comment)

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit cd92558 into openshift:master Jul 30, 2020
@cgwalters
Copy link
Member Author

Catching up on all the issues behind this PR. I don't like adding new etcd-specific logic to the MCO. It feels like this moves us backwards from separating out etcd concerns into its own operator. Can we do this in a more component-agnostic way?

Let's debate in #1897 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants