Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade nerc-ocp-test cluster to 4.17 #912

Open
1 of 5 tasks
dystewart opened this issue Jan 30, 2025 · 10 comments
Open
1 of 5 tasks

Upgrade nerc-ocp-test cluster to 4.17 #912

dystewart opened this issue Jan 30, 2025 · 10 comments
Assignees
Labels
openshift This issue pertains to NERC OpenShift

Comments

@dystewart
Copy link

dystewart commented Jan 30, 2025

Motivation

Need to upgrade this cluster from current version 4.15.28 to 4.17

Completion Criteria

Description

  • Determine upgrade path
  • Upgrade nerc-ocp-test
  • Test upgraded OCP version (including with class workloads)
  • Determine prod upgrade maintenance time
  • Upgrade prod cluster

Completion dates

Required - 03/2025

@dystewart dystewart added the openshift This issue pertains to NERC OpenShift label Jan 30, 2025
@dystewart dystewart self-assigned this Jan 30, 2025
@dystewart
Copy link
Author

Here is the upgrade path:
Image

@dystewart
Copy link
Author

@dystewart
Copy link
Author

There is a micro-architecture change in the RHEL 9.2 version underlying OCP in OCP v4.16. rhel-micro-architecture-update-requirements.
So we need to check that x86-64-v2 instruction set is supported on our hosts. We can do so with:

$ oc debug node/master0-example.com 
$ chroot /host
$ /lib64/ld-linux-x86-64.so.2 --help |grep "Subdirectories of glibc-hwcaps directories" -A5

We need to see output like:

  x86-64-v4
  x86-64-v3 (supported, searched) 
  x86-64-v2 (supported, searched) 

Confirmed that x86-64-v2 instruction set is supported on our hosts in nerc-ocp-test.

@dystewart
Copy link
Author

We also have some k8s api removals to consider in OCP 4.16 (k8s 1.29).

Removed api is flowcontrol.apiserver.k8s.io/v1beta2 and affected resources are FlowSchema & PriorityLevelConfiguration.

No alerts are firing with APIRemovedInNextReleaseInUse and there are zero instances of flowcontrol.apiserver.k8s.io/v1beta2 returned with $ oc get apirequestcounts

@dystewart
Copy link
Author

This upgrade will require admin acks given the removed apis

@dystewart
Copy link
Author

The console cluster operator is in degraded state, so I will have to address that prior to upgrade being viable.

@dystewart
Copy link
Author

Blocked by #930

@dystewart dystewart added the blocked Include reason issue is blocked in the description label Feb 10, 2025
@dystewart
Copy link
Author

Adding the 'managementState: Managed' field back to the console cluster object fixed the operator state.

$ oc edit consoles.operator.openshift.io cluster -o yaml
...
spec:
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
...

It looks like this is based on our custom OCP console config:

The solution has been validated as part of [Case #03456495](https://access.redhat.com/bounce/?externalURL=https%3A%2F%2Fgss--c.vf.force.com%2Fapex%2FCase_View%3Fid%3D5006R00001rc9XAQAY%26sfdc.override%3D1). In this case, the customer made some customization in the console like adding some custom logo and they observed this error the Console operator being degraded with the Unknown state error.

So we need to make a PR to patch this field in, since it will always need to be present.

dystewart added a commit to dystewart/nerc-ocp-config that referenced this issue Feb 10, 2025
Closes issue: nerc-project/operations#912
The console object "cluster" was giving error due to a missing managementState. Since we have a cutom console config we need to add a patch which
includes this line with the value Managed, so the managementState does not fall off the console resource in the future.
dystewart added a commit to dystewart/nerc-ocp-config that referenced this issue Feb 11, 2025
Closes issue: nerc-project/operations#912
The console object "cluster" was giving error due to a missing managementState. Since we have a cutom console config we need to add a patch which
includes this line with the value Managed, so the managementState does not fall off the console resource in the future.
@dystewart dystewart removed the blocked Include reason issue is blocked in the description label Feb 11, 2025
dystewart added a commit to dystewart/nerc-ocp-config that referenced this issue Feb 11, 2025
dystewart added a commit to OCP-on-NERC/nerc-ocp-config that referenced this issue Feb 11, 2025
Closes issue: nerc-project/operations#912
The console object "cluster" was giving error due to a missing managementState. Since we have a cutom console config we need to add a patch which
includes this line with the value Managed, so the managementState does not fall off the console resource in the future.
dystewart added a commit to OCP-on-NERC/nerc-ocp-config that referenced this issue Feb 11, 2025
@dystewart
Copy link
Author

First step in the upgrade process is underway! (4.15 -> 4.16)

@dystewart
Copy link
Author

Note: I had to delete the storageClass: ocs-external-storagecluster-ceph-rbd when switching the odf operator to the 4.16 stable channel.

Here's the next step in the upgrade process:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openshift This issue pertains to NERC OpenShift
Projects
None yet
Development

No branches or pull requests

1 participant