Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-168: MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) #880

Closed
jiridanek opened this issue Aug 23, 2022 · 10 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@jiridanek
Copy link
Member

What happened:

  • microshift expects to have a rhel volume group available, centos stream creates cs volume group
  • router-default-6795657dbc-nnqmv pod in openshift-ingress namespace fails to start, for (to me) unexplained reasons

What you expected to happen:

🦄

How to reproduce it (as minimally and precisely as possible):

Need install cri-o (covered at https://microshift.io/docs/getting-started/)

sudo dnf module enable -y cri-o:1.21
sudo dnf install -y cri-o cri-tools
sudo systemctl enable crio --now

Need enable openstack repos to have openvswitch2.16 package for microshift-networking, available (not mentioned in RHEL 8 instructions or really anywhere)

sudo dnf config-manager --set-enabled powertools
sudo dnf install -y epel-release centos-release-openstack-yoga

Now I can install microshift rpms

sudo dnf install -y \
    ./packaging/rpm/_rpmbuild/RPMS/x86_64/microshift-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.x86_64.rpm \
    ./packaging/rpm/_rpmbuild/RPMS/x86_64/microshift-networking-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.x86_64.rpm \
    ./packaging/rpm/_rpmbuild/RPMS/noarch/microshift-selinux-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.noarch.rpm

Following the install instructions further

sudo firewall-cmd --zone=trusted --add-source=10.42.0.0/16 --permanent
sudo firewall-cmd --zone=public --add-port=80/tcp --permanent
sudo firewall-cmd --zone=public --add-port=443/tcp --permanent
sudo firewall-cmd --zone=public --add-port=5353/udp --permanent
sudo firewall-cmd --reload
sudo systemctl enable microshift --now

Copy the pull secret (https://github.com/openshift/microshift/blob/main/docs/devenv_rhel8.md)

Do the other things

curl -O https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/ocp/stable/openshift-client-linux.tar.gz
sudo tar -xf openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl

mkdir ~/.kube
sudo cat /var/lib/microshift/resources/kubeadmin/kubeconfig > ~/.kube/config

And, I end up with all pods up, with the exception of topolvm-node-26t6g (and, as I later noticed router-default-6795657dbc-nnqmv)

/usr/local/bin/oc get pods --all-namespaces
NAMESPACE                  NAME                                  READY   STATUS             RESTARTS        AGE
openshift-dns              dns-default-5tsdh                     2/2     Running            0               15m
openshift-dns              node-resolver-cxb2v                   1/1     Running            0               15m
openshift-ingress          router-default-6795657dbc-nnqmv       0/1     Running            2 (80s ago)     15m
openshift-ovn-kubernetes   ovnkube-master-zcbsl                  4/4     Running            0               15m
openshift-ovn-kubernetes   ovnkube-node-h22g5                    1/1     Running            0               15m
openshift-service-ca       service-ca-76649665b5-thmh8           1/1     Running            0               15m
openshift-storage          topolvm-controller-8479455f95-pvqls   4/4     Running            0               15m
openshift-storage          topolvm-node-26t6g                    2/4     CrashLoopBackOff   10 (2m6s ago)   14m

and these errors in the log for topolvm-node-26t6g

/usr/local/bin/oc logs topolvm-node-26t6g -n openshift-storage
Defaulted container "lvmd" out of: lvmd, topolvm-node, csi-registrar, liveness-probe, file-checker (init)
2022-08-23T09:18:49.459028Z topolvm-node-26t6g lvmd info: "configuration file loaded: " device_classes="[0xc00059c9b0]" file_name="/etc/topolvm/lvmd.yaml" socket_name="/run/lvmd/lvmd.sock"
2022-08-23T09:18:49.498429Z topolvm-node-26t6g lvmd error: "Volume group not found:" volume_group="rhel"
Error: not found
not found

Looks like my volume group is named cs and not rhel

vgdisplay 
  --- Volume group ---
  VG Name               cs
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <237.46 GiB
  PE Size               4.00 MiB
  Total PE              60789
  Alloc PE / Size       60789 / <237.46 GiB
  Free  PE / Size       0 / 0   
  VG UUID               RQ1RHa-33vc-XEFs-lY9c-LZj5-YeAz-W91cfz

So I renamed the vg following https://forums.centos.org/viewtopic.php?t=62406

And now I noticed the CrashLoopBackOff in router-default-6795657dbc-nnqmv pods

/usr/local/bin/oc logs router-default-6795657dbc-nnqmv -n openshift-ingress
[-]has-synced failed: Router not synced
W0823 09:26:50.017716       1 reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:26:50.017746       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
I0823 09:26:50.191462       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
W0823 09:26:50.357748       1 reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:26:50.357872       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host


[...]


I0823 09:27:25.191607       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:26.192461       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:27.192082       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:28.192125       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:29.192098       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:30.190609       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced

Anything else we need to know?:

Environment:

  • Microshift version (use microshift version): MicroShift Version: 4.10.0-0.microshift-2022-08-08-151458-61-g2d7df45a Base OCP Version: 4.10.18
  • Hardware configuration:
  • OS (e.g: cat /etc/os-release):

NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

  • Kernel (e.g. uname -a): Linux localhost.jiridanek.net 4.18.0-373.el8.x86_64 Init #1 SMP Tue Mar 22 15:11:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

Relevant Logs

/usr/local/bin/oc describe pod router-default-6795657dbc-nnqmv -n openshift-ingress
Name:                 router-default-6795657dbc-nnqmv
Namespace:            openshift-ingress
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 localhost.jiridanek.github.beta.tailscale.net/10.40.2.205
Start Time:           Tue, 23 Aug 2022 11:03:29 +0200
Labels:               ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
                      pod-template-hash=6795657dbc
Annotations:          openshift.io/scc: hostnetwork
                      target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
                      unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: 10
Status:               Running
IP:                   10.40.2.205
IPs:
  IP:           10.40.2.205
Controlled By:  ReplicaSet/router-default-6795657dbc
Containers:
  router:
    Container ID:  cri-o://4eb7dd3852f97c2178545721b365d4783f1d5d13c4e61b960b590cd8e9982215
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b
    Ports:         80/TCP, 443/TCP, 1936/TCP
    Host Ports:    80/TCP, 443/TCP, 1936/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      Error
      Message:      reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:34:37.910504       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
I0823 09:34:38.191606       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:39.192003       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:40.192476       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:41.192337       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:42.192030       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
W0823 09:34:45.886374       1 reflector.go:324] github.com/openshift/router/pkg/router/template/service_lookup.go:33: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:34:45.886502       1 reflector.go:138] github.com/openshift/router/pkg/router/template/service_lookup.go:33: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host

      Exit Code:    137
      Started:      Tue, 23 Aug 2022 11:32:41 +0200
      Finished:     Tue, 23 Aug 2022 11:34:52 +0200
    Ready:          False
    Restart Count:  8
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://localhost:1936/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:1936/healthz/ready delay=0s timeout=1s period=1s #success=1 #failure=120
    Environment:
      STATS_PORT:                                1936
      ROUTER_SERVICE_NAMESPACE:                  openshift-ingress
      DEFAULT_CERTIFICATE_DIR:                   /etc/pki/tls/private
      DEFAULT_DESTINATION_CA_PATH:               /var/run/configmaps/service-ca/service-ca.crt
      ROUTER_CIPHERS:                            TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
      ROUTER_DISABLE_HTTP2:                      true
      ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK:  false
      ROUTER_METRICS_TLS_CERT_FILE:              /etc/pki/tls/private/tls.crt
      ROUTER_METRICS_TLS_KEY_FILE:               /etc/pki/tls/private/tls.key
      ROUTER_METRICS_TYPE:                       haproxy
      ROUTER_SERVICE_NAME:                       default
      ROUTER_SET_FORWARDED_HEADERS:              append
      ROUTER_THREADS:                            4
      SSL_MIN_VERSION:                           TLSv1.2
    Mounts:
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/configmaps/service-ca from service-ca-bundle (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5w76x (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
  service-ca-bundle:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      service-ca-bundle
    Optional:  false
  kube-api-access-5w76x:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  33m                    default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         33m                    default-scheduler  Successfully assigned openshift-ingress/router-default-6795657dbc-nnqmv to localhost.jiridanek.github.beta.tailscale.net
  Warning  FailedMount       28m (x2 over 30m)      kubelet            Unable to attach or mount volumes: unmounted volumes=[default-certificate service-ca-bundle], unattached volumes=[default-certificate service-ca-bundle kube-api-access-5w76x]: timed out waiting for the condition
  Warning  FailedMount       26m (x11 over 33m)     kubelet            MountVolume.SetUp failed for volume "service-ca-bundle" : configmap references non-existent config key: service-ca.crt
  Warning  FailedMount       26m (x11 over 33m)     kubelet            MountVolume.SetUp failed for volume "default-certificate" : secret "router-certs-default" not found
  Warning  FailedMount       26m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[service-ca-bundle default-certificate], unattached volumes=[service-ca-bundle kube-api-access-5w76x default-certificate]: timed out waiting for the condition
  Warning  FailedMount       24m (x5 over 24m)      kubelet            MountVolume.SetUp failed for volume "default-certificate" : secret "router-certs-default" not found
  Warning  FailedMount       24m (x5 over 24m)      kubelet            MountVolume.SetUp failed for volume "service-ca-bundle" : configmap references non-existent config key: service-ca.crt
  Normal   Pulling           24m                    kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b"
  Normal   Pulled            24m                    kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b" in 4.824691275s
  Normal   Created           24m                    kubelet            Created container router
  Normal   Started           24m                    kubelet            Started container router
  Warning  DNSConfigForming  24m (x5 over 24m)      kubelet            Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openshift-ingress.svc.cluster.local svc.cluster.local cluster.local meerkat-justice.ts.net jiridanek.github.beta.tailscale.net brq.redhat.com
  Warning  Unhealthy         24m (x3 over 24m)      kubelet            Startup probe failed: HTTP probe failed with statuscode: 500
  Warning  ProbeError        4m24s (x935 over 24m)  kubelet            Startup probe error: HTTP probe failed with statuscode: 500
body: [-]backend-http failed: reason withheld
[-]has-synced failed: reason withheld
[+]process-running ok
healthz check failed
@ggiguash
Copy link
Contributor

ggiguash commented Aug 23, 2022

The router pod fails to start because of the missing mandatory firewall setting:

sudo firewall-cmd --permanent --zone=trusted --add-source=169.254.169.1

See https://github.com/openshift/microshift/blob/main/docs/howto_firewall.md#firewalld

@ggiguash
Copy link
Contributor

ggiguash commented Aug 23, 2022

For the LVM name issue, "rhel" is the hardcoded name at present. We have a USHIFT-168 ticket to allow storage configuration via /etc/microshift/config.yaml.

@jiridanek
Copy link
Member Author

jiridanek commented Aug 23, 2022

Thanks, adding the firewall rule resolved it

# /usr/local/bin/oc get pods --all-namespaces
NAMESPACE                  NAME                                  READY   STATUS    RESTARTS   AGE
openshift-dns              dns-default-5tsdh                     2/2     Running   2          61m
openshift-dns              node-resolver-cxb2v                   1/1     Running   1          61m
openshift-ingress          router-default-6795657dbc-6tqhb       1/1     Running   2          11m
openshift-ovn-kubernetes   ovnkube-master-zcbsl                  4/4     Running   4          61m
openshift-ovn-kubernetes   ovnkube-node-h22g5                    1/1     Running   1          61m
openshift-service-ca       service-ca-76649665b5-thmh8           1/1     Running   1          61m
openshift-storage          topolvm-controller-8479455f95-pvqls   4/4     Running   4          61m
openshift-storage          topolvm-node-26t6g                    4/4     Running   21         60m

I was following (outdated) docs for this step, at https://microshift.io/docs/getting-started/#deploying-microshift

I am looking forward to USHIFT-168 being resolved. Ideally, I'd hope that microshift is able to autodetect default RHEL/Fedora/CentOS Stream settings and "just work", without me having to edit yamls as part of a "getting started" quick install.

@jiridanek jiridanek changed the title [BUG] Microshift cannot be installed on CentOS Stream 8 [BUG] Microshift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) Aug 23, 2022
@ggiguash
Copy link
Contributor

We are currently focused on RHEL support, but we certainly plan to add Fedora/CentOS support for the community.

@ggiguash
Copy link
Contributor

/kind feature

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 23, 2022
@ggiguash
Copy link
Contributor

/retitle [Enhancement] MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group)

@openshift-ci openshift-ci bot changed the title [BUG] Microshift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) [Enhancement] MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) Aug 23, 2022
@ggiguash
Copy link
Contributor

ggiguash commented Aug 27, 2022

/retitle USHIFT-168: MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group)

@openshift-ci openshift-ci bot changed the title [Enhancement] MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) USHIFT-168: MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) Aug 27, 2022
@dmc5179
Copy link

dmc5179 commented Aug 29, 2022

This issue also happens on systems that have been converted to RHEL from centos. I converted a Centos 7 system to RHEL using convert2rhel and then used leapp to upgrade from RHEL 7 to RHEL 8. I hit this exact same error.

@ggiguash
Copy link
Contributor

ggiguash commented Oct 24, 2022

/close
This problem was addressed by USHIFT-168.
The configuration feature is documented at https://github.com/openshift/microshift/blob/main/docs/default_csi_plugin.md#configuring-odf-lvm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2022

@ggiguash: Closing this issue.

In response to this:

/close
This problem was addressed by USHIFT-168.
The configuration feature is documented at https://github.com/openshift/microshift/blob/main/docs/default_csi_plugin.md#configuring-odf-lvm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this as completed Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants