Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete cluster malfunction after attempting to promote node to control plane #10587

Closed
craigcabrey opened this issue Jul 27, 2024 · 2 comments

Comments

@craigcabrey
Copy link

I seem to have some bad state in my etcd db that I can't track down. It manifests as the control plane nodes continuously crash looping from a divide by 0 panic. I can't get the cluster back into a healthy state.

Environmental Info:
K3s Version: v1.29.5+k3s1

root@ms01-node-5:/etc/systemd/system# k3s -v
k3s version v1.29.5+k3s1 (4e53a323)
go version go1.21.9

Node(s) CPU architecture, OS, and Version:

Linux ms01-node-5 6.8.11-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024 x86_64 GNU/Linux

Cluster Configuration:

10 nodes, all running CoreOS

$ cat /etc/rancher/k3s/config.yaml
server: https://[fdb5:12c1:f8cb:0:b2e2:3565:cff3:6cf6]:6443
embedded-registry: true
secrets-encryption: true
disable:
  - servicelb
  - traefik

cluster-cidr: fdb5:12c1:f8cb:dead:beaf::/96,10.42.0.0/16
service-cidr: fdb5:12c1:f8cb:dead:c0de::/108,10.43.0.0/16

flannel-backend: wireguard-native
flannel-ipv6-masq: true
flannel-iface: internal

node-ip: fdb5:12c1:f8cb:0:f67:d816:2aea:e002,192.168.7.35

kubelet-arg:
  - "node-ip=::"

kube-controller-manager-arg:
  - node-cidr-mask-size-ipv6=108

# https://docs.k3s.io/cli/server#listeners
tls-san: k8s.internal.lan
write-kubeconfig-mode: "0644"

Describe the bug:

Steps To Reproduce:

  • Installed K3s:

Expected behavior:
Don't crashloop

Actual behavior:
Control plane continuously crash loops

Additional context / logs:

Jul 27 23:39:57 venus-node-3 k3s[231025]: I0727 23:39:57.063808  231025 leaderelection.go:260] successfully acquired lease kube-system/kube-scheduler
Jul 27 23:39:57 venus-node-3 k3s[231025]: E0727 23:39:57.064948  231025 runtime.go:79] Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)
Jul 27 23:39:57 venus-node-3 k3s[231025]: goroutine 171292 [running]:
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x56df280?, 0xa49e700})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/runtime/runtime.go:75 +0x85
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/runtime/runtime.go:49 +0x6b
Jul 27 23:39:57 venus-node-3 k3s[231025]: panic({0x56df280?, 0xa49e700?})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /usr/local/go/src/runtime/panic.go:914 +0x21f
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0xc01fc78900, {0x6fb82f0, 0xc031d15360}, {0x701faa0, 0xc01f57be00}, 0xc037850a90?, 0xc02cd0fb00)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:503 +0x9f1
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0xc01fc78900, {0x6fb82f0, 0xc031d15360}, {0x701faa0, 0xc01f57be00}, 0x0?, 0xc02cd0fb00)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:400 +0x33f
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0xc01fc78900, {0x6fb82f0, 0xc031d15360}, 0x2?, {0x701faa0, 0xc01f57be00}, 0xc004df7220, {0xc1a17f3343ddf3fc, 0x13c3e27219, 0xa6cc5e0}, ...)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:150 +0xf3
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0xc01fc78900, {0x6fb82f0, 0xc0384255e0})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:112 +0x5cd
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:259 +0x22
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc040ed1e80?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:226 +0x33
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc048b451d0?, {0x6f62c00, 0xc048b682d0}, 0x1, 0xc047337f20)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:227 +0xaf
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc1a17f3343a3ffd1?, 0x0, 0x0, 0x10?, 0xc1c9bb?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:204 +0x7f
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x6fb82f0, 0xc0384255e0}, 0xc048b58850, 0x2b55702?, 0xc01942ef68?, 0x80?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:259 +0x93
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x6fb82f0?, 0xc0384255e0?}, 0xc0482fbc20?, 0xc01942efb8?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:170 +0x25
Jul 27 23:39:57 venus-node-3 k3s[231025]: created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 171357
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/scheduler.go:414 +0xf6
Jul 27 23:39:57 venus-node-3 k3s[231025]: panic: runtime error: integer divide by zero [recovered]
Jul 27 23:39:57 venus-node-3 k3s[231025]:         panic: runtime error: integer divide by zero
Jul 27 23:39:57 venus-node-3 k3s[231025]: goroutine 171292 [running]:
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/runtime/runtime.go:56 +0xcd
Jul 27 23:39:57 venus-node-3 k3s[231025]: panic({0x56df280?, 0xa49e700?})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /usr/local/go/src/runtime/panic.go:914 +0x21f
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0xc01fc78900, {0x6fb82f0, 0xc031d15360}, {0x701faa0, 0xc01f57be00}, 0xc037850a90?, 0xc02cd0fb00)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:503 +0x9f1
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0xc01fc78900, {0x6fb82f0, 0xc031d15360}, {0x701faa0, 0xc01f57be00}, 0x0?, 0xc02cd0fb00)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:400 +0x33f
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0xc01fc78900, {0x6fb82f0, 0xc031d15360}, 0x2?, {0x701faa0, 0xc01f57be00}, 0xc004df7220, {0xc1a17f3343ddf3fc, 0x13c3e27219, 0xa6cc5e0}, ...)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:150 +0xf3
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0xc01fc78900, {0x6fb82f0, 0xc0384255e0})
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/schedule_one.go:112 +0x5cd
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:259 +0x22
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc040ed1e80?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:226 +0x33
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc048b451d0?, {0x6f62c00, 0xc048b682d0}, 0x1, 0xc047337f20)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:227 +0xaf
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc1a17f3343a3ffd1?, 0x0, 0x0, 0x10?, 0xc1c9bb?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:204 +0x7f
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x6fb82f0, 0xc0384255e0}, 0xc048b58850, 0x2b55702?, 0xc01942ef68?, 0x80?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:259 +0x93
Jul 27 23:39:57 venus-node-3 k3s[231025]: k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x6fb82f0?, 0xc0384255e0?}, 0xc0482fbc20?, 0xc01942efb8?)
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.29.5-k3s1/pkg/util/wait/backoff.go:170 +0x25
Jul 27 23:39:57 venus-node-3 k3s[231025]: created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 171357
Jul 27 23:39:57 venus-node-3 k3s[231025]:         /go/pkg/mod/github.com/k3s-io/kubernetes@v1.29.5-k3s1/pkg/scheduler/scheduler.go:414 +0xf6
Jul 27 23:39:57 venus-node-3 systemd[1]: k3s.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 27 23:39:57 venus-node-3 systemd[1]: k3s.service: Failed with result 'exit-code'.
@James4Ever0
Copy link

@craigcabrey
Copy link
Author

Yea, thanks. I stumbled across it and can report back that moving from 1.29.5 -> 1.29.6 has allowed me to put the pieces back together. I have one node that is refusing to rejoin though, but probably a separate issue.

@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants