Some kube-system services stuck in Pending state #6

darvelo · 2023-09-17T17:18:42Z

Any idea why kube-dns, hubble, and others would be stuck in Pending state?

kubectl -n kube-system get pods gives:

NAME                                                       READY   STATUS    RESTARTS   AGE
anetd-cf8jz                                                1/1     Running   0          13m
anetd-q5vzr                                                1/1     Running   0          13m
anetd-rzk7g                                                1/1     Running   0          13m
antrea-controller-horizontal-autoscaler-7b69d9bfd7-f82m6   0/1     Pending   0          13m
event-exporter-gke-7bf6c99dcb-grmz7                        0/2     Pending   0          13m
filestore-node-4vd54                                       3/3     Running   0          13m
filestore-node-86dbn                                       3/3     Running   0          13m
filestore-node-dssdr                                       3/3     Running   0          13m
fluentbit-gke-f9hh9                                        2/2     Running   0          13m
fluentbit-gke-m2hqb                                        2/2     Running   0          13m
fluentbit-gke-wscl5                                        2/2     Running   0          13m
gke-metadata-server-2q8q5                                  1/1     Running   0          13m
gke-metadata-server-5xgg5                                  1/1     Running   0          13m
gke-metadata-server-hmz6s                                  1/1     Running   0          13m
hubble-generate-certs-init-64mnp                           0/1     Pending   0          13m
hubble-relay-677f85b964-v2cxd                              0/2     Pending   0          14m
konnectivity-agent-autoscaler-5d9dbcc6d8-swvst             0/1     Pending   0          14m
konnectivity-agent-fb695849d-6ks95                         0/1     Pending   0          13m
konnectivity-agent-fb695849d-hdq7q                         0/1     Pending   0          14m
konnectivity-agent-fb695849d-qvck9                         0/1     Pending   0          13m
kube-dns-7f58849488-rngxv                                  0/3     Pending   0          13m
kube-dns-7f58849488-rtb7g                                  0/3     Pending   0          14m
kube-dns-autoscaler-84b8db4dc7-4qpmx                       0/1     Pending   0          13m
l7-default-backend-d86c96845-6mhrm                         0/1     Pending   0          14m
metrics-server-v0.5.2-8569bc4cf9-rt26w                     0/2     Pending   0          14m
netd-74jz8                                                 1/1     Running   0          13m
netd-ckswg                                                 1/1     Running   0          13m
netd-k6pzk                                                 1/1     Running   0          13m
pdcsi-node-csvx5                                           2/2     Running   0          13m
pdcsi-node-n46x7                                           2/2     Running   0          13m
pdcsi-node-xvqkx                                           2/2     Running   0          13m

kubectl -n kube-system describe pod for hubble-generate-certs-init-64mnp and hubble-relay-677f85b964-v2cxd and kube-dns pods returns:

Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Warning  FailedScheduling   16m (x2 over 16m)   default-scheduler   no nodes available to schedule pods
  Normal   NotTriggerScaleUp  16m                 cluster-autoscaler  pod didn't trigger scale-up:
  Warning  FailedScheduling   16m                 default-scheduler   0/1 nodes are available: 1 node(s) had untolerated taint {node.cilium.io/agent-not-ready: true}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   NotTriggerScaleUp  95s (x84 over 15m)  cluster-autoscaler  pod didn't trigger scale-up: 1 node(s) had untolerated taint {node.cilium.io/agent-not-ready: true}
  Warning  FailedScheduling   9s (x3 over 11m)    default-scheduler   0/3 nodes are available: 3 node(s) had untolerated taint {node.cilium.io/agent-not-ready: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

kubectl exec -it -n kube-system deployment/hubble-relay -c hubble-cli -- hubble gives:

Error from server (BadRequest): pod hubble-relay-677f85b964-v2cxd does not have a host assigned

My config vars are:

dataplane_v2_enabled = true
enable_dpv2_hubble   = true
machine_type       = "e2-standard-2"
preemptible        = false
disk_size_gb       = 40
initial_node_count = 3
min_nodes          = 3
max_nodes          = 6

Strange because kubectl get nodes is:

NAME                                               STATUS   ROLES    AGE   VERSION
gke-cluster-nodepool-d5a1f7ad-cf52   Ready    <none>   26m   v1.27.3-gke.100
gke-cluster-nodepool-d5a1f7ad-gm5c   Ready    <none>   26m   v1.27.3-gke.100
gke-cluster-nodepool-d5a1f7ad-pwhp   Ready    <none>   26m   v1.27.3-gke.100

So it seems like the nodes are up and running in my zonal cluster.

The text was updated successfully, but these errors were encountered:

darvelo · 2023-09-17T18:19:57Z

Commenting out the cilium taint from terraform.tfvars seems to have fixed the issue. All the pods and hubble-ui seem to be running well now.

Maybe it's not needed since Dataplane v2 comes with Cilium, or maybe GCP changed the taint key, but I couldn't find much documentation on this other than https://docs.cilium.io/en/stable/installation/taints/ which recommends NoExecute over NoSchedule, but I tried removing the taint before trying to see if NoExecute would work or not.

Neutrollized · 2023-09-18T04:30:59Z

Oh yes. My bad, in my recent additions of the GKE DPV2 Observability tools, I added those settings to the sample terraform.tfvars file and for got I had a taint in there. dataplane_v2_enabled = true and the taint example I have there are mutually exclusive. The taint is needed only if you're going to install the open-source Cilium. Enabling DPV2 will have the GKE cluster come with a stripped down, downstream version of Cilium pre-installed. I'll update this in next update. Thanks!

Neutrollized · 2023-09-18T04:46:39Z

I pushed the updates in v0.14.1. Thanks for letting me know! (also updated the proxy subnet purpose setting to reflect the new name)

darvelo · 2023-09-18T09:26:21Z

Thanks @Neutrollized! 👍🏽

Neutrollized closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some kube-system services stuck in Pending state #6

Some kube-system services stuck in Pending state #6

darvelo commented Sep 17, 2023 •

edited

Loading

darvelo commented Sep 17, 2023 •

edited

Loading

Neutrollized commented Sep 18, 2023

Neutrollized commented Sep 18, 2023

darvelo commented Sep 18, 2023

Some kube-system services stuck in Pending state #6

Some kube-system services stuck in Pending state #6

Comments

darvelo commented Sep 17, 2023 • edited Loading

darvelo commented Sep 17, 2023 • edited Loading

Neutrollized commented Sep 18, 2023

Neutrollized commented Sep 18, 2023

darvelo commented Sep 18, 2023

darvelo commented Sep 17, 2023 •

edited

Loading

darvelo commented Sep 17, 2023 •

edited

Loading