WIP: Components downstream of the pod worker should use it for state #115342

smarterclayton · 2023-01-26T18:33:53Z

This is a continuation of #113145 to ensure that components that react to "actual" pods (pods that can be running on the node) instead of the "desired" pods for the node (any pod scheduled to the node). A significant realization over the last few years is that pods can have significant lifecycle after they are force deleted - the control plane is unaware of those pods, but nodes must ensure cleanup and accurately react to the use of resources by those still running pods. Our first step was to make the pod worker the source of truth for the runtime state of pods and to have components like volume manager check the pod worker state machine for each pod before taking actions - pods that are terminating may still have running containers, pods that are terminated have no running containers, etc. We then ensured that the kubelet status manager correctly reported the transition of pods from non-terminal to terminal phase after the pod reached specific lifecycle phases.

Most recently, we addressed numerous issues in static pod shutdown to ensure they are properly gracefully shut down as multiple ecosystem distributions depend on static pods for critical lifecycle behavior. A static pod, when updated, changes UID, but two static pods with the same fullname cannot be running at the same time in the Kubelet. This means that components must continue to be aware of those updated static pods after they are updated until they reach final termination, and do so consistently.

#113145 fixes an issue that needs state, but #114994 identified the general class of problem represented by downstreams using "pod manager" (the "desired state of pods") vs "pod workers" (the "actual state of pods"). This PR cleans up the pod manager interface, separates various use cases, and then enables status manager to consult pod worker as truth for pods rather than pod manager.

TODO:

Need e2e test demonstrating the problem

/kind bug

k8s-ci-robot · 2023-01-26T18:33:55Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rphillips · 2023-01-26T22:58:40Z

pkg/kubelet/status/status_manager_test.go

+type mutablePodManager interface {
+	AddPod(*v1.Pod)
+	UpdatePod(*v1.Pod)
+	DeletePod(*v1.Pod)


Other areas of the Kubelet use RemovePod terminology... Probably makes sense here.

I see 7 mentions of RemovePod, and slightly more for pod manager's DeletePod. Kubelet config loop uses DeletePods to mean a "pod has been requested for deletion", and "RemovePods" to mean "pod is gone".

Agree Remove should be used, I'll add a new commit.

smarterclayton · 2023-01-30T15:31:39Z

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/115342/pull-kubernetes-node-e2e-containerd/1618678637907677184/ is suspicious

smarterclayton · 2023-01-30T15:35:35Z

/retest

cici37 · 2023-01-31T21:05:24Z

/triage accepted

We can drop this patch after the following two PRs merge (or their equivalent): * kubernetes#115342 * kubernetes#113145 UPSTREAM: <carry>: kubelet: fix readiness probes with pod termination

k8s-ci-robot requested review from caesarxuchao, cheftako and a team January 26, 2023 18:35

smarterclayton mentioned this pull request Jan 26, 2023

kubelet: Force deleted pods can fail to move out of terminating #113145

Merged

5 tasks

rphillips reviewed Jan 26, 2023

View reviewed changes

haircommander mentioned this pull request Jan 9, 2025

crio-<containerID>.scope succeeded cri-o/cri-o#8904

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Components downstream of the pod worker should use it for state #115342

WIP: Components downstream of the pod worker should use it for state #115342

smarterclayton commented Jan 26, 2023

k8s-ci-robot commented Jan 26, 2023

rphillips Jan 26, 2023

smarterclayton Apr 14, 2023

smarterclayton commented Jan 30, 2023

smarterclayton commented Jan 30, 2023

cici37 commented Jan 31, 2023

WIP: Components downstream of the pod worker should use it for state #115342

WIP: Components downstream of the pod worker should use it for state #115342

Conversation

smarterclayton commented Jan 26, 2023

k8s-ci-robot commented Jan 26, 2023

rphillips Jan 26, 2023

Choose a reason for hiding this comment

smarterclayton Apr 14, 2023

Choose a reason for hiding this comment

smarterclayton commented Jan 30, 2023

smarterclayton commented Jan 30, 2023

cici37 commented Jan 31, 2023