From 20a8885bc6a5c815ec9965c66a6b202de7ad3686 Mon Sep 17 00:00:00 2001 From: Elana Hashman Date: Thu, 12 Aug 2021 13:58:20 -0700 Subject: [PATCH 1/4] Update swap KEP for 1.23 beta Fill out remaining beta PRR questions, add test plans --- keps/prod-readiness/sig-node/2400.yaml | 2 + keps/sig-node/2400-node-swap/README.md | 59 +++++++++++++++++++++++++- keps/sig-node/2400-node-swap/kep.yaml | 4 +- 3 files changed, 61 insertions(+), 4 deletions(-) diff --git a/keps/prod-readiness/sig-node/2400.yaml b/keps/prod-readiness/sig-node/2400.yaml index 1eb33a410c5..4741875582b 100644 --- a/keps/prod-readiness/sig-node/2400.yaml +++ b/keps/prod-readiness/sig-node/2400.yaml @@ -1,3 +1,5 @@ kep-number: 2400 alpha: approver: "@deads2k" +beta: + approver: "@deads2k" diff --git a/keps/sig-node/2400-node-swap/README.md b/keps/sig-node/2400-node-swap/README.md index 7fa2b6aeea0..2d7570c4775 100644 --- a/keps/sig-node/2400-node-swap/README.md +++ b/keps/sig-node/2400-node-swap/README.md @@ -401,8 +401,12 @@ For alpha: and further development efforts. - Focus should be on supported user stories as listed above. -Once this data is available, additional test plans should be added for the next -phase of graduation. +For beta: + +- Add e2e tests that exercise all available swap configurations via the CRI. +- Add e2e tests that verify pod-level control of swap utilization. +- Add e2e tests that verify swap performance with pods using a tmpfs. +- Verify new system-reserved settings for swap memory. ### Graduation Criteria @@ -587,6 +591,19 @@ Try to be as paranoid as possible - e.g., what if some components will restart mid-rollout? --> +If a new node with swap memory fails to come online, it will not impact any +running components. + +It is possible that if a cluster administrator adds swap memory to an already +running node, and then performs an in-place upgrade, the new kubelet could fail +to start unless the configuration was modified to tolerate swap. However, we +would expect that if a cluster admin is adding swap to the node, they will also +update the kubelet's configuration to not fail with swap present. + +Generally, it is considered best practice to add a swap memory partition at +node image/boot time and not provision it dynamically after a kubelet is +already running and reporting Ready on a node. + ###### What specific metrics should inform a rollback? +Workload churn or performance degradations on nodes. The metrics will be +application/use-case specific, but we can provide some suggestions. + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? +N/A because swap support lacks a runtime upgrade/downgrade path; kubelet must +be restarted with or without swap support. + ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? +No. + ### Monitoring Requirements +KubeletConfiguration has set `failOnSwap: false`. + +The prometheus `node_exporter` will also export stats on swap memory +utilization. + ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? +TBD. We will determine a set of metrics as part of beta graduation. We will +need more data; there is not a single metric or set of metrics that can be used +to generally quantify node performance. + - [ ] Metrics - Metric name: - [Optional] Aggregation method: @@ -647,6 +681,8 @@ high level (needs more precise definitions) those may be things like: - 99,9% of /health requests per day finish with 200 code --> +N/A + ###### Are there any missing metrics that would be useful to have to improve observability of this feature? +N/A + ### Dependencies + +Individual nodes with swap memory enabled may experience performance +degradations under load. This could potentially cause a cascading failure on +nodes without swap: if nodes with swap fail Ready checks, workloads may be +rescheduled en masse. + +Thus, cluster administrators should be careful while enabling swap. To minimize +disruption, you may want to taint nodes with swap available to protect against +this problem. Taints will ensure that workloads which tolerate swap will not +spill onto nodes without swap under load. + ###### What steps should be taken if SLOs are not being met to determine the problem? +It is suggested that if nodes with swap memory enabled cause performance or +stability degradations, those nodes are cordoned, drained, and replaced with +nodes that do not use swap memory. + ## Implementation History - **2015-04-24:** Discussed in [#7294](https://github.com/kubernetes/kubernetes/issues/7294). diff --git a/keps/sig-node/2400-node-swap/kep.yaml b/keps/sig-node/2400-node-swap/kep.yaml index 1ecf7efa74f..322f413ecbd 100644 --- a/keps/sig-node/2400-node-swap/kep.yaml +++ b/keps/sig-node/2400-node-swap/kep.yaml @@ -20,12 +20,12 @@ prr-approvers: - "@deads2k" # The target maturity stage in the current dev cycle for this KEP. -stage: alpha +stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.22" +latest-milestone: "v1.23" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: From 18a9bee462c438918b594c773a881ea91cb729ff Mon Sep 17 00:00:00 2001 From: Elana Hashman Date: Wed, 8 Sep 2021 11:22:32 -0700 Subject: [PATCH 2/4] Address PRR feedback --- keps/sig-node/2400-node-swap/README.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/keps/sig-node/2400-node-swap/README.md b/keps/sig-node/2400-node-swap/README.md index 2d7570c4775..fd70b211ca3 100644 --- a/keps/sig-node/2400-node-swap/README.md +++ b/keps/sig-node/2400-node-swap/README.md @@ -420,8 +420,6 @@ For beta: #### Beta -_(Tentative.)_ - - Add support for controlling swap consumption at the pod level [via cgroups]. - Handle usage of swap during container restart boundaries for writes to tmpfs (which may require pod cgroup change beyond what container runtime will do at @@ -441,6 +439,8 @@ _(Tentative.)_ #### GA +_(Tentative.)_ + - Test a wide variety of scenarios that may be affected by swap support. - Remove feature flag. @@ -612,7 +612,8 @@ that might indicate a serious problem? --> Workload churn or performance degradations on nodes. The metrics will be -application/use-case specific, but we can provide some suggestions. +application/use-case specific, but we can provide some suggestions, based on +the stability metrics identified earlier. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? @@ -658,9 +659,12 @@ utilization. Pick one more of these and delete the rest. --> -TBD. We will determine a set of metrics as part of beta graduation. We will -need more data; there is not a single metric or set of metrics that can be used -to generally quantify node performance. +TBD. We will determine a set of metrics as a requirement for beta graduation. +We will need more production data; there is not a single metric or set of +metrics that can be used to generally quantify node performance. + +This section to be updated before the feature can be marked as graduated, and +to be worked on during 1.23 development. - [ ] Metrics - Metric name: From 8277301aae07f6bab8208ae495a63078af7e1057 Mon Sep 17 00:00:00 2001 From: Elana Hashman Date: Wed, 8 Sep 2021 12:08:15 -0700 Subject: [PATCH 3/4] Add test plan note for eviction manager/MemoryPressure --- keps/sig-node/2400-node-swap/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/keps/sig-node/2400-node-swap/README.md b/keps/sig-node/2400-node-swap/README.md index fd70b211ca3..794b0e0e1b3 100644 --- a/keps/sig-node/2400-node-swap/README.md +++ b/keps/sig-node/2400-node-swap/README.md @@ -407,6 +407,8 @@ For beta: - Add e2e tests that verify pod-level control of swap utilization. - Add e2e tests that verify swap performance with pods using a tmpfs. - Verify new system-reserved settings for swap memory. +- Verify MemoryPressure behaviour with swap enabled and document any changes + for configuring eviction. ### Graduation Criteria From 908266a9c45cafcb7ffdc5b81c60228f7c8fb59a Mon Sep 17 00:00:00 2001 From: Elana Hashman Date: Wed, 8 Sep 2021 13:29:28 -0700 Subject: [PATCH 4/4] Add swap memory to Kubelet stats API --- keps/sig-node/2400-node-swap/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/keps/sig-node/2400-node-swap/README.md b/keps/sig-node/2400-node-swap/README.md index 794b0e0e1b3..c3dc4faac1e 100644 --- a/keps/sig-node/2400-node-swap/README.md +++ b/keps/sig-node/2400-node-swap/README.md @@ -430,6 +430,7 @@ For beta: detects on the host. - Consider introducing new configuration modes for swap, such as a node-wide swap limit for workloads. +- Add swap memory to the Kubelet stats api. - Determine a set of metrics for node QoS in order to evaluate the performance of nodes with and without swap enabled. - Better understand relationship of swap with memory QoS in cgroup v2 @@ -668,6 +669,8 @@ metrics that can be used to generally quantify node performance. This section to be updated before the feature can be marked as graduated, and to be worked on during 1.23 development. +We will also add swap memory utilization to the Kubelet stats API, to provide a means of monitoring this beyond cadvisor Prometheus stats. + - [ ] Metrics - Metric name: - [Optional] Aggregation method: