Core: reduce memory footprint #982

benashz · 2024-12-09T23:52:22Z

Previously, VSO was caching and registering full object watchers on K8s Secrets. While watching K8s Secrets is necessary for automated remediation in the case where a destination secret is deleted from the cluster, doing so can result in OOM conditions for the operator, since each Secret's data contributes to the operator's total memory.

This change does the following:

disables the caching K8s Secrets in the manager's client.
only watches for Secret metadata changes.

Remaining tasks:

Add integration regression tests - see Add OOM integration test #984

Reproduction steps:
Run:

kind create cluster --name oom
helm install --wait --namespace vault-secrets-operator-system --create-namespace vault-secrets-operator hashicorp/vault-secrets-operator
kubectl create namespace oom-secrets
dd if=/dev/urandom of=1m.random bs=1m count=1
# should be enough to trigger the OOM killer
for n in {0..90}; do kubectl create --namespace oom-secrets secret generic sec-$n --from-file /tmp/1m.random ; done

Wait for the manager to be OOM killed

Test:

make docker-build load-docker-image KIND_CLUSTER_NAME=oom
helm upgrade --set controller.manager.image.tag=0.0.0-dev --namespace vault-secrets-operator-system --create-namespace vault-secrets-operator hashicorp/vault-secrets-operator

Wait for VSO to come up, it should not be OOM killed

The previous client cache locking scheme was not thread safe, and allocated more locks than are typically needed. This change replaces that approach by using the KeyMutex provided by the k8s utils package. Locks are now a pooled resource. Other fixes - update invalid Bitnami Helm chart repo for postgres. We should phasing out its use. - Bump TF Helm to latest version

Previously, VSO was caching and registering full object watchers on K8s Secrets. While watching K8s Secrets is necessary for automated remediation in the case where a destination secret is deleted from the cluster, doing so can result in OOM conditions for the operator, since each Secret's data contributes to the operator's total memory. This change does the following: - disables the caching K8s Secrets in the manager's client. - only watches for Secret metadata changes.

controllers/hcpvaultsecretsapp_controller.go

main.go

tvoran · 2024-12-10T07:02:19Z

The doc string for WatchesMetadata makes it sound like we should use metav1.PartialObjectMetadata{} instead of corev1.Secret{} for Get's where possible. I'm wondering if we need to do that for functions like getSecretExists()?

benashz · 2024-12-10T14:54:15Z

The doc string for WatchesMetadata makes it sound like we should use metav1.PartialObjectMetadata{} instead of corev1.Secret{} for Get's where possible. I'm wondering if we need to do that for functions like getSecretExists()?

Yeah, that might make those calls a bit lighter weight. We can probably address that in follow-on work.

benashz added 4 commits November 25, 2024 16:04

demo: update the postgres chart URL

40b5ec3

lint bats tests

209d8a3

benashz requested a review from a team as a code owner December 9, 2024 23:52

benashz linked an issue Dec 9, 2024 that may be closed by this pull request

VSO is getting OOMKilled on OpenShift cluster #973

Closed

benashz modified the milestones: v0.9.0, v0.9.1 Dec 10, 2024

benashz self-assigned this Dec 10, 2024

benashz requested a review from tvoran December 10, 2024 00:09

tvoran approved these changes Dec 10, 2024

View reviewed changes

controllers/hcpvaultsecretsapp_controller.go Show resolved Hide resolved

main.go Show resolved Hide resolved

Document why we are using WatchesMetadata() for Secrets

9f2f611

benashz linked an issue Dec 10, 2024 that may be closed by this pull request

Vault Secrets Operator memory usage way too high #969

Closed

Base automatically changed from VAULT-32632/client-cache-use-k8s-utils-keymutex to main December 10, 2024 22:25

Merge branch 'main' into VAULT-32632/fix-secrets-memory-usage

279c93b

benashz enabled auto-merge (squash) December 10, 2024 22:29

benashz merged commit 56e29b3 into main Dec 10, 2024
43 checks passed

benashz deleted the VAULT-32632/fix-secrets-memory-usage branch December 10, 2024 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: reduce memory footprint #982

Core: reduce memory footprint #982

benashz commented Dec 9, 2024 •

edited

Loading

tvoran commented Dec 10, 2024

benashz commented Dec 10, 2024

Core: reduce memory footprint #982

Core: reduce memory footprint #982

Conversation

benashz commented Dec 9, 2024 • edited Loading

tvoran commented Dec 10, 2024

benashz commented Dec 10, 2024

benashz commented Dec 9, 2024 •

edited

Loading