Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: reduce memory footprint #982

Merged
merged 6 commits into from
Dec 10, 2024
Merged

Conversation

benashz
Copy link
Collaborator

@benashz benashz commented Dec 9, 2024

Previously, VSO was caching and registering full object watchers on K8s Secrets. While watching K8s Secrets is necessary for automated remediation in the case where a destination secret is deleted from the cluster, doing so can result in OOM conditions for the operator, since each Secret's data contributes to the operator's total memory.

This change does the following:

  • disables the caching K8s Secrets in the manager's client.
  • only watches for Secret metadata changes.

Remaining tasks:

Reproduction steps:
Run:

kind create cluster --name oom
helm install --wait --namespace vault-secrets-operator-system --create-namespace vault-secrets-operator hashicorp/vault-secrets-operator
kubectl create namespace oom-secrets
dd if=/dev/urandom of=1m.random bs=1m count=1
# should be enough to trigger the OOM killer
for n in {0..90}; do kubectl create --namespace oom-secrets secret generic sec-$n --from-file /tmp/1m.random ; done

Wait for the manager to be OOM killed

Test:

make docker-build load-docker-image KIND_CLUSTER_NAME=oom
helm upgrade --set controller.manager.image.tag=0.0.0-dev --namespace vault-secrets-operator-system --create-namespace vault-secrets-operator hashicorp/vault-secrets-operator

Wait for VSO to come up, it should not be OOM killed

The previous client cache locking scheme was not thread safe, and
allocated more locks than are typically needed. This change replaces
that approach by using the KeyMutex provided by the k8s utils package.
Locks are now a pooled resource.

Other fixes
- update invalid Bitnami Helm chart repo for postgres. We should phasing
  out its use.
- Bump TF Helm to latest version
Previously, VSO was caching and registering full object watchers on K8s Secrets. While watching K8s Secrets is necessary for automated remediation in the case where a destination secret is deleted from the cluster, doing so can result in OOM conditions for the operator, since each Secret's data contributes to the operator's total memory.

This change does the following:
- disables the caching K8s Secrets in the manager's client.
- only watches for Secret metadata changes.
@benashz benashz requested a review from a team as a code owner December 9, 2024 23:52
@benashz benashz linked an issue Dec 9, 2024 that may be closed by this pull request
@benashz benashz modified the milestones: v0.9.0, v0.9.1 Dec 10, 2024
@benashz benashz self-assigned this Dec 10, 2024
@benashz benashz requested a review from tvoran December 10, 2024 00:09
controllers/hcpvaultsecretsapp_controller.go Show resolved Hide resolved
main.go Show resolved Hide resolved
@tvoran
Copy link
Member

tvoran commented Dec 10, 2024

The doc string for WatchesMetadata makes it sound like we should use metav1.PartialObjectMetadata{} instead of corev1.Secret{} for Get's where possible. I'm wondering if we need to do that for functions like getSecretExists()?

@benashz
Copy link
Collaborator Author

benashz commented Dec 10, 2024

The doc string for WatchesMetadata makes it sound like we should use metav1.PartialObjectMetadata{} instead of corev1.Secret{} for Get's where possible. I'm wondering if we need to do that for functions like getSecretExists()?

Yeah, that might make those calls a bit lighter weight. We can probably address that in follow-on work.

@benashz benashz linked an issue Dec 10, 2024 that may be closed by this pull request
Base automatically changed from VAULT-32632/client-cache-use-k8s-utils-keymutex to main December 10, 2024 22:25
@benashz benashz enabled auto-merge (squash) December 10, 2024 22:29
@benashz benashz merged commit 56e29b3 into main Dec 10, 2024
43 checks passed
@benashz benashz deleted the VAULT-32632/fix-secrets-memory-usage branch December 10, 2024 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VSO is getting OOMKilled on OpenShift cluster Vault Secrets Operator memory usage way too high
2 participants