Installing argocd causes unbounded etcd memory usage #3556
Labels
bug
Something isn't working
component:core
Syncing, diffing, cluster state cache
more-information-needed
Further information is requested
type:scalability
Issues related to scalability and performance related issues
Milestone
If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a
question in argocd slack channel.
Checklist:
argocd version
.Describe the bug
ArgoCD causes unbounded increases in etcd memory usage
To Reproduce
Install ArgoCD and monitor etcd memory usage. We are observing that the memory increases consistently to the extent that etcd processes get OOM killed.
Expected behavior
ArgoCD doesn't cause etcd memory to increase consistently
Screenshots
We restarted etcd on a cluster as this is the only way we have so far discovered to relieve the problem and reduce the memory usage. I have included below metrics graphs that are from after an etcd restart to show the issue.
ArgoCD on this cluster is configured with an app-of-apps containing two applications. One of these apps contains a single SealedSecret. The second app contains a ConfigMap, Deployment, Service, SealedSecret and custom CRD called a TLSRoute that creates a Route and injects the required certificates into it.
After we restarted etcd this is the memory usage we see over approx 20-21 hours :
We believe the memory usage may be caused by an unbounded increase in the number of watches in etcd. Graph for the same cluster over the same period showing the number of watches in etcd :
We have investigated changing the snapshot-count of the etcd cluster to resolve this but reducing this from the default of 100000 to 10000 and then to 10 makes no difference (above graphs are with snapshot-count set to 10)
We have found this etcd bug and wonder if it could be related and if there is anything that ArgoCD does with watches that could exacerbate this issue etcd-io/etcd#9416 (comment)
I would also note we see an increase in the etcd memory usage and number of watches even if we install ArgoCD and do NOT configure any applications in it.
Version
We have observed this issue with argocd 1.5.1, 1.5.2 and 1.5.3.
ArgoCD is installed on an OKD 3.11 cluster.
The text was updated successfully, but these errors were encountered: