Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

Grafana reports Fluentd memory leak #85

Closed
jayjun opened this issue Feb 14, 2017 · 10 comments
Closed

Grafana reports Fluentd memory leak #85

jayjun opened this issue Feb 14, 2017 · 10 comments

Comments

@jayjun
Copy link

jayjun commented Feb 14, 2017

  • Deis v2.11.0
  • Kubernetes v1.5.2

Just a week in production with one Deis app. Memory usage grew exponentially from 100 MB to around 380 MB. Interestingly only one pod leaks.

fluentd

Very low volume site, and certainly no runaway logs from my app. However, nsqd does log like a madman.

Created an issue so others may post similar findings.

@stuszynski
Copy link

Hi,

we encounter a similar issue.
metric_explorer___datadog

Environment:

  • Deis v2.10.0
  • Kubernetes v1.5.1
  • remote syslog export outside of the cluster.

We decided to set a memory limit for fluentd but in fact, it just makes fluentd to be killed by OOM more frequently. Unfortunately, after that we encounter another issue. After each OOM kill of the Flutentd container there was a very high cpu/io spike (~50k iops) on the node.

We didn't spot any suspicious activity near Flutentd containers or log pipeline, but we suspect that usage is growing much faster when Flutentd is logging non-ascii chars.

Our only solution so far, is a graceful restart of flunetd docker containers each night.

@jchauncey
Copy link
Member

Trying to google around and see if there are any open issues against fluentd or the plugins we have installed that might be causing this problem but I am not having any luck.

@jchauncey
Copy link
Member

@jayjun is the fluentd pod that is having the issues the same node that is also hosting nsq?

@jayjun
Copy link
Author

jayjun commented Feb 15, 2017

@jchauncey Nope, nsqd is in my other node. Here are all the containers in the same node as the errant fluentd,

k8s_myapp-web.dc7b656d_myapp-web-3789995061-cm7lv
k8s_deis-builder.666551bb_deis-builder-805890417-50stb
k8s_deis-controller.8c80e6bd_deis-controller-3898609164-1z205
k8s_deis-logger-fluentd.85506d36_deis-logger-fluentd-cdblz
k8s_deis-logger-redis.fb946a63_deis-logger-redis-304849759-c8bkp
k8s_deis-logger.6b520480_deis-logger-176328999-hvzd2
k8s_deis-monitor-influxdb.43a7b400_deis-monitor-influxdb-2729657543-7vv4m
k8s_deis-monitor-telegraf.a4b3af_deis-monitor-telegraf-5hvvn
k8s_deis-registry-token-refresher.a5a2d75f_deis-registry-token-refresher-3889501108-dk1l3
k8s_deis-workflow-manager.b65ee28a_deis-workflow-manager-2528409207-x38fm
k8s_fluentd-cloud-logging.fe89ba10_fluentd-cloud-logging-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_kube-proxy.9b4e199d_kube-proxy-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_tiller.8f4765f0_tiller-deploy-3299276078-rvd9k
k8s_POD.d8dbe16c_myapp-web-3789995061-cm7lv
k8s_POD.d8dbe16c_deis-builder-805890417-50stb
k8s_POD.d8dbe16c_deis-controller-3898609164-1z205
k8s_POD.d8dbe16c_deis-logger-176328999-hvzd2
k8s_POD.d8dbe16c_deis-logger-fluentd-cdblz
k8s_POD.d8dbe16c_deis-logger-redis-304849759-c8bkp
k8s_POD.d8dbe16c_deis-monitor-influxdb-2729657543-7vv4m
k8s_POD.d8dbe16c_deis-monitor-telegraf-5hvvn
k8s_POD.d8dbe16c_deis-registry-token-refresher-3889501108-dk1l3
k8s_POD.d8dbe16c_deis-workflow-manager-2528409207-x38fm
k8s_POD.d8dbe16c_fluentd-cloud-logging-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_POD.d8dbe16c_kube-proxy-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_POD.d8dbe16c_tiller-deploy-3299276078-rvd9k

@jayjun
Copy link
Author

jayjun commented Feb 22, 2017

Some updates. FYI, no one touched this cluster in the last 7 days.

Fluentd

fluentd

Memory leaks grow linearly for all Fluentd pods, not just one. deis-logger-fluentd-cdblz leaked 1 GB in 7 days!

My app was deployed on Feb 13, that's when slopes changed. So, it is somehow related to deployments.

Others

I can't spot any correlation with other pods, except for Workflow Manager. Not sure if they're related.

workflow-manager

Memory usage also [oddly] flatlined since 2 days ago.


I've gracefully restarted deis-logger-fluentd-cdblz today to see what happens.

@jayjun
Copy link
Author

jayjun commented Feb 22, 2017

@mboersma There's a memory fix in Fluentd v0.14.13.

@mboersma
Copy link
Member

There's a memory fix in Fluentd v0.14.13.

Excellent, certainly worth a try. I'll go revise #87.

@mattk42
Copy link

mattk42 commented Mar 6, 2017

Fyi, I have been running a pod with the fluentd upgrade for a bit. It has made a huge difference.
screenshot 2017-03-06 08 35 30

@jchauncey
Copy link
Member

Thats awesome @mattk42

@bacongobbler
Copy link
Member

I think this can be closed now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants