Grafana reports Fluentd memory leak #85

jayjun · 2017-02-14T13:54:26Z

Deis v2.11.0
Kubernetes v1.5.2

Just a week in production with one Deis app. Memory usage grew exponentially from 100 MB to around 380 MB. Interestingly only one pod leaks.

Very low volume site, and certainly no runaway logs from my app. However, nsqd does log like a madman.

Created an issue so others may post similar findings.

stuszynski · 2017-02-14T15:22:41Z

Hi,

we encounter a similar issue.

Environment:

Deis v2.10.0
Kubernetes v1.5.1
remote syslog export outside of the cluster.

We decided to set a memory limit for fluentd but in fact, it just makes fluentd to be killed by OOM more frequently. Unfortunately, after that we encounter another issue. After each OOM kill of the Flutentd container there was a very high cpu/io spike (~50k iops) on the node.

We didn't spot any suspicious activity near Flutentd containers or log pipeline, but we suspect that usage is growing much faster when Flutentd is logging non-ascii chars.

Our only solution so far, is a graceful restart of flunetd docker containers each night.

jchauncey · 2017-02-14T16:13:34Z

Trying to google around and see if there are any open issues against fluentd or the plugins we have installed that might be causing this problem but I am not having any luck.

jchauncey · 2017-02-14T16:14:12Z

@jayjun is the fluentd pod that is having the issues the same node that is also hosting nsq?

jayjun · 2017-02-15T02:14:41Z

@jchauncey Nope, nsqd is in my other node. Here are all the containers in the same node as the errant fluentd,

k8s_myapp-web.dc7b656d_myapp-web-3789995061-cm7lv
k8s_deis-builder.666551bb_deis-builder-805890417-50stb
k8s_deis-controller.8c80e6bd_deis-controller-3898609164-1z205
k8s_deis-logger-fluentd.85506d36_deis-logger-fluentd-cdblz
k8s_deis-logger-redis.fb946a63_deis-logger-redis-304849759-c8bkp
k8s_deis-logger.6b520480_deis-logger-176328999-hvzd2
k8s_deis-monitor-influxdb.43a7b400_deis-monitor-influxdb-2729657543-7vv4m
k8s_deis-monitor-telegraf.a4b3af_deis-monitor-telegraf-5hvvn
k8s_deis-registry-token-refresher.a5a2d75f_deis-registry-token-refresher-3889501108-dk1l3
k8s_deis-workflow-manager.b65ee28a_deis-workflow-manager-2528409207-x38fm
k8s_fluentd-cloud-logging.fe89ba10_fluentd-cloud-logging-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_kube-proxy.9b4e199d_kube-proxy-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_tiller.8f4765f0_tiller-deploy-3299276078-rvd9k
k8s_POD.d8dbe16c_myapp-web-3789995061-cm7lv
k8s_POD.d8dbe16c_deis-builder-805890417-50stb
k8s_POD.d8dbe16c_deis-controller-3898609164-1z205
k8s_POD.d8dbe16c_deis-logger-176328999-hvzd2
k8s_POD.d8dbe16c_deis-logger-fluentd-cdblz
k8s_POD.d8dbe16c_deis-logger-redis-304849759-c8bkp
k8s_POD.d8dbe16c_deis-monitor-influxdb-2729657543-7vv4m
k8s_POD.d8dbe16c_deis-monitor-telegraf-5hvvn
k8s_POD.d8dbe16c_deis-registry-token-refresher-3889501108-dk1l3
k8s_POD.d8dbe16c_deis-workflow-manager-2528409207-x38fm
k8s_POD.d8dbe16c_fluentd-cloud-logging-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_POD.d8dbe16c_kube-proxy-gke-cluster-1-default-pool-0d4b1875-gf74
k8s_POD.d8dbe16c_tiller-deploy-3299276078-rvd9k

jayjun · 2017-02-22T00:46:35Z

Some updates. FYI, no one touched this cluster in the last 7 days.

Fluentd

Memory leaks grow linearly for all Fluentd pods, not just one. deis-logger-fluentd-cdblz leaked 1 GB in 7 days!

My app was deployed on Feb 13, that's when slopes changed. So, it is somehow related to deployments.

Others

I can't spot any correlation with other pods, except for Workflow Manager. Not sure if they're related.

Memory usage also [oddly] flatlined since 2 days ago.

I've gracefully restarted deis-logger-fluentd-cdblz today to see what happens.

jayjun · 2017-02-22T01:27:29Z

@mboersma There's a memory fix in Fluentd v0.14.13.

in_tail: Untracked files should be removed from watching list to avoid memory bloat
in_tail: Untracked files should be removed from watching list. fix #1455 fluent/fluentd#1467

mboersma · 2017-02-22T16:55:13Z

There's a memory fix in Fluentd v0.14.13.

Excellent, certainly worth a try. I'll go revise #87.

mattk42 · 2017-03-06T15:39:45Z

Fyi, I have been running a pod with the fluentd upgrade for a bit. It has made a huge difference.

jchauncey · 2017-03-06T16:42:30Z

Thats awesome @mattk42

bacongobbler · 2017-03-16T14:01:02Z

I think this can be closed now.

mboersma mentioned this issue Feb 14, 2017

chore(Dockerfile): update fluentd to 0.14.13 #87

Merged

bacongobbler closed this as completed Mar 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grafana reports Fluentd memory leak #85

Grafana reports Fluentd memory leak #85

jayjun commented Feb 14, 2017 •

edited

Loading

stuszynski commented Feb 14, 2017

jchauncey commented Feb 14, 2017

jchauncey commented Feb 14, 2017

jayjun commented Feb 15, 2017 •

edited

Loading

jayjun commented Feb 22, 2017 •

edited

Loading

jayjun commented Feb 22, 2017

mboersma commented Feb 22, 2017

mattk42 commented Mar 6, 2017

jchauncey commented Mar 6, 2017

bacongobbler commented Mar 16, 2017

Grafana reports Fluentd memory leak #85

Grafana reports Fluentd memory leak #85

Comments

jayjun commented Feb 14, 2017 • edited Loading

stuszynski commented Feb 14, 2017

jchauncey commented Feb 14, 2017

jchauncey commented Feb 14, 2017

jayjun commented Feb 15, 2017 • edited Loading

jayjun commented Feb 22, 2017 • edited Loading

Fluentd

Others

jayjun commented Feb 22, 2017

mboersma commented Feb 22, 2017

mattk42 commented Mar 6, 2017

jchauncey commented Mar 6, 2017

bacongobbler commented Mar 16, 2017

jayjun commented Feb 14, 2017 •

edited

Loading

jayjun commented Feb 15, 2017 •

edited

Loading

jayjun commented Feb 22, 2017 •

edited

Loading