-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[solved] Memory usage on in_tail (WAL) #3073
Comments
@loburm can you please try the following new RC image:
|
Hi Eduardo, |
is there a way you can remove some filters to identify the possible root cause of the leak ? |
So far I have run three more tests:
As we can see problem should be somewhere in the next config:
I'm going to run a few more tests to figure out which plugin is causing the problem here. |
As a side note I was testing with 1.7.0-rc9. |
Last result: I have been running different combinations:
In all cases we see a similar pattern, so I assume memory leak we have in tail plugin. @edsiper should I try some other options to narrow problem down further? |
Thanks for the repro, some initial thoughts are it might be related to tag_regex. Any possibility of running tail without it @loburm , appreciate your help? |
Result from the latest test: As it's visible from the graph, regression was introduced just before 1.6.0. I have used the simplest possible configuration to exclude problems with output plugin:
|
I have tried to use binary search approach and managed to find a change, after which we see an issue: a79bc25 Additional investigation has shown that reverting line:
to:
resolves an issue for all Fluent Bit versions. I have cherrypicked, built and run tests for 1.6.0, 1.7.0 and 1.7.2 |
@edsiper how do you think, can we revert that line to fix an issue? |
@edsiper My pods are getting in crashLoopBackOff after some time and when I removed Db.sync and all DB-related config it worked fine, I am suspecting this might due to the same issue. `a0d00qf@m-c02dj3hfmd6n fluent-bit % kubectl logs -f fluent-bit-lv7h4 -n logging
[2021/03/14 12:00:48] [engine] caught signal (SIGSEGV)
|
I am testing that specific issue..,. |
NOTE: this is not a memory leak. It's the extra memory map used by WAL feature. I will take two actions:
|
Wow, but that's a lot (4MB per input plugin). Thanks for the investigation. |
everything has a price: 4MB for less I/O to disk :) |
|
Bug Report
Describe the bug
I have managed to reproduce issue both in 1.6.10 and 1.7.0-rc5. Also it seems to be affecting 1.5.7 too, but at least I don't see huge growth at the beginning that eats additional 5-6 MB. On 1.3.11 I don't see such huge problem (bigger amount of RAM explained by initial load that the cluster was experiencing).
Screenshots
Your Environment
I have removed a few input blocks that haven't got any logs during a test, but can provide full config if needed. Our metrics show that overall load during that period of time was near 400 bytes/s. If was mainly coming from three sources: node-problem-detector, kubelet and kube_containers_kube-system.
The text was updated successfully, but these errors were encountered: