Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Fluentd Buffer Queue Size #1877

Merged
merged 1 commit into from
Mar 5, 2019
Merged

Conversation

christianberg
Copy link
Collaborator

This increases the number of chunks that can be queued to be sent to
S3. The documentation claims that this number is unlimited when
not set, but the default was in fact recently set to 1, which
causes a backlog of chunks to build up when there is a larger number
of log files.

This increases the number of chunks that can be queued to be sent to
S3. The [documentation][1] claims that this number is unlimited when
not set, but the default was in fact [recently set][2] to `1`, which
causes a backlog of chunks to build up when there is a larger number
of log files.

[1]: https://docs.fluentd.org/v1.0/articles/buffer-section#buffering-parameters
[2]: fluent/fluentd#2173

Signed-off-by: Christian Berg <berg.christian@gmail.com>
@christianberg christianberg force-pushed the increase-fluentd-queue branch from 3c59f2c to cd297c1 Compare March 5, 2019 08:06
@christianberg
Copy link
Collaborator Author

👍

1 similar comment
@mikkeloscar
Copy link
Contributor

👍

@hjacobs
Copy link
Contributor

hjacobs commented Mar 5, 2019

How do we know that 100 is a good number?

@christianberg
Copy link
Collaborator Author

I admit that 100 is somewhat arbitrary.

After looking into the Fluentd source code and running a few tests (which I'm documenting in more detail and will link here when done), I believe it is safe to set this relatively high. Both the buffered chunks and the queued chunks are stored on disk, and the only difference between the two (that I can see), is that the former have a b in the filename, while the latter have a q.

This means that no additional resources are consumed by putting more chunks into the queue. But chunks are only moved into the queue at regular intervals (~ every 5 seconds with our time-slicing settings, if I followed the calculations correctly), and if the queue is too small (i.e. the default length of 1), the S3 upload thread is starved and a backlog builds up.

100 was sufficient in my tests to not starve the thread, but a lower number would probably also be fine. When the log throughput gets high enough for the queue to fill up with 100 chunks, other factors are actually more limiting (specifically CPU consumption of Fluentd) and would need to be addressed first. 100,000 events/minute can be handled with the current settings.

@aermakov-zalando
Copy link
Contributor

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants