-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wrong config stack size] Crashes with "double free or corruption (!prev)" and other messages #1454
Comments
I spent this morning to investigate this issue, but failed to reproduce the crash I've used Fluent Bit v1.2.1 built on Debian 10. With that, I was able to confirm The reported symptom implies that there is a memory bug in @bortok Can you always reproduce the crash using the attached log file? If you enabled |
Here are the logs with Trace_Output set to On. First, when launched with empty DB - crashes right away.
Next, if launched again, w/o deleting the DB - nothing happens, since the file is not growing
Finally, after deleting DB file - crashes again.
|
@bortok Thank you. After tracing the log, it seems that the crush is occurring during Fluent Bit /* Get upstream connection */
u_conn = flb_upstream_conn_get(ctx->u);
if (!u_conn) {
FLB_OUTPUT_RETURN(FLB_RETRY);
}
/* Convert format */
pack = elasticsearch_format(data, bytes, tag, tag_len, &bytes_out, ctx);
if (!pack) {
flb_upstream_conn_release(u_conn);
FLB_OUTPUT_RETURN(FLB_ERROR);
} Looking from another angle, crash02.pcap.txt implies that a TCP connection
To investigate the root cause, we need to track down the code path that gets So: can you give me a tracing log? You can get one foliowing these steps:
And please also give me a system/build configuration report. You can get one by:
|
@fujimotos before re-compiling, I tried running fluent-bit with ElasticSearch service being down. And then also what you asked.
see crash03.pcap.txt
see crash04.pcap.txt System/build config report
|
@fujimotos I think I was able to nail down the exact condition for the crash. It is in DNS resolution of the ES Host parameter. If an IP address is used, fluent-bit work fine. Also, if there an entry in /etc/hosts for the hostname (FQDN) it also works fine. Only when an actual DNS query needs to be made to resolve the host, it crashes. |
The root cause of the problem is the defined stack size of 8kb for co-routines. If you remove the option Coro_Stack_Size the problem goes away. The problem is on getaddrinfo() system call, when retrieving DNS records it uses alloca(3) system call to perform memory allocation, so if there are many records we need more memory, so the problem is that alloca(3) takes memory from the stack instead of the heap. |
@edsiper removing Coro_Stack_Size does solve the problem with DNS resolution. I need to look back why Coro_Stack_Size was required in first place. Thank you! |
@bortok I am closing this ticket as a config issue. If you found that default stack size is a problem please let me know. |
@edsiper Thank you! |
Bug Report
Describe the bug
Fluent-bit crashes within a few seconds with the messages like these:
or
To Reproduce
See attached files (remove .txt where applicable):
fluent-bit-crash.conf.txt
parsers.conf.txt
crash02.pcap.txt
Your Environment
Additional context
Can't tail Zeek dhcp.log. If only stdout is in output, it doesn't crash, so it seems ES output plugin is the problem.
The text was updated successfully, but these errors were encountered: