Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wrong config stack size] Crashes with "double free or corruption (!prev)" and other messages #1454

Closed
bortok opened this issue Jul 22, 2019 · 9 comments

Comments

@bortok
Copy link

bortok commented Jul 22, 2019

Bug Report

Describe the bug
Fluent-bit crashes within a few seconds with the messages like these:

[0] bro.dhcp: [1563832192.571684000, {"ecs"=>{"version"=>"1.1.0"}, "agent"=>{"type"=>"fluent-bit"}, "event"=>{"module"=>"bro", "dataset"=>"bro.dhcp"}, "observer"=>{"type"=>"bro"}, "log"=>{"original"=>"1563832192.571684	CEyTse12G1SVCUye94	10.36.237.199	-	-	clouddc-kibana-01	-	-	-	-	-	-	-	REQUEST	0.000000"}, "client"=>{"ip"=>"10.36.237.199", "mac"=>"-", "fqdn"=>"-", "domain"=>"-"}, "host"=>{"hostname"=>"clouddc-kibana-01"}}]
free(): invalid next size (normal)
Aborted

or

[0] bro.dhcp: [1563832192.571684000, {"ecs"=>{"version"=>"1.1.0"}, "agent"=>{"type"=>"fluent-bit"}, "event"=>{"module"=>"bro", "dataset"=>"bro.dhcp"}, "observer"=>{"type"=>"bro"}, "log"=>{"original"=>"1563832192.571684	CEyTse12G1SVCUye94	10.36.237.199	-	-	clouddc-kibana-01	-	-	-	-	-	-	-	REQUEST	0.000000"}, "client"=>{"ip"=>"10.36.237.199", "mac"=>"-", "fqdn"=>"-", "domain"=>"-"}, "host"=>{"hostname"=>"clouddc-kibana-01"}}]
double free or corruption (!prev)
Aborted

To Reproduce
See attached files (remove .txt where applicable):

Your Environment

  • Version used: v1.2.1
  • Configuration: fluent-bit-crash.conf
  • Environment name and version (e.g. Kubernetes? What version?): Zeek/BRO version 2.6.2
  • Server type and version: Linux clouddc-srv-03 4.19.0-5-amd64 Fix include paths #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux
  • Operating System and version: Debian 10
  • Filters and plugins: parsers.conf

Additional context
Can't tail Zeek dhcp.log. If only stdout is in output, it doesn't crash, so it seems ES output plugin is the problem.

@fujimotos
Copy link
Member

I spent this morning to investigate this issue, but failed to reproduce the crash

I've used Fluent Bit v1.2.1 built on Debian 10. With that, I was able to confirm
that Fluent Bit can parse & transfer the data in dhcp.log to Elasticsearch.

The reported symptom implies that there is a memory bug in out_es,
but I cannot point out any suspicious part.

@bortok Can you always reproduce the crash using the attached log file?

If you enabled Trace_Output in fluent-bit.conf, which line do you see
last? (I suspect that the line after that causes the crash)

@bortok
Copy link
Author

bortok commented Jul 29, 2019

Here are the logs with Trace_Output set to On.

First, when launched with empty DB - crashes right away.

sudo fluent-bit -c fluent-bit-crash.conf
Fluent Bit v1.2.1
Copyright (C) Treasure Data

[0] bro.dhcp: [1563832192.571684000, {"ecs"=>{"version"=>"1.1.0"}, "agent"=>{"type"=>"fluent-bit"}, "event"=>{"module"=>"bro", "dataset"=>"bro.dhcp"}, "observer"=>{"type"=>"bro"}, "log"=>{"original"=>"1563832192.571684	CEyTse12G1SVCUye94	10.36.237.199	-	-	clouddc-kibana-01	-	-	-	-	-	-	-	REQUEST	0.000000"}, "client"=>{"ip"=>"10.36.237.199", "mac"=>"-", "fqdn"=>"-", "domain"=>"-"}, "host"=>{"hostname"=>"clouddc-kibana-01"}}]
double free or corruption (!prev)
Aborted
[2019/07/29 12:45:03] [debug] [storage] [cio stream] new stream registered: tail.0
[2019/07/29 12:45:03] [ info] [storage] initializing...
[2019/07/29 12:45:03] [ info] [storage] in-memory
[2019/07/29 12:45:03] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/07/29 12:45:03] [ info] [engine] started (pid=37817)
[2019/07/29 12:45:03] [debug] [engine] coroutine stack size: 8192 bytes (8.0K)
[2019/07/29 12:45:03] [debug] [in_tail] inotify watch fd=20
[2019/07/29 12:45:03] [debug] [in_tail] scanning path /usr/local/etc/fluent-bit/test/dhcp.log
[2019/07/29 12:45:03] [debug] [in_tail] add to scan queue /usr/local/etc/fluent-bit/test/dhcp.log, offset=0
[2019/07/29 12:45:03] [debug] [out_es] host=clouddc-elastic-01.dcon.sclab port=9200 uri=/_bulk index=fluent-bit type=_doc
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 0 conditions and 3 rules
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 0 conditions and 1 rules
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 0 conditions and 8 rules
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 1 conditions and 1 rules
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 0 conditions and 7 rules
[2019/07/29 12:45:03] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/29 12:45:03] [debug] [router] match rule tail.0:stdout.0
[2019/07/29 12:45:03] [debug] [router] match rule tail.0:es.1
[2019/07/29 12:45:03] [ info] [sp] stream processor started
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Rule RENAME ts TO @timestamp : No keys matching ts found, not applying rule
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 17 elements, output map size 19 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 19 elements, output map size 20 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Rule RENAME client_fqdn TO host_name : Existing key client_fqdn found, not applying rule
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 20 elements, output map size 20 elements
[2019/07/29 12:45:03] [debug] [filter_modify] : Match for condition KEY_VALUE_EQUALS -
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 20 elements, output map size 19 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 19 elements, output map size 19 elements
[2019/07/29 12:45:03] [debug] [filter_modify] Input map size 19 elements, output map size 11 elements
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 11, will be 10, nested map size will be 2
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 10, will be 7, nested map size will be 4
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : No match found for server_
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : Outer map size is 7, will be 7, nested map size will be 1
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : No match found for zeek_
[2019/07/29 12:45:03] [debug] [filter_nest] Nest : No match found for dhcp_
[2019/07/29 12:45:03] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log read=491 lines=9
[2019/07/29 12:45:03] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log promote to TAIL_EVENT

Next, if launched again, w/o deleting the DB - nothing happens, since the file is not growing

sudo fluent-bit -c fluent-bit-crash.conf
Fluent Bit v1.2.1
Copyright (C) Treasure Data

^C[engine] caught signal (SIGINT)
[2019/07/29 12:45:07] [debug] [task] created task=0x56320d103b10 id=0 OK
[2019/07/29 12:45:34] [debug] [storage] [cio stream] new stream registered: tail.0
[2019/07/29 12:45:34] [ info] [storage] initializing...
[2019/07/29 12:45:34] [ info] [storage] in-memory
[2019/07/29 12:45:34] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/07/29 12:45:34] [ info] [engine] started (pid=37820)
[2019/07/29 12:45:34] [debug] [engine] coroutine stack size: 8192 bytes (8.0K)
[2019/07/29 12:45:34] [debug] [in_tail] inotify watch fd=20
[2019/07/29 12:45:34] [debug] [in_tail] scanning path /usr/local/etc/fluent-bit/test/dhcp.log
[2019/07/29 12:45:34] [debug] [in_tail] add to scan queue /usr/local/etc/fluent-bit/test/dhcp.log, offset=491
[2019/07/29 12:45:34] [debug] [out_es] host=clouddc-elastic-01.dcon.sclab port=9200 uri=/_bulk index=fluent-bit type=_doc
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 0 conditions and 3 rules
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 0 conditions and 1 rules
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 0 conditions and 8 rules
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 1 conditions and 1 rules
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 0 conditions and 7 rules
[2019/07/29 12:45:34] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/29 12:45:34] [debug] [router] match rule tail.0:stdout.0
[2019/07/29 12:45:34] [debug] [router] match rule tail.0:es.1
[2019/07/29 12:45:34] [ info] [sp] stream processor started
[2019/07/29 12:45:34] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log promote to TAIL_EVENT
[2019/07/29 12:45:56] [ info] [input] pausing tail.0

Finally, after deleting DB file - crashes again.

rm /usr/local/etc/fluent-bit/test/tail.db
sudo fluent-bit -c fluent-bit-crash.conf
Fluent Bit v1.2.1
Copyright (C) Treasure Data

[0] bro.dhcp: [1563832192.571684000, {"ecs"=>{"version"=>"1.1.0"}, "agent"=>{"type"=>"fluent-bit"}, "event"=>{"module"=>"bro", "dataset"=>"bro.dhcp"}, "observer"=>{"type"=>"bro"}, "log"=>{"original"=>"1563832192.571684	CEyTse12G1SVCUye94	10.36.237.199	-	-	clouddc-kibana-01	-	-	-	-	-	-	-	REQUEST	0.000000"}, "client"=>{"ip"=>"10.36.237.199", "mac"=>"-", "fqdn"=>"-", "domain"=>"-"}, "host"=>{"hostname"=>"clouddc-kibana-01"}}]
free(): invalid next size (normal)
Aborted
[2019/07/29 12:51:38] [debug] [in_tail] inotify watch fd=20
[2019/07/29 12:51:38] [debug] [in_tail] scanning path /usr/local/etc/fluent-bit/test/dhcp.log
[2019/07/29 12:51:38] [debug] [in_tail] add to scan queue /usr/local/etc/fluent-bit/test/dhcp.log, offset=0
[2019/07/29 12:51:38] [debug] [out_es] host=clouddc-elastic-01.dcon.sclab port=9200 uri=/_bulk index=fluent-bit type=_doc
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 0 conditions and 3 rules
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 0 conditions and 1 rules
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 0 conditions and 8 rules
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 1 conditions and 1 rules
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 0 conditions and 7 rules
[2019/07/29 12:51:38] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/29 12:51:38] [debug] [router] match rule tail.0:stdout.0
[2019/07/29 12:51:38] [debug] [router] match rule tail.0:es.1
[2019/07/29 12:51:38] [ info] [sp] stream processor started
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Rule RENAME ts TO @timestamp : No keys matching ts found, not applying rule
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 17 elements, output map size 19 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 19 elements, output map size 20 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Rule RENAME client_fqdn TO host_name : Existing key client_fqdn found, not applying rule
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 20 elements, output map size 20 elements
[2019/07/29 12:51:38] [debug] [filter_modify] : Match for condition KEY_VALUE_EQUALS -
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 20 elements, output map size 19 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 19 elements, output map size 19 elements
[2019/07/29 12:51:38] [debug] [filter_modify] Input map size 19 elements, output map size 11 elements
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 11, will be 10, nested map size will be 2
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 10, will be 7, nested map size will be 4
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : No match found for server_
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : Outer map size is 7, will be 7, nested map size will be 1
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : No match found for zeek_
[2019/07/29 12:51:38] [debug] [filter_nest] Nest : No match found for dhcp_
[2019/07/29 12:51:38] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log read=491 lines=9
[2019/07/29 12:51:38] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log promote to TAIL_EVENT
[2019/07/29 12:51:42] [debug] [task] created task=0x559d65a87b10 id=0 OK

@fujimotos
Copy link
Member

@bortok Thank you.

After tracing the log, it seems that the crush is occurring during Fluent Bit
trying to connect to Elasticsearch. Specifically, somewhere between line
556-566 in out_es/es.c
.

/* Get upstream connection */
u_conn = flb_upstream_conn_get(ctx->u);
if (!u_conn) {
    FLB_OUTPUT_RETURN(FLB_RETRY);
}

/* Convert format */
pack = elasticsearch_format(data, bytes, tag, tag_len, &bytes_out, ctx);
if (!pack) {
    flb_upstream_conn_release(u_conn);
    FLB_OUTPUT_RETURN(FLB_ERROR);
}

Looking from another angle, crash02.pcap.txt implies that a TCP connection
is created successfully, but the connection gets closed immediately after that.

06:55:53.001741 IP 172.20.100.132.38416 > 172.20.100.41.9200: tcp 0  # SYN
06:55:53.001973 IP 172.20.100.41.9200 > 172.20.100.132.38416: tcp 0  # ACK+SYN
06:55:53.002015 IP 172.20.100.132.38416 > 172.20.100.41.9200: tcp 0  # ACK
06:55:53.002612 IP 172.20.100.132.38416 > 172.20.100.41.9200: tcp 0  # FIN
06:55:53.002819 IP 172.20.100.41.9200 > 172.20.100.132.38416: tcp 0  # FIN+ACK
06:55:53.002855 IP 172.20.100.132.38416 > 172.20.100.41.9200: tcp 0  # ACK

To investigate the root cause, we need to track down the code path that gets
executed here.

So: can you give me a tracing log? You can get one foliowing these steps:

$ cmake .. -D FLB_TRACE=On
$ make
$ ./bin/fluent-bit -c fluent-bit-crash.conf.txt

And please also give me a system/build configuration report. You can get one by:

$ ./bin/fluent-bit --sos

@bortok
Copy link
Author

bortok commented Jul 30, 2019

@fujimotos before re-compiling, I tried running fluent-bit with ElasticSearch service being down. And then also what you asked.

  1. ElasticSearch service not running:
sudo fluent-bit -c /usr/local/etc/fluent-bit/fluent-bit-crash.conf
Fluent Bit v1.2.1
Copyright (C) Treasure Data

[0] bro.dhcp: [1563832192.571684000, {"ecs"=>{"version"=>"1.1.0"}, "agent"=>{"type"=>"fluent-bit"}, "event"=>{"module"=>"bro", "dataset"=>"bro.dhcp"}, "observer"=>{"type"=>"bro"}, "log"=>{"original"=>"1563832192.571684	CEyTse12G1SVCUye94	10.36.237.199	-	-	clouddc-kibana-01	-	-	-	-	-	-	-	REQUEST	0.000000"}, "client"=>{"ip"=>"10.36.237.199", "mac"=>"-", "fqdn"=>"-", "domain"=>"-"}, "host"=>{"hostname"=>"clouddc-kibana-01"}}]
free(): invalid next size (normal)
Aborted
[2019/07/30 08:58:53] [debug] [storage] [cio stream] new stream registered: tail.0
[2019/07/30 08:58:53] [ info] [storage] initializing...
[2019/07/30 08:58:53] [ info] [storage] in-memory
[2019/07/30 08:58:53] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/07/30 08:58:53] [ info] [engine] started (pid=49083)
[2019/07/30 08:58:53] [debug] [engine] coroutine stack size: 8192 bytes (8.0K)
[2019/07/30 08:58:53] [debug] [in_tail] inotify watch fd=20
[2019/07/30 08:58:53] [debug] [in_tail] scanning path /usr/local/etc/fluent-bit/test/dhcp.log
[2019/07/30 08:58:53] [debug] [in_tail] add to scan queue /usr/local/etc/fluent-bit/test/dhcp.log, offset=0
[2019/07/30 08:58:53] [debug] [out_es] host=clouddc-elastic-01.dcon.sclab port=9200 uri=/_bulk index=fluent-bit type=_doc
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 0 conditions and 3 rules
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 0 conditions and 1 rules
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 0 conditions and 8 rules
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 1 conditions and 1 rules
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 0 conditions and 7 rules
[2019/07/30 08:58:53] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/30 08:58:53] [debug] [router] match rule tail.0:stdout.0
[2019/07/30 08:58:53] [debug] [router] match rule tail.0:es.1
[2019/07/30 08:58:53] [ info] [sp] stream processor started
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Rule RENAME ts TO @timestamp : No keys matching ts found, not applying rule
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 17 elements, output map size 19 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 19 elements, output map size 20 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Rule RENAME client_fqdn TO host_name : Existing key client_fqdn found, not applying rule
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 20 elements, output map size 20 elements
[2019/07/30 08:58:53] [debug] [filter_modify] : Match for condition KEY_VALUE_EQUALS -
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 20 elements, output map size 19 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 19 elements, output map size 19 elements
[2019/07/30 08:58:53] [debug] [filter_modify] Input map size 19 elements, output map size 11 elements
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 11, will be 10, nested map size will be 2
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 10, will be 7, nested map size will be 4
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : No match found for server_
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : Outer map size is 7, will be 7, nested map size will be 1
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : No match found for zeek_
[2019/07/30 08:58:53] [debug] [filter_nest] Nest : No match found for dhcp_
[2019/07/30 08:58:53] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log read=491 lines=9
[2019/07/30 08:58:53] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log promote to TAIL_EVENT
[2019/07/30 08:58:57] [debug] [task] created task=0x55c16f593b50 id=0 OK

see crash03.pcap.txt

  1. After re-compiling with -D FLB_TRACE=On, with ElasticSearch up and running
./bin/fluent-bit -c /usr/local/etc/fluent-bit/fluent-bit-crash.conf
Fluent Bit v1.2.1
Copyright (C) Treasure Data

[0] bro.dhcp: [1563832192.571684000, {"ecs"=>{"version"=>"1.1.0"}, "agent"=>{"type"=>"fluent-bit"}, "event"=>{"module"=>"bro", "dataset"=>"bro.dhcp"}, "observer"=>{"type"=>"bro"}, "log"=>{"original"=>"1563832192.571684	CEyTse12G1SVCUye94	10.36.237.199	-	-	clouddc-kibana-01	-	-	-	-	-	-	-	REQUEST	0.000000"}, "client"=>{"ip"=>"10.36.237.199", "mac"=>"-", "fqdn"=>"-", "domain"=>"-"}, "host"=>{"hostname"=>"clouddc-kibana-01"}}]
double free or corruption (!prev)
Aborted
[2019/07/30 09:17:08] [debug] [storage] [cio stream] new stream registered: tail.0
[2019/07/30 09:17:08] [ info] [storage] initializing...
[2019/07/30 09:17:08] [ info] [storage] in-memory
[2019/07/30 09:17:08] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/07/30 09:17:08] [ info] [engine] started (pid=51894)
[2019/07/30 09:17:08] [debug] [engine] coroutine stack size: 8192 bytes (8.0K)
[2019/07/30 09:17:08] [debug] [in_tail] inotify watch fd=20
[2019/07/30 09:17:08] [debug] [in_tail] scanning path /usr/local/etc/fluent-bit/test/dhcp.log
[2019/07/30 09:17:08] [debug] [in_tail] add to scan queue /usr/local/etc/fluent-bit/test/dhcp.log, offset=0
[2019/07/30 09:17:08] [trace] [in_tail] path: /usr/local/etc/fluent-bit/test/dhcp.log
[2019/07/30 09:17:08] [debug] [out_es] host=clouddc-elastic-01.dcon.sclab port=9200 uri=/_bulk index=fluent-bit type=_doc
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 0 conditions and 3 rules
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 0 conditions and 1 rules
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 0 conditions and 8 rules
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 1 conditions and 1 rules
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 0 conditions and 7 rules
[2019/07/30 09:17:08] [debug] [filter_modify] Initialized modify filter with 0 conditions and 2 rules
[2019/07/30 09:17:08] [trace] [router] input=tail.0 tag=bro.dhcp
[2019/07/30 09:17:08] [debug] [router] match rule tail.0:stdout.0
[2019/07/30 09:17:08] [debug] [router] match rule tail.0:es.1
[2019/07/30 09:17:08] [ info] [sp] stream processor started
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 1 elements, output map size 3 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Rule RENAME ts TO @timestamp : No keys matching ts found, not applying rule
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 17 elements, output map size 19 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 19 elements, output map size 20 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Rule RENAME client_fqdn TO host_name : Existing key client_fqdn found, not applying rule
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 20 elements, output map size 20 elements
[2019/07/30 09:17:08] [debug] [filter_modify] : Match for condition KEY_VALUE_EQUALS -
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 20 elements, output map size 19 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 19 elements, output map size 19 elements
[2019/07/30 09:17:08] [debug] [filter_modify] Input map size 19 elements, output map size 11 elements
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 11, will be 11, nested map size will be 1
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 11, will be 10, nested map size will be 2
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 10, will be 10, nested map size will be 1
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 10, will be 7, nested map size will be 4
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : No match found for server_
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : Outer map size is 7, will be 7, nested map size will be 1
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : No match found for zeek_
[2019/07/30 09:17:08] [debug] [filter_nest] Nest : No match found for dhcp_
[2019/07/30 09:17:08] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log read=491 lines=9
[2019/07/30 09:17:08] [debug] [in_tail] file=/usr/local/etc/fluent-bit/test/dhcp.log promote to TAIL_EVENT
[2019/07/30 09:17:12] [trace] [task 0x561e1ce8fb50] created (id=0)
[2019/07/30 09:17:12] [debug] [task] created task=0x561e1ce8fb50 id=0 OK
[2019/07/30 09:17:12] [trace] [thread 0x561e1cd6a970] created (custom data at 0x561e1cd6a990, size=64
[2019/07/30 09:17:12] [trace] [thread 0x561e1cd6a900] created (custom data at 0x561e1cd6a920, size=64
[2019/07/30 09:17:12] [trace] [upstream] connection in process

see crash04.pcap.txt

System/build config report

./bin/fluent-bit --sos
Fluent Bit v1.2.1
Copyright (C) Treasure Data

[2019/07/30 09:20:34] [ info] [storage] initializing...
[2019/07/30 09:20:34] [ info] [storage] in-memory
[2019/07/30 09:20:34] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/07/30 09:20:34] [ info] [engine] started (pid=51916)

Fluent Bit Enterprise - SOS Report
==================================
The following report aims to be used by Fluent Bit and Fluentd Enterprise
Customers of Treasure Data. For more details visit:

    https://fluentd.treasuredata.com


[Fluent Bit]
    Edition		Community Edition
    Version		1.2.1
    Built Flags		 FLB_HAVE_PARSER FLB_HAVE_STREAM_PROCESSOR JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_SQLDB FLB_HAVE_TRACE FLB_HAVE_FLUSH_LIBCO FLB_HAVE_FORK FLB_HAVE_TIMESPEC_GET FLB_HAVE_GMTOFF FLB_HAVE_UNIX_SOCKET FLB_HAVE_PROXY_GO FLB_HAVE_SYSTEM_STRPTIME FLB_HAVE_LIBBACKTRACE FLB_HAVE_REGEX FLB_HAVE_LUAJIT FLB_HAVE_C_TLS FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY

[Operating System]
    Name		Linux
    Release		4.19.0-5-amd64
    Version		#1 SMP Debian 4.19.37-5 (2019-06-19)

[Hardware]
    Architecture	x86_64
    Processors		24

[Built Plugins]
    Inputs              cpu mem kmsg proc disk netif tail dummy head health storage_backlog stream_processor serial stdin syslog exec tcp mqtt lib forward random
    Filters             stdout throttle grep kubernetes parser nest modify lua record_modifier
    Outputs             azure bigquery counter es exit file forward http influxdb kafka-rest nats null plot splunk stackdriver stdout tcp td lib flowcounter gelf

[SERVER] Runtime configuration
    Flush		5.000000
    Daemon		Off
    Log_Level		Info

@bortok
Copy link
Author

bortok commented Jul 30, 2019

@fujimotos I think I was able to nail down the exact condition for the crash. It is in DNS resolution of the ES Host parameter. If an IP address is used, fluent-bit work fine. Also, if there an entry in /etc/hosts for the hostname (FQDN) it also works fine. Only when an actual DNS query needs to be made to resolve the host, it crashes.

@edsiper
Copy link
Member

edsiper commented Jul 30, 2019

The root cause of the problem is the defined stack size of 8kb for co-routines. If you remove the option Coro_Stack_Size the problem goes away.

The problem is on getaddrinfo() system call, when retrieving DNS records it uses alloca(3) system call to perform memory allocation, so if there are many records we need more memory, so the problem is that alloca(3) takes memory from the stack instead of the heap.

@bortok
Copy link
Author

bortok commented Jul 30, 2019

@edsiper removing Coro_Stack_Size does solve the problem with DNS resolution. I need to look back why Coro_Stack_Size was required in first place. Thank you!

@edsiper edsiper changed the title Crashes with "double free or corruption (!prev)" and other messages [wrong config stack size] Crashes with "double free or corruption (!prev)" and other messages Jul 30, 2019
@edsiper
Copy link
Member

edsiper commented Jul 30, 2019

@bortok I am closing this ticket as a config issue. If you found that default stack size is a problem please let me know.

@edsiper edsiper closed this as completed Jul 30, 2019
@fujimotos
Copy link
Member

@edsiper Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants