in_exec with out_exec in one config create file descriptors leak #3627

LukasJerabek · 2022-02-09T14:51:18Z

Describe the bug

Hello, I think I have found the core issue of my problems with fluentd configuration, could someone have look on this please?
I have set everything according to https://docs.fluentd.org/installation/before-install.
Using configuration and two scripts below I can notice that if I do these for a while:

cd /proc/<fluentd PID>/ 
find fd | wc -l

Then I can see that number of file descriptors is rising infinitely eventually resulting in memory problems and "too many open files" error because of reaching fd limit (ulimit -n). This only occures if both in_exec and out_exec are used, since if I do following swaps, the number of file descriptors stabilizes. I can do either of these and the problems will disappear.

File descriptors count is stabilized if I swap match section for this:

<match hardware.memory>
  @type file
  append true
  path {{ home }}/fluentd/test_bug2.log
  add_path_suffix false
  <buffer []>
    @type memory
    flush_mode immediate
  </buffer>
  <format>
    @type json
  </format>
</match>

or swap source and filter section for this:

<source>
  @type sample
  sample {"mem_available":"1.5","swap_total":"1.5","swap_free":"1.2","time_local":"2022-02-09T10:00:00+0100"}
  tag hardware.memory
</source>

To Reproduce

Install fluentd:
gem install fluentd --version 1.14.5 --no-doc
Create folder "fluentd" in home folder
Exchange {{ home }} or {{ user }} for your real values in every file below.
Put fluentd.conf (with configuration below), get_memory_metrics.sh, psql_memory.sh into {{ home }}/fluentd/
do chmod +x on get_memory_metrics.sh, psql_memory.sh
run {{ home }}/bin/fluentd 1.14.5 -c {{ home }}/fluentd/fluentd.conf -vv

Now start checking the file descriptors in /proc/<fluentd PID>/ with:
find fd | wc -l
Keep cheking it for some time, youll see the count of file descriptors never stops rising.

Hopefully I did not miss anything.

Expected behavior

Number of file descriptors is not rising infinitely and not causing memory leaks.

Your Environment

- Fluentd version: 1.14.5 (installed as gem)
- TD Agent version: -
- Operating system: Rocky Linux 8
- Kernel version: 4.18.0-348.12.2.el8_5.x86_64
- Ruby version: 2.5.9

Your Configuration

<system>
  process_name fluentd_monitor
</system>

<source>
  @type exec
  tag hardware.memory
  run_interval 10s
  command {{ home }}/fluentd/get_memory_metrics.sh
  <parse>
    @type json
    types mem_available:float,swap_total:float,swap_free:float
  </parse>
</source>

<filter hardware.memory>
  @type record_transformer
  enable_ruby
  <record>
    time_local ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
  </record>
</filter>

<match hardware.memory>
  @type exec
  command {{ home }}/fluentd/psql_memory.sh first_argument second_argument
  <format>
    @type csv
    force_quotes false
    delimiter |
    fields time_local,mem_available,swap_total,swap_free
  </format>
  <buffer time>
    @type file
    path {{ home }}/fluentd/hardware.memory.buff
    timekey 50s
    timekey_wait 0s
    flush_mode lazy
    total_limit_size 200MB
    retry_type periodic  # exponential backoff is broken https://github.com/fluent/fluentd/issues/3609
    retry_max_times 0  # need to go to secondary as soon as possible https://github.com/fluent/fluentd/issues/3610
  </buffer>
  <secondary>
    @type secondary_file
    directory {{ home }}/fluentd/secondary/
    basename dump.${chunk_id}
  </secondary>
</match>

Your Error Log

No errors in fluentd.log using --v

Additional context

get_memory_metrics.sh

#!/bin/bash
mem_info=$(awk '{ \
    if (/MemAvailable:/) {mem_available=$2}; \
    if (/SwapTotal:/) {swap_total=$2}; \
    if (/SwapFree:/) {swap_free=$2}; \
    if (mem_available && swap_total && swap_free) \
      {printf "%.3f %.3f %.3f", mem_available/1024/1024, swap_total/1024/1024, swap_free/1024/1024;exit} \
  }' /proc/meminfo)
read -r mem_available swap_total swap_free<<<"${mem_info}"

printf '{"mem_available":"%s","swap_total":"%s","swap_free":"%s"}' \
"$mem_available" \
"$swap_total" \
"$swap_free"

psql_memory.sh

#!/bin/sh
echo $1 $2 $3

The text was updated successfully, but these errors were encountered:

kenhys · 2022-02-15T06:26:59Z

It seems that it is reproducible with a more simple in_exec example. it may also reproducible for out_exec side.

<system>
  process_name fluentd_monitor
</system>

<source>
  @type exec
  tag hardware.memory
  run_interval 10s
  command PATH_TO_COMMAND
  <parse>
    @type json
    types mem_available:float,swap_total:float,swap_free:float
  </parse>
</source>

<match hardware.memory>
  @type stdout
</match>

LukasJerabek · 2022-02-15T07:47:10Z

I fixed get_memory_metrics.sh in my example which did not work, prinf part was missing. I am gonna try your example, but I was not able to reproduce this when having only in_exec or out_exec in config, both had to be present.

LukasJerabek · 2022-02-15T08:07:10Z

Yea, I can confirm that your example causes this too. So its not the combination of both that causes this.

EDIT: wrong, it was too soon to judge

LukasJerabek · 2022-02-15T08:46:19Z

I am sorry, it was too soon to judge, it has stabilized on around 174 of FDs. So my first thought still applies, I observe only the combination of both in_exec and out_exec to cause infinite rising of FDs.

github-actions · 2022-06-13T10:02:30Z

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

LukasJerabek · 2022-06-13T11:34:40Z

Bump.

ashie · 2022-07-28T12:59:36Z

I checked it but can't reproduce.
I modified your config a little because they doesn't work properly for me:

fluent.conf

<system>
  process_name fluentd_monitor
</system>

<source>
  @type exec
  tag hardware.memory
  run_interval 10s
  command /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/get_memory_metrics.sh
  <parse>
    @type json
    types mem_available:float,swap_total:float,swap_free:float
  </parse>
</source>

<filter hardware.memory>
  @type record_transformer
  enable_ruby
  <record>
    time_local ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
  </record>
</filter>

<match hardware.memory>
  @type exec
  command /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/psql_memory.sh
  <format>
    @type csv
    force_quotes false
    delimiter |
    fields time_local,mem_available,swap_total,swap_free
  </format>
  <buffer time>
    @type file
    path /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/hardware.memory.buff
    timekey 50s
    timekey_wait 0s
    flush_mode lazy
    total_limit_size 200MB
    retry_type periodic  # exponential backoff is broken https://github.com/fluent/fluentd/issues/3609
    retry_max_times 0  # need to go to secondary as soon as possible https://github.com/fluent/fluentd/issues/3610
  </buffer>
</match>

get_memory_metrics.sh:

#!/bin/bash
mem_info=$(awk '{ \
    if (/MemAvailable:/) {mem_available=$2}; \
    if (/SwapTotal:/) {swap_total=$2}; \
    if (/SwapFree:/) {swap_free=$2}; \
    if (mem_available && swap_total && swap_free) \
      {printf "%.3f %.3f %.3f", mem_available/1024/1024, swap_total/1024/1024, swap_free/1024/1024;exit} \
  }' /proc/meminfo)
read -r mem_available swap_total swap_free<<<"${mem_info}"

printf '{"mem_available":"%s","swap_total":"%s","swap_free":"%s"}' \
"$mem_available" \
"$swap_total" \
"$swap_free"

(<<< doesn't work for me so I modified the shebang to use bash explicitly.)

psql_memory.sh:

#!/bin/sh
cat $1 >> /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/memusage.txt

The argument of the command is a path of buffer chunk file.
Dump the contents to a file to confirm the command is called expectedly.

Environment:

Ubuntu 22.04
Ruby 3.0.2 & 2.7.5
Fluentd 1.15.1

Results:

$ sudo find /proc/2591/fd | wc -l
13
$ cat /home/aho/Projects/Fluentd/fluentd/issues/3627-exec-memleak/memusage.txt
2022-07-28T21:43:28+0900|22.107|15.999|15.983
2022-07-28T21:43:38+0900|22.107|15.999|15.983
2022-07-28T21:43:48+0900|22.107|15.999|15.983
...

Count of fd is always 13

ashie · 2022-07-28T13:30:38Z

It might depend on installed gems (low possibility though), please try updating them.

LukasJerabek · 2022-07-28T13:36:56Z

(<<< doesn't work for me so I modified the shebang to use bash explicitly.)

I dont see any difference in script you posted? There is bash shebang as well in my version of the script.

Only difference I see is that you are missing secondary section. Could you try that with that? The {{ home }} was meant to be rewritten according to you environment.

I will try again later and post if I still observe the problem.

LukasJerabek · 2022-07-28T13:41:19Z

Also did you try sudo find /proc/2591/fd | wc -l in like 15 minutes after fluentd was started or only at the begging?

ashie · 2022-07-28T13:53:37Z

Only difference I see is that you are missing secondary section. Could you try that with that?

I also tried secondary but got almost same result.

Also did you try sudo find /proc/2591/fd | wc -l in like 15 minutes after fluentd was started or only at the begging?

I tried it every 1 sec for 2 hours.

LukasJerabek · 2022-07-28T19:04:12Z

I still see the problem even with version 1.15.1. Roughly after 10 minutes there is 90 in my case, after 20 minutes 160. 30 minutes 233 and so on...

install vagrant https://www.vagrantup.com/downloads
create VagrantFile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  config.vm.box = "rockylinux/8"
  config.vm.box_check_update = true
  config.vm.hostname = "rocky8"
  config.vm.network "private_network", ip: "192.168.56.121"
  config.vm.define "rocky8"
  config.vm.provider "virtualbox" do |vb|
    vb.name = "rocky8"
    vb.gui = false
    vb.memory = 1024
    vb.cpus = 1
    vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 1000 ]
  end
  config.vm.network :forwarded_port, guest: 22, host: 4444, id: 'ssh'
  config.ssh.extra_args = ["-L", "8000:localhost:8000"]
end

vagrant up
vagrant ssh
sudo dnf -y module enable ruby:3.0
sudo dnf install gcc make cmake ruby ruby-devel
gem install fluentd --version 1.15.1 --no-doc
cd ~
mkdir fluentd
cd fluentd
create fluentd.conf, get_memory_metric.sh, psql_memory.sh from my first post above
swich {{ home }} for /home/vagrant in all files from my first post above
chmod +x on both .sh files
cd ~/fluentd
~/bin/fluentd -c fluentd.conf
start another shell (vagrant ssh from vagrant folder again)
ps aux | grep fluentd
take PID of worker:fluentd_monitor
cd /proc/<PID>
find fd | wc -l
observe forever rising number of fds

LukasJerabek · 2022-07-28T19:07:13Z

I use vagrant in this example, but I have seen this in rented KVM cloud.

ashie · 2022-07-29T12:28:01Z

Thanks, it reproduced on Rocky Linux 8:

[vagrant@localhost ~]$ ls -l /proc/5844/fd/
total 0
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 0 -> /dev/pts/1
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 1 -> /dev/pts/1
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 10 -> 'pipe:[58579]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 12 -> 'pipe:[58126]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 13 -> 'pipe:[58620]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 14 -> 'pipe:[58589]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 15 -> 'pipe:[58600]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 16 -> 'pipe:[58608]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 17 -> 'pipe:[58273]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 18 -> 'pipe:[58630]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 19 -> 'pipe:[58184]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 2 -> /dev/pts/1
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 20 -> 'pipe:[58642]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:20 21 -> 'pipe:[58654]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 22 -> 'pipe:[58674]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 23 -> 'pipe:[58336]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 24 -> 'pipe:[58697]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 25 -> 'pipe:[58679]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 26 -> 'pipe:[58297]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 27 -> 'pipe:[58729]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 28 -> 'pipe:[58739]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 29 -> 'pipe:[58817]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 3 -> 'anon_inode:[eventfd]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 30 -> 'pipe:[58346]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:21 31 -> 'pipe:[58322]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 32 -> 'pipe:[58776]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 33 -> 'pipe:[58784]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 34 -> 'pipe:[58793]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 36 -> 'pipe:[58826]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 37 -> 'pipe:[58835]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:22 38 -> 'pipe:[58802]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 39 -> 'pipe:[58846]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 4 -> 'anon_inode:[eventfd]'
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 40 -> 'pipe:[58852]'
l-wx------. 1 vagrant vagrant 64 Jul 29 12:23 41 -> /vagrant/3627-exec-memleak/hardware.memory.buff/buffer.b5e4f0b73baa41208d74aab41afc22048.log.meta
lr-x------. 1 vagrant vagrant 64 Jul 29 12:23 43 -> 'pipe:[58861]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 5 -> 'anon_inode:[eventpoll]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 6 -> 'anon_inode:[eventpoll]'
lrwx------. 1 vagrant vagrant 64 Jul 29 12:20 8 -> /vagrant/3627-exec-memleak/hardware.memory.buff/buffer.b5e4f0b73baa41208d74aab41afc22048.log
l-wx------. 1 vagrant vagrant 64 Jul 29 12:20 9 -> 'pipe:[58532]'

ashie · 2022-07-29T14:09:35Z

Analyzed data by fluent-plugin-sigdump:

They aren't decreased even after major GC is triggered.

Pipes for child processes aren't closed explicitly after exiting child processes. Fix #3627 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie · 2022-08-04T00:19:25Z

With #3844 file descriptor leak seems fixed, but still following objects are linearly incremented:

Pipes for child processes aren't closed explicitly after exiting child processes. Fix #3627 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie · 2022-08-10T06:03:20Z

Although file descriptor leak is resolved, not yet resolved the problem described above.
Reopen this.

LukasJerabek · 2022-11-22T10:18:43Z

Hi, any luck on this one?

github-actions · 2023-04-12T10:01:42Z

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

LukasJerabek · 2023-04-12T10:09:31Z

I think this is still not solved completely...

github-actions · 2023-05-13T10:01:24Z

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

LukasJerabek · 2023-05-13T10:59:33Z

I think this is still not solved completely...

LukasJerabek mentioned this issue Feb 9, 2022

File descriptors leak #3620

Closed

LukasJerabek changed the title ~~in_exec with out_exec in one config creates file descriptors leak~~ in_exec with out_exec in one config create file descriptors leak Feb 9, 2022

kenhys mentioned this issue Feb 15, 2022

in_exec: fix a file descriptor leaks #3634

Closed

kenhys moved this to In-Progress in Fluentd Kanban Feb 15, 2022

kenhys added this to Fluentd Kanban Feb 15, 2022

ashie added the memory label Mar 15, 2022

github-actions bot added the stale label Jun 13, 2022

ashie removed the stale label Jun 13, 2022

ashie removed the status in Fluentd Kanban Jul 19, 2022

ashie moved this to In-Progress in Fluentd Kanban Jul 28, 2022

ashie added a commit that referenced this issue Aug 3, 2022

child_process: Plug file descriptor leak

083a66e

Pipes for child processes aren't closed explicitly after exiting child processes. Fix #3627 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie mentioned this issue Aug 3, 2022

child_process: Plug file descriptor leak #3844

Merged

ashie added a commit that referenced this issue Aug 9, 2022

child_process: Plug file descriptor leak

e9a9294

Pipes for child processes aren't closed explicitly after exiting child processes. Fix #3627 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie closed this as completed in #3844 Aug 10, 2022

Repository owner moved this from In-Progress to Done in Fluentd Kanban Aug 10, 2022

ashie reopened this Aug 10, 2022

github-actions bot added the stale label Apr 12, 2023

daipom added moreinfo Missing version, need reproducible steps, need to investigate more and removed stale labels Apr 12, 2023

github-actions bot added the stale label May 13, 2023

github-actions bot removed the stale label May 14, 2023

ashie added bug Something isn't working and removed moreinfo Missing version, need reproducible steps, need to investigate more labels May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in_exec with out_exec in one config create file descriptors leak #3627

in_exec with out_exec in one config create file descriptors leak #3627

LukasJerabek commented Feb 9, 2022 •

edited by ashie

Loading

kenhys commented Feb 15, 2022

LukasJerabek commented Feb 15, 2022 •

edited

Loading

LukasJerabek commented Feb 15, 2022 •

edited

Loading

LukasJerabek commented Feb 15, 2022

github-actions bot commented Jun 13, 2022

LukasJerabek commented Jun 13, 2022

ashie commented Jul 28, 2022 •

edited

Loading

ashie commented Jul 28, 2022

LukasJerabek commented Jul 28, 2022 •

edited

Loading

LukasJerabek commented Jul 28, 2022

ashie commented Jul 28, 2022

LukasJerabek commented Jul 28, 2022 •

edited

Loading

LukasJerabek commented Jul 28, 2022

ashie commented Jul 29, 2022 •

edited

Loading

ashie commented Jul 29, 2022 •

edited

Loading

ashie commented Aug 4, 2022

ashie commented Aug 10, 2022 •

edited

Loading

LukasJerabek commented Nov 22, 2022

github-actions bot commented Apr 12, 2023

LukasJerabek commented Apr 12, 2023

github-actions bot commented May 13, 2023

LukasJerabek commented May 13, 2023

in_exec with out_exec in one config create file descriptors leak #3627

in_exec with out_exec in one config create file descriptors leak #3627

Comments

LukasJerabek commented Feb 9, 2022 • edited by ashie Loading

Describe the bug

To Reproduce

Expected behavior

Your Environment

Your Configuration

Your Error Log

Additional context

kenhys commented Feb 15, 2022

LukasJerabek commented Feb 15, 2022 • edited Loading

LukasJerabek commented Feb 15, 2022 • edited Loading

LukasJerabek commented Feb 15, 2022

github-actions bot commented Jun 13, 2022

LukasJerabek commented Jun 13, 2022

ashie commented Jul 28, 2022 • edited Loading

ashie commented Jul 28, 2022

LukasJerabek commented Jul 28, 2022 • edited Loading

LukasJerabek commented Jul 28, 2022

ashie commented Jul 28, 2022

LukasJerabek commented Jul 28, 2022 • edited Loading

LukasJerabek commented Jul 28, 2022

ashie commented Jul 29, 2022 • edited Loading

ashie commented Jul 29, 2022 • edited Loading

ashie commented Aug 4, 2022

ashie commented Aug 10, 2022 • edited Loading

LukasJerabek commented Nov 22, 2022

github-actions bot commented Apr 12, 2023

LukasJerabek commented Apr 12, 2023

github-actions bot commented May 13, 2023

LukasJerabek commented May 13, 2023

LukasJerabek commented Feb 9, 2022 •

edited by ashie

Loading

LukasJerabek commented Feb 15, 2022 •

edited

Loading

LukasJerabek commented Feb 15, 2022 •

edited

Loading

ashie commented Jul 28, 2022 •

edited

Loading

LukasJerabek commented Jul 28, 2022 •

edited

Loading

LukasJerabek commented Jul 28, 2022 •

edited

Loading

ashie commented Jul 29, 2022 •

edited

Loading

ashie commented Jul 29, 2022 •

edited

Loading

ashie commented Aug 10, 2022 •

edited

Loading