-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread leak in API with unclosed sockets #5148
Comments
Hello @julianbrost does the error still occur? Best PS: "really bored" because that patch hasn't been approved by the core devs (yet). |
FWIW #5278 fixes I2_LEAK_DEBUG which might be useful for diagnosing this problem. |
@gunnarbeutner @Al2Klimov See #5408 It's a deadlock with the lock in ApiListener.
with
And made the private in the hpp file.
|
Looking closer at the stacktrace I think #5419 would be your fix. |
Another way to try and fix is until the release is limit the amount of concurrency to less cores, I use machines that have 24 cores so for testing I sometimes limit an instance to 2 or 4 cores so I don't bother other processes as much when starting/reloading all the time. (that deletes and reinserts lots of stuff in the database). Reload without a override uses all cores for a few seconds, with concurrency at 2 a reload takes about as long but only peaks at 200% so no worries that's only 2 cores. 👍 In init.conf add; check with
|
It is not reproducible on macOS and RHEL7, but Debian 9 hits it.
|
Next steps: Figure out why it does not fail on RHEL but Debian.
|
hi,
with this command, I can kill the instance and have a crash file (three files in here):
|
hi, today we scan our network with Saint again and all Icinga2 instances crashs:
so, also with 2.8.1-1 Stretch release .. no change. |
We still see this thread leak issue with following stack trace: Is there any fix or workaround for this? |
A possible test would be to get a different boost version (and compile it) and recompile Icinga 2 against. Just for dev purposes, to get a clear view where to look for this error. CentOS 7 for example uses 1.54 which doesn't have such. Could also be a Debian specific patch which causes trouble here. Workarounds don't exist, that's an issue with shared pointer logic here. |
I have 1.54 boost on my Ubuntu 14.04.5: ~# dpkg -l | grep boost |
As @N-o-X tested this last week, I'm also not able to reproduce this on Debian Stretch 9.4 using current master.
There may be some changes with git master vs 2.8.x, as we've changed from boost::shared/intrusive_ptr to std::shared/intrusive_ptr. I've checked out v2.8.2 and have built that one. Same behaviour, round about 24-26 threads. |
@dnsmichi do you see this issue when installing icinga on mac? |
macOS is not a supported platform, just a development environment. Look above, tests have been run on various Linux distributions and neither gives an indication tp reliably reproduce the error and actually fix it. |
I upgrades from Debian 8 to 9 and since then I get load warnings, icingaweb2 feels sluggish.
|
@slalomsk8er what exactly are you using to query the REST API? Anything special on your setup which helps to isolate the problem? |
@dnsmichi There is icingaweb2 and a custom dashboard that uses a proxy script (attached). It gets called with or without a filter argument.
all names/identifiers are replaced by ***** |
The script condition |
I guess it isn't intentional but from my understanding the GC will clean up after exit. |
I'm closing this in favour of #6361. |
Hello @dnsmichi Is this issue fixed with latest Icinga build? Could you please point me to the version which I can pick and test in my test environment. Many Thanks, |
I've triggered snapshot builds a few minutes ago, you should see the current date in your package manager's update. It would help to know which distribution and version you're using. |
When doing lots of requests to the API using the command
check_http -H localhost -p 5665 -S -e '401 Unauthorized' -N
, theicinga2
process leaks threads over time until it reaches the process limit for its user and then either segfaults or aborts (I have seen both so far).Expected Behavior
Icinga should handle the requests an not leak threads.
Current Behavior
Icinga leaks threads over time. The leaked threads look like this in gdb (for Icinga 2.6.3):
In most cases (4 out of 5 so far), this lead to a segfault at some point:
icinga2[16628]: segfault at ffffffffffffffff ip ffffffffffffffff sp 00002acd7791a2a8 error 15
. In one case, Icinga aborted with the following log messages:icinga2.log in case of SIGABRT (Icinga 2.6.2)
report.1491439251.043945
I suspect both behaviors are only a symptom of using up the ulimit for allowed processes.
Steps to Reproduce (for bugs)
apt install icinga2
icinga2 api setup
systemctl restart icinga2
icinga2
threads:for pid in $(pidof icinga2); do ps -T -p $pid | grep -F icinga2; done | wc -l
(16 in my case)while :; do /usr/lib/nagios/plugins/check_http -H localhost -p 5665 -S -e '401 Unauthorized' -N > /dev/null; done
(let this run for some time, see below)icinga2
threads:for pid in $(pidof icinga2); do ps -T -p $pid | grep -F icinga2; done | wc -l
(now for example 70 in my case)Some remarks: I'm not entirely sure what's the best way to reproduce the issue. For the numbers above, I used a fresh Icinga2 instance on a new VM with the Debian standard config (i.e. monitoring localhost with a few checks) and running the loop sending the requests 6 times in parallel for about an hour. There it does not leak threads very fast but it does so over time.
On the other hand, with an instance that handles quite some checks, including many passive ones, and regularly receiving real API requests, running one loop doing requests is enough to get a thread leaked every few seconds.
Context
I tried to to debug another issue and forgot that loop running which made Icinga segfault. That's not good ;)
Your Environment
icinga2 --version
): r2.6.2-1, steps to reproduce were also tested on r2.6.3-1icinga2 feature list
):api checker mainlog notification
for the 2.6.3 instance I used for reproducing andapi checker command ido-pgsql livestatus mainlog notification statusdata syslog
on the 2.6.2 with which I noticed the issueicinga2 daemon -C
): no errorsThe text was updated successfully, but these errors were encountered: