-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang in TlsStream::Handshake #5007
Comments
Where you able to generate a full gdb backtrace? And as such, once the first breakpoint is hit, call |
From further gdb debugging, when hanging, it is hitting the m_CV.wait(lock); once (https://github.com/Icinga/icinga2/blob/v2.6.2/lib/base/tlsstream.cpp#L290) and never hitting https://github.com/Icinga/icinga2/blob/v2.6.2/lib/base/tlsstream.cpp#L292 This time, our environment has been working fine for the past 24h, to trigger the problem, I just did a single service icinga2 reload Please find the full gdb backtrace (I did it right after a hang call and from my understanding it should be Thread #11 you are looking for) in attachment. |
Again further gdb debugging, when hanging, it's not even going into the void TlsStream::OnEvent(int revents) method. I also tried to change the EventEngine constant from "poll" to "epoll" and vice-versa with no luck |
I also monitored with tcpdump while debugging and saw no outgoing data from Icinga2 to the console client when it hangs. (no outgoing data from https://github.com/Icinga/icinga2/blob/v2.6.2/lib/remote/apilistener.cpp:L353 to ... hang) edit: |
A restart kills the entire process, while a restart spawns a child process which then takes over. |
Could you try #5416 it fixed my problem |
It looks like we are experiencing the same issue with Icinga2 r2.7.1-1. It's a setup with two masters and around 1500 agents. Here is a short analysis of the core dump captured with gcore:
Perhaps this helps you to find the root cause. If you need other details I would be happy to provide these. |
Now we are on 2.8.0 but experiencing the same issue.
Again many threads (1972) and nearly all are stuck in TlsStream::Handshake. One example:
I think this is a duplicate of #5204 . |
This was analysed and fixed in 2.9. |
Expected behaviour:
I can connect/query the API without problem
Actual behaviour:
TLS handshake randomly hangs with no error in the logs (no log at all except start of thread in fact) causing the API to be unavailable to satellites / dashing / whatever.
Installation description:
Debian 8, Icinga2 2.6.2-1, icinga2-ido-pgsql 2.6.2-1, icingaweb2 2.4.1-1 (running on the same machine: influxdb & grafana)
Found the 1667 objects:
Type : Count
ApiListener : 1
ApiUser : 2
CheckCommand : 207
CheckerComponent : 1
Comment : 11
Endpoint : 38
ExternalCommandListener : 1
FileLogger : 2
Host : 56
HostGroup : 9
IcingaApplication : 1
IdoPgsqlConnection : 1
InfluxdbWriter : 1
Notification : 720
NotificationCommand : 4
NotificationComponent : 1
PerfdataWriter : 1
Service : 559
ServiceGroup : 3
TimePeriod : 3
User : 5
UserGroup : 1
Zone : 39
root@icinga:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.6.2-1)
Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Application information:
Installation root: /usr
Sysconf directory: /etc
Run directory: /run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
System information:
Platform: Debian GNU/Linux
Platform version: 8 (jessie)
Kernel: Linux
Kernel version: 4.4.35-2-pve
Architecture: x86_64
Build information:
Compiler: GNU 4.9.2
Build host: smithers
What I did to try and narrow the problem to TlsStream:Handshake
Notes
The hangs seems to occur more & more often as I do multiple reloads of Icinga2 due to configuration changes.
If I do a restart the hangs disappears immediately and they randomly come back after a while.
I have the ability to debug/reproduce against our instance using gdb so if I can provide more information do not hesitate.
Related issue
My issue seems to be related to this one as when I read the description the problem is quite the same #3902
The text was updated successfully, but these errors were encountered: