Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang in TlsStream::Handshake #5007

Closed
willemda opened this issue Feb 15, 2017 · 9 comments
Closed

Hang in TlsStream::Handshake #5007

willemda opened this issue Feb 15, 2017 · 9 comments
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working

Comments

@willemda
Copy link

willemda commented Feb 15, 2017

Expected behaviour:
I can connect/query the API without problem

Actual behaviour:
TLS handshake randomly hangs with no error in the logs (no log at all except start of thread in fact) causing the API to be unavailable to satellites / dashing / whatever.

Installation description:
Debian 8, Icinga2 2.6.2-1, icinga2-ido-pgsql 2.6.2-1, icingaweb2 2.4.1-1 (running on the same machine: influxdb & grafana)

Found the 1667 objects:
Type : Count
ApiListener : 1
ApiUser : 2
CheckCommand : 207
CheckerComponent : 1
Comment : 11
Endpoint : 38
ExternalCommandListener : 1
FileLogger : 2
Host : 56
HostGroup : 9
IcingaApplication : 1
IdoPgsqlConnection : 1
InfluxdbWriter : 1
Notification : 720
NotificationCommand : 4
NotificationComponent : 1
PerfdataWriter : 1
Service : 559
ServiceGroup : 3
TimePeriod : 3
User : 5
UserGroup : 1
Zone : 39

root@icinga:~# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.6.2-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
Installation root: /usr
Sysconf directory: /etc
Run directory: /run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

System information:
Platform: Debian GNU/Linux
Platform version: 8 (jessie)
Kernel: Linux
Kernel version: 4.4.35-2-pve
Architecture: x86_64

Build information:
Compiler: GNU 4.9.2
Build host: smithers

What I did to try and narrow the problem to TlsStream:Handshake

Notes

The hangs seems to occur more & more often as I do multiple reloads of Icinga2 due to configuration changes.
If I do a restart the hangs disappears immediately and they randomly come back after a while.

I have the ability to debug/reproduce against our instance using gdb so if I can provide more information do not hesitate.

Related issue

My issue seems to be related to this one as when I read the description the problem is quite the same #3902

@dnsmichi
Copy link
Contributor

Where you able to generate a full gdb backtrace? And as such, once the first breakpoint is hit, call s and step into the underlaying Handshake() function, stepping further to see where exactly things go wrong.

@dnsmichi dnsmichi added area/api REST API area/distributed Distributed monitoring (master, satellites, clients) needs feedback We'll only proceed once we hear from you again labels Feb 15, 2017
@willemda
Copy link
Author

From further gdb debugging, when hanging, it is hitting the m_CV.wait(lock); once (https://github.com/Icinga/icinga2/blob/v2.6.2/lib/base/tlsstream.cpp#L290) and never hitting https://github.com/Icinga/icinga2/blob/v2.6.2/lib/base/tlsstream.cpp#L292

This time, our environment has been working fine for the past 24h, to trigger the problem, I just did a single service icinga2 reload

Please find the full gdb backtrace (I did it right after a hang call and from my understanding it should be Thread #11 you are looking for) in attachment.

gdb_bt.txt

@willemda
Copy link
Author

willemda commented Feb 16, 2017

Again further gdb debugging, when hanging, it's not even going into the void TlsStream::OnEvent(int revents) method.
I don't have much knowledge of how the internals at this stage are supposed to work but I can dig a bit deeper if you instruct me how to ;)

I also tried to change the EventEngine constant from "poll" to "epoll" and vice-versa with no luck

@willemda
Copy link
Author

willemda commented Feb 16, 2017

I also monitored with tcpdump while debugging and saw no outgoing data from Icinga2 to the console client when it hangs. (no outgoing data from https://github.com/Icinga/icinga2/blob/v2.6.2/lib/remote/apilistener.cpp:L353 to ... hang)

edit:
Can't understand how a restart resolves my problem while a reload creates it. Is the reload handled differently than the restart except for configuration validation?

@dnsmichi
Copy link
Contributor

dnsmichi commented Jun 8, 2017

A restart kills the entire process, while a restart spawns a child process which then takes over.

@Stefar77
Copy link
Contributor

Stefar77 commented Jul 14, 2017

Could you try #5416 it fixed my problem
edit: Better yet try #5419 for even more fixes in the API with locks

dnsmichi pushed a commit that referenced this issue Sep 18, 2017
This was split from #5416 and #5419.

More patches from #5419 are pending.

refs #5419
refs #5418
refs #5416

refs #5408
refs #5148
refs #5007
refs #4968
refs #4910
@djboris9
Copy link

djboris9 commented Oct 19, 2017

It looks like we are experiencing the same issue with Icinga2 r2.7.1-1. It's a setup with two masters and around 1500 agents.

Here is a short analysis of the core dump captured with gcore:

> icinga2 --version
  icinga2 - The Icinga 2 network monitoring daemon (version: r2.7.1-1)
  
  Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
  License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  
  Application information:
    Installation root: /usr
    Sysconf directory: /etc
    Run directory: /run
    Local state directory: /var
    Package data directory: /usr/share/icinga2
    State path: /var/lib/icinga2/icinga2.state
    Modified attributes path: /var/lib/icinga2/modified-attributes.conf
    Objects path: /var/cache/icinga2/icinga2.debug
    Vars path: /var/cache/icinga2/icinga2.vars
    PID path: /run/icinga2/icinga2.pid
  
  System information:
    Platform: Red Hat Enterprise Linux Server
    Platform version: 7.4 (Maipo)
    Kernel: Linux
    Kernel version: 3.10.0-693.1.1.el7.x86_64
    Architecture: x86_64
  
  Build information:
    Compiler: GNU 4.8.5
    Build host: unknown

> info sharedlibraries
  From                To                  Syms Read   Shared Object Library
  0x00002b42a91fa7e0  0x00002b42a92009c3  Yes (*)     /lib64/libboost_thread-mt.so.1.53.0
  0x00002b42a94072f0  0x00002b42a9407e63  Yes (*)     /lib64/libboost_system-mt.so.1.53.0
  0x00002b42a9631de0  0x00002b42a96692bf  Yes (*)     /lib64/libboost_program_options-mt.so.1.53.0
  0x00002b42a98c7530  0x00002b42a995822d  Yes (*)     /lib64/libboost_regex-mt.so.1.53.0
  0x00002b42a9bc9b10  0x00002b42a9cb185e  Yes         /usr/lib64/icinga2/libbase.so.2.7.1
  0x00002b42a9f31530  0x00002b42a9f84030  Yes         /usr/lib64/icinga2/libconfig.so.2.7.1
  0x00002b42aa1cc600  0x00002b42aa24a2b8  Yes         /usr/lib64/icinga2/libcli.so.2.7.1
  0x00002b42aa4af4a0  0x00002b42aa57b685  Yes         /usr/lib64/icinga2/libremote.so.2.7.1
  0x00002b42aa7c2e60  0x00002b42aa7c395e  Yes (*)     /lib64/libdl.so.2
  0x00002b42aa9e0bb0  0x00002b42aaa1c58d  Yes (*)     /lib64/libssl.so.10
  0x00002b42aaca4f00  0x00002b42aaddcbd7  Yes (*)     /lib64/libcrypto.so.10
  0x00002b42ab09a1f0  0x00002b42ab09f10c  Yes         /usr/lib64/icinga2/libyajl.so.2
  0x00002b42ab2a2620  0x00002b42ab2a2989  Yes         /usr/lib64/icinga2/libmmatch.so.2.7.1
  0x00002b42ab4a4680  0x00002b42ab4a47c2  Yes         /usr/lib64/icinga2/libsocketpair.so.2.7.1
  0x00002b42ab6a6870  0x00002b42ab6a6d19  Yes         /usr/lib64/icinga2/libexecvpe.so.2.7.1
  0x00002b42ab8b5660  0x00002b42ab8cef10  Yes (*)     /lib64/libedit.so.0
  0x00002b42abaf1e40  0x00002b42abafdbb8  Yes (*)     /lib64/libtinfo.so.5
  0x00002b42abd6a510  0x00002b42abdd15ba  Yes (*)     /lib64/libstdc++.so.6
  0x00002b42ac01c370  0x00002b42ac087276  Yes (*)     /lib64/libm.so.6
  0x00002b42ac31baf0  0x00002b42ac32b2a5  Yes (*)     /lib64/libgcc_s.so.1
  0x00002b42ac534900  0x00002b42ac53fce1  Yes (*)     /lib64/libpthread.so.0
  0x00002b42ac76a480  0x00002b42ac8b0bcf  Yes (*)     /lib64/libc.so.6
  0x00002b42acb10250  0x00002b42acb1304c  Yes (*)     /lib64/librt.so.1
  0x00002b42acd681a0  0x00002b42ace1a958  Yes (*)     /lib64/libicuuc.so.50
  0x00002b42ad127610  0x00002b42ad22cac4  Yes (*)     /lib64/libicui18n.so.50
  0x00002b42ad48d570  0x00002b42ad48d658  Yes (*)     /lib64/libicudata.so.50
  0x00002b42a8fcbb10  0x00002b42a8fe6710  Yes (*)     /lib64/ld-linux-x86-64.so.2
  0x00002b42aea6d650  0x00002b42aea9fa1a  Yes (*)     /lib64/libgssapi_krb5.so.2
  0x00002b42aecd3a10  0x00002b42aed3ae8a  Yes (*)     /lib64/libkrb5.so.3
  0x00002b42aef97570  0x00002b42aef98143  Yes (*)     /lib64/libcom_err.so.2
  0x00002b42af19e8c0  0x00002b42af1bcc0f  Yes (*)     /lib64/libk5crypto.so.3
  0x00002b42af3cf170  0x00002b42af3db6f8  Yes (*)     /lib64/libz.so.1
  0x00002b42af5e6890  0x00002b42af5ed42b  Yes (*)     /lib64/libkrb5support.so.0
  0x00002b42af7f25b0  0x00002b42af7f31cc  Yes (*)     /lib64/libkeyutils.so.1
  0x00002b42af9f89d0  0x00002b42afa077e1  Yes (*)     /lib64/libresolv.so.2
  0x00002b42afc15ac0  0x00002b42afc2b8c6  Yes (*)     /lib64/libselinux.so.1
  0x00002b42afe375f0  0x00002b42afe7d5b0  Yes (*)     /lib64/libpcre.so.1
  0x00002b42b0ca01d0  0x00002b42b0ca73e1  Yes (*)     /lib64/libnss_files.so.2
  0x00002b42b0f39760  0x00002b42b10b5c31  Yes         /usr/lib64/icinga2/libicinga.so.2.7.1
  0x00002b42b1354430  0x00002b42b136a9c7  Yes         /usr/lib64/icinga2/libmethods.so.2.7.1
  0x00002b42b15894f0  0x00002b42b15a5a6e  Yes         /usr/lib64/icinga2/libchecker.so.2.7.1
  0x00002b42b17d5a80  0x00002b42b1820ac6  Yes         /usr/lib64/icinga2/libcompat.so.2.7.1
  0x00002b42b1a56020  0x00002b42b1a7acf6  Yes         /usr/lib64/icinga2/libdb_ido_pgsql.so.2.7.1
  0x00002b42b1c967a0  0x00002b42b1caf0d8  Yes (*)     /lib64/libpq.so.5
  0x00002b42b1edcb90  0x00002b42b1f5ffbf  Yes         /usr/lib64/icinga2/libdb_ido.so.2.7.1
  0x00002b42b21a0af0  0x00002b42b21d5014  Yes (*)     /lib64/libldap_r-2.4.so.2
  0x00002b42b23ef6d0  0x00002b42b23f6a22  Yes (*)     /lib64/liblber-2.4.so.2
  0x00002b42b25ffb60  0x00002b42b2610fc3  Yes (*)     /lib64/libsasl2.so.3
  0x00002b42b2823ec0  0x00002b42b284eb2f  Yes (*)     /lib64/libssl3.so
  0x00002b42b2a6d310  0x00002b42b2a81ce7  Yes (*)     /lib64/libsmime3.so
  0x00002b42b2ca4740  0x00002b42b2d77604  Yes (*)     /lib64/libnss3.so
  0x00002b42b2fc1390  0x00002b42b2fcfd45  Yes (*)     /lib64/libnssutil3.so
  0x00002b42b31e2f10  0x00002b42b31e3c78  Yes (*)     /lib64/libplds4.so
  0x00002b42b33e7510  0x00002b42b33e8b76  Yes (*)     /lib64/libplc4.so
  0x00002b42b35f7ca0  0x00002b42b3617cbf  Yes (*)     /lib64/libnspr4.so
  0x00002b42b3829e50  0x00002b42b382eaac  Yes (*)     /lib64/libcrypt.so.1
  0x00002b42b3a60ba0  0x00002b42b3a61309  Yes (*)     /lib64/libfreebl3.so
  0x00002b42cce14fd0  0x00002b42cce27dfd  Yes         /usr/lib64/icinga2/libnotification.so.2.7.1

> Threads:
  Id   Target Id         Frame 
  1208 Thread 0x2b42a900a500 (LWP 62925) 0x00002b42ac80a1ad in nanosleep () from /lib64/libc.so.6
  1207 Thread 0x2b4454a1b700 (LWP 101768) 0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1206 Thread 0x2b4454016700 (LWP 101759) 0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1205 Thread 0x2b4447db5700 (LWP 101743) 0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

> Every thread which is skipped (see Id) is at the following point:
  #### Thread 0x############ (LWP ######) 0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  1177 Thread 0x2b447048b700 (LWP 99785) 0x00002b42ac53d9ed in connect () from /lib64/libpthread.so.0

  1130 Thread 0x2b44475b1700 (LWP 96399) 0x00002b42ac53d9ed in connect () from /lib64/libpthread.so.0

  763  Thread 0x2b4425ca5700 (LWP 79293) 0x00002b42ac895d05 in __memcpy_ssse3_back () from /lib64/libc.so.6

  235  Thread 0x2b42cc803700 (LWP 65939) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  234  Thread 0x2b42cc602700 (LWP 65937) 0x00002b42ac838a3d in poll () from /lib64/libc.so.6
  233  Thread 0x2b42cc401700 (LWP 65936) 0x00002b42ac838a3d in poll () from /lib64/libc.so.6
  232  Thread 0x2b42cc200700 (LWP 65935) 0x00002b42ac838a3d in poll () from /lib64/libc.so.6
  231  Thread 0x2b42b3e63700 (LWP 65934) 0x00002b42ac838a3d in poll () from /lib64/libc.so.6

  60   Thread 0x2b442dae4700 (LWP 65211) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  54   Thread 0x2b442accd700 (LWP 65189) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  20   Thread 0x2b4370200700 (LWP 64047) 0x00002b42aa4c4786 in icinga::intrusive_ptr_release(icinga::Object*) [clone .local.3743] (object=0x2b42d0002d40)
    at /usr/src/debug/icinga2-2.7.1/lib/base/object.hpp:195

  18   Thread 0x2b4358a04700 (LWP 63957) 0x00002b42ac843923 in epoll_wait () from /lib64/libc.so.6
  17   Thread 0x2b4358602700 (LWP 63955) 0x00002b42ac843923 in epoll_wait () from /lib64/libc.so.6
  16   Thread 0x2b42cfe4a700 (LWP 63953) 0x00002b42ac843923 in epoll_wait () from /lib64/libc.so.6
  15   Thread 0x2b42cfa48700 (LWP 63950) 0x00002b42ac53d42d in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x2b42cf646700 (LWP 63948) 0x00002b42ac843923 in epoll_wait () from /lib64/libc.so.6
  13   Thread 0x2b42cf043700 (LWP 63946) 0x00002b42ac843923 in epoll_wait () from /lib64/libc.so.6
  12   Thread 0x2b42cec41700 (LWP 63943) 0x00002b42ac843923 in epoll_wait () from /lib64/libc.so.6
  11   Thread 0x2b42ce83f700 (LWP 63941) 0x00002b42ac53d42d in __lll_lock_wait () from /lib64/libpthread.so.0
  10   Thread 0x2b42cda38700 (LWP 63934) 0x00002b42ac53d98d in accept () from /lib64/libpthread.so.0
  9    Thread 0x2b42cd435700 (LWP 63933) 0x00002b42ac838a3d in poll () from /lib64/libc.so.6
  8    Thread 0x2b42cd234700 (LWP 63932) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  6    Thread 0x2b42b0c9d700 (LWP 62931) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x2b42b0a9c700 (LWP 62930) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

  3    Thread 0x2b42b069a700 (LWP 62928) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x2b42b0499700 (LWP 62927) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x2b42b0298700 (LWP 62926) 0x00002b42ac53acf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> So there are around 1000 threads stuck at the same point

> Backtrace of one random thread:
[Switching to thread 1000 (Thread 0x2b446e87d700 (LWP 90209))]
#0  0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#0  0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b42a9c547db in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) [clone .local.2439] (this=this@entry=0x2b42c4041a90, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:73
#2  0x00002b42a9c2b58d in icinga::TlsStream::Handshake (this=0x2b42c4041980) at /usr/src/debug/icinga2-2.7.1/lib/base/tlsstream.cpp:290
#3  0x00002b42aa52e21e in icinga::ApiListener::NewClientHandlerInternal (this=this@entry=0x2b42d8001910, client=..., hostname=..., role=role@entry=icinga::RoleClient)
    at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:366
#4  0x00002b42aa52fa03 in icinga::ApiListener::NewClientHandler (this=this@entry=0x2b42d8001910, client=..., hostname=..., role=role@entry=icinga::RoleClient)
    at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:324
#5  0x00002b42aa52fe6b in icinga::ApiListener::AddConnection (this=0x2b42d8001910, endpoint=...) at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:304
#6  0x00002b42a91fc27a in thread_proxy () from /lib64/libboost_thread-mt.so.1.53.0
#7  0x00002b42ac536e25 in start_thread () from /lib64/libpthread.so.0
#8  0x00002b42ac84334d in clone () from /lib64/libc.so.6

> Switched to frame 2 and done a listing:
#2  0x00002b42a9c2b58d in icinga::TlsStream::Handshake (this=0x2b42c4041980) at /usr/src/debug/icinga2-2.7.1/lib/base/tlsstream.cpp:290
290			m_CV.wait(lock);
285	
286		m_CurrentAction = TlsActionHandshake;
287		ChangeEvents(POLLOUT);
288	
289		while (!m_HandshakeOK && !m_ErrorOccurred && !m_Eof)
290			m_CV.wait(lock);
291	
292		if (m_Eof)
293			BOOST_THROW_EXCEPTION(std::runtime_error("Socket was closed during TLS handshake."));
294	

> info locals
lock = {m = 0x2b42c4041a68, is_locked = false}
__PRETTY_FUNCTION__ = "void icinga::TlsStream::Handshake()"
__PRETTY_FUNCTION__ = "void icinga::TlsStream::Handshake()"

> Backtrace of thread 1100
[Switching to thread 1100 (Thread 0x2b44816fb700 (LWP 93882))]
#0  0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#0  0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b42a9c547db in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) [clone .local.2439] (this=this@entry=0x2b434808ce60, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:73
#2  0x00002b42a9c2b58d in icinga::TlsStream::Handshake (this=0x2b434808cd50) at /usr/src/debug/icinga2-2.7.1/lib/base/tlsstream.cpp:290
#3  0x00002b42aa52e21e in icinga::ApiListener::NewClientHandlerInternal (this=0x2b42d8001910, client=..., hostname=..., role=icinga::RoleServer)
    at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:366
#4  0x00002b42aa52fa03 in icinga::ApiListener::NewClientHandler (this=<optimized out>, client=..., hostname=..., role=<optimized out>)
    at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:324
#5  0x00002b42a91fc27a in thread_proxy () from /lib64/libboost_thread-mt.so.1.53.0
#6  0x00002b42ac536e25 in start_thread () from /lib64/libpthread.so.0
#7  0x00002b42ac84334d in clone () from /lib64/libc.so.6

> Lock has also is_locked = false, I don't know if this is relevant
$2 = {m = 0x2b434808ce38, is_locked = false}

> bt full of thread 1000
#0  0x00002b42ac53a945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00002b42a9c547db in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) [clone .local.2439] (this=this@entry=0x2b42c4041a90, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:73
        guard = {m = 0x0}
        check_for_interruption = {thread_info = 0x2b4334db63b0, m = 0x2b42c4041a90, set = true}
        res = <optimized out>
#2  0x00002b42a9c2b58d in icinga::TlsStream::Handshake (this=0x2b42c4041980) at /usr/src/debug/icinga2-2.7.1/lib/base/tlsstream.cpp:290
        lock = {m = 0x2b42c4041a68, is_locked = false}
        __PRETTY_FUNCTION__ = "void icinga::TlsStream::Handshake()"
        __PRETTY_FUNCTION__ = "void icinga::TlsStream::Handshake()"
#3  0x00002b42aa52e21e in icinga::ApiListener::NewClientHandlerInternal (this=this@entry=0x2b42d8001910, client=..., hostname=..., role=role@entry=icinga::RoleClient)
    at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:366
        currentContextFrame = {<No data fields>}
        conninfo = {m_Data = <incomplete type>}
        tlsStream = {px = 0x2b42c4041980}
        cert = {px = 0x0, pn = {pi_ = 0x0}}
        identity = {m_Data = <incomplete type>}
        ctype = <optimized out>
        endpoint = <optimized out>
        verify_ok = <optimized out>
#4  0x00002b42aa52fa03 in icinga::ApiListener::NewClientHandler (this=this@entry=0x2b42d8001910, client=..., hostname=..., role=role@entry=icinga::RoleClient)
    at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:324
No locals.
#5  0x00002b42aa52fe6b in icinga::ApiListener::AddConnection (this=0x2b42d8001910, endpoint=...) at /usr/src/debug/icinga2-2.7.1/lib/remote/apilistener.cpp:304
        host = {m_Data = <incomplete type>}
        port = {m_Data = <incomplete type>}
        client = {px = 0x2b42c4041d90}
#6  0x00002b42a91fc27a in thread_proxy () from /lib64/libboost_thread-mt.so.1.53.0
No symbol table info available.
#7  0x00002b42ac536e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#8  0x00002b42ac84334d in clone () from /lib64/libc.so.6
No symbol table info available.

Perhaps this helps you to find the root cause. If you need other details I would be happy to provide these.

@djboris9
Copy link

Now we are on 2.8.0 but experiencing the same issue.

icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.0-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Red Hat Enterprise Linux Server
  Platform version: 7.4 (Maipo)
  Kernel: Linux
  Kernel version: 3.10.0-693.1.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

Again many threads (1972) and nearly all are stuck in TlsStream::Handshake. One example:

[Switching to thread 1500 (Thread 0x7f0b6ce7c700 (LWP 72270))]
#0  0x00007f0cb10ae945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt full
#0  0x00007f0cb10ae945 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f0cb39692eb in boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) [clone .local.2559] (this=this@entry=0x7f0bb8300430, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:73
        guard = {m = 0x0}
        check_for_interruption = {thread_info = 0x7f0c442fdac0, m = 0x7f0bb8300430, set = true}
        res = <optimized out>
#2  0x00007f0cb396fa6d in icinga::TlsStream::Handshake (this=0x7f0bb8300320) at /usr/src/debug/icinga2-2.8.0/lib/base/tlsstream.cpp:290
        lock = {m = 0x7f0bb8300408, is_locked = false}
        __PRETTY_FUNCTION__ = "void icinga::TlsStream::Handshake()"
        __PRETTY_FUNCTION__ = "void icinga::TlsStream::Handshake()"
#3  0x00007f0cb30e37fe in icinga::ApiListener::NewClientHandlerInternal (this=this@entry=0x7f0c80000c20, client=..., hostname=..., role=role@entry=icinga::RoleClient)
    at /usr/src/debug/icinga2-2.8.0/lib/remote/apilistener.cpp:451
        currentContextFrame = {<No data fields>}
        conninfo = {m_Data = <incomplete type>}
        tlsStream = {px = 0x7f0bb8300320}
        cert = {px = 0x7f0c10241f10, pn = {pi_ = 0x7f0c3403b470}}
        identity = {m_Data = <incomplete type>}
        ctype = <optimized out>
        endpoint = <optimized out>
        verify_ok = <optimized out>
#4  0x00007f0cb30e5023 in icinga::ApiListener::NewClientHandler (this=this@entry=0x7f0c80000c20, client=..., hostname=..., role=role@entry=icinga::RoleClient)
    at /usr/src/debug/icinga2-2.8.0/lib/remote/apilistener.cpp:409
No locals.
#5  0x00007f0cb30e548b in icinga::ApiListener::AddConnection (this=0x7f0c80000c20, endpoint=...) at /usr/src/debug/icinga2-2.8.0/lib/remote/apilistener.cpp:389
        host = {m_Data = <incomplete type>}
        port = {m_Data = <incomplete type>}
        client = {px = 0x7f0bb83f3c40}
#6  0x00007f0cb43dd27a in thread_proxy () from /lib64/libboost_thread-mt.so.1.53.0
No symbol table info available.
#7  0x00007f0cb10aae25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#8  0x00007f0cb0dd834d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) frame 2
#2  0x00007f0cb396fa6d in icinga::TlsStream::Handshake (this=0x7f0bb8300320) at /usr/src/debug/icinga2-2.8.0/lib/base/tlsstream.cpp:290
290                     m_CV.wait(lock);
(gdb) list
285
286             m_CurrentAction = TlsActionHandshake;
287             ChangeEvents(POLLOUT);
288
289             while (!m_HandshakeOK && !m_ErrorOccurred && !m_Eof)
290                     m_CV.wait(lock);
291
292             if (m_Eof)
293                     BOOST_THROW_EXCEPTION(std::runtime_error("Socket was closed during TLS handshake."));
294
(gdb)

I think this is a duplicate of #5204 .
If #5419 is boost related, could this be the same issue just in a different place?

@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again labels Jan 8, 2018
@dnsmichi
Copy link
Contributor

This was analysed and fixed in 2.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants