Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins crash when run from icinga2-2.8.3 #6257

Closed
beheerderdag opened this issue Apr 25, 2018 · 25 comments · Fixed by #6260
Closed

Plugins crash when run from icinga2-2.8.3 #6257

beheerderdag opened this issue Apr 25, 2018 · 25 comments · Fixed by #6260
Labels
area/checks Check execution and results blocker Blocks a release or needs immediate attention bug Something isn't working
Milestone

Comments

@beheerderdag
Copy link

We are using Oracle linux (6.9). After upgrading icinga2 today ps that runs from the icinga user crashes.

abrt report

abrt_version:   2.0.8
cgroup:         
cmdline:        /bin/ps -eo 's uid pid ppid vsz rss pcpu etime comm args'
event_log:      
executable:     /bin/ps
hostname:       silo2
kernel:         3.8.13-118.11.2.el6uek.x86_64
last_occurrence: 1524644388
machineid:      sosreport_uploader-dmidecode=3e09daeef311ed180ecdce08b9798954e1b07b24b7a91ae57195bf48c0f82fa9
pid:            2219
pkg_arch:       x86_64
pkg_epoch:      0
pkg_fingerprint: 72F9 7B74 EC55 1F03
pkg_name:       procps
pkg_release:    45.0.1.el6_9.1
pkg_vendor:     Oracle America
pkg_version:    3.2.8
pwd:            /
time:           Wed 25 Apr 2018 09:15:18 AM CEST
uid:            498
username:       icinga

sosreport.tar.xz: Binary file, 1256500 bytes

core_backtrace:
:{   "signal": 11
:,   "executable": "/bin/ps"
:,   "stacktrace":
:      [ {   "crash_thread": true
:        ,   "frames":
:              [ {   "address": 4206748
:                ,   "build_id": "2ab2498a96e7cfc4942207da4da8376443d1d7ba"
:                ,   "build_id_offset": 12444
:                ,   "file_name": "/bin/ps"
:                }
:              , {   "address": 4203318
:                ,   "build_id": "2ab2498a96e7cfc4942207da4da8376443d1d7ba"
:                ,   "build_id_offset": 9014
:                ,   "file_name": "/bin/ps"
:                } ]
:        } ]
:}

dso_list:
:/lib64/ld-2.12.so glibc-2.12-1.209.0.3.el6_9.2.x86_64 (Oracle America) 1497950689
:/lib64/libproc-3.2.8.so procps-3.2.8-45.0.1.el6_9.1.x86_64 (Oracle America) 1499866591
:/bin/ps procps-3.2.8-45.0.1.el6_9.1.x86_64 (Oracle America) 1499866591
:/lib64/libc-2.12.so glibc-2.12-1.209.0.3.el6_9.2.x86_64 (Oracle America) 1497950689
:/lib64/libselinux.so.1 libselinux-2.0.94-7.el6.x86_64 (Oracle America) 1475744335
:/lib64/libdl-2.12.so glibc-2.12-1.209.0.3.el6_9.2.x86_64 (Oracle America) 1497950689

environ:
:TERM=screen
:PATH=/sbin:/usr/sbin:/bin:/usr/bin
:PWD=/
:LANG=en_US.UTF-8
:SHLVL=1
:LC_NUMERIC=C
:LC_ALL=C

limits:
:Limit                     Soft Limit           Hard Limit           Units     
:Max cpu time              unlimited            unlimited            seconds   
:Max file size             unlimited            unlimited            bytes     
:Max data size             unlimited            unlimited            bytes     
:Max stack size            262144               unlimited            bytes     
:Max core file size        0                    unlimited            bytes     
:Max resident set          unlimited            unlimited            bytes     
:Max processes             16384                16384                processes 
:Max open files            16384                16384                files     
:Max locked memory         65536                65536                bytes     
:Max address space         unlimited            unlimited            bytes     
:Max file locks            unlimited            unlimited            locks     
:Max pending signals       63680                63680                signals   
:Max msgqueue size         819200               819200               bytes     
:Max nice priority         0                    0                    
:Max realtime priority     0                    0                    
:Max realtime timeout      unlimited            unlimited            us        

maps:
:00400000-00414000 r-xp 00000000 fc:00 1839                               /bin/ps
:00614000-00615000 rw-p 00014000 fc:00 1839                               /bin/ps
:00615000-00635000 rw-p 00000000 00:00 0 
:00cbb000-00cdc000 rw-p 00000000 00:00 0                                  [heap]
:7f877f7bc000-7f877f7be000 r-xp 00000000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f7be000-7f877f9be000 ---p 00002000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f9be000-7f877f9bf000 r--p 00002000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f9bf000-7f877f9c0000 rw-p 00003000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f9c0000-7f877fb4a000 r-xp 00000000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fb4a000-7f877fd4a000 ---p 0018a000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fd4a000-7f877fd4e000 r--p 0018a000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fd4e000-7f877fd50000 rw-p 0018e000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fd50000-7f877fd54000 rw-p 00000000 00:00 0 
:7f877fd54000-7f877fd62000 r-xp 00000000 fc:00 4212                       /lib64/libproc-3.2.8.so
:7f877fd62000-7f877ff62000 ---p 0000e000 fc:00 4212                       /lib64/libproc-3.2.8.so
:7f877ff62000-7f877ff63000 rw-p 0000e000 fc:00 4212                       /lib64/libproc-3.2.8.so
:7f877ff63000-7f877ff77000 rw-p 00000000 00:00 0 
:7f877ff77000-7f877ff94000 r-xp 00000000 fc:00 18559                      /lib64/libselinux.so.1
:7f877ff94000-7f8780193000 ---p 0001d000 fc:00 18559                      /lib64/libselinux.so.1
:7f8780193000-7f8780194000 r--p 0001c000 fc:00 18559                      /lib64/libselinux.so.1
:7f8780194000-7f8780195000 rw-p 0001d000 fc:00 18559                      /lib64/libselinux.so.1
:7f8780195000-7f8780196000 rw-p 00000000 00:00 0 
:7f8780196000-7f87801b6000 r-xp 00000000 fc:00 3008                       /lib64/ld-2.12.so
:7f87803a3000-7f87803a7000 rw-p 00000000 00:00 0 
:7f87803b5000-7f87803b6000 rw-p 00000000 00:00 0 
:7f87803b6000-7f87803b7000 r--p 00020000 fc:00 3008                       /lib64/ld-2.12.so
:7f87803b7000-7f87803b8000 rw-p 00021000 fc:00 3008                       /lib64/ld-2.12.so
:7f87803b8000-7f87803b9000 rw-p 00000000 00:00 0 
:7ffcc5fd9000-7ffcc5ffa000 rw-p 00000000 00:00 0                          [stack]
:7ffcc5ffd000-7ffcc5fff000 r-xp 00000000 00:00 0                          [vdso]
:ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

open_fds:
:0:/dev/null
:pos:        0
:flags:        0100002
:1:pipe:[188294541]
:pos:        0
:flags:        01
:2:pipe:[188294542]
:pos:        0
:flags:        01

var_log_messages:
:Apr 25 09:15:18 silo2 kernel: ps[2219]: segfault at 7ffcc5f77ef8 ip 000000000040309c sp 00007ffcc5f77f00 error 6 in ps[400000+14000]
:Apr 25 09:15:18 silo2 abrt[2220]: Saved core dump of pid 2219 (/bin/ps) to /var/spool/abrt/ccpp-2018-04-25-09:15:18-2219 (503808 bytes)
:Apr 25 09:15:22 silo2 kernel: ps[2468]: segfault at 7ffdff0e49e8 ip 000000000040309c sp 00007ffdff0e49f0 error 6 in ps[400000+14000]
:Apr 25 09:15:22 silo2 abrt[2469]: Not saving repeating crash in '/bin/ps'

@olegy89
Copy link

olegy89 commented Apr 25, 2018

Seems we have that problem too.
From perl script command:
$msg_count = `$path_to_sudo $path_to_exim -bpc`;

Returns error code 11
Same command under icinga user running directly from shell returns code 0.

After downgrade to 2.8.2-1 all works as before.

@Crunsher
Copy link
Contributor

@olegy89 Also on Oracle?

@Crunsher Crunsher added the area/checks Check execution and results label Apr 25, 2018
@olegy89
Copy link

olegy89 commented Apr 25, 2018

@Crunsher centos6, centos7

@dnsmichi
Copy link
Contributor

I'm not able to reproduce this on centos7. Can you share your exact Host, Service and CheckCommand object definition?

This works fine:

object Host "c" {

 check_command = "c"
 check_interval = 5s
 retry_interval = 5s
}
object CheckCommand "c" {
  command = [ "/bin/ps", "-eo", "s uid pid ppid vsz rss pcpu etime comm args" ]
}

@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Apr 25, 2018
@Crunsher Crunsher changed the title ps crashes in icinga2-2.8.3 Plugins crash when run from icinga2-2.8.3 Apr 25, 2018
@netzwerkgoettin
Copy link
Contributor

Hi,

in our case it's the mailq command that fails. It does not fail in all cases with the earlier icinga2 versions. These checks run for months, user nagios is allowed and so on.

mailq on host is empty

nagios$ mailq
nagios$

user nagios does the check via plugin manually and it works

nagios$ '/usr/lib/nagios/plugins/check_mailq' '-M' 'exim' '-c' '5' '-w' '2'
OK: exim mailq (0) is below threshold (2/5)|unsent=0;2;5;0

icingaweb2 reports CRITICAL

CRITICAL: Error code 0 returned from /usr/bin/mailq

icinga2 debug log on client

[2018-04-25 09:36:34 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_mailq' '-M' 'exim' '-c' '5' '-w' '2': PID 3365
[2018-04-25 09:36:34 +0200] notice/Process: PID 3365 ('/usr/lib/nagios/plugins/check_mailq' '-M' 'exim' '-c' '5' '-w' '2') terminated with exit code 2

syslog on client

Apr 25 09:36:34 lnv-2065 kernel: [  974.883826] mailq[3366]: segfault at 7fff49eca968 ip 0000559f3db94463 sp 00007fff49eca810 error 6 in exim4[559f3db80000+f3000]

zones.d/director-global/service_apply.conf

apply Service "mailq" {
    check_command = "mailq"
    max_check_attempts = "5"
    check_period = "always"
    check_interval = 1m
    retry_interval = 1m
    check_timeout = 10s
    enable_notifications = false
    enable_active_checks = true
    enable_passive_checks = true
    enable_event_handler = true
    enable_perfdata = true
    volatile = false

    assign where "Linux Agent via Icinga 2 Core" in host.templates
    command_endpoint = host_name
    vars.mailq_critical = "5"
    vars.mailq_servertype = "exim"
    vars.mailq_warning = "2"

    import DirectorOverrideTemplate
}

It was okay before and happens since installing icinga2-2.8.3-1
Using Ubuntu 16.04-LTS / Ubuntu 14.04 LTS

Cheers,
Marianne

@olegy89
Copy link

olegy89 commented Apr 25, 2018

We noticed problem only with external command 'sudo exim -bpc' and 'check_ipmi_sensor' plugin.
'ps' works fine. But 'eximq' fails not on each host despite same version of icinga and exim.

object CheckCommand "eximq" {
  import "ipv4-or-ipv6"
  command = [  PluginDir + "/base/" + "check_eximq" ]
  arguments = {
    "--critical" = "$critical$"
    "--warning" = "$warning$"
  }
  timeout = "60"
}
cat ./check_eximq
#!/usr/bin/env perl
$msg_count = `sudo exim -bpc`;
print $?;
exit;
sudo -u icinga ./check_eximq
0

Result displayed in icinga web:
Plugin Output
11

@Crunsher
Copy link
Contributor

@olegy89 Could you run uname -srvmo on the machine? The problem might be kernel specific

@olegy89
Copy link

olegy89 commented Apr 25, 2018

@Crunsher
Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 GNU/Linux
Linux 3.10.0-714.10.2.lve1.5.12.el7.x86_64 #1 SMP Fri Feb 2 00:27:48 EST 2018 x86_64 GNU/Linux
Linux 3.10.0-693.21.1.vz7.46.3 #1 SMP Mon Apr 2 18:21:35 MSK 2018 x86_64 GNU/Linux

@netzwerkgoettin
Copy link
Contributor

@Crunsher affected examples:
Linux 4.4.0-121-generic #145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018 x86_64 GNU/Linux
Linux 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 GNU/Linux

@Crunsher
Copy link
Contributor

Thanks! So it has nothing to do with the kernel sigh

@Crunsher
Copy link
Contributor

I am able to reproduce this using @sysadmama 's config example

@beheerderdag
Copy link
Author

@dnsmichi
we are using check_procs.

apply Service "procs" {
  import "generic-service"

  check_command = "procs"

  assign where host.name == NodeName
}

@Crunsher
Copy link
Contributor

The commit at fault is bf95937
Tickets: #6119 #6215

@dnsmichi
Copy link
Contributor

We've isolated the problem and are preparing 2.8.4 which reverts the regression.

@dnsmichi dnsmichi added bug Something isn't working blocker Blocks a release or needs immediate attention and removed needs feedback We'll only proceed once we hear from you again labels Apr 25, 2018
@dnsmichi dnsmichi added this to the 2.8.4 milestone Apr 25, 2018
Crunsher added a commit that referenced this issue Apr 25, 2018
dnsmichi pushed a commit that referenced this issue Apr 25, 2018
@dnsmichi
Copy link
Contributor

Backported to support/2.8

@dnsmichi
Copy link
Contributor

Release is in progress: https://github.com/Icinga/icinga2/blob/master/RELEASE.md

Btw - https://twitter.com/wrf42/status/989118952444919808

@dnsmichi
Copy link
Contributor

Thanks for the reports everyone 💪

2.8.4 is published to our package repos.

[root@608c145dffda /]# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.4-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 4.9.87-linuxkit-aufs
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

@migodev
Copy link

migodev commented Apr 25, 2018

I can confirm, patch is working for check_ipmi_sensor. Thanks !

@leahoswald
Copy link
Contributor

Jup, patch ist working too. Thanks! 🎉

@linuxmail
Copy link

linuxmail commented Apr 25, 2018

hi,

just as an information: I had/have the same issue for my nagios-plugins-ceph (check_ceph_*) and check_ipmi. It took me some hours to find it, but going back to 2.8.2-1.stretch solved the problem.

@Crunsher
Copy link
Contributor

@linuxmail 2.8.4 has this fixed

@dnsmichi
Copy link
Contributor

I did a little reading yesterday evening on the faulty patch, and for some technical reference it can be assumed that it changed the way the default stack size was set and handled later. This caused a too low stack size where specific applications/plugins would then crash from in this process/thread space.

We've seen a similar thing with the stack guard patches in the RHEL kernel where setting the stack size also failed and made applications crash. That experience, and the only known located change in application.cpp justifies the immediate revert for production. Future patches in this region will be reviewed long-term, and if not properly proven with test protocols, likely not get merged.

Cheers,
Michael

@tclh123
Copy link
Contributor

tclh123 commented Apr 28, 2018

Hello @dnsmichi
Sorry for this regression. Actually I believe this is because bf95937 fix the rlimit stack resetting feature, then let the default rlimit value 256 * 1024(hardcoded there https://github.com/Icinga/icinga2/blob/v2.8.4/lib/base/application.cpp#L1503) become effected, which is too low for some specific check commands.
I think we just fix the logic there( https://github.com/Icinga/icinga2/blob/v2.8.4/lib/base/application.cpp#L249 ) - if user didn't set the RLimitStack config, we just don't reset the rlimit value.

@Crunsher
Copy link
Contributor

@tclh123 Feel free to open another PR. We would like having this fixed but got our hands are full with 2.9.0

@dnsmichi
Copy link
Contributor

Such a PR must include a test protocol with and without the patch testing all the edge cases, and requires long term tests. As can be seen, there are more implications with breaking things here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results blocker Blocks a release or needs immediate attention bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants