Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start icinga2 with kernel-3.10.0-514.21.2 RHEL7 #5367

Closed
alexlud opened this issue Jun 20, 2017 · 33 comments
Closed

Unable to start icinga2 with kernel-3.10.0-514.21.2 RHEL7 #5367

alexlud opened this issue Jun 20, 2017 · 33 comments
Labels
area/cli Command line helpers blocker Blocks a release or needs immediate attention bug Something isn't working core/crash Shouldn't happen, requires attention
Milestone

Comments

@alexlud
Copy link

alexlud commented Jun 20, 2017

General Notes

This seems to be an upstream Kernel regression in RHEL 7 only.

Please read the published advisory and our twitter channel where we keep posting updates on the matter.

https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/

Original Description

Hello,
I've applied the latest kernel update on my Icinga2 box. After booting the new kernel icinga2 is no longer able to start.
Running the previous kernel version is my current workaround.

Log:
Jun 20 08:15:05 icinga.example.com prepare-dirs[2629]: execvp: Argument list too long
Jun 20 08:15:05 icinga.example.com prepare-dirs[2629]: Could not fetch RunAsUser variable. Error ''. Exiting.
Jun 20 08:15:05 icinga.example.com systemd[1]: icinga2.service: control process exited, code=exited status=6
Jun 20 08:15:05 icinga.example.com systemd[1]: Failed to start Icinga host/service/network monitoring system.
Jun 20 08:15:05 icinga.example.com systemd[1]: Unit icinga2.service entered failed state.
Jun 20 08:15:05 icinga.example.com systemd[1]: icinga2.service failed.

Icinga2 version is 2.6.3
RHEL7.3 with all updates
kernel-3.10.0-514.21.2.el7.x86_64

@pefmeister
Copy link

Hi there,

got the same problem. This happend after upgrading to the newest RHEL kernel / glibc. The following (quick and dirty) fix did at least let me start Icinga again.

Change in /usr/sbin/icinga2 the last line to look like this:

exec $ICINGA2_BIN --no-stack-rlimit "$@"

@pefmeister
Copy link

When running strace, two systems with different patchlevel behave differently:

System with 3.10.0-514.21.1.el7.x86_64:

setrlimit(RLIMIT_NOFILE, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_STACK, {rlim_cur=256*1024, rlim_max=RLIM64_INFINITY}) = 0
execve("/usr/lib64/icinga2/sbin/icinga2", ["/usr/lib64/icinga2/sbin/icinga2", "--no-stack-rlimit"], [/* 25 vars */]) = 0
brk(0)                                  = 0x243b000`

System with 3.10.0-514.21.2.el7.x86_64:

setrlimit(RLIMIT_NOFILE, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_STACK, {rlim_cur=256*1024, rlim_max=RLIM64_INFINITY}) = 0
execve("/usr/lib64/icinga2/sbin/icinga2", ["/usr/lib64/icinga2/sbin/icinga2", "--no-stack-rlimit"], [/* 21 vars */]) = -1 E2BIG (Argument list too long)

There seems to be some major change in the behavior of the kernels. Any idea how to change that?

@pefmeister
Copy link

I suppose this is related to https://rhn.redhat.com/errata/RHSA-2017-1484.html. Is this something that must be fixed within Icinga?

@gunnarbeutner
Copy link
Contributor

Looks like their security fix inadvertently breaks legitimate uses of setrlimit(RLIMIT_STACK, ...).

@dnsmichi dnsmichi added core/crash Shouldn't happen, requires attention blocker Blocks a release or needs immediate attention labels Jun 20, 2017
@dnsmichi dnsmichi added this to the 2.7.0 milestone Jun 20, 2017
@dnsmichi
Copy link
Contributor

Thanks for the report, we'll look into that and are therefore postponing today's v2.7 release.

@dnsmichi
Copy link
Contributor

CVE-2017-1000364 seems fixed/applied in Debian too.

https://security-tracker.debian.org/tracker/CVE-2017-1000364

@dnsmichi
Copy link
Contributor

@mcktr
Copy link
Member

mcktr commented Jun 20, 2017

@dnsmichi I applied the patches this morning to our Debian 8 'jessie' system and Icinga 2 is still starting after a reboot.

root@[HOSTNAME]:~# uname -a
Linux [HOSTNAME] 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u1 (2017-06-18) x86_64 GNU/Linux

Is there anything else I can send you to help with this problem?

@dnsmichi
Copy link
Contributor

@mcktr thanks a lot, it's good to know that Debian does not seem to be affected.

We're currently investigating on the RHEL kernel update, diff'ing -1 and -2 source rpms.

@dnsmichi
Copy link
Contributor

stack_guard_gap = 256UL>>PAGE_SHIFT

expendable_stack_area()

Setting 4.5 MB stack size works, 4 MB does not.

@dnsmichi
Copy link
Contributor

We're lowering the stack size not to reserve too much memory for spawned threads. An older version just attempted to set ulimit -u inside the init script which failed on Debian Jessie in #1006.

Options:

  • remove RLIMIT_STACK and make it a systemd/init script option again
  • increase rlimit to a hardcoded size

@dnsmichi
Copy link
Contributor

@Crunsher
Copy link
Contributor

Alright, for now you can use @pefmeister workaround, we'll have a blogpost detailing the issues out in the coming days and 2.7 will come with a longterm solution.

@lazyfrosch
Copy link
Contributor

We reported a bug to RedHat mentioning the problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1463241

The bug is currently private (I guess default for kernel)

You can reproduce the problem in a more simple way:

$ ulimit -s 1024
$ /bin/true
bash: /bin/true: Argument list too long

$ ulimit -s 4096
$ /bin/true
bash: /bin/true: Argument list too long

@dnsmichi
Copy link
Contributor

@lazyfrosch
Copy link
Contributor

RHEL 6 seems to be fine:

[root@rhel6-test ~]# uname -a
Linux rhel6-test.localdomain 2.6.32-696.3.2.el6.x86_64 #1 SMP Wed Jun 7 11:51:39 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@rhel6-test ~]# bash -c "ulimit -s 256; /bin/true; echo 'Works.'"
Works.

@dnsmichi
Copy link
Contributor

@dnsmichi
Copy link
Contributor

The workaround for systemd also requires the prepare-dirs script being patched.

diff --git a/etc/initsystem/prepare-dirs b/etc/initsystem/prepare-dirs
index 6c4a08869..5677a787a 100644
--- a/etc/initsystem/prepare-dirs
+++ b/etc/initsystem/prepare-dirs
@@ -13,13 +13,13 @@ else
 fi


-ICINGA2_USER=`$DAEMON variable get --current RunAsUser`
+ICINGA2_USER=`$DAEMON variable get --current RunAsUser --no-stack-rlimit`
 if [ $? != 0 ]; then
         echo "Could not fetch RunAsUser variable. Error '$ICINGA2_USER'. Exiting."
         exit 6
 fi

-ICINGA2_GROUP=`$DAEMON variable get --current RunAsGroup`
+ICINGA2_GROUP=`$DAEMON variable get --current RunAsGroup --no-stack-rlimit`
 if [ $? != 0 ]; then
         echo "Could not fetch RunAsGroup variable. Error '$ICINGA2_GROUP'. Exiting."
         exit 6

@dnsmichi dnsmichi added the area/cli Command line helpers label Jun 20, 2017
@dnsmichi
Copy link
Contributor

CentOS 7 is currently rolling the kernel update onto the mirrors. The main mirror has it available.

[root@icinga2 ~]# uname -a ; ulimit -s 1024 && /bin/true && echo "works"
Linux icinga2 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
-bash: /bin/true: Argument list too long

@lazyfrosch
Copy link
Contributor

Looks like there are related problems with the Kernel Update, but also on Debian jessie here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=865311

@dnsmichi
Copy link
Contributor

Reported CentOS bug: https://bugs.centos.org/view.php?id=13453

@dnsmichi
Copy link
Contributor

https://lkml.org/lkml/2017/6/19/1515

@lazyfrosch
Copy link
Contributor

We received a test-build from RedHat that works fine in my test environment.

dnsmichi pushed a commit that referenced this issue Jun 23, 2017
dnsmichi pushed a commit that referenced this issue Jun 23, 2017
Make rlimits configurable by adding three variables: RLimitFiles, RLimitProcesses and RLimitStack

refs #5367
@dnsmichi
Copy link
Contributor

Our advisory is updated with everything that happened.

https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/

Please ensure to open a support case at RedHat to ask for an accelerated fix, or a test RPM. This raises awareness that they'll release it soon enough.

The configuration options have been added for v2.7. I would leave this issue open until RedHat/CentOS released a new Kernel update.

@alexlud
Copy link
Author

alexlud commented Jun 29, 2017

New kernel from RH is available.
kernel-3.10.0-514.26.1

No issues so far.

@pefmeister
Copy link

I can confirm this, too. Seems to be working with the new kernel. Let's go 2.7!

@dnsmichi
Copy link
Contributor

Thanks for your tests. We'll wait until everything is publicly resolved.

https://bugzilla.redhat.com/show_bug.cgi?id=1463241 is not clear about its state, CentOS still has the old Kernel version.

It is also highly likely that Debian was affected as they recently changed their patch set.
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.30-2%2Bdeb9u2
https://lists.debian.org/debian-security-announce/2017/msg00160.html

There might be more patches or regressions coming in, see e.g. torvalds/linux@98da7d0

Let's wait and see when the Kernel problems will calm down, then we'll may start a release cycle for 2.7 again.

@lazyfrosch
Copy link
Contributor

A knowledge base entry has been published, saying solution is in progress

@lazyfrosch
Copy link
Contributor

Also related: https://access.redhat.com/solutions/3098341

@Copis
Copy link

Copis commented Jun 30, 2017

Hi,

CentOS has released new kernel update 3.10.0-514.26.1 and i could confirm that icinga2 process starts well.

@dnsmichi
Copy link
Contributor

Catching up after vacation - the CentOS bug tracker item (https://bugs.centos.org/view.php?id=13453) is resolved and RedHat has published multiple Kernel versions too. Tested that inside the Vagrant box, works fine.

[root@icinga2 ~]# uname -a && icinga2 daemon -C
Linux icinga2 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
information/cli: Icinga application loader (version: v2.6.3-399-gc7d71b0)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: icinga2
warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
information/ConfigItem: Instantiated 4 ApiUsers.
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 3 Zones.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 1 Endpoint.
information/ConfigItem: Instantiated 1 UserGroup.
information/ConfigItem: Instantiated 28 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 177 CheckCommands.
information/ConfigItem: Instantiated 1 Downtime.
information/ConfigItem: Instantiated 4 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 157 Hosts.
information/ConfigItem: Instantiated 318 Comments.
information/ConfigItem: Instantiated 1 User.
information/ConfigItem: Instantiated 3 TimePeriods.
information/ConfigItem: Instantiated 161 Services.
information/ConfigItem: Instantiated 3 ServiceGroups.
information/ConfigItem: Instantiated 1 ScheduledDowntime.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ConfigItem: Instantiated 1 GraphiteWriter.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).

We'll discuss the 2.7 release once everyone involved returned from holidays, probably next week or so.

Closing here, thanks to everyone involved 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli Command line helpers blocker Blocks a release or needs immediate attention bug Something isn't working core/crash Shouldn't happen, requires attention
Projects
None yet
Development

No branches or pull requests

8 participants