-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Icinga2 sends notification during reload (despite downtime) and does not log it #6846
Comments
ref/NC/597231 |
@widhalmt Any chance that the internal ticket contains any useful information to reproduce this issue or something else? Let me know if I can do anything to help with this problem. |
@ekeih Sorry. All we got so far was that the setup in question hit the same problem. I'm still evaluating but I already escaleted the ticket internally so it might well be that someone else will come back to you for details. |
Thanks for the info! 🙂 |
Whiteboxtested v2.10.2 with the following result:
(Thanks to @Crunsher for consultation.) TL;DR: The code seems clean. CC @lippserd @ekeih Please could you provide debug logs around the time the notification was send? And: Are you sure that Icinga 2 calls the notification script? |
We also see this behavior in our setup too. |
@SimonHoenscheid Do you have some debug logs and information about your setup? |
Currently not. Last time we used the debug log it grew extremely fast. So I am not able to enable it for several days in production.
Yes, I am 100% sure that it is called by Icinga2. |
Found that issue, because we had the same problem in the past. [2019-01-25 18:40:41 +0100] information/WorkQueue: #34 (JsonRpcConnection, #23) items: 3, rate: 2495.82/s (149749/min 780477/5min 2344675/15min);
[2019-01-25 18:40:51 +0100] information/WorkQueue: #34 (JsonRpcConnection, #23) items: 1, rate: 2505.15/s (150309/min 780335/5min 2344155/15min);
[2019-01-25 18:41:01 +0100] information/WorkQueue: #34 (JsonRpcConnection, #23) items: 4, rate: 2514.7/s (150882/min 780287/5min 2347299/15min);
[2019-01-25 18:41:02 +0100] information/Application: Reload requested, letting new process take over.
[2019-01-25 18:41:02 +0100] information/ApiListener: 'api' stopped.
[2019-01-25 18:41:02 +0100] information/CheckerComponent: 'checker' stopped.
[2019-01-25 18:41:03 +0100] information/ExternalCommandListener: 'command' stopped.
[2019-01-25 18:42:01 +0100] information/FileLogger: 'main-log' started.
[2019-01-25 18:42:10 +0100] information/WorkQueue: #10 (DaemonCommand::Run) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2019-01-25 18:42:13 +0100] information/ApiListener: 'api' started.
[2019-01-25 18:42:13 +0100] information/ApiListener: Copying 3 zone configuration files for zone 'xx.yy.zz.de' to '/var/lib/icinga2/api/zones/xx.yy.zz.de'.
[2019-01-25 18:42:13 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/xx.yy.zz.de' (0 Bytes). Received timestamp '2019-01-25 18:42:13 +0100' (1548438133.171498), Current timestamp '2019-01-25 11:50:50 +0100' (1548413450.769627).
[2019-01-25 18:42:13 +0100] information/ApiListener: Copying 12 zone configuration files for zone 'director-global' to '/var/lib/icinga2/api/zones/director-global'. Historie in Icingweb2 is empty for that timestamp ... ... SMS-Gateway shows Icinga has triggered the notification. Systemd logs something like this Jan 25 18:41:02 localhost.localdomain systemd[1]: icinga2.service: Supervising process 32212 which is not our child. We'll most likely not notice when it exits. We're running a single master with several zones. So I think it's not a cluster problem ... |
Another idea ... Is it possible, that the newly spawn Icinga2 process will send this notification while starting up? Because ACKs sent via API are shown in Icingaweb2: The notification script automatically sends an ACK to the API for the triggered service:
So the API accepts and process connections and events, but logs nothing to /var/log/icinga2/icinga2.log. The reload on 25th at ~ 18:40 was triggered by Director: As I understand the state of Icinga2 is written to So ... Which Icinga2-Core receives the API-Call during reload in that case? The old or the new one? Why is an API-Call persisten, but nothing is logged or written to ido? Update: After digging through the notificationcomponent code as @Al2Klimov described, I found, that the following isn't never logged to the /var/log/icinga2/icinga2.log in our environments:
With a
only the startup of the component is logged:
A reload with systemd or director also brings up the NotificationComponent message in the logfile.
An "NotificationComponent ... stopped" event, which should be thrown during shutdown as I understand, wasn't also logged in that case. I assume something goes wrong during shutdown, which leads to a trigger that sends notifications out. |
During reload the daemon starts a copy of itself and then shuts down. There's a small period of time during which any of them could receive the request. At the moment log messages about API requests may be lost, but #6827 would fix that. |
I can sit down next week and put everything together. regarding debug logs, I need to have a look of the impacts this will have on our system. |
@ekeih You could auto-trucate the debug log by a cron script not to run out of disk space. |
Same here. If any addtional Logs or Information are required, please feel free to ask. Thanks |
@MaBauMeBad I think the main reason that there is no progress is that nobody provided debug logs. I currently do not have the resources to do so. |
Hi all, Cheers, |
Please test this with 2.11 RC - https://icinga.com/2019/07/25/icinga-2-11-release-candidate/ |
I consider this fixed with 2.11 from the many patches we've implemented in this region. |
This is similar to #6057. It is even the exact same Icinga2 setup 😉
Expected Behavior
Current Behavior
We have two masters (master01 and master02) which are responsible for the notifications in our cluster.
This should not happen because the service is still in downtime (and also the active checks are disabled). I guess it only happened for one contact because the other notification objects seem to be "scheduled" on master01.
Icinga2 does not log this notification, but our notification handler does. The notification is also not visible in Icingaweb2.
Possible Solution
I guess that it is again some issue with the startup order or a race condition. But I have no specific idea.
Steps to Reproduce (for bugs)
It seems to happen at random times. I do not even know how often. Currently I only have this one example and no idea how to reproduce it.
Context
We have 2 masters and 6 satellites (4 zones, 2x2 and 2x1 satellites) and deploy our configuration via Puppet on master01 at xx:22. (In this case at 20:22.) Usually Icinga2 selects master02 as the IDO-DB master.
Your Environment
icinga2 --version
): r2.10.2-1icinga2 feature list
): api checker command ido-mysql mainlog notificationicinga2 daemon -C
):Let me know if you think that a detailed configuration of the zones end endpoints would be helpful or if I can do something to help.
The text was updated successfully, but these errors were encountered: