Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notifications are sent when reloading Icinga 2 even though they're deactivated via modified attributes #4696

Closed
icinga-migration opened this issue Sep 28, 2016 · 21 comments
Assignees
Labels
area/notifications Notification events blocker Blocks a release or needs immediate attention bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12817

Created by pstiffel on 2016-09-28 08:40:48 +00:00

Assignee: pstiffel
Status: Feedback
Target Version: (none)
Last Update: 2016-12-12 06:47:52 +00:00 (in Redmine)

Icinga Version: 2.5.4-1
Backport?: Not yet backported
Include in Changelog: 1

Unfortunately, even after updating to 2.5.4, Icinga2 still sends out notifications for services which have notifications disabled, when icinga2 is reloaded due to configuration changes.
And it is not deterministic which of the services get notified.

Example.
At the moment, we have 3 services which are in a critical state and which have notifications disabled. After a reload (we added some hosts and services), mail, sms and ticketsystem-notifications were send out for only one of the services.
The next reload due to configuration changes produced mail and sms notifications for all 3 services, but a third notification (email to a ticketsystem) was only notified for 2 services.

We have an HA-cluster-setup running on two nodes (Debian Jessie with packages from debmon.org)

Attachments

  • icinga.zip pstiffel - 2016-11-25 14:51:21 +00:00 - Logfiles
@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-09-30 13:08:34 +00:00

  • Status changed from New to Feedback
  • Assigned to set to pstiffel

Do you happen to have any debug logs from those reloads which could help identify and reproduce the issue you're having?

@icinga-migration
Copy link
Author

Updated by pstiffel on 2016-09-30 15:13:09 +00:00

The problem is still happening, so I guess, I can switch on the debuglog and try to reproduce the error.
But it will take until after the long weekend...

@icinga-migration
Copy link
Author

Updated by pstiffel on 2016-10-07 11:31:41 +00:00

What is the preferred way to post the debug log? It is a production machine, so I don't want to include the debug.log here. It's too much work to anonymize the log...

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-18 15:32:56 +00:00

I'd prefer a specific host/service and notification example extracted from it (that works well with replacing sensitive names too). That flow is important to see how they pass or not pass the specified state/type filters, or if the notification is for example forced if it shouldn't be.

Other details which are helpful - specific configuration objects from "icinga2 object list --type ..." for those objects (Host, Service, Notification, User).

@icinga-migration
Copy link
Author

Updated by pstiffel on 2016-11-18 15:47:58 +00:00

Ok, I see what I can do.

@icinga-migration
Copy link
Author

Updated by pstiffel on 2016-11-25 14:52:39 +00:00

  • File added icinga.zip

Managed to declutter and anonymize some debug logfiles and the service definitions.

Let me know if you need more or if I removed important lines...

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-07 21:41:33 +00:00

The configuration files all have "enable_notifications = true" set. So does the debug log tell - there are reminder notifications in place which will check against such parameters, in addition to state/type filters. Everything looks normal from this point of view.

Can you tell me how you did disable notifications for these services?

@icinga-migration
Copy link
Author

Updated by pstiffel on 2016-12-12 06:47:52 +00:00

The services failed and we turned the notifications off via the classic UI.

@icinga-migration icinga-migration added needs feedback We'll only proceed once we hear from you again bug Something isn't working area/notifications Notification events labels Jan 17, 2017
@edpstiffel
Copy link

Just a quick feedback: the same thing happens with V2.6

@dnsmichi
Copy link
Contributor

Might be a race condition between loading the state file and the modified attributes file.

Can you verify that both HA instance have modified-attributes.conf set to the same values (enable_notifications = false).

@edpstiffel
Copy link

edpstiffel commented Feb 23, 2017

Could this be related to #4946 ?

We also checked what you suggested: On both nodes, the modified-attributes.conf says that the notification is disabled.

@dnsmichi
Copy link
Contributor

Hm, this log entry looks suspicious.

60189 [2016-10-07 11:25:05 +0200] notice/ApiListener: Received update for object: {"modified_attributes":{"enable_notifications":true},"name":"switch2.eurodata.de!snmp-Netzteil 1","original_attributes":["enable_notificatio      ns"],"type":"Service","version":1475832291.7656579018}

Can you try the following:

  • Remove the modified-attributes.conf entries for both nodes
  • Disable the notifications inside your configuration objects (master zones.d)
  • Restart icinga2 on both nodes

I would believe that issue is gone then.

The issue is somehow that one node thinks that notifications are enabled, and enforces that over the cluster. I'm not sure yet how to reliably reproduce the issue.

@edpstiffel
Copy link

Hmm, this issue is not related to a dedicated host.
For testing, we disabled notifications on three new different hosts, and now and then, icinga sends out notifications when the config is reloaded.
But same thing: there are three notification objects connected to each host, and the notifications do not occur for all objects and it's unpredictable which objects send out mail.

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Mar 24, 2017
@saikrishnagaddipati
Copy link
Contributor

+1
Having the same issue.

@robbsie
Copy link

robbsie commented Apr 14, 2017

Same here, also with HA - seems to be a timing problem when reloading.

The notifications are obviously sent before the files from /var/lib/icinga2/api or something else are loaded, this is where the downtimes and ACKs from the API/Icingaweb are stored afaik.

@dnsmichi dnsmichi added the blocker Blocks a release or needs immediate attention label Apr 18, 2017
@dnsmichi
Copy link
Contributor

There are two problems with are possibly related at the event of triggering re-notifications.

  • modified attributes are not applied
  • downtimes are not active

We're currently investigating on a possible race condition between activating all objects (including the NotificationComponent feature) while not having applied modifications and runtime state.

One thing you can do to help nail down the issue:

  • enhance your notification script and temporarly add the value of the runtime macros $host.enable_notifications$ resp. $service.enable_notifications$
  • optionally add $host.version$ and $service.version$ (that one holds the modification timestamp. If it is 0, the object wasn't modified at all at the time the notification triggered.

Reproducing the issue

  • Ensure to have a config compile time > 10sec, best is a couple of minutes. slow it down at all cost.
  • Edit modified-attributes.conf and modify enable_notifications=false (or have it in place already via runtime API request)
curl -k -s -u root:icinga -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/objects/hosts' -d '{ "filter": "match(\"4696*\", host.name)", "attrs": { "enable_notifications": false } }'
curl -k -s -u root:icinga -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/objects/services' -d '{ "filter": "match(\"4696*\", host.name)", "attrs": { "enable_notifications": false } }'
  • Have hosts and services which use the random check command, retry and check interval is 1s
  • Have notification objects which use a notification interval < 10s
  • use the notification script to dump the runtime state of the objects involved (host, service, notification)
  • log the notifications in the script, do not send them to anyone. attach them here.

Test Config

cat 4696-logger

#!/bin/bash

echo `date +%s` $1 >> /tmp/i2.4096.notifications.log
cat 4696-disable-notifications

#!/bin/bash

curl -k -s -u root:icinga -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/objects/hosts' -d '{ "filter": "match(\"4696*\", host.name)", "attrs": { "enable_notifications": false } }'
curl -k -s -u root:icinga -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/objects/services' -d '{ "filter": "match(\"4696*\", host.name)", "attrs": { "enable_notifications": false } }'
cat 4696.conf

const countHosts = 3;
const countServices = 300;


for (i in range(countHosts)) {
  object Host "4696-test-" + i use (i) {
    check_command = "random"
    check_interval = 1s
    retry_interval = 1s
  }
}

for (j in range(countServices)) {
  apply Service "4696-test-" + j use (j) {
    check_command = "random"
    check_interval = 1s
    retry_interval = 1s

    assign where match("4696*", host.name)
  }
}

/* comamnds */
object NotificationCommand "4696-log-command" {
  command = [ "./4696-logger" ]

  arguments = {
    "log" = {
      skip_key = true
      value = "$logger_data$"
    }
  }

  vars.logger_data = {{
    //dumps all objects and relevant info as json string
    var data = ""
    var host_name = macro("$host.name$")
    var service_name = macro("$service.name$")
    var notification_name = macro("$notification.name$")

    data += "Timestamp: " + DateTime(get_time()).to_string() + "\n"

    h = get_host(host_name)
    data += "Host: " + h.name

    if (service_name.len() > 0) {
      s = get_service(host_name, service_name)
      data += " Service: " + s.name + " Notifications: " + string(s.enable_notifications) + " Version: " + string(s.version)
    } else {
      data += " Notifications: " + string(h.enable_notifications) + " Version: " + string(h.version)
    }

    n = get_object(Notification, notification_name)
    //data += " Last notification: " + DateTime(n.last_problem_notification).to_string() + "\n"

    return data
  }}
}

/* notifications */
object User "4696-user" { }

template Notification "4696-log" {
  command = "4696-log-command"
  interval = 5s
  users = [ "4696-user" ]
}

apply Notification "4696-log-host" to Host {
  import "4696-log"

  assign where match("4696*", host.name)
}

apply Notification "4696-log-service" to Service {
  import "4696-log"

  assign where match("4696*", host.name)
}

Attached to this issue:
4696.zip

Tests

Test case 1

Config with 3 hosts and 300 services each. Run once and disable notifications via API. Send a kill command to icinga2, don't do foreground + ctrl+C.

Verify inside modified-attributes.conf that everything is disabled.

var obj = get_object("Service", "4696-test-2!4696-test-115")
if (obj) {
        obj.modify_attribute("enable_notifications", false)
        obj.version = 1493374359.446715
}

var obj = get_object("Service", "4696-test-0!4696-test-115")
if (obj) {
        obj.modify_attribute("enable_notifications", false)
        obj.version = 1493374359.448456
}

See that the log says its sending notifications.

mbmif /usr/local/icinga2/etc/icinga2/tests (master *) # icinga2 daemon
[2017-04-28 12:14:38 +0200] information/cli: Icinga application loader (version: v2.6.3-196-g26c0a6f0c; debug)
[2017-04-28 12:14:38 +0200] information/cli: Loading configuration file(s).
[2017-04-28 12:14:39 +0200] information/ConfigItem: Committing config item(s).
[2017-04-28 12:14:39 +0200] information/ApiListener: My API identity: mbmif.int.netways.de
[2017-04-28 12:14:40 +0200] warning/ApplyRule: Apply rule 'satellite-host' (in /usr/local/icinga2/etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2017-04-28 12:14:40 +0200] warning/ApplyRule: Apply rule 'testdowntime' (in /usr/local/icinga2/etc/icinga2/tests/daniel.conf: 1:0-1:48) for type 'ScheduledDowntime' does not match anywhere!
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 3 Zones.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 4 ApiUsers.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 208 CheckCommands.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1828 Comments.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1834 Services.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 3 TimePeriods.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 2 Users.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 17 Hosts.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 3 NotificationCommands.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 918 Notifications.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 CheckResultReader.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 StatusDataWriter.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Instantiated 1 RedisWriter.
[2017-04-28 12:14:40 +0200] information/ScriptGlobal: Dumping variables to file '/usr/local/icinga2/var/cache/icinga2/icinga2.vars'
[2017-04-28 12:14:40 +0200] information/ConfigObject: Restoring program state from file '/usr/local/icinga2/var/lib/icinga2/icinga2.state'
[2017-04-28 12:14:40 +0200] information/ConfigObject: Restored 4841 objects. Loaded 0 new objects without state.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Triggering Start signal for config items
[2017-04-28 12:14:40 +0200] information/ApiListener: 'api' started.
[2017-04-28 12:14:40 +0200] information/ApiListener: Copying 3 zone configuration files for zone 'global-templates' to '/usr/local/icinga2/var/lib/icinga2/api/zones/global-templates'.
[2017-04-28 12:14:40 +0200] information/ApiListener: Adding new listener on port '5665'
[2017-04-28 12:14:40 +0200] information/CheckResultReader: 'reader' started.
[2017-04-28 12:14:40 +0200] information/CheckerComponent: 'checker' started.
[2017-04-28 12:14:40 +0200] information/ExternalCommandListener: 'command' started.
[2017-04-28 12:14:40 +0200] information/DbConnection: 'ido-mysql' started.
[2017-04-28 12:14:40 +0200] information/NotificationComponent: 'notification' started.
[2017-04-28 12:14:40 +0200] information/RedisWriter: 'redis' started.
[2017-04-28 12:14:40 +0200] information/RedisWriter: Trying to connect to redis server
[2017-04-28 12:14:40 +0200] information/StatusDataWriter: 'status' started.
[2017-04-28 12:14:40 +0200] information/ConfigItem: Activated all objects.
[2017-04-28 12:14:40 +0200] information/DbConnection: Resuming IDO connection: ido-mysql
[2017-04-28 12:14:40 +0200] information/IdoMysqlConnection: 'ido-mysql' resumed.
[2017-04-28 12:14:40 +0200] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.2')
[2017-04-28 12:14:41 +0200] information/Checkable: Notifications are disabled for service '4696-test-1'.

..........

[2017-04-28 12:14:41 +0200] information/Checkable: Notifications are disabled for service '4696-test-0!4696-test-84'.
[2017-04-28 12:14:41 +0200] information/Checkable: Notifications are disabled for service '4696-test-0!4696-test-87'.
[2017-04-28 12:14:41 +0200] information/Checkable: Checking for configured notifications for object '4696-test-0!4696-test-115'
[2017-04-28 12:14:41 +0200] information/Notification: Sending 'Problem' notification '4696-test-0!4696-test-115!4696-log-service for user '4696-user'
[2017-04-28 12:14:41 +0200] information/Checkable: Notifications are disabled for service '4696-test-0!4696-test-101'.
[2017-04-28 12:14:41 +0200] information/Checkable: Notifications are disabled for service '4696-test-0!4696-test-88'.
[2017-04-28 12:14:41 +0200] information/Checkable: Checking for configured notifications for object '4696-test-0!4696-test-126'
[2017-04-28 12:14:41 +0200] information/Checkable: Checking for configured notifications for object '4696-test-0!4696-test-119'
[2017-04-28 12:14:41 +0200] information/Notification: Sending 'Problem' notification '4696-test-0!4696-test-126!4696-log-service for user '4696-user'
[2017-04-28 12:14:41 +0200] information/Notification: Sending 'Problem' notification '4696-test-0!4696-test-119!4696-log-service for user '4696-user'

Test Case 2

Increase the host and service count. Modify countHosts to 300 or similar to really stress the system.

Will continue with debugging and investigating on a fix.

@gunnarbeutner
Copy link
Contributor

FYI, the current patch doesn't address the ::IsActive issue yet.

@dnsmichi
Copy link
Contributor

dnsmichi commented May 3, 2017

Patch resolves the issue with modified attributes only. The issue could still happen with downtimes not activated on notification request.
I'm using the test case described above without the local branch logging I made in another attempt to fix it.

@dnsmichi dnsmichi added this to the 2.7.0 milestone May 3, 2017
@gunnarbeutner
Copy link
Contributor

I'm moving the second part of the patch to a separate ticket (#5224) because we're not going to target 2.6.4 for this one just yet.

@gunnarbeutner gunnarbeutner added this to the 2.6.4 milestone May 4, 2017
@gunnarbeutner gunnarbeutner removed this from the 2.7.0 milestone May 4, 2017
@gunnarbeutner gunnarbeutner changed the title [dev.icinga.com #12817] Notifications are send out althought disabled when reloading Icinga2 Notifications are sent when reloading Icinga 2 even though they're deactivated via modified attributes May 4, 2017
@saikrishnagaddipati
Copy link
Contributor

Can someone please confirm if the fix for the issue "modified attributes are not applied" will be released in 2.6.4 or 2.7.0 and also is there any bugfix version being released for this issue?

@gunnarbeutner
Copy link
Contributor

At this point in time this ticket's milestone is set to 2.6.4, i.e. it'll be released as part of the next minor release. There's no ETA for that release yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events blocker Blocks a release or needs immediate attention bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants