Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icinga2.8 - Notifications are sent even in downtime #6231

Closed
anan80 opened this issue Apr 17, 2018 · 7 comments · Fixed by #6270
Closed

icinga2.8 - Notifications are sent even in downtime #6231

anan80 opened this issue Apr 17, 2018 · 7 comments · Fixed by #6270
Labels
area/notifications Notification events bug Something isn't working
Milestone

Comments

@anan80
Copy link

anan80 commented Apr 17, 2018

We have icinga master1-2 /satellite/client setup running with version 2.8. We are facing issue that we are still receiving alert notification even in downtime, Downtime is set thorugh web-UI and it shows in UI but still icinga is triggering alert notification mails. I observed that, no of servers doesnot match between master1 and master 2 server in /var/lib/icinga2/api/packages/_api//conf.d/downtimes directory

icinga: 2.8.0-1
icingaweb2: 2.4.1

Master 1 - api.conf is set to false for accept_config but master 2 is set to true

@dnsmichi
Copy link
Contributor

I would appreciate it if you would take the time to fill in the issue template and provide steps to reproduce the problem.

@dnsmichi dnsmichi added needs feedback We'll only proceed once we hear from you again area/notifications Notification events labels Apr 17, 2018
@unix0r
Copy link

unix0r commented Apr 19, 2018

I'm also seeing this issue.

There is a downtime for a service of a host (created via icingaweb2):

object Downtime "PLS-SERVER-xxxxx" ignore_on_error {
	author = "admin"
	comment = "node was removed"
	config_owner = ""
	duration = 0.000000
	end_time = 1839587992.000000
	entry_time = 1523968793.733191
	fixed = true
	host_name = "pls-goeteborg1_10.46.8.141_10.46.8.140"
	scheduled_by = ""
	service_name = "Kiosk_LastSeen"
	start_time = 1523968792.000000
	triggered_by = ""
	version = 1523968793.733220
	zone = "pls-goeteborg1"
}

But there are still notifications coming from this service:

***** Service Monitoring on PLS-SERVER *****

Last Order at Order Point on 10.46.8.140 is CRITICAL!

Info: 59 days 3 hours ago

When: 2018-04-19 14:45:06 +0200
Service: Kiosk_LastSeen
Host: pls-goeteborg1_10.46.8.141_10.46.8.140
IPv4: 10.46.8.141

System Information:

root@PLS-SERVER:/etc/icinga2# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.2-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
Installation root: /usr
Sysconf directory: /etc
Run directory: /run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

System information:
Platform: Debian GNU/Linux
Platform version: 9 (stretch)
Kernel: Linux
Kernel version: 4.9.0-4-amd64
Architecture: x86_64

Build information:
Compiler: GNU 6.3.0
Build host: 022328c363ac

root@PLS-SERVER:/etc/icinga2# icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf influxdb livestatus opentsdb statusdata syslog
Enabled features: api checker command graphite ido-pgsql mainlog notification perfdata

root@PLS-SERVER:/etc/icinga2# icinga2 daemon -C
information/cli: Icinga application loader (version: r2.8.2-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: master
warning/ApplyRule: Apply rule 'ping6' (in /etc/icinga2/conf.d/services.conf: 28:1-28:21) for type 'Service' does not match anywhere!
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 10 Zones.
information/ConfigItem: Instantiated 8 Endpoints.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 2 ApiUsers.
information/ConfigItem: Instantiated 358 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 236 CheckCommands.
information/ConfigItem: Instantiated 139 Downtimes.
information/ConfigItem: Instantiated 8 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 1 EventCommand.
information/ConfigItem: Instantiated 510 Hosts.
information/ConfigItem: Instantiated 2 UserGroups.
information/ConfigItem: Instantiated 2 Users.
information/ConfigItem: Instantiated 4 TimePeriods.
information/ConfigItem: Instantiated 2991 Services.
information/ConfigItem: Instantiated 16 ServiceGroups.
information/ConfigItem: Instantiated 1 ScheduledDowntime.
information/ConfigItem: Instantiated 1 ExternalCommandListener.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ConfigItem: Instantiated 1 GraphiteWriter.
information/ConfigItem: Instantiated 1 PerfdataWriter.
information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).

@dnsmichi
Copy link
Contributor

Please extract the runtime state for this host/service and downtime via REST API endpoints /v1/objects/services and /v1/objects/downtimes.

@unix0r
Copy link

unix0r commented Apr 19, 2018

Downtime:

    {
        "attrs": {
            "__name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen!PLS-SERVER-1523968793-7",
            "active": true,
            "author": "admin",
            "comment": "node was removed",
            "config_owner": "",
            "duration": 0.0,
            "end_time": 1839587992.0,
            "entry_time": 1523968793.733191,
            "fixed": true,
            "ha_mode": 0.0,
            "host_name": "pls-goeteborg1_10.46.8.141_10.46.8.140",
            "legacy_id": 70.0,
            "name": "PLS-SERVER-1523968793-7",
            "original_attributes": null,
            "package": "_api",
            "paused": false,
            "scheduled_by": "",
            "service_name": "Kiosk_LastSeen",
            "source_location": {
                "first_column": 0.0,
                "first_line": 1.0,
                "last_column": 56.0,
                "last_line": 1.0,
                "path": "/var/lib/icinga2/api/packages/_api/PLS-SERVER-1513681196-1/conf.d/downtimes/pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen!PLS-SERVER-1523968793-7.conf"
            },
            "start_time": 1523968792.0,
            "templates": [
                "PLS-SERVER-1523968793-7"
            ],
            "trigger_time": 1523968793.737246,
            "triggered_by": "",
            "triggers": [],
            "type": "Downtime",
            "version": 1523968793.73322,
            "was_cancelled": false,
            "zone": "pls-goeteborg1"
        },
        "joins": {},
        "meta": {},
        "name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen!PLS-SERVER-1523968793-7",
        "type": "Downtime"
    }

Service:

    {
        "attrs": {
            "__name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen",
            "acknowledgement": 0.0,
            "acknowledgement_expiry": 0.0,
            "action_url": "",
            "active": true,
            "check_attempt": 1.0,
            "check_command": "check_kiosk_lastseen",
            "check_interval": 600.0,
            "check_period": "",
            "check_timeout": null,
            "command_endpoint": "",
            "display_name": "Last Order at Order Point",
            "downtime_depth": 1.0,
            "enable_active_checks": true,
            "enable_event_handler": true,
            "enable_flapping": false,
            "enable_notifications": true,
            "enable_passive_checks": true,
            "enable_perfdata": true,
            "event_command": "",
            "flapping": false,
            "flapping_current": 0.0,
            "flapping_last_change": 0.0,
            "flapping_threshold": 0.0,
            "flapping_threshold_high": 30.0,
            "flapping_threshold_low": 25.0,
            "force_next_check": false,
            "force_next_notification": false,
            "groups": [
                "Kiosk_LastSeen"
            ],
            "ha_mode": 0.0,
            "host_name": "pls-goeteborg1_10.46.8.141_10.46.8.140",
            "icon_image": "",
            "icon_image_alt": "",
            "last_check": 1524150835.323797,
            "last_check_result": {
                "active": true,
                "check_source": "pls-goeteborg1",
                "command": [
                    "/usr/lib/nagios/plugins/check_lastseen",
                    "-c",
                    "2880",
                    "-v",
                    "2018-02-19T10:08:37.27438+01:00",
                    "-w",
                    "1440"
                ],
                "execution_end": 1524150835.323666,
                "execution_start": 1524150835.302616,
                "exit_status": 2.0,
                "output": "59 days 6 hours ago",
                "performance_data": [
                    "duration=85325;1440;2880"
                ],
                "schedule_end": 1524150835.323797,
                "schedule_start": 1524150835.302077,
                "state": 2.0,
                "type": "CheckResult",
                "vars_after": {
                    "attempt": 1.0,
                    "reachable": true,
                    "state": 2.0,
                    "state_type": 1.0
                },
                "vars_before": {
                    "attempt": 1.0,
                    "reachable": true,
                    "state": 2.0,
                    "state_type": 1.0
                }
            },
            "last_hard_state": 2.0,
            "last_hard_state_change": 1519204509.405459,
            "last_reachable": true,
            "last_state": 2.0,
            "last_state_change": 1519204509.405459,
            "last_state_critical": 1524150835.461949,
            "last_state_ok": 1519117653.056663,
            "last_state_type": 1.0,
            "last_state_unknown": 0.0,
            "last_state_unreachable": 0.0,
            "last_state_warning": 1519203909.451425,
            "max_check_attempts": 5.0,
            "name": "Kiosk_LastSeen",
            "next_check": 1524151430.251961,
            "notes": "",
            "notes_url": "",
            "original_attributes": null,
            "package": "_etc",
            "paused": false,
            "retry_interval": 60.0,
            "severity": 129.0,
            "source_location": {
                "first_column": 1.0,
                "first_line": 236.0,
                "last_column": 30.0,
                "last_line": 236.0,
                "path": "/etc/icinga2/zones.d/global-templates/P_services.conf"
            },
            "state": 2.0,
            "state_type": 1.0,
            "templates": [
                "Kiosk_LastSeen",
                "kiosk-service-urgent",
                "kiosk-service",
                "generic-service"
            ],
            "type": "Service",
            "vars": {
                "notification": {
                    "mail": {
                        "users": [
                            "v_user"
                        ]
                    }
                }
            },
            "version": 0.0,
            "volatile": false,
            "zone": "pls-goeteborg1"
        },
        "joins": {},
        "meta": {},
        "name": "pls-goeteborg1_10.46.8.141_10.46.8.140!Kiosk_LastSeen",
        "type": "Service"
    }

@dnsmichi
Copy link
Contributor

Hm, looks ok to me. Can you share the zones.conf on that master, and check which node is sending the notification email? I would suspect that it happens on the secondary master which maybe doesn't have the downtime applied.

@unix0r
Copy link

unix0r commented Apr 27, 2018

We only have one master and only the master is sending mails.
Notifications and Downtimes are also applied via icingaweb2 of the master.

object Endpoint NodeName {
}

object Zone ZoneName {
	endpoints = [ NodeName ]
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}
object Endpoint "pls-goeteborg1"{
	host = "10.10.27.2"
}
object Zone "pls-goeteborg1"{
	endpoints = ["pls-goeteborg1"]
	parent = ZoneName
}

@dnsmichi
Copy link
Contributor

So your problem is different to what @anan80 described, highly likely.

One thing I would also check via REST API - the notification objects and their current state. E.g. the last_notification timestamp, etc.

@dnsmichi dnsmichi added this to the 2.9.0 milestone May 4, 2018
@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again labels May 4, 2018
dnsmichi pushed a commit that referenced this issue May 4, 2018
This patch ensures that specific configuration types
are pre-activated and post-activated. In general,
logging is first, then common configuration objects
like host/service, downtimes, etc.
In the end, all features are activated after to ensure
that notifications are only sent once downtimes are applied.
A similar thing happens for starting with checks too early.
The ApiListener feature runs first to allow cluster connections
at first glance.

fixes #6057
fixes #6231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants