-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time of day based alert routing/notification #876
Comments
So, @brian-brazil (@fabxc ?), could you provide any design guidance on how to implement said feature, as I think I'd like to suppress all 'severity: warning' alerts overnight rather than putting loads of time-based rule duplication into my Prometheus alert rules. I really don't want to do that as warnings are still valid & worth warning about if I go looking for current Alerts active - I just don't want to be woken up for them. They also shouldn't seem to resolve every evening & start again in the morning. After being burnt wasting effort on #709 I want some suggestion up front of what might be accepted from the maintainers. |
Indeed, it would be nice to see whether it is within the scope of AM. It'd be great to have this feature. |
I would like to be able to have time of day, or day of week influence which receiver an alert is sent to. i.e. - during daytime/business hours, alerts might go to a slack channel, vs during night/weekends, same alerts might go to pagerduty, or on e-mail for the current on-call person. From the Alertmanager perspective, it could be nice to use existing label matching to control routing to different receivers based on datetime. i.e. -
Something similar for time of day? It's a little tricker, and in routing, it would be nice keep things simple... i.e. - match_re on a label like 'time_window: business_hours', but I'm not sure how to get that meaningful label in there from the alert manager perspective without some sort of relabeling within alertmanager itself, and prometheus passing along an alert date/time. I'm a Prometheus/Alertmanager newbie, so apologies in advance if I'm missing something obvious here. An approach that generates meaningful date/time labels on the alerts, means those labels could also be used in inhibition as per @tyrken 's request to inhibit warnings for some or all alerts overnight. One of my main drivers for this is to not introduce time based rule duplication in all my prometheus alerts, as that feels cumbersome. |
Netstat is 40% of the metrics on my laptop, many of which are highly detailed information about IP internals in the kernel. ~300 such metrics on every machine in your fleet is excessive, so focus on key metrics by default, overridable by the user. Fixes prometheus#515 Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
The ability to subdue events based on the day and/or time would be really useful. I don't know what the best approach is here, but for Sensu they follow a re-usable pattern tied to the handler (receiver): subdue-attributes |
+1 : Just here to says it would be a lovely feature to be able to sleep well during the night and have some alerts only during business hours / days |
Adding https://golang.org/pkg/time/#Weekday for future reference. This could be implemented as a pipeline step that filters based on a defined day/time range. All times would be done in UTC. |
This is a important missing feature! Either enable alerts during some time ranges, or allow recursive silent rules Is there any workaround to silent staging/test/QA alerts during the night, but still receive then during the day? |
@danielmotaleite I have a solution for this based on inhibition rules that doesn't require any change to AlertManager. I'll post the blog post address here once it is out. |
AlertManager definitely needs a way for setting silence hours in config file. With label targeting, like it is in inhibit_rules. |
@simonpasquier - waiting for that blog link! |
@simonpasquier I am thirsty for this. 👍 |
Hello
Of course it's GMT based so does not take into account summer/winter times, neither bank holidays. Hope it might helps others. You should just have to replace |
https://gist.github.com/roidelapluie/8c67e9c8fb18b310a4a90cb92a23056b Our solution, with GMT and days off. Then you do:
That takes holidays in consideration. |
PS: about daily_saving_time_belgium: yes it works. |
I've written a blog post on how I solved my use case - link |
@Tom-Fawcett this is so great! |
@roidelapluie Have you encountered this? If there's no good workaround, I plan to try the approach @Tom-Fawcett wrote up. It seems like it would avoid that particular issue. |
Yes we have switched to inhibition now!! so much easier!! :) |
I was considering the development of a calendar exporter. It would produce simple on/off status based on calendar rules. It would be easier to handle specific cases (multiple time zone, reception rules, non-gregorian calendar) and any number of integrations could be considered. IMHO it would be an elegant solution but at the cost of database space for dummy metrics. |
The main problem is that with such a thing, if it is down, Prometheus will fire many alerts. Maybe we could do a binary/script that would generate files suitable for alerting rules. Because it will be more reliable |
Good point. I guess the same code able to generate metrics would be able to generate such a recording rule (in simple cases). Or, it could send the corresponding inhibition requests. In my line of work (exchange market access for financial institutions), we have a lot of checks related to calendar, across multiple timezones. So it wouldn't be limited to alert inhibition, we also expect events to occur within a specific time frame. |
@roidelapluie @michael-doubez just like any other exporter, you should have redundant instances in different zones running, so if one fails, you still get data from the other way The idea of a exporter outputting data and time based rules is actually not bad, but developing one with enough features may be tricky! :) |
I am thirsty for this too. 👍 |
Timezones are not supported in go on Windows. |
Alertmanager uses Golang 1.14, but 1.15 has an option to embed the timezone data - does that work on Windows? See https://golang.org/doc/go1.15#time/tzdata |
Not really, it was out of date when it was added and hasn't been updated since. Even if was promptly updated, you could still be easily talking a year for an update to propagate out given Go and AM release cycles which is far too long - ignoring all the other problems with embedding data such as being forced to upgrade. |
Yeah proper timezone support will have to wait until Go parses the OS provided timezone files on Windows. There's an open issue for this in Go, so hopefully there's progress soon. We can always add the feature relatively easily as soon as support is added. |
But is not having any TZ support still better than no support whatsoever? |
Il we go out today it will be out of sync for europe in about 6 months |
@roidelapluie is a change in timezones for the whole Europe scheduled next April? |
It seems like they moved it to 2022, but yes, it should be the end of DST here. |
There's countries where you often get zero notice of a change, and more generally timezone changes happen more frequently than you'd think. Canada is in the middle of one for example (the relevant law hadn't passed yet, but they were planning on it last I looked). |
Pull request here: #2393 |
Glad to see there's a PR open for this feature although it seems it's only for muting alerts between specific time periods. Are there any updates with regards to allowing for different alerting routes depending on the date of week and/or time? Something like was described in this comment would be great. |
Hi @hartfordfive, this was initially discussed in the design draft, but it was determined to be too problematic to change routing between time periods because there's a lot of important behaviour tied to routes, for example when time intervals change should a flurry of new alerts fire to the newly active route? Should a flurry of resolved alerts be sent to the old one? For this reason, routing remains static and muting is applied to them in the above design. However you should be able to achieve most of the same outcomes by muting routes and using the |
Hi, I was wondering if anyone can help me. So i followed this post and from the design document, if I understood correctly i set the below in alertmanager.yml
I started with this to test, however every time i am restarting alertmanager, I am getting the below error: msg="Loading configuration file failed" file=/etc/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n line 1: field mute_time_intervals not found in type config.plain Am i missing something? |
Hi @justin27c, |
hours 0-6 silence:
|
Implemented in #2393 |
Hi all.
But with that config alertmanager don't want to start.
Can anyone help me? |
@bimmerkiev - can you paste the whole log file to check which is line 44? |
@justin27c sure. fixed. |
Can you change the below?
|
Already tried that - no luck. The same error |
If I try this:
I'm receiving:
If like this:
The result:
|
I have the below config and working fine:
|
Hi @bimmerkiev, you were almost there with this version:
The problem is that the receiver was indented, so Alertmanager was getting confused because it should be at the top level alongside mute_time_intervals, group wait etc. This config should be ok:
|
I'm really appreciate for your help. |
Thank you all for your ideas, this is the expression I came up with in order to trigger the alert only during working hours on week days: |
We've had numerous requests for routing alerts based on the time of day/week. This issue is to track those.
The text was updated successfully, but these errors were encountered: