Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent alerts from inhibiting themselves #1017

Merged
merged 5 commits into from
Nov 7, 2017

Conversation

alin-amana
Copy link
Contributor

This resolves #535.

Simply checks for the fingerprint of the potentially inhibiting alert not being equal to the inhibited alert.

@brian-brazil
Copy link
Contributor

We've already discussed this elsewhere and decided that this is not worth making an exception for. The desired effect can be achieved by an appropriate label layout.

@alin-amana
Copy link
Contributor Author

Not trying to be obnoxious, but I don't understand why #666 is still open in this case. I.e. if this is the intended behavior and it is documented as such.

@brian-brazil
Copy link
Contributor

It's still open pending documentation to clarify that this is the behaviour.

@alin-amana
Copy link
Contributor Author

Well, AFAICT https://prometheus.io/docs/alerting/configuration/#<inhibit_rule> already has this note.

@alin-amana
Copy link
Contributor Author

Since there seem to be no takers, I'm closing this. :o)

@alin-amana alin-amana closed this Oct 4, 2017
@alin-amana alin-amana deleted the dont_inhibit_self branch October 4, 2017 08:11
@beorn7
Copy link
Member

beorn7 commented Oct 4, 2017

We've already discussed this elsewhere and decided that this is not worth making an exception for. The desired effect can be achieved by an appropriate label layout.

IIRC the outcome was that the desired effect cannot be achieved by an appropriate label layout.

The stated reason to not prevent alerts from inhibiting itself was that the implementation would be too tricky. From the user's perspective, it does never make sense for an alert to inhibit itself.

I would at least like to have one of the Alertmanager people check if this attempt pulls it off. @stuartnelson3 @brancz @fabxc

@brian-brazil
Copy link
Contributor

brian-brazil commented Oct 4, 2017

IIRC the outcome was that the desired effect cannot be achieved by an appropriate label layout.

I don't see that outcome, merely that one specific way of doing it doesn't work. Have you tried something along the lines of severity=page and severity=page-nodedown?

@beorn7
Copy link
Member

beorn7 commented Oct 4, 2017

Our intention is to not bloat our severity levels.

@brian-brazil
Copy link
Contributor

The proposal to change the inhibition bloats the alertmanger config with knowledge of alertnames, and I expect will be more fragile overall to keep in sync. The expected way to use inhibition is via severity labels and equivalents, I'm generally against changing things merely because a user doesn't like the recommended way of doing things.

@alin-amana
Copy link
Contributor Author

For context, I realized after I submitted this PR that it is not a complete fix.

E.g. if you have 2 ALERT metrics that match source_match/source_match_re they will not inhibit themselves, but they will inhibit each other. It would be slightly more involved (but not impossible) to prevent this from occurring.

@beorn7
Copy link
Member

beorn7 commented Oct 4, 2017

E.g. if you have 2 ALERT metrics that match source_match/source_match_re they will not inhibit themselves, but they will inhibit each other.

Isn't that the intended behavior? Could you give a full example (with the full inhibition rule) so that I can understand what's the problem with that?

@brian-brazil
Copy link
Contributor

Off the top of my head, say you had NodeDownReasonFoo and NodeDownReasonBar which you wanted either to inhibit all other alerts from a machine. Even if self-inhibition was stopped, if both were firing for a node then both would get inhibited and you'd get no notifications.

My gut feeling is that this isn't solvable in general, though I haven't thought through it fully.

@beorn7
Copy link
Member

beorn7 commented Oct 4, 2017

How about this take: Any alert that is matched by the source matchers of a given inhibition rules is automatically prevented from matching the target side of the same inhibition rule.

@brian-brazil
Copy link
Contributor

At a first pass that'd seem to do the right thing, I'm still not convinced the setups it enables are a good idea though.

@fabxc
Copy link
Contributor

fabxc commented Oct 4, 2017

An alert inhibiting so far has been unexpected and unintuitive behavior 100% of the time so far. I see little reason to keep it around just for the sake of it. If there's an intuitive and correct solution, I'd be in favor of it.

@alin-amana
Copy link
Contributor Author

alin-amana commented Nov 2, 2017

I've taken @beorn7's suggestion and came up with a one-line change that prevents alerts that match both the source and target filters from inhibiting themselves.

You can still set up multiple inhibition rules (such as A inhibits B, B inhibits A) that will essentially end up inhibiting all alerts, but you can't do it with a single '<inhibit_rule>' anymore. Progress. :o)

@alin-amana alin-amana reopened this Nov 2, 2017
@stuartnelson3
Copy link
Contributor

Seems like a sound solution to me, but I've not been involved in the discussion so far, so I leave it to the others to decide. The circular inhibition issue already exists, so 🤷‍♀️ from me.

@beorn7
Copy link
Member

beorn7 commented Nov 2, 2017

I'll have a look ASAP…

@alin-amana
Copy link
Contributor Author

ASAP

I do not think it means what you think it means. Sorry, I just had to take that. Low hanging fruit and all... :o)

Any chance of looking at it though? I've added a comment to get CircleCI to retry so it's now a 2 line change, but only one is code.

@beorn7
Copy link
Member

beorn7 commented Nov 6, 2017

ASAP

I do not think it means what you think it means. Sorry, I just had to take that. Low hanging fruit and all... :o)

I know exactly what it means, sadly…

@beorn7
Copy link
Member

beorn7 commented Nov 6, 2017

I think this semantics makes sense. 👍 from my side.

@beorn7
Copy link
Member

beorn7 commented Nov 7, 2017

@stuartnelson3 I'm not sure if your last comment here including that emoji (purple angel?) means approval or not. If you agree with this change, please simply merge.

@stuartnelson3
Copy link
Contributor

Sorry for the confusion, it was supposed to be the shrugging emoji. I'll go ahead and merge, thanks for reviewing this.

@stuartnelson3 stuartnelson3 merged commit dc3c78e into prometheus:master Nov 7, 2017
@alin-amana
Copy link
Contributor Author

Whee! Thanks!

hh pushed a commit to ii/alertmanager that referenced this pull request Sep 5, 2018
Signed-off-by: Dan Fredell <Dan.Fredell@gmail.com>
Copy link

@kelein kelein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to inhibit all other alerts if target is down?
6 participants