Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

types: fix alert merging #1611

Merged
merged 1 commit into from
Nov 9, 2018

Conversation

simonpasquier
Copy link
Member

@simonpasquier simonpasquier commented Nov 7, 2018

Alert merging assumed that EndsAt would always be empty for firing alerts. This is no longer true starting with Prometheus v2.4.0: EndsAt is set to a multiple of the evaluation interval or resend interval (whichever is the largest). See prometheus/prometheus#4550

The issue might be in #1581 and it has been raised on the prometheus users mailing list too.

I've added more tests because even though the code change is small, it is quite tricky to get it right (and I'm not sure I did!).

cc @brian-brazil @stuartnelson3 @mxinden

@mxinden
Copy link
Member

mxinden commented Nov 8, 2018

Just a high level question. I need to give this further thoughts.

Alert merging assumed that EndsAt would always be empty for firing alerts.

How can the EndsAt ever be empty with:

alertmanager/api/v1/api.go

Lines 448 to 451 in 625604d

if alert.EndsAt.IsZero() {
alert.Timeout = true
alert.EndsAt = now.Add(resolveTimeout)
}

Wouldn't it rather be:

Alert merging assumed that Alert.Timeout would always be true for firing alerts.

@mxinden
Copy link
Member

mxinden commented Nov 8, 2018

I don't see #1553 (v0.16.0) happening in the short term. Do you think we should cut a patch release for the previous v0.15.x series?

@simonpasquier
Copy link
Member Author

Alert merging assumed that EndsAt would always be empty for firing alerts.

I meant to say: alert merging assumed that firing alerts received from Prometheus would always have empty EndsAt and thus Timeout set to true.

Although the problem is more acute for setups with short group_interval values (which we don't really encourage), I think it would deserve a 0.15.x release.

Res: &Alert{
Alert: model.Alert{
StartsAt: now.Add(-2 * time.Minute),
EndsAt: now,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this. If I have two alerts, where one is firing and one is not, I expect the merged alert to still be firing, right? Especially as none of them have the Timeout flag set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because B is received after A, it's EndsAt value "wins" because A isn't resolved (is what I'm reading from the code and comments).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because B is received after A

Exactly, an alert that is effectively resolved wins over an older alert that was firing previously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right got it. Thanks for clarifying @stuartnelson3 & @simonpasquier!

Res: &Alert{
Alert: model.Alert{
StartsAt: now.Add(-2 * time.Minute),
EndsAt: now,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right got it. Thanks for clarifying @stuartnelson3 & @simonpasquier!

@mxinden
Copy link
Member

mxinden commented Nov 9, 2018

Should we merge this first into release-0.15 branch and then merge release-0.15 into master. That way we don't rewrite history via a cherry-pick. What do you think @stuartnelson3 @simonpasquier.

@stuartnelson3
Copy link
Contributor

Should we merge this first into release-0.15 branch and then merge release-0.15 into master. That way we don't rewrite history via a cherry-pick. What do you think @stuartnelson3 @simonpasquier.

Sounds fine to me

@mxinden mxinden changed the base branch from master to release-0.15 November 9, 2018 13:55
@mxinden mxinden changed the base branch from release-0.15 to master November 9, 2018 13:56
@mxinden
Copy link
Member

mxinden commented Nov 9, 2018

@simonpasquier would you mind rebasing your one commit on top of release-0.15 branch?

Alert merging assumed that EndsAt would always be empty for firing
alerts. This is no longer true starting with Prometheus v2.4.0: EndsAt
is set to a multiple of the evaluation interval or resend interval
(whichever is the largest). This change updates the merging logic to
support both cases.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@simonpasquier simonpasquier changed the base branch from master to release-0.15 November 9, 2018 14:10
@simonpasquier
Copy link
Member Author

PR rebased on top of release-0.15. I had an additional acceptance test in test/with_api_v2/acceptance/send_test.go that wasn't relevant anymore on the release branch since the V2 API doesn't exist. I'll add it back after this PR is merged back into master.

@mxinden mxinden merged commit a5db882 into prometheus:release-0.15 Nov 9, 2018
@mxinden
Copy link
Member

mxinden commented Nov 9, 2018

Will follow up with a CHANGELOG pull request in a bit.

@simonpasquier simonpasquier deleted the fix-merge-alerts branch November 9, 2018 14:58
@mxinden
Copy link
Member

mxinden commented Nov 15, 2018

I had an additional acceptance test in test/with_api_v2/acceptance/send_test.go that wasn't relevant anymore on the release branch since the V2 API doesn't exist. I'll add it back after this PR is merged back into master.

@simonpasquier would you mind following up with a PR for this test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants