-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return no error when deleting expired silence #2817
Return no error when deleting expired silence #2817
Conversation
…ilence Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
91d6855
to
d389091
Compare
Note: a PR (#2815) was posted shortly before this one addressing a related but distinct issue: return a 404 instead of a 500 for non-existent silences. The 2 PRs address different issues arising from delete silence returning 500 on any error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, I would suggest to add a comment in the code to make it clear that we want this method to be indempotent.
Note that this is not guaranteed to work indefinitvely because expired silences get garbage collected at some point.
cc @simonpasquier @gotjosh for a second look.
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
@simonpasquier @gotjosh, sorry to interrupt, but would you be able to take some time to take a look at this PR soon? |
silence/silence.go
Outdated
@@ -622,7 +622,8 @@ func (s *Silences) expire(id string) error { | |||
|
|||
switch getState(sil, now) { | |||
case types.SilenceStateExpired: | |||
return errors.Errorf("silence %s already expired", id) | |||
// returning nil ensures idempotent behaviour, at least in the short term before the silence is gc'd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// returning nil ensures idempotent behaviour, at least in the short term before the silence is gc'd | |
// Returning nil ensures idempotent behaviour, at least in the short term before the silence is gc'd. |
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
silence/silence.go
Outdated
@@ -622,7 +622,8 @@ func (s *Silences) expire(id string) error { | |||
|
|||
switch getState(sil, now) { | |||
case types.SilenceStateExpired: | |||
return errors.Errorf("silence %s already expired", id) | |||
// Returning nil ensures idempotent behaviour, at least in the short term before the silence is gc'd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Returning nil ensures idempotent behaviour, at least in the short term before the silence is gc'd | |
// Returning nil ensures idempotent behaviour, at least in the short term before the silence is gc'd. |
Sorry for the nits, our styleguide for comments is capitalized sentences with full stop. Once correct I'd merge this pull request. |
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
No problem. Apologies for missing the full stop in your previous recommended change. |
@@ -622,7 +622,8 @@ func (s *Silences) expire(id string) error { | |||
|
|||
switch getState(sil, now) { | |||
case types.SilenceStateExpired: | |||
return errors.Errorf("silence %s already expired", id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for coming in late - I feel that this kind of comment about the semantics of the function is best put as part of the function description.
How do you feel about changing the top-level comment from:
// Expire the silence with the given ID immediately.
to
// Expire the silence with the given ID immediately. It is idempotent, nil is returned if the silence already expired before it is GC'd.
// If the silence is not found an error is returned.
and removing the comment on the body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll make the change.
I'm a bit surprised that the CI passes here - if we don't have a test at an API level, we should create them. I'd love to see two tests:
This affects both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed with all the comments from @roidelapluie and @gotjosh
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
Signed-off-by: Soon-Ping Phang <soonping@amazon.com>
I've updated the function comment, and added tests for DeleteSilence and PostSilences. Found during testing that PostSilences actually probably won't change in behaviour, as expire() is only called if the existing silence is unexpired. i.e. The test passed even without my change to the behaviour of expire(). If you want me to remove that test, let me know. alertmanager/silence/silence.go Lines 559 to 564 in c91717b
|
@simonpasquier @gotjosh Please let me know if you think I need to add to or change the tests I've added, and if anything else is needed. Thank you |
@simonpasquier @gotjosh Please let me know if you think I need to add to or change the tests I've added, and if anything else is needed. Thank you |
Sorry - I've been on PTO for the past few weeks. Will take a look at my earliest convenience. |
LGTM |
thanks! |
Delete silence currently returns HTTP 500 if called on an expired silence. This makes it non-idempotent, as the following call sequence can result:
After this sequence, the user is left uncertain whether there's a problem in AlertManager/Prometheus. Even fetching the silence to verify its state won't confirm whether the 500 was indicative of a separate underlying problem that could manifest in other ways later.
Prometheus exposes the delete_series API which does exhibit idempotent behaviour -- calling delete_series on an already deleted series returns the same status as the initial delete_series call: HTTP 200.
This PR changes Silences.expire(id) to return nil for expired silences, which impacts APIs as follows:
alertmanager/silence/silence.go
Lines 561 to 562 in c91717b