-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor nflog configuration options to make it similar to Silences. #3220
Conversation
The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval. To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences. Signed-off-by: gotjosh <josue.abreu@gmail.com>
675ee0c
to
d17244d
Compare
Signed-off-by: gotjosh <josue.abreu@gmail.com>
13aaeeb
to
5e9a323
Compare
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
type Log struct { | ||
clock clock.Clock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most, if not all, of the diff in the tests, has to do with the fact that we now mock the clock instead of injecting. Technically, this can be done in separate PR, but this PR seems simple enough to follow so I decided to include it here.
func WithNow(f func() time.Time) Option { | ||
return func(l *Log) error { | ||
l.now = f | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this plumbing and the now func() time.Time
on the manager struct was only to help with tests -- this is all replaced by using clock.Clock
// If not nil, the last argument is an override for what to do as part of the maintenance - for advanced usage. | ||
func (l *Log) Maintenance(interval time.Duration, snapf string, stopc <-chan struct{}, override MaintenanceFunc) { | ||
if interval == 0 || stopc == nil { | ||
level.Error(l.logger).Log("msg", "interval or stop signal are missing - not running maintenance") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new - it kind of annoyed me that we'd return from this function on silences when you misconfigured the maintenance but fail silently. I don't think we should do this -- ideally, return an error but I settled with a log line to keep the diff sane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. I'd suggest that we check the validity of the maintenance interval in cmd/alertmanager/main.go in a later PR. As of today, a negative maintenance interval triggers a panic...
@@ -377,6 +381,7 @@ func (s *Silences) Maintenance(interval time.Duration, snapf string, stopc <-cha | |||
return size, err | |||
} | |||
if size, err = s.Snapshot(f); err != nil { | |||
f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seemed like a bug - we only close on the second return, but we also return here without closing the file descriptor. This case should be very rare but possible nonetheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Signed-off-by: gotjosh <josue.abreu@gmail.com>
nflog/nflog.go
Outdated
|
||
if o.SnapshotFile != "" { | ||
if r, err := os.Open(o.SnapshotFile); err != nil { | ||
if !os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe log at info-level that a previous log to load up didn't exist, so it'll create one? Or, maybe the inverse path, something like "Loading a previous snapshot..."
At a minimum, we should log it at debug level, since there are sort of two independent paths here. If for some reason a file can't be accessed, things could end up in a weird state, and there is no evidence trail left behind that it happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A debug log sounds good to me. Alertmanager would hit this code path when it starts from scratch.
nflog.WithMaintenance(*maintenanceInterval, stopc, wg.Done, nil), | ||
nflog.WithMetrics(prometheus.DefaultRegisterer), | ||
nflog.WithLogger(log.With(logger, "component", "nflog")), | ||
notificationLogOpts := nflog.Options{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change 👍
SnapshotReader io.Reader | ||
SnapshotFile string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a doc-comment to each, saying only one of these fields should be set. That way, callers can see this in their editors directly - just makes it a little more convenient to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The top-level comment reads as:
// A snapshot file or reader from which the initial state is loaded.
// None or only one of them must be set.
In my editor it shows as:
Which indicates:
// None or only one of them must be set.
Are you thinking of something different? Happy to do it, but unsure of what this means for a different editor 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -377,6 +381,7 @@ func (s *Silences) Maintenance(interval time.Duration, snapf string, stopc <-cha | |||
return size, err | |||
} | |||
if size, err = s.Snapshot(f); err != nil { | |||
f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// If not nil, the last argument is an override for what to do as part of the maintenance - for advanced usage. | ||
func (l *Log) Maintenance(interval time.Duration, snapf string, stopc <-chan struct{}, override MaintenanceFunc) { | ||
if interval == 0 || stopc == nil { | ||
level.Error(l.logger).Log("msg", "interval or stop signal are missing - not running maintenance") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. I'd suggest that we check the validity of the maintenance interval in cmd/alertmanager/main.go in a later PR. As of today, a negative maintenance interval triggers a panic...
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Simon Pasquier <pasquier.simon@gmail.com> Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
…rometheus#3220) * Refactor nflog configuration options to make it similar to Silences. The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval. To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.
…rometheus#3220) * Refactor nflog configuration options to make it similar to Silences. The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval. To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and `run` to be similar to the silences.
The Notification Log is a similar component to Silences. They're the only two things that are shared between nodes when running in HA and they both hold some sort of internal state that needs to be cleaned up on an interval.
To simplify the code and make it a bit more understandable (among other benefits such as improved testability) - I've refactor the notification log configuration and
run
to be similar to the silences.