-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved error when victorops fails #1207
Conversation
I would prefer to leave the error message as is, and add the alert names to the debug level logging above. |
@stuartnelson3 done |
notify/notify.go
Outdated
@@ -600,7 +600,11 @@ func (r RetryStage) Exec(ctx context.Context, l log.Logger, alerts ...*types.Ale | |||
case <-tick.C: | |||
if retry, err := r.integration.Notify(ctx, alerts...); err != nil { | |||
numFailedNotifications.WithLabelValues(r.integration.name).Inc() | |||
level.Debug(l).Log("msg", "Notify attempt failed", "attempt", i, "integration", r.integration.name, "err", err) | |||
var alertnames []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one final nit .. could you change this to
alertnames := make([]string, 0, len(alerts))
makes sure we only allocate memory once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! :)
@stuartnelson3 done |
yeah, that didn't seem to work:
|
joining strings work as expected. |
Actually, now that I'm thinking about this, would it really be the fault of a particular alert for failure? I think it would be your configuration of a particular receiver, rather than the alert. |
yeah, but I have dozens of different receivers on 2 different accounts. I tried to access the receiver from the alert but wasn't able to... can I do that? Anyway, by the alertnames I was able to found the issue, I think. Another question: alerts are "bucketed" to be sent to victorops, right? It seems like one of the accounts api_key was revoked (still looking into that), and some alerts from the other api_key were not routed to victorops as well. Is it possible that a failing alert in a bucket abort the entire bucket (thus not even trying to send the other alerts)? |
actually seems like one of the accounts was removed... so that explains the 404 |
The receiver (and its name) is available in the I'm not sure what you mean by failing alert, but I'm assuming that if the connection info to victor ops is correct, it will receive the alerts successfully. |
(sorry) |
the problem was the following:
so my question is: is it possible that one bad config (sales) affects alerts from another config (engineering)? as far as the stage thing goes, I'll try to do that, thanks! :) |
The two receivers should be totally separate and with separate API keys. Since you were sending lots of failing alerts for sales (and retrying), potentially you were being rate-limited based on your alertmanager IP address, which caused the engineering alerts to get blocked. I don't know unfortunately. Typically there's rate limiting per API token, but they might also have a more general IP rate limiting in place. This would be a question for victor ops. |
notify/notify.go
Outdated
@@ -238,7 +238,7 @@ func createStage(rc *config.Receiver, tmpl *template.Template, wait func() time. | |||
var s MultiStage | |||
s = append(s, NewWaitStage(wait)) | |||
s = append(s, NewDedupStage(notificationLog, recv)) | |||
s = append(s, NewRetryStage(i)) | |||
s = append(s, NewRetryStage(i, recv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you update this to only pass in the GroupName
? No reason to pass around more data than is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry about all the back and forth :)
no problem, thanks! |
Build and publish ARM32v7, ARM64v8 and ppc64le docker images. Signed-off-by: Johannes Würbach <johannes.wuerbach@googlemail.com>
I am getting errors like this:
its hard to find out which alerts are erroring, I thought that maybe improve the error adding the alert names might help...