-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mesh metrics #1225
Add mesh metrics #1225
Conversation
This change adds 2 new metrics for the mesh: * alertmanager_peer_connection, state of the connection between the Alertmanager instance and a peer. * alertmanager_peer_terminations_total, total number of terminated connection. It also moves the gathering of the alertmanager_peer_position metric outside of the meshWait() function so that the metric is computed accurately even when no alerting group fires.
cmd/alertmanager/main.go
Outdated
connDesc: prometheus.NewDesc( | ||
"alertmanager_peer_connection", | ||
"State of the connection between the Alertmanager instance and a peer.", | ||
[]string{"peer", "nick"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are both of these labels needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is that the nickname makes it easier to identify the peer:
alertmanager_peer_connection{instance="prometheus-0",job="alertmanager",nick="prometheus-1",peer="02:42:ac:11:00:03"} 1
alertmanager_peer_connection{instance="prometheus-1",job="alertmanager",nick="prometheus-0",peer="02:42:ac:11:00:02"} 1
But we can't get rid of of peer
altogether since nick
's uniqueness isn't enforced by the mesh library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are both belonging to the connected AM, I imagine only one is necessary (comment race)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for only using peer
. If a user wants to make it easier to identify they can set a custom peer-id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. I'll update the PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just looking for clarification on a question (see comments)
* Add smartmon.py python port of the smartmon.sh bash script Signed-off-by: Arthur Skowronek <ags@digineo.de>
This change adds 2 new metrics for the mesh:
Alertmanager instance and a peer.
connection.
It also moves the gathering of the alertmanager_peer_position metric
outside of the meshWait() function so that the metric is computed
accurately even when no alerting group fires.
Fixes #1219.