Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mesh metrics #1225

Merged
merged 2 commits into from
Feb 6, 2018
Merged

Conversation

simonpasquier
Copy link
Member

This change adds 2 new metrics for the mesh:

  • alertmanager_peer_connection, state of the connection between the
    Alertmanager instance and a peer.
  • alertmanager_peer_terminations_total, total number of terminated
    connection.

It also moves the gathering of the alertmanager_peer_position metric
outside of the meshWait() function so that the metric is computed
accurately even when no alerting group fires.

Fixes #1219.

This change adds 2 new metrics for the mesh:

* alertmanager_peer_connection, state of the connection between the
  Alertmanager instance and a peer.
* alertmanager_peer_terminations_total, total number of terminated
  connection.

It also moves the gathering of the alertmanager_peer_position metric
outside of the meshWait() function so that the metric is computed
accurately even when no alerting group fires.
connDesc: prometheus.NewDesc(
"alertmanager_peer_connection",
"State of the connection between the Alertmanager instance and a peer.",
[]string{"peer", "nick"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are both of these labels needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is that the nickname makes it easier to identify the peer:

alertmanager_peer_connection{instance="prometheus-0",job="alertmanager",nick="prometheus-1",peer="02:42:ac:11:00:03"}  1
alertmanager_peer_connection{instance="prometheus-1",job="alertmanager",nick="prometheus-0",peer="02:42:ac:11:00:02"}  1

But we can't get rid of of peer altogether since nick's uniqueness isn't enforced by the mesh library.

Copy link
Contributor

@stuartnelson3 stuartnelson3 Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are both belonging to the connected AM, I imagine only one is necessary (comment race)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for only using peer. If a user wants to make it easier to identify they can set a custom peer-id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. I'll update the PR accordingly.

Copy link
Contributor

@stuartnelson3 stuartnelson3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just looking for clarification on a question (see comments)

@stuartnelson3 stuartnelson3 merged commit 17bd637 into prometheus:master Feb 6, 2018
@simonpasquier simonpasquier deleted the add-peer-metrics branch February 6, 2018 13:47
hh pushed a commit to ii/alertmanager that referenced this pull request Apr 2, 2019
* Add smartmon.py python port of the smartmon.sh bash script

Signed-off-by: Arthur Skowronek <ags@digineo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants