-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leases: Add metrics to etcd leases #9778
leases: Add metrics to etcd leases #9778
Conversation
Given #9764, I'm not sure whether these should live in the |
We will decide whether to keep |
Codecov Report
@@ Coverage Diff @@
## master #9778 +/- ##
=========================================
Coverage ? 69.54%
=========================================
Files ? 376
Lines ? 35215
Branches ? 0
=========================================
Hits ? 24489
Misses ? 8962
Partials ? 1764
Continue to review full report at Codecov.
|
@gyuho I am not a fan of many metrics. I hope normal users can only need to pay attention to a small set of well thought metrics. Other metrics are for debugging. |
@xiang90 Agree. I will go through all metrics, and select a few important ones (e.g. DB size in use). |
lease/metrics.go
Outdated
Namespace: "etcd_debugging", | ||
Subsystem: "lease", | ||
Name: "renewed_total", | ||
Help: "The total number of renewed leases.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lease renew is not a cluster wide operation, it only goes through the leader node at least for now. needs to add some clarification here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this.
lgtm |
lease/metrics.go
Outdated
@@ -0,0 +1,59 @@ | |||
// Copyright 2015 The etcd Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/2015/2018/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this.
lease/metrics.go
Outdated
Name: "total_ttl", | ||
Help: "Bucketed histogram of lease TTLs.", | ||
// 1 second -> 1 month | ||
Buckets: prometheus.ExponentialBuckets(1, 2, 24), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pow(2,23) * 1 == 8388608 seconds == 97 days
?
Do we really need this high upper bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leases can be up to 9E9 seconds, so setting this fairly high seemed reasonable, but I'm happy to decrease it if you'd prefer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, can you update the comment instead? 1 second -> 3 months
?
6e06d19
to
d9ab580
Compare
lease/metrics.go
Outdated
prometheus.HistogramOpts{ | ||
Namespace: "etcd_debugging", | ||
Subsystem: "lease", | ||
Name: "total_ttl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also rename to ttl_total
to be consistent with other metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
This patch adds four metrics to the `leases` package for easier debugging.
d9ab580
to
0369298
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @idiamond-stripe
This patch adds four metrics (leases granted, leases revoked, leases renewed, and the average TTL of leases) to the
leases
package for easier debugging and visibility into outstanding leases.There's not currently a way to monitor the number of leases that are granted or revoked within etcd. It's currently possible to see the number of leases expired via
server_lease_expired_total
, but not the number initially granted nor those being renewed.This makes it tough to determine the current number of leases held, and determine why leases are not being cleaned up. This patch makes the behavior described in, for example, #9395 easier to debug and understand as it makes it clear that leases are never being revoked, rather than renewed forever.
Please read https://github.com/coreos/etcd/blob/master/CONTRIBUTING.md#contribution-flow.