-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect a new metric: kubelet.evictions #5076
Conversation
Codecov Report
|
kubelet/metadata.csv
Outdated
@@ -55,3 +55,4 @@ kubernetes.kubelet.volume.stats.inodes_free,gauge,,inode,,The number of free ino | |||
kubernetes.kubelet.volume.stats.inodes_used,gauge,,inode,,The number of used inodes in the volume,-1,kubelet,k8s.vol.inodes_used | |||
kubernetes.ephemeral_storage.limits,gauge,,byte,,Ephemeral storage limit of the container (requires kubernetes v1.8+),0,kubelet,k8s.eph_storage.limits | |||
kubernetes.ephemeral_storage.requests,gauge,,byte,,Ephemeral storage request of the container (requires kubernetes v1.8+),0,kubelet,k8s.eph_storage.requests | |||
kubernetes.kubelet.evictions,count,,,,The number of PODs that have been evicted from the kubelet (ALPHA in kubernetes v1.16),0,kubelet,k8s.evict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it not better to use a gauge
, like for kubernetes.containers.restarts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, it's hard to use as gauge, in the end, I want to see events of evictions so I'm going to apply some diff
over it.
A kubelet restart would create a negative diff and that's a bit missleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+100, an eviction is an event and we should count them by incrementing a metric which is the exact definition of a count/rate. It makes it easy to do a bar graph with the number of evictions bucketed by the graph interval, or to display a timeseries with the rate of evictions. Using the count metadata also has some nice side effects in the UI: like switching space aggregation to sum
automatically rather than avg
.
The way to do that with a gauge
is awkward because you have to use diff
and rollups in time to print the same things. Imho, the container restart metric should likely be a count as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the initial code was generating a gauge.
I've now modified the code to make it produce a real counter.
Extract of the output of kubectl exec -ti -n datadog datadog-fh8ql -- agent check kubelet --check-rate
:
{
"metric": "kubernetes.kubelet.evictions",
"points": [
[
1574935200,
0
]
],
"tags": [
"eviction_signal:allocatableMemory.available"
],
"host": "kind-worker3-lenaic-kind",
"type": "count",
"interval": 0,
"source_type_name": "System"
},
This metric provides the number of PODs that have been evicted from a node and by eviction signal. This metric has been introduced in Kubernetes 1.16 by kubernetes/kubernetes#81377
d75ae62
to
8bdd0a4
Compare
Change direction Co-Authored-By: Haïssam Kaj <hkaj@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
What does this PR do?
Collect a new metric:
kubelet.evictions
This metric provides the number of PODs that have been evicted from a node and by eviction signal.
This metric has been introduced in Kubernetes 1.16 by kubernetes/kubernetes#81377
Motivation
Additional Notes
Review checklist (to be filled by reviewers)
changelog/
andintegration/
labels attached