Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect a new metric: kubelet.evictions #5076

Merged
merged 5 commits into from
Nov 29, 2019
Merged

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented Nov 25, 2019

What does this PR do?

Collect a new metric: kubelet.evictions

This metric provides the number of PODs that have been evicted from a node and by eviction signal.

This metric has been introduced in Kubernetes 1.16 by kubernetes/kubernetes#81377

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@therve therve changed the title [kubelet] Collect a new metric: kubelet.evictions Collect a new metric: kubelet.evictions Nov 27, 2019
therve
therve previously approved these changes Nov 27, 2019
@@ -55,3 +55,4 @@ kubernetes.kubelet.volume.stats.inodes_free,gauge,,inode,,The number of free ino
kubernetes.kubelet.volume.stats.inodes_used,gauge,,inode,,The number of used inodes in the volume,-1,kubelet,k8s.vol.inodes_used
kubernetes.ephemeral_storage.limits,gauge,,byte,,Ephemeral storage limit of the container (requires kubernetes v1.8+),0,kubelet,k8s.eph_storage.limits
kubernetes.ephemeral_storage.requests,gauge,,byte,,Ephemeral storage request of the container (requires kubernetes v1.8+),0,kubelet,k8s.eph_storage.requests
kubernetes.kubelet.evictions,count,,,,The number of PODs that have been evicted from the kubelet (ALPHA in kubernetes v1.16),0,kubelet,k8s.evict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it not better to use a gauge, like for kubernetes.containers.restarts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, it's hard to use as gauge, in the end, I want to see events of evictions so I'm going to apply some diff over it.
A kubelet restart would create a negative diff and that's a bit missleading.

Copy link
Member

@LeoCavaille LeoCavaille Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100, an eviction is an event and we should count them by incrementing a metric which is the exact definition of a count/rate. It makes it easy to do a bar graph with the number of evictions bucketed by the graph interval, or to display a timeseries with the rate of evictions. Using the count metadata also has some nice side effects in the UI: like switching space aggregation to sum automatically rather than avg.
The way to do that with a gauge is awkward because you have to use diff and rollups in time to print the same things. Imho, the container restart metric should likely be a count as well.

Copy link
Member Author

@L3n41c L3n41c Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the initial code was generating a gauge.
I've now modified the code to make it produce a real counter.

Extract of the output of kubectl exec -ti -n datadog datadog-fh8ql -- agent check kubelet --check-rate:

    {
      "metric": "kubernetes.kubelet.evictions",
      "points": [
        [
          1574935200,
          0
        ]
      ],
      "tags": [
        "eviction_signal:allocatableMemory.available"
      ],
      "host": "kind-worker3-lenaic-kind",
      "type": "count",
      "interval": 0,
      "source_type_name": "System"
    },

image

This metric provides the number of PODs that have been evicted from a node and by eviction signal.

This metric has been introduced in Kubernetes 1.16 by kubernetes/kubernetes#81377
@L3n41c L3n41c force-pushed the lenaic/kubelet_evictions branch from d75ae62 to 8bdd0a4 Compare November 28, 2019 09:34
kubelet/metadata.csv Outdated Show resolved Hide resolved
Change direction

Co-Authored-By: Haïssam Kaj <hkaj@users.noreply.github.com>
Copy link
Contributor

@clamoriniere clamoriniere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@L3n41c L3n41c merged commit b413b86 into master Nov 29, 2019
@L3n41c L3n41c deleted the lenaic/kubelet_evictions branch November 29, 2019 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants