Fix kubernetes.memory.limits on kind clusters #11914

L3n41c · 2022-05-02T14:04:26Z

What does this PR do?

When the cadvisor endpoint of the kubelet exposes twice the container_spec_memory_limit_bytes metric, do not sum them.

Motivation

On kind clusters, the kubernetes.memory.limits metric reported by the agent is currently twice the real pod memory limit.
The kubernetes.cpu.limits, kubernetes.cpu.requests and kubernetes.memory.requests are all correct.

The root cause of this bad value is that the cadvisor endpoint of kind kubelet is reporting the memory limit twice for a given container.

Ex. with the following manifest (from DataDog/datadog-agent#10508):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-prepared
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-prepared
  template:
    metadata:
      labels:
        app: nginx-prepared
    spec:
      containers:
      - image: nginx:1.7.9
        name: nginx-prepared
        resources:
          limits:
            cpu: 200m
            memory: 20Mi
          requests:
            cpu: 100m
            memory: 10Mi

On kind, the kubelet cadvisor endpoint returns:

root@datadog-agent-linux-k9c4p:/# curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -k -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" https://$DD_KUBERNETES_
KUBELET_HOST:10250/metrics/cadvisor | grep memory_limit | grep nginx | grep -v pause
container_spec_memory_limit_bytes{container="",id="/kubelet/kubepods/burstable/podd8cfdf76-9f01-4cab-8e8f-d1f6beda67f0",image="",name="",namespace="nginx-preloader-sample",pod="nginx-prepared-6f85667d88-msx5q"} 2.097152e+07
container_spec_memory_limit_bytes{container="nginx-prepared",id="/docker/6e9d93d4629c147c9f17e64c531efc2ebe78e94954ba11407bf38552670a2ac5/kubelet/kubepods/burstable/podd8cfdf76-9f01-4cab-8e8f-d1f6beda67f0/60ccc8b7bcfdcb5bda15b455aac0505b18a363b3ad4e918dcd113ecaae9ebafd",image="docker.io/library/nginx:1.7.9",name="60ccc8b7bcfdcb5bda15b455aac0505b18a363b3ad4e918dcd113ecaae9ebafd",namespace="nginx-preloader-sample",pod="nginx-prepared-6f85667d88-msx5q"} 2.097152e+07
container_spec_memory_limit_bytes{container="nginx-prepared",id="/kubelet/kubepods/burstable/podd8cfdf76-9f01-4cab-8e8f-d1f6beda67f0/60ccc8b7bcfdcb5bda15b455aac0505b18a363b3ad4e918dcd113ecaae9ebafd",image="docker.io/library/nginx:1.7.9",name="60ccc8b7bcfdcb5bda15b455aac0505b18a363b3ad4e918dcd113ecaae9ebafd",namespace="nginx-preloader-sample",pod="nginx-prepared-6f85667d88-msx5q"} 2.097152e+07

Whereas, for the same pod definition, a GKE kubelet cadvisor kubelet returns:

 lenaic.huard:~$ kubectl exec pod/datadog-agent-linux-qh7gp agent -c agent -- bash -c 'curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -k -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" https://$DD_KUBERNETES_KUBELET_HOST:10250/metrics/cadvisor' | grep memory_limit | grep nginx | grep -v pause
container_spec_memory_limit_bytes{container="",id="/kubepods/burstable/pod5a8ed807-7a47-4bf2-9cbe-d624ba05fd88",image="",name="",namespace="nginx-preloader-sample",pod="nginx-prepared-6697ccc84-8wc6x"} 2.097152e+07
container_spec_memory_limit_bytes{container="nginx-prepared",id="/kubepods/burstable/pod5a8ed807-7a47-4bf2-9cbe-d624ba05fd88/ad131b04a5d33d4396bb455f2b66d0ff5a13c0b855770b2c396d28c4ab959a34",image="nginx@sha256:e3456c851a152494c3e4ff5fcc26f240206abac0c9d794affb40e0714846c451",name="k8s_nginx-prepared_nginx-prepared-6697ccc84-8wc6x_nginx-preloader-sample_5a8ed807-7a47-4bf2-9cbe-d624ba05fd88_0",namespace="nginx-preloader-sample",pod="nginx-prepared-6697ccc84-8wc6x"} 2.097152e+07

As shown above, on kind, the same value is reported twice for the same namespace,pod,container triplet.
Those two identical values are currently summed at

integrations-core/kubelet/datadog_checks/kubelet/prometheus.py

Line 393 in 5858c86

    
           samples = self._sum_values_by_context(metric, self._get_entity_id_if_container_metric)

This sum results in the agent sending twice the expected value.

Additional Notes

Fixes The kubernetes.memory.limits is incorrect when K8s cluster is in KIND(Kubernetes IN Docker) environment datadog-agent#10508

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have changelog/ and integration/ labels attached

codecov · 2022-05-02T14:08:50Z

Codecov Report

Merging #11914 (30484c8) into master (1a0d5a7) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Flag	Coverage Δ
kubelet	`90.62% <100.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Fix kubernetes.memory.limits on kind clusters

30484c8

L3n41c added the integration/kubelet label May 2, 2022

L3n41c requested review from a team as code owners May 2, 2022 14:04

L3n41c added the changelog/Fixed label May 2, 2022

vboulineau approved these changes May 12, 2022

View reviewed changes

L3n41c merged commit 2b05261 into master May 12, 2022

L3n41c deleted the lenaic/fix_double_mem_lim_kind branch May 12, 2022 07:44

ewoodthomas mentioned this pull request Aug 23, 2024

Switched processUsageTotal to use latest metric and enabled TestKindS… DataDog/datadog-agent#28720

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix kubernetes.memory.limits on kind clusters #11914

Fix kubernetes.memory.limits on kind clusters #11914

L3n41c commented May 2, 2022

codecov bot commented May 2, 2022 •

edited

Loading

Fix kubernetes.memory.limits on kind clusters #11914

Fix kubernetes.memory.limits on kind clusters #11914

Conversation

L3n41c commented May 2, 2022

What does this PR do?

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

codecov bot commented May 2, 2022 • edited Loading

Codecov Report

codecov bot commented May 2, 2022 •

edited

Loading