Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeletstats lack labels #29880

Closed
bon77 opened this issue Dec 14, 2023 · 7 comments
Closed

kubeletstats lack labels #29880

bon77 opened this issue Dec 14, 2023 · 7 comments
Labels
bug Something isn't working needs triage New item requiring triage receiver/kubeletstats

Comments

@bon77
Copy link

bon77 commented Dec 14, 2023

Describe the bug
I collect the kubeletstats following the documentation.
I can see them in the logs and find them in Grafana (explore), but they do not have any labels attached to them.
For the node related metrics, I would expect to have the label node_name attached.

Steps to reproduce
Install the collector helm chart with the values below:
(I use kind, but I don't think that is of relevance)

helm install otel --namespace otel -f collector-values.yaml open-telemetry/opentelemetry-collector

What did you expect to see?
I expect to see a label containing the k8s.node.name, so that I can match the metrics in dash-boards etc.

2023-12-14T05:21:42.217Z    info    ResourceMetrics #0
Resource SchemaURL:
Resource attributes:
     -> k8s.node.name: Str(unkind-control-plane)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope otelcol/kubeletstatsreceiver 0.91.0
Metric #0
Descriptor:
     -> Name: k8s.node.cpu.time
     -> Description: Node CPU time
     -> Unit: s
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
**Data point attributes:
    -> node_name: Str(unkind-control-plane)**
StartTimestamp: 2023-12-14 01:30:48 +0000 UTC
Timestamp: 2023-12-14 05:21:42.012037899 +0000 UTC
Value: 3047.665864
Metric open-telemetry/opentelemetry-collector#1
Descriptor:
     -> Name: k8s.node.cpu.utilization
     -> Description: Node CPU utilization
[..]

What did you see instead?

2023-12-14T05:21:42.217Z    info    ResourceMetrics #0
Resource SchemaURL:
Resource attributes:
     -> k8s.node.name: Str(unkind-control-plane)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope otelcol/kubeletstatsreceiver 0.91.0
Metric #0
Descriptor:
     -> Name: k8s.node.cpu.time
     -> Description: Node CPU time
     -> Unit: s
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
[..]

What version did you use?
v0.91.0
Helm chart version: opentelemetry-collector-0.77.0

What config did you use?
Config: (e.g. the yaml config file)
collector-values.yaml:

nameOverride: ""
fullnameOverride: ""
mode: "deployment"
namespaceOverride: "otel"
presets:
  logsCollection:
    enabled: false
    includeCollectorLogs: false
    storeCheckpoints: false
    maxRecombineLogSize: 102400
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
    extractAllPodLabels: false
    extractAllPodAnnotations: false
  kubeletMetrics:
    enabled: false
  kubernetesEvents:
    enabled: true
  clusterMetrics:
    enabled: true
configMap:
  create: true
config:
  exporters:
    debug/basic:
      verbosity: basic
    debug/normal:
      verbosity: normal
    debug/detailed:
      verbosity: detailed
  extensions:
    health_check: {}
    memory_ballast: {}
  processors:
    batch: {}
    memory_limiter: null
    k8sattributes: null
  receivers:
    jaeger:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:14250
        thrift_http:
          endpoint: ${env:MY_POD_IP}:14268
        thrift_compact:
          endpoint: ${env:MY_POD_IP}:6831
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318
    hostmetrics:
      collection_interval: 20s
      root_path: /hostfs
    kubeletstats:
      auth_type: 'serviceAccount'
      collection_interval: 20s
      endpoint: '${env:K8S_NODE_NAME}:10250'
      insecure_skip_verify: true
      metric_groups:
        - node
        - pod
        - container
    k8s_cluster:
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
              - targets:
                  - ${env:MY_POD_IP}:8888
    zipkin:
      endpoint: ${env:MY_POD_IP}:9411
  service:
    telemetry:
      metrics:
        address: ${env:MY_POD_IP}:8888
    extensions:
      - health_check
      - memory_ballast
    pipelines:
      metrics:
        exporters:
          - debug/detailed
        processors:
          - memory_limiter
          - batch
          - k8sattributes
        receivers:
          - otlp
          - prometheus
          - k8s_cluster
          - hostmetrics
          - kubeletstats
image:
  repository: otel/opentelemetry-collector-contrib
  pullPolicy: IfNotPresent
  tag: ""
  digest: ""
imagePullSecrets: []
command:
  name: otelcol-contrib
  extraArgs: []
serviceAccount:
  create: true
  annotations: {}
  name: ""
clusterRole:
  create: true
  annotations: {}
  name: ""
  rules:
    - apiGroups:
      - ''
      resources:
      - 'pods'
      - 'nodes'
      - 'nodes/stats'
      verbs:
      - 'get'
      - 'list'
      - 'watch'
  clusterRoleBinding:
    annotations: {}
    name: ""
podSecurityContext: {}
securityContext: {}
nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []
priorityClassName: ""
extraEnvs:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: status.podIP
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
extraEnvsFrom: []
extraVolumes: []
extraVolumeMounts: []
ports:
  otlp:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    hostPort: 4317
    protocol: TCP
    appProtocol: grpc
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    hostPort: 4318
    protocol: TCP
  jaeger-compact:
    enabled: true
    containerPort: 6831
    servicePort: 6831
    hostPort: 6831
    protocol: UDP
  jaeger-thrift:
    enabled: true
    containerPort: 14268
    servicePort: 14268
    hostPort: 14268
    protocol: TCP
  jaeger-grpc:
    enabled: true
    containerPort: 14250
    servicePort: 14250
    hostPort: 14250
    protocol: TCP
  zipkin:
    enabled: true
    containerPort: 9411
    servicePort: 9411
    hostPort: 9411
    protocol: TCP
  metrics:
    enabled: false
    containerPort: 8888
    servicePort: 8888
    protocol: TCP
resources:
  limits:
    cpu: 250m
    memory: 512Mi
podAnnotations: {}
podLabels: {}
additionalLabels: {}
hostNetwork: false
dnsPolicy: ""
dnsConfig: {}
replicaCount: 1
revisionHistoryLimit: 10
annotations: {}
extraContainers: []
initContainers: []
lifecycleHooks: {}
livenessProbe:
  httpGet:
    port: 13133
    path: /
readinessProbe:
  httpGet:
    port: 13133
    path: /
service:
  type: ClusterIP
  annotations: {}
ingress:
  enabled: false
  additionalIngresses: []
podMonitor:
  enabled: false
  metricsEndpoints:
    - port: metrics
  extraLabels: {}
serviceMonitor:
  enabled: false
  metricsEndpoints:
    - port: metrics
  extraLabels: {}
podDisruptionBudget:
  enabled: false
autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  behavior: {}
  targetCPUUtilizationPercentage: 80
rollout:
  rollingUpdate: {}
  strategy: RollingUpdate
prometheusRule:
  enabled: false
  groups: []
  defaultRules:
    enabled: false
  extraLabels: {}
statefulset:
  volumeClaimTemplates: []
  podManagementPolicy: "Parallel"
networkPolicy:
  enabled: false
  annotations: {}
  allowIngressFrom: []
  extraIngressRules: []
  egressRules: []
useGOMEMLIMIT: true

Environment
OS: " Ubuntu 22.04.3 LTS"
ARCH: "x86_64"
Helm: "version.BuildInfo{Version:"v3.13.2", GitCommit:"2a2fb3b98829f1e0be6fb18af2f6599e0f4e8243", GitTreeState:"clean", GoVersion:"go1.20.10"}"

Additional context
What I want to achieve is:

Send these metrics to Mimir and use Grafana to display them.
For that to work, I need to get some relation between the node and the metrics.

@bon77 bon77 added the bug Something isn't working label Dec 14, 2023
@dmitryax dmitryax transferred this issue from open-telemetry/opentelemetry-collector Dec 14, 2023
Copy link
Contributor

Pinging code owners for receiver/kubeletstats: @dmitryax @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

Hello @bon77, the kubeletstats receiver attaches k8s.node.name to metrics as a resource attribute, not as an attribute on each datapoint. This follows OpenTelemetry semantic conventions.

From the data you've included:

2023-12-14T05:21:42.217Z    info    ResourceMetrics #0
Resource SchemaURL:
Resource attributes:
     -> k8s.node.name: Str(unkind-control-plane) # This is the value you're looking for
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope otelcol/kubeletstatsreceiver 0.91.0

From my searching it looks like you want node_name as an attribute on the datapoint as a requirement of Mimir, is that correct?

Can you share which exporter you plan to use to get data into your backend? Depending on your configuration you can either use the transform processor, or in some cases the exporter itself will have options available.

@bon77
Copy link
Author

bon77 commented Dec 15, 2023

Hi @crobert-1 , thanks for your response.

I am using the HTTP exporter to connect to mimir. I do not see anything I could set there.

Here the configuration (diff to previous cfg) I use:

diff --git a/collector-values.yaml b/collector-values.yaml
index 29f75b0..e558c8d 100644
--- a/collector-values.yaml
+++ b/collector-values.yaml
@@ -92,6 +92,10 @@ config:
       verbosity: normal
     debug/detailed:
       verbosity: detailed
+    otlphttp/mimir:
+      endpoint: http://mimir-distributor.mimir:8080/otlp
+      headers:
+        X-Scope-OrgID: BANANA
   extensions:
     # The health_check extension is mandatory for this chart.
     # Without the health_check extension the collector will fail the readiness and liveliness probes.
@@ -152,6 +156,7 @@ config:
       metrics:
         exporters:
           - debug/detailed
+          - otlphttp/mimir
         processors:
           - memory_limiter
           - batch

Thanks for pointing out the transform processor. That might work, but I can't seem to get the syntax right.
I tried:

    transform:
      metric_statements:
        - context: datapoint
          statements:
            - set(attributes["bbh_test"], "transformer")
            - set(attributes["node_name"], "${resource.k8s.node.name}")

My test attribute gets set, but I cannot figure out how to address the Resource attribute k8s.node.name.
I tried also a few variations, like set(attributes["node_name"], attributes["resource.k8s.node.name"]) with no luck.

I also tried using attributes processor, but I only get my 2 static test labels added.

diff --git a/collector-values.yaml b/collector-values.yaml
index e558c8d..f5c19e3 100644
--- a/collector-values.yaml
+++ b/collector-values.yaml
@@ -107,6 +107,17 @@ config:
     # If set to null, will be overridden with values based on k8s resource limits
     memory_limiter: null
     k8sattributes: null
+    attributes/add_bon_labels:
+      actions:
+        - key: bbh_first
+          value: START
+          action: upsert
+        - key: bbh_k8s_node_name
+          from_attribute: k8s.node.name
+          action: upsert
+        - key: bbh_last
+          value: ENDEGELAENDE
+          action: upsert
   receivers:
     jaeger:
       protocols:
@@ -161,6 +172,7 @@ config:
           - memory_limiter
           - batch
           - k8sattributes
+          - attributes/add_bon_labels
         receivers:
           - otlp
           - prometheus

Some additional information about my lab setup:
As for Mimir, I am using the helm chart with defaults in this LAB environment.

helm -n mimir install mimir grafana/mimir-distributed

I also run a default Grafana instance in this lab for easier access:

helm -n grafana install grafana grafana/grafana

@bon77
Copy link
Author

bon77 commented Dec 15, 2023

I think I found a work-around for my problem.
It's not nice, but seems to work for my case. I'd rather not use environment variables though.

work-around in processors:

  processors:
    batch: {}
    # If set to null, will be overridden with values based on k8s resource limits
    memory_limiter: null
    k8sattributes: null
    attributes/add_bon_labels:
      actions:
        - key: k8s_node_name
          value: ${K8S_NODE_NAME}
          action: upsert

This depends on the $K8S_NODE_NAME environment variable in extaEnvs (I already had that, because of kubeletstats endpoint)

extraEnvs:
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

If anybody has a nicer solution, please share.

@crobert-1
Copy link
Member

You'd access the node name like this for the transform processor using the datapoint context: resouce.attributes["k8s.node.name"]. Here's a reference.
Here's what your config would look like:

transform:
  metric_statements:
    - context: datapoint
      statements:
        - set(attributes["node_name"], resource.attributes["k8s.node.name"])

@crobert-1
Copy link
Member

@bon77 It looks like this is working now, so I'm going to close this issue. Please let us know if I missed anything, or if you have any more questions!

@bon77
Copy link
Author

bon77 commented Dec 18, 2023

Yes, that works for me. Thank you very much @crobert-1 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/kubeletstats
Projects
None yet
Development

No branches or pull requests

3 participants