Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubeletstatsreceiver] had a breaking resource label change on container metrics starting in v0.52.0 #10842

Closed
jvoravong opened this issue Jun 8, 2022 · 6 comments · Fixed by #10848
Labels
bug Something isn't working comp:kubernetes Kubernetes-related components comp: receiver Receiver priority:p2 Medium

Comments

@jvoravong
Copy link
Contributor

jvoravong commented Jun 8, 2022

Describe the bug
After PR #9744 was merged, metrics for containers running in pods running in Kubernetes had a label change. This label change is a breaking change, several default dashboards at my company stop functioning properly with this change.

The issue: In versions v0.51.0 and before the label k8s.container.name was used on container metrics. Starting in v0.52.0, the label container.name is used on container metrics instead of k8s.container.name.

It is not trivial for users of the kubeletstats receiver to migrate monitoring content (alerts, dashboards, etc) without notice.

PR: [receiver/kubeletstats] Migrate kubeletstatsreceiver to the new Metrics Builder #9744

kubeletstatsreceiver v0.51.0:

conventions.AttributeK8SContainerName: sContainer.Name,

kubeletstatsreceiver v0.52.0:

Steps to reproduce
Deploy the kubeletstats receiver v0.51.0 to a Kubernetes cluster, record some container metrics with label values.
Deploy the kubeletstats receiver v0.52.0 to a Kubernetes cluster, record some container metrics with label values.
Compare the metrics labels.

Example of label difference:
kubeletstatsreceiver v0.51.0 Sample container metric:
StartTimestamp: 2022-05-27 16:51:14 +0000 UTC
Timestamp: 2022-06-08 20:09:57.352340611 +0000 UTC
Value: 0
ResourceMetrics #4
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource labels:
-> k8s.pod.uid: STRING(----)
-> k8s.pod.name: STRING(dns-controller-
-)
-> k8s.namespace.name: STRING(kube-system)
-> k8s.container.name: STRING(dns-controller)
-> container.id: STRING(
)
-> cloud.provider: STRING(aws)
-> cloud.platform: STRING(aws_ec2)
-> cloud.region: STRING(us-west-2)
-> cloud.account.id: STRING()
-> cloud.availability_zone: STRING(us-west-2a)
-> host.id: STRING(i-)
-> host.image.id: STRING(ami-
)
-> host.type: STRING(m3.xlarge)
-> host.name: STRING(ip-
-
--.us-west-2.compute.internal)
-> os.type: STRING(linux)
-> k8s.node.name: STRING(ip-
---.us-west-2.compute.internal)
-> k8s.cluster.name: STRING(********-aws)

kubeletstatsreceiver v0.52.0 sample container metric:
StartTimestamp: 2022-05-27 16:52:46 +0000 UTC
Timestamp: 2022-06-08 20:18:30.183591147 +0000 UTC
Value: 5373952
ResourceMetrics #21
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource labels:
-> k8s.pod.uid: STRING(----)
-> k8s.pod.name: STRING(ebs-csi-node-
)
-> k8s.namespace.name: STRING(kube-system)
-> container.name: STRING(ebs-plugin)
-> container.id: STRING()
-> cloud.provider: STRING(aws)
-> cloud.platform: STRING(aws_ec2)
-> cloud.region: STRING(us-west-2)
-> cloud.account.id: STRING(
)
-> cloud.availability_zone: STRING(us-west-2a)
-> host.id: STRING(i-)
-> host.image.id: STRING(ami-
)
-> host.type: STRING(m3.xlarge)
-> host.name: STRING(ip----30.us-west-2.compute.internal)
-> os.type: STRING(linux)
-> k8s.node.name: STRING(ip-
---.us-west-2.compute.internal)
-> k8s.cluster.name: STRING(********-aws)

What did you expect to see?
Any of the following

  • Would have expected the continued use of the k8s.container.name label
  • A migration plan for moving from the k8s.container.name label to the container.name label
  • A feature gate that toggles the use of the k8s.container.name label or the container.name label is used.

What did you see instead?
All container resource metrics now use the container.name label

What version did you use?
Version: (v0.51.0, v0.52.0)

What config did you use?
Config: (e.g. the yaml config file)

# Example k8s agent configuration (Splunk Otel Collector Distribution)
# bash: kubectl get configmap splunk-otel-collector-otel-agent -o yaml
apiVersion: v1
data:
  relay: |
    exporters:
      logging:
        loglevel: debug
      signalfx:
        access_token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
        api_url: https://api.us0.signalfx.com
        correlation: null
        ingest_url: https://ingest.us0.signalfx.com
        sync_host_metadata: true
    extensions:
      health_check: null
      memory_ballast:
        size_mib: ${SPLUNK_BALLAST_SIZE_MIB}
      zpages: null
    processors:
      batch: null
      filter/logs:
        logs:
          exclude:
            match_type: strict
            resource_attributes:
            - key: splunk.com/exclude
              value: "true"
      groupbyattrs/logs:
        keys:
        - com.splunk.source
        - com.splunk.sourcetype
        - container.id
        - fluent.tag
        - istio_service_name
        - k8s.container.name
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
      k8sattributes:
        extract:
          annotations:
          - from: pod
            key: splunk.com/sourcetype
          - from: namespace
            key: splunk.com/exclude
            tag_name: splunk.com/exclude
          - from: pod
            key: splunk.com/exclude
            tag_name: splunk.com/exclude
          - from: namespace
            key: splunk.com/index
            tag_name: com.splunk.index
          - from: pod
            key: splunk.com/index
            tag_name: com.splunk.index
          labels:
          - key: app
          metadata:
          - k8s.namespace.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - container.id
          - container.image.name
          - container.image.tag
        filter:
          node_from_env_var: K8S_NODE_NAME
        pod_association:
        - from: resource_attribute
          name: k8s.pod.uid
        - from: resource_attribute
          name: k8s.pod.ip
        - from: resource_attribute
          name: ip
        - from: connection
        - from: resource_attribute
          name: host.name
      memory_limiter:
        check_interval: 2s
        limit_mib: ${SPLUNK_MEMORY_LIMIT_MIB}
      resource:
        attributes:
        - action: insert
          key: k8s.node.name
          value: ${K8S_NODE_NAME}
        - action: upsert
          key: k8s.cluster.name
          value: jvoravong-aws
      resource/add_agent_k8s:
        attributes:
        - action: insert
          key: k8s.pod.name
          value: ${K8S_POD_NAME}
        - action: insert
          key: k8s.pod.uid
          value: ${K8S_POD_UID}
        - action: insert
          key: k8s.namespace.name
          value: ${K8S_NAMESPACE}
      resource/add_environment:
        attributes:
        - action: insert
          key: deployment.environment
          value: jvoravong-aws
      resource/logs:
        attributes:
        - action: upsert
          from_attribute: k8s.pod.annotations.splunk.com/sourcetype
          key: com.splunk.sourcetype
        - action: delete
          key: k8s.pod.annotations.splunk.com/sourcetype
        - action: delete
          key: splunk.com/exclude
      resourcedetection:
        detectors:
        - env
        - ec2
        - system
        override: true
        timeout: 10s
    receivers:
      hostmetrics:
        collection_interval: 10s
        scrapers:
          cpu: null
          disk: null
          filesystem: null
          load: null
          memory: null
          network: null
          paging: null
          processes: null
      kubeletstats:
        auth_type: serviceAccount
        collection_interval: 10s
        endpoint: ${K8S_NODE_IP}:10250
        extra_metadata_labels:
        - container.id
        metric_groups:
        - container
        - pod
        - node
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      prometheus/agent:
        config:
          scrape_configs:
          - job_name: otel-agent
            scrape_interval: 10s
            static_configs:
            - targets:
              - ${K8S_POD_IP}:8889
      signalfx:
        endpoint: 0.0.0.0:9943
    service:
      extensions:
      - health_check
      - k8s_observer
      - memory_ballast
      - zpages
      pipelines:
        metrics:
          exporters:
          - logging
          processors:
          - memory_limiter
          - batch
          - resourcedetection
          - resource
          receivers:
          - hostmetrics
          - kubeletstats
          - otlp
          - signalfx
        metrics/agent:
          exporters:
          - signalfx
          processors:
          - memory_limiter
          - batch
          - resource/add_agent_k8s
          - resourcedetection
          - resource
          receivers:
          - prometheus/agent
      telemetry:
        metrics:
          address: 0.0.0.0:8889
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: splunk-otel-collector
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2022-06-08T20:09:14Z"
  labels:
    app: splunk-otel-collector
    app.kubernetes.io/instance: splunk-otel-collector
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: splunk-otel-collector
    app.kubernetes.io/version: 0.51.0
    chart: splunk-otel-collector-0.51.0
    helm.sh/chart: splunk-otel-collector-0.51.0
    heritage: Helm
    release: splunk-otel-collector
  name: splunk-otel-collector-otel-agent
  namespace: monitoring
  resourceVersion: "********"
  uid: ********-********-********-********-********

Environment
Kubernetes, AWS

@jvoravong jvoravong added the bug Something isn't working label Jun 8, 2022
@jvoravong jvoravong changed the title [kubeletstatsreceiver] had breaking resource label change on container metrics starting in v0.52.0 [kubeletstatsreceiver] had a breaking resource label change on container metrics starting in v0.52.0 Jun 8, 2022
@TylerHelmuth
Copy link
Member

@dmitryax this is fixed by updated the resource attribute in metadata.yaml right? Which is the correct semantic convention, k8s.container.name or container.name? I see both options in the semantic convention.

@TylerHelmuth TylerHelmuth added priority:p2 Medium comp: receiver Receiver comp:kubernetes Kubernetes-related components labels Jun 8, 2022
@dmitryax
Copy link
Member

dmitryax commented Jun 8, 2022

k8s.container.name must have value taken from k8s pod spec, that name is unique only within pod definition. container.name is defined by container runtime engine and has different value depending on container runtime engine.

This change shouldn't be part of migration to the metrics builder.

@dmitryax
Copy link
Member

dmitryax commented Jun 8, 2022

I don't think container.name is a good attribute for a k8s receiver. We should bring back k8s specific k8s.container.name

@dmitryax
Copy link
Member

dmitryax commented Jun 8, 2022

If we currently use the same value for container.name as was used for k8s.container.name, it's incorrect

@TylerHelmuth
Copy link
Member

If we currently use the same value for container.name as was used for k8s.container.name, it's incorrect

We do. So the resource attribute should be updated in metadata.yaml. @jvoravong is that something you can do?

@dmitryax
Copy link
Member

dmitryax commented Jun 9, 2022

I submitted a fix #10848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp:kubernetes Kubernetes-related components comp: receiver Receiver priority:p2 Medium
Projects
None yet
3 participants