Unspecified pod metrics landing in CloudWatch #847

jbeemster · 2022-01-04T12:54:33Z

Describe the bug
We have been limiting the number of custom metrics sent by OTEL in our EKS cluster to reduce cost. In doing so I appear to be running into a case where the prefix of a metric_name_selector is catching other metrics as well that share a similar prefix.

Though this could also be a case of mis-config on my end!

Steps to reproduce
Leveraging the following config chunk for OTEL:

          # pod metrics
          - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName]]
            metric_name_selectors:
              - pod_cpu_utilization
              - pod_memory_utilization

Four metrics can land in Cloudwatch:

pod_cpu_utilization
pod_cpu_utilization_over_pod_limit
pod_memory_utilization
pod_memory_utilization_over_pod_limit

What did you expect to see?
I expected to see only the two metrics specified and not to see the extra custom metrics arriving.

What did you see instead?
Occasionally the _over_pod_limit metrics were showing up without this being present in the configuration.

Environment
Running on an EKS cluster in AWS (eu-west-2) running version 1.19 of EKS and v0.13.0 of the AWS OTEL Collector project.

Full configmap below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "aws-otel-collector.fullname" . }}
  labels:
    {{- include "aws-otel-collector.labels" . | nindent 4 }}
data:
  otel-agent-config: |
    extensions:
      health_check:

    receivers:
      awscontainerinsightreceiver:

    processors:
      batch/metrics:
        timeout: 60s

    exporters:
      awsemf:
        namespace: ContainerInsights
        log_group_name: '/aws/containerinsights/{ClusterName}/performance'
        log_stream_name: '{NodeName}'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          # pod metrics
          - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName]]
            metric_name_selectors:
              - pod_cpu_utilization
              - pod_memory_utilization

          # service metrics
          - dimensions: [[Service, Namespace, ClusterName]]
            metric_name_selectors:
              - service_number_of_running_pods

    service:
      pipelines:
        metrics:
          receivers: [awscontainerinsightreceiver]
          processors: [batch/metrics]
          exporters: [awsemf]

      extensions: [health_check]

The text was updated successfully, but these errors were encountered:

sethAmazon · 2022-01-04T17:19:35Z

I think this has to do with the selector being a regex.

As you can see regex hello is match for string helloWorld
You could try using a regex that is more specific

jbeemster · 2022-01-05T15:56:40Z

Hi @sethAmazon that makes a ton of sense! Had not realized it was all regex - will close this.

jbeemster closed this as completed Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unspecified pod metrics landing in CloudWatch #847

Unspecified pod metrics landing in CloudWatch #847

jbeemster commented Jan 4, 2022

sethAmazon commented Jan 4, 2022 •

edited

Loading

jbeemster commented Jan 5, 2022

Unspecified pod metrics landing in CloudWatch #847

Unspecified pod metrics landing in CloudWatch #847

Comments

jbeemster commented Jan 4, 2022

sethAmazon commented Jan 4, 2022 • edited Loading

jbeemster commented Jan 5, 2022

sethAmazon commented Jan 4, 2022 •

edited

Loading