Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unspecified pod metrics landing in CloudWatch #847

Closed
jbeemster opened this issue Jan 4, 2022 · 2 comments
Closed

Unspecified pod metrics landing in CloudWatch #847

jbeemster opened this issue Jan 4, 2022 · 2 comments

Comments

@jbeemster
Copy link

Describe the bug
We have been limiting the number of custom metrics sent by OTEL in our EKS cluster to reduce cost. In doing so I appear to be running into a case where the prefix of a metric_name_selector is catching other metrics as well that share a similar prefix.

Though this could also be a case of mis-config on my end!

Steps to reproduce
Leveraging the following config chunk for OTEL:

          # pod metrics
          - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName]]
            metric_name_selectors:
              - pod_cpu_utilization
              - pod_memory_utilization

Four metrics can land in Cloudwatch:

  • pod_cpu_utilization
  • pod_cpu_utilization_over_pod_limit
  • pod_memory_utilization
  • pod_memory_utilization_over_pod_limit

What did you expect to see?
I expected to see only the two metrics specified and not to see the extra custom metrics arriving.

What did you see instead?
Occasionally the _over_pod_limit metrics were showing up without this being present in the configuration.

Environment
Running on an EKS cluster in AWS (eu-west-2) running version 1.19 of EKS and v0.13.0 of the AWS OTEL Collector project.


Full configmap below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "aws-otel-collector.fullname" . }}
  labels:
    {{- include "aws-otel-collector.labels" . | nindent 4 }}
data:
  otel-agent-config: |
    extensions:
      health_check:

    receivers:
      awscontainerinsightreceiver:

    processors:
      batch/metrics:
        timeout: 60s

    exporters:
      awsemf:
        namespace: ContainerInsights
        log_group_name: '/aws/containerinsights/{ClusterName}/performance'
        log_stream_name: '{NodeName}'
        resource_to_telemetry_conversion:
          enabled: true
        dimension_rollup_option: NoDimensionRollup
        parse_json_encoded_attr_values: [Sources, kubernetes]
        metric_declarations:
          # pod metrics
          - dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName]]
            metric_name_selectors:
              - pod_cpu_utilization
              - pod_memory_utilization

          # service metrics
          - dimensions: [[Service, Namespace, ClusterName]]
            metric_name_selectors:
              - service_number_of_running_pods

    service:
      pipelines:
        metrics:
          receivers: [awscontainerinsightreceiver]
          processors: [batch/metrics]
          exporters: [awsemf]

      extensions: [health_check]
@sethAmazon
Copy link
Member

sethAmazon commented Jan 4, 2022

I think this has to do with the selector being a regex.
Screen Shot 2022-01-04 at 12 19 17 PM
As you can see regex hello is match for string helloWorld
You could try using a regex that is more specific

@jbeemster
Copy link
Author

Hi @sethAmazon that makes a ton of sense! Had not realized it was all regex - will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants