Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadogexporter] EC2MetadataError: failed to make EC2Metadata request #22807

Closed
inigohu opened this issue May 26, 2023 · 7 comments · Fixed by #30341
Closed

[exporter/datadogexporter] EC2MetadataError: failed to make EC2Metadata request #22807

inigohu opened this issue May 26, 2023 · 7 comments · Fixed by #30341
Labels
bug Something isn't working exporter/datadog Datadog components never stale Issues marked with this label will be never staled and automatically removed priority:p3 Lowest

Comments

@inigohu
Copy link
Contributor

inigohu commented May 26, 2023

Component(s)

exporter/datadog

What happened?

Description

I get this warning message when the collector starts:

WARN: failed to get session token, falling back to IMDSv1: 404 Not Found: Not Found
status code: 404, request id:
caused by: EC2MetadataError: failed to make EC2Metadata request Not Found status code: 404, request id:

I'm not running on AWS so I don't understand why that warning is raised.

Steps to Reproduce

I can't reproduce it locally

Collector version

v0.78.0

Environment information

Environment

OS: Google Cloud Platform (GKE autopilot 1.24.11-gke.1000)

OpenTelemetry Collector configuration

receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    processors:
      batch:
    exporters:
      googlecloud:
        project: "my-project"
      datadog:
        api:
          key: <DD_API_KEY>
          site: datadoghq.eu
        metrics:
          sums:
            cumulative_monotonic_mode: to_delta
          histograms:
            mode: distributions
            send_aggregation_metrics: true
          resource_attributes_as_tags: true
        host_metadata:
          enabled: false
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
    service:
      extensions: [health_check]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [googlecloud]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [datadog]

Log output

info    service/telemetry.go:104    Setting up own telemetry...       
info    service/telemetry.go:127    Serving Prometheus metrics    {"address": ":8888", "level": "Basic"}

WARN: failed to get session token, falling back to IMDSv1: 404 Not Found: Not Found 
status code: 404, request id:  
caused by: EC2MetadataError: failed to make EC2Metadata request   Not Found                                                                                                                                                                           status code: 404, request id: 

info    provider/provider.go:30    Resolved source    {"kind": "exporter", "data_type": "metrics", "name": "datadog", "provider": "system", "source": {"Kind":"host","Identifier":"open-telemetry-9898f74fc-6l5sd"}}
...

Additional context

I am not 100% sure if this warning comes from datadog exporter, but my suspicions point to it. If not, feel free to close it.

@inigohu inigohu added bug Something isn't working needs triage New item requiring triage labels May 26, 2023
@github-actions github-actions bot added the exporter/datadog Datadog components label May 26, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@cforce
Copy link

cforce commented Sep 8, 2023

same issue here

@kevinnoel-be
Copy link
Contributor

kevinnoel-be commented Oct 2, 2023

Not sure if it's the same issue exactly but we too see the same error when running the collector on a non EKS cluster i.e. GKE. We do not use the datadog exporter at all in this particular collector deployment (seen on v0.76 & v0.84 at least). Here is a simplified config used:

    receivers:
      k8sobjects:
        auth_type: serviceAccount
        objects: # ...

    processors:
      resourcedetection:
        detectors:
          - env
          - gcp
          - eks
          - ec2
          - azure
          - system
        timeout: 2s
        override: false
        system:
          resource_attributes:
            host.id:
              enabled: false

      batch: # ...
      memory_limiter: # ...
      # ...

    extensions: # ...

    exporters:
      logging: # ...
      otlp: # ...

    service:
      telemetry: # ...
      extensions: # ...

      pipelines:
        logs:
          receivers:
            - k8sobjects
          processors:
            - resourcedetection
            - memory_limiter
            - batch
            # ...
          exporters:
            - logging
            # ...

I've ran the collector with debug logs and we can see (I assume as much at least) that this is getting triggered in the resourcedetection processor. Additionally, it breaks the "promise" of collector telemetry logs in JSON format 😅:

Collector debug logs
...
{"level":"info","ts":1696239560.521837,"caller":"internal/resourcedetection.go:125","msg":"began detecting resource information","kind":"processor","name":"resourcedetection","pipeline":"logs"}
{"level":"info","ts":1696239560.5745764,"caller":"gcp/gcp.go:67","msg":"Fallible detector failed. This attribute will not be available.","kind":"processor","name":"resourcedetection","pipeline":"logs","key":"host.name","error":"metadata: GCE metadata \"instance/name\" not defined"}
{"level":"debug","ts":1696239560.5763094,"caller":"eks/detector.go:70","msg":"Unable to identify EKS environment","kind":"processor","name":"resourcedetection","pipeline":"logs","error":"isEks() error retrieving auth configmap: failed to retrieve ConfigMap kube-system/aws-auth: configmaps \"aws-auth\" is forbidden: User \"system:serviceaccount:xxxx:xxxx\" cannot get resource \"configmaps\" in API group \"\" in the namespace \"kube-system\""}
{"level":"warn","ts":1696239560.5763583,"caller":"internal/resourcedetection.go:130","msg":"failed to detect resource","kind":"processor","name":"resourcedetection","pipeline":"logs","error":"isEks() error retrieving auth configmap: failed to retrieve ConfigMap kube-system/aws-auth: configmaps \"aws-auth\" is forbidden: User \"system:serviceaccount:xxxx:xxxx\" cannot get resource \"configmaps\" in API group \"\" in the namespace \"kube-system\""}
 2023/10/02 09:39:20 WARN: failed to get session token, falling back to IMDSv1: 404 Not Found: Not Found
 	status code: 404, request id: 
 caused by: EC2MetadataError: failed to make EC2Metadata request
 Not Found
 
 	status code: 404, request id: 
{"level":"debug","ts":1696239560.5777507,"caller":"ec2/ec2.go:62","msg":"EC2 metadata unavailable","kind":"processor","name":"resourcedetection","pipeline":"logs","error":"EC2MetadataError: failed to make EC2Metadata request\nNot Found\n\n\tstatus code: 404, request id: "}
{"level":"debug","ts":1696239560.578286,"caller":"azure/azure.go:47","msg":"Azure detector metadata retrieval failed","kind":"processor","name":"resourcedetection","pipeline":"logs","error":"Azure IMDS replied with status code: 404 Not Found"}
{"level":"info","ts":1696239560.578477,"caller":"internal/resourcedetection.go:139","msg":"detected resource information","kind":"processor","name":"resourcedetection","pipeline":"logs","resource":{"cloud.account.id":"xxxx","cloud.platform":"gcp_kubernetes_engine","cloud.provider":"gcp","cloud.region":"xxxx","host.id":"xxxx","host.name":"xxxx","k8s.cluster.name":"xxxx","os.type":"linux"}}
...

Copy link
Contributor

github-actions bot commented Dec 4, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 4, 2023
@mx-psi mx-psi added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Dec 4, 2023
@matej-g
Copy link
Contributor

matej-g commented Dec 18, 2023

This seems to be coming from the AWS SDK in https://github.com/aws/aws-sdk-go/blob/394d04f7e36b85532cede3eb815a6a23413b2eaa/aws/ec2metadata/token_provider.go#L68 - this part of the code does not respect the logging decision (since for the AWS client, logging should be off by default). Since in some settings (we have also noticed this in GKE Autopilot users) that call can lead to 404 or 403 responses, and subsequently the log is being printed out to stdout.

Besides trying to fix this in the upstream, we could override the logger with a custom AWS logger that would just discard any logs (and thus be true to the "logging off" level that should be by default). Similarly to other providers, I think the EC2 detector should still fail silently (or with debug log) in case we cannot obtain the metadata (e.g. because we're not running on EC2).

@matej-g
Copy link
Contributor

matej-g commented Jan 9, 2024

This will be resolved by merging #30341, since it has already been fixed in the upstream

@mx-psi mx-psi linked a pull request Jan 9, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/datadog Datadog components never stale Issues marked with this label will be never staled and automatically removed priority:p3 Lowest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants