Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector not working when k8sattributes in use #35879

Closed
clintonb opened this issue Oct 18, 2024 · 8 comments · Fixed by #36385
Closed

Collector not working when k8sattributes in use #35879

clintonb opened this issue Oct 18, 2024 · 8 comments · Fixed by #36385
Assignees
Labels
bug Something isn't working processor/k8sattributes k8s Attributes processor

Comments

@clintonb
Copy link

Component(s)

processor/k8sattributes

What happened?

Description

I am trying to add k8sattributes to a gateway collector, but the collector and health check are not functioning. The collector appears to start, but refuses connections on the receiving ports. The health check endpoint returns a 503 with {"status":"Server not available","upSince":"0001-01-01T00:00:00Z","uptime":""}.

Steps to Reproduce

  1. Build an image with the configuration below.
  2. Run it.

Expected Result

  1. Collector responds to health checks at curl -v http://localhost:13133.
  2. Received traces include k8s attributes.

Actual Result

The collector never becomes healthy, and does not accept any signals.

Collector version

0.111.0

Environment information

Environment

OS: macOS 15.0.1 (Docker), and GKE Autopilot

OpenTelemetry Collector configuration

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
        include_metadata: true

processors:
  k8sattributes:
    auth_type: "serviceAccount"
    extract:
      metadata:
#        - k8s.namespace.name
#        - k8s.deployment.name
#        - k8s.statefulset.name
#        - k8s.daemonset.name
#        - k8s.cronjob.name
#        - k8s.job.name
#        - k8s.node.name
        - k8s.pod.name
#        - k8s.pod.uid
#        - k8s.pod.start_time
    passthrough: false
#    pod_association:
#      - sources:
#          - from: resource_attribute
#            name: k8s.pod.ip
#      - sources:
#          - from: resource_attribute
#            name: k8s.pod.uid
#      - sources:
#          - from: connection

exporters:
  # NOTE: Add this to the list of pipeline exporters to see the collector's debug logs
  debug:
    verbosity: detailed
service:
  extensions: [ health_check ]
  # https://opentelemetry.io/docs/collector/configuration/#telemetry
  telemetry:
    # This controls log verbosity of the collector itself.
    logs:
      encoding: json
      level: "debug"
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ k8sattributes ]
      exporters: [ debug ]

Log output

otel-collector-1  | {"level":"info","ts":1729289848.7740626,"caller":"service@v0.111.0/service.go:136","msg":"Setting up own telemetry..."}
otel-collector-1  | {"level":"info","ts":1729289848.7742362,"caller":"telemetry/metrics.go:70","msg":"Serving metrics","address":"localhost:8888","metrics level":"Normal"}
otel-collector-1  | {"level":"info","ts":1729289848.7743702,"caller":"builders/builders.go:26","msg":"Development component. May change in the future.","kind":"exporter","data_type":"traces","name":"debug"}
otel-collector-1  | {"level":"debug","ts":1729289848.774658,"caller":"builders/builders.go:24","msg":"Beta component. May change in the future.","kind":"processor","name":"k8sattributes","pipeline":"traces"}
otel-collector-1  | {"level":"debug","ts":1729289848.774745,"caller":"builders/builders.go:24","msg":"Stable component.","kind":"receiver","name":"otlp","data_type":"traces"}
otel-collector-1  | {"level":"debug","ts":1729289848.7748547,"caller":"builders/extension.go:48","msg":"Beta component. May change in the future.","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729289848.7753859,"caller":"service@v0.111.0/service.go:208","msg":"Starting otelcol-contrib...","Version":"0.111.0","NumCPU":16}
otel-collector-1  | {"level":"info","ts":1729289848.775412,"caller":"extensions/extensions.go:39","msg":"Starting extensions..."}
otel-collector-1  | {"level":"info","ts":1729289848.775442,"caller":"extensions/extensions.go:42","msg":"Extension is starting...","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729289848.7754776,"caller":"healthcheckextension@v0.111.0/healthcheckextension.go:33","msg":"Starting health_check extension","kind":"extension","name":"health_check","config":{"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
otel-collector-1  | {"level":"warn","ts":1729289848.7757652,"caller":"internal@v0.111.0/warning.go:40","msg":"Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.","kind":"extension","name":"health_check","documentation":"https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
otel-collector-1  | {"level":"info","ts":1729289848.7758508,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"health_check"}


### Additional context

I run `curl -v http://localhost:13133` to check if the collector is healthy.
@clintonb clintonb added bug Something isn't working needs triage New item requiring triage labels Oct 18, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the processor/k8sattributes k8s Attributes processor label Oct 18, 2024
@vkamlesh
Copy link

Can you provide more details on how you configured the OTEL collector? Additionally, why did you commented the pod_association block in the configuration? I believe that without pod_association, the k8sattributes processor will not function correctly.

@clintonb
Copy link
Author

@vkamlesh I've posted the smallest configuration I have to replicate the issue. The logs are everything I have. k8sattributes doesn't seem to log anything, even at the debug level.

The collector is broken even if I restore all commented-out code. This is when I run both locally and on Kubernetes.

I don't think anything in my Dockerfile should affect this processor, but here it is for completeness:

# Adapted from:
#  - https://www.honeycomb.io/blog/rescue-struggling-pods-from-scratch
#  - https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/cmd/otelcontribcol/Dockerfile
FROM otel/opentelemetry-collector-contrib:0.111.0 AS binary
FROM alpine:latest

ARG USER_UID=10001
USER ${USER_UID}

COPY --from=binary /otelcol-contrib /

EXPOSE 4317 4318 55680 55679

COPY config.yaml /etc/otelcol/config.yaml

ENV LOG_LEVEL=info

ARG COMMIT_SHA=""
ENV COMMIT_SHA=${COMMIT_SHA}

# Remove the entrypoint so we can execute other commands for hooks and other purposes.
ENTRYPOINT []
CMD ["/otelcol-contrib", "--config", "/etc/otelcol/config.yaml"]

@vkamlesh
Copy link

For k8sattributes, I think you need to un-comment pod_association section.
For example

      k8sattributes/logs: #Extracting Kubernetes attributes from resource metadata.
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.deployment.name
            - k8s.statefulset.name
            - k8s.daemonset.name
            - k8s.cronjob.name
            - k8s.job.name
            - k8s.node.name
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.pod.start_time
            - k8s.cluster.uid
            - k8s.container.name
            - container.image.name
            - container.image.tag
            - k8s.cluster.uid
        filter:
          node_from_env_var: K8S_NODE_NAME
        passthrough: false
        pod_association:
          - sources:
              - from: resource_attribute
                name: k8s.pod.ip
          - sources:
              - from: resource_attribute
                name: k8s.pod.uid
          - sources:
              - from: resource_attribute
                name: container.id
          - sources:
              - from: connection

@clintonb
Copy link
Author

@vkamlesh I tried that and it doesn't work.

config.yaml

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
        include_metadata: true

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.node.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
        - k8s.cluster.uid
        - k8s.container.name
        - container.image.name
        - container.image.tag
        - k8s.cluster.uid
    filter:
      node_from_env_var: K8S_NODE_NAME
    passthrough: false
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip
      - sources:
          - from: resource_attribute
            name: k8s.pod.uid
      - sources:
          - from: resource_attribute
            name: container.id
      - sources:
          - from: connection

exporters:
  # NOTE: Add this to the list of pipeline exporters to see the collector's debug logs
  debug:
    verbosity: detailed
service:
  extensions: [ health_check ]
  # https://opentelemetry.io/docs/collector/configuration/#telemetry
  telemetry:
    # This controls log verbosity of the collector itself.
    logs:
      encoding: json
      level: "debug"
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ k8sattributes ]
      exporters: [ debug ]

collector logs

otel-collector-1  | {"level":"info","ts":1729696350.0262184,"caller":"service@v0.111.0/service.go:136","msg":"Setting up own telemetry..."}
otel-collector-1  | {"level":"info","ts":1729696350.0269117,"caller":"telemetry/metrics.go:70","msg":"Serving metrics","address":"localhost:8888","metrics level":"Normal"}
otel-collector-1  | {"level":"info","ts":1729696350.0271108,"caller":"builders/builders.go:26","msg":"Development component. May change in the future.","kind":"exporter","data_type":"traces","name":"debug"}
otel-collector-1  | {"level":"debug","ts":1729696350.0287833,"caller":"builders/builders.go:24","msg":"Beta component. May change in the future.","kind":"processor","name":"k8sattributes","pipeline":"traces"}
otel-collector-1  | {"level":"debug","ts":1729696350.0288186,"caller":"builders/builders.go:24","msg":"Stable component.","kind":"receiver","name":"otlp","data_type":"traces"}
otel-collector-1  | {"level":"debug","ts":1729696350.028841,"caller":"builders/extension.go:48","msg":"Beta component. May change in the future.","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729696350.029432,"caller":"service@v0.111.0/service.go:208","msg":"Starting otelcol-contrib...","Version":"0.111.0","NumCPU":16}
otel-collector-1  | {"level":"info","ts":1729696350.0294414,"caller":"extensions/extensions.go:39","msg":"Starting extensions..."}
otel-collector-1  | {"level":"info","ts":1729696350.0296128,"caller":"extensions/extensions.go:42","msg":"Extension is starting...","kind":"extension","name":"health_check"}
otel-collector-1  | {"level":"info","ts":1729696350.029626,"caller":"healthcheckextension@v0.111.0/healthcheckextension.go:33","msg":"Starting health_check extension","kind":"extension","name":"health_check","config":{"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
otel-collector-1  | {"level":"warn","ts":1729696350.031119,"caller":"internal@v0.111.0/warning.go:40","msg":"Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.","kind":"extension","name":"health_check","documentation":"https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
otel-collector-1  | {"level":"info","ts":1729696350.0315917,"caller":"extensions/extensions.go:59","msg":"Extension started.","kind":"extension","name":"health_check"}

Health check response

curl -v http://localhost:13133
* Host localhost:13133 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:13133...
* Connected to localhost (::1) port 13133
> GET / HTTP/1.1
> Host: localhost:13133
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 503 Service Unavailable
< Content-Type: application/json
< Date: Wed, 23 Oct 2024 15:14:10 GMT
< Content-Length: 78
<
* Connection #0 to host localhost left intact
{"status":"Server not available","upSince":"0001-01-01T00:00:00Z","uptime":""}%

@atoulme atoulme removed the needs triage New item requiring triage label Oct 29, 2024
@ChrsMark
Copy link
Member

I don't think that's an issue with the processor/k8sattributes component. It looks like an issue with how the otlp receiver and/or the health_check extension. I would speculate the issue lies on how these are configured and specifically the endpoint part.

If you are running the Collector on K8s I would advice to take a look into https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector and either use the Helm Chart directly or check how these components are configured by default.

@bacherfl
Copy link
Contributor

I tried to reproduce this and got the same behaviour when there is an error during the initialization of the kube client:

componentstatus.ReportStatus(host, componentstatus.NewFatalErrorEvent(err))

In this case the error is passed through to the componentstatus.ReportStatus() function, but does not end up in the logs, which makes this scenario hard to troubleshoot.
Therefore I think also logging the error using the processor's logger in addition to passing it on to the ReportStatus() would make sense as it would make it easier to spot any errors during the kube client initialization.

@ChrsMark
Copy link
Member

I tried to reproduce this and got the same behaviour when there is an error during the initialization of the kube client:

componentstatus.ReportStatus(host, componentstatus.NewFatalErrorEvent(err))

In this case the error is passed through to the componentstatus.ReportStatus() function, but does not end up in the logs, which makes this scenario hard to troubleshoot. Therefore I think also logging the error using the processor's logger in addition to passing it on to the ReportStatus() would make sense as it would make it easier to spot any errors during the kube client initialization.

That'd make sense!

ZenoCC-Peng pushed a commit to ZenoCC-Peng/opentelemetry-collector-contrib that referenced this issue Dec 6, 2024
…nitialisation (open-telemetry#36385)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR adds more log output to the k8s attributes receiver to log any
errors that are encountered during the kube client initialisation, to
make troubleshooting and identifying this issue easier.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35879

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…nitialisation (open-telemetry#36385)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR adds more log output to the k8s attributes receiver to log any
errors that are encountered during the kube client initialisation, to
make troubleshooting and identifying this issue easier.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35879

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this issue Jan 13, 2025
…nitialisation (open-telemetry#36385)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR adds more log output to the k8s attributes receiver to log any
errors that are encountered during the kube client initialisation, to
make troubleshooting and identifying this issue easier.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35879

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
chengchuanpeng pushed a commit to chengchuanpeng/opentelemetry-collector-contrib that referenced this issue Jan 26, 2025
…nitialisation (open-telemetry#36385)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR adds more log output to the k8s attributes receiver to log any
errors that are encountered during the kube client initialisation, to
make troubleshooting and identifying this issue easier.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#35879

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/k8sattributes k8s Attributes processor
Projects
None yet
5 participants