Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Tracing Not Working in v3.5.5+ #16935

Closed
4 tasks done
ugur99 opened this issue Nov 13, 2023 · 6 comments
Closed
4 tasks done

Distributed Tracing Not Working in v3.5.5+ #16935

ugur99 opened this issue Nov 13, 2023 · 6 comments

Comments

@ugur99
Copy link

ugur99 commented Nov 13, 2023

Bug report criteria

What happened?

Hi team, we're trying to enable distributed tracing for etcd, and we've configured it with the required flags according to the documentation. However, it does not seem to work properly with the v3.5.5+ (same config works fine with v3.5.4 without any problems); not sure if its a bug/regression or misdocumentation problem. Do you have any ideas or suggestions on this? Thanks in advance!

What did you expect to happen?

Distributed tracing should work for later versions too; it seems that it does not work with the required config flags that put in documentation.

How can we reproduce it (as minimally and precisely as possible)?

You can check both etcd versions with the same configuration. To providing such a playground you can provision a cluster by kind (or kubespray) and install required apps like tempo, opentelemetry operator. After than provision a collector that uses localNetwork on controlplane nodes and observe the different behaviours between v3.5.4 and v3.5.5+

collector config -->

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: collector
spec:
  mode: daemonset
  nodeSelector: 
    node-role.kubernetes.io/controlplane: ""
  hostNetwork: true
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
          http:
            endpoint: "0.0.0.0:4318"
    exporters:
      logging:
      otlphttp:
        endpoint: http://my-tempo.tempo:4318
    processors:
      batch:
    extensions:
      health_check: {}
    service:
      extensions: [health_check]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [logging, otlphttp]

Anything else we need to know?

No response

Etcd version (please run commands below)

works with 3.5.4 but does not work with 3.5.5+

Etcd configuration (command line flags or environment variables)

    - --experimental-enable-distributed-tracing=true
    - --experimental-distributed-tracing-address=0.0.0.0:4317
    - --experimental-distributed-tracing-service-name=etcd
    - --experimental-distributed-tracing-instance-id=testtrace

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

No response

@ugur99 ugur99 changed the title Distributed Tracing Does Not Work for v3.5.5+ Distributed Monitoring Not Working in v3.5.5+ Nov 13, 2023
@ugur99 ugur99 changed the title Distributed Monitoring Not Working in v3.5.5+ Distributed Tracing Not Working in v3.5.5+ Nov 13, 2023
@serathius
Copy link
Member

cc @dashpole

@jmhbnz
Copy link
Member

jmhbnz commented Nov 14, 2023

Hey @ugur99 thanks for raising this. I'm wondering if what you are seeing is related to the following which was introduced between 3.5.4 and 3.5.5:

It looks like for main / 3.6 we also have a configurable sample rate option: https://github.com/etcd-io/etcd/pull/13248/files#diff-538c79cd00ec18cb43b5dddd5f36b979d9d050cf478a241304493284739d31bfR353

However that sample rate option doesn't appear to be documented yet? Refer: https://etcd.io/docs/v3.6/op-guide/configuration/#experimental-distributed-tracing

We might need to do a backport of the ExperimentalDistributedTracingSamplingRatePerMillion configuration option to resolve this for 3.5 stable release branch.

@jmhbnz
Copy link
Member

jmhbnz commented Nov 14, 2023

@serathius can you please add an area/observability label. We don't really have any way of grouping tracing/loging/metrics related issues currently.

@ugur99
Copy link
Author

ugur99 commented Nov 14, 2023

thank you both @serathius @jmhbnz for the quick replies! It might be a good idea as well to document current status in v3.5 documentation as well.

@jmhbnz
Copy link
Member

jmhbnz commented Nov 16, 2023

Have started working on a fix #16951. Need to setup a tracing test environment to validate before marking ready for review.

Edit: Pull request is now ready for review.

@jmhbnz
Copy link
Member

jmhbnz commented Nov 27, 2023

Hey @ugur99 - A quick update, the code fix has been merged in #16951 and our documentation updated in etcd-io/website#753. The fix will be available for users very shortly in our upcoming v3.5.11 stable release.

I'll close this issue now as the work is completed, many thanks for reporting this bug 🙏🏻

@jmhbnz jmhbnz closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants