Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracer: Jaeger tracer agent report fail #110632

Closed
maxnilz opened this issue Sep 14, 2023 · 3 comments · Fixed by #111342
Closed

tracer: Jaeger tracer agent report fail #110632

maxnilz opened this issue Sep 14, 2023 · 3 comments · Fixed by #111342
Assignees
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@maxnilz
Copy link

maxnilz commented Sep 14, 2023

Describe the problem

Hi, I was trying to set a local jaeger tracer for my local single instance, during this process, I'm getting an error that says "data does not fit within one UDP packet" inside the Jaeger exporter at here

To Reproduce

  1. Build CockroachDB from source(commit 0cde11b) ./dev doctor & ./dev build
  2. Start CockroachDB in a single node mode ./_bazel/bin/pkg/cmd/cockroach/cockroach_/cockroach start-single-node --insecure --listen-addr=localhost:36257 --sql-addr=localhost:26257
  3. Start Jaeger via docker-compose
  # docker-compose.yaml
  jaeger:
    image: jaegertracing/all-in-one
    container_name: jaeger
    ports:
      - "16685:16685"
      - "16686:16686"
      - "14250:14250"
      - "14268:14268"
      - "14269:14269"
      - "6831:6831/udp"
    environment:
      - COLLECTOR_ZIPKIN_HTTP_PORT=9411
      - COLLECTOR_OTLP_ENABLED=true
  1. Set jaeger agent to CockroachDB SET CLUSTER SETTING trace.jaeger.agent='localhost:6831';
  2. Check the error log of CockroachDB tail -f ./cockroach-data/logs/cockroach-stderr.log, and find the following output
I230914 08:32:30.841918 1 util/log/file_sync_buffer.go:238 ⋮ [config]   line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid [chan@]file:line redactionmark \[tags\] [counter] msg
I230914 08:32:30.841647 1 util/log/flags.go:213  [-] 1  stderr capture started
2023/09/14 16:32:32 data does not fit within one UDP packet; size 65018, max 65000, spans 41
2023/09/14 16:32:33 data does not fit within one UDP packet; size 65005, max 65000, spans 45
2023/09/14 16:32:34 multiple errors during transform: data does not fit within one UDP packet; size 65009, max 65000, spans 44, data does not fit within one UDP packet; size 65012, max 65000, spans 55
2023/09/14 16:32:35 data does not fit within one UDP packet; size 65020, max 65000, spans 58
2023/09/14 16:32:45 data does not fit within one UDP packet; size 65018, max 65000, spans 71
2023/09/14 16:33:00 data does not fit within one UDP packet; size 65008, max 65000, spans 83
2023/09/14 16:33:32 data does not fit within one UDP packet; size 65006, max 65000, spans 46
2023/09/14 16:33:33 data does not fit within one UDP packet; size 65003, max 65000, spans 53
2023/09/14 16:36:07 data does not fit within one UDP packet; size 65015, max 65000, spans 52
2023/09/14 16:38:08 data does not fit within one UDP packet; size 65001, max 65000, spans 74
2023/09/14 16:40:11 data does not fit within one UDP packet; size 65002, max 65000, spans 78


After a little bit of digging, there is a bug in the otel jaeger exporter, go.opentelemetry.io/otel/exporters/jaeger@v1.0.0-RC3, in which the maxPacketSize check would fail because of it is not considering the emitBatchOverhead that described here

It turns out the Jaeger exporter is now removed from the open elementary already, Refers to

However, since Jaeger now supports the opentelemetry OLTP, As recommended by Jaeger & opentelemetry, the alternative would be:

  • go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
  • or go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc

Then I switched to use OLTP and it works

  1. Start otel collector with the following config
  # docker-compose.yaml
  otel-collector:
    image: otel/opentelemetry-collector-contrib
    container_name: otel-collector
    volumes:
      - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
    ports:
      - 1888:1888 # pprof extension
      - 8888:8888 # Prometheus metrics exposed by the collector
      - 8889:8889 # Prometheus exporter metrics
      - 13133:13133 # health_check extension
      - 4317:4317 # OTLP gRPC receiver
      - 4318:4318 # OTLP http receiver
      - 55679:55679 # zpages extension
    
  jaeger:
    image: jaegertracing/all-in-one
    container_name: jaeger
    ports:
      - "16685:16685"
      - "16686:16686"
      - "14250:14250"
      - "14268:14268"
      - "14269:14269"
      - "6831:6831/udp"
    environment:
      - COLLECTOR_ZIPKIN_HTTP_PORT=9411
      - COLLECTOR_OTLP_ENABLED=true
# otel-collector-config.yaml
receivers:
  otlp: # the OTLP receiver the app is sending traces to
    protocols:
      grpc:
      http:

processors:
  batch:

exporters:
  otlp/jaeger: # Jaeger supports OTLP directly
    endpoint: http://jaeger:4317
    tls: 
      insecure: true

service:
  pipelines:
    traces/dev:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/jaeger]
  1. Unset jaeger SET CLUSTER SETTING trace.jaeger.agent='';
  2. Set OLTP collector SET CLUSTER SETTING trace.opentelemetry.collector='localhost:4317';

I guess the solution is simply to delete the Jaeger integration, or fork the otel-jaeger-exporter(EOLed already) and fix it from there(for backward compatibility)

Jira issue: CRDB-31535
Epic: CRDB-28893

@maxnilz maxnilz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 14, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 14, 2023

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @dhartunian (found keywords: Prometheus,metrics)
  • @cockroachdb/cdc (found keywords: export)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added A-cdc Change Data Capture O-community Originated from the community X-blathers-triaged blathers was able to find an owner T-cdc labels Sep 14, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 14, 2023

cc @cockroachdb/cdc

@jayshrivastava jayshrivastava added A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team and removed A-cdc Change Data Capture T-cdc labels Sep 14, 2023
@kvoli kvoli added T-observability-inf and removed A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team labels Sep 14, 2023
@kvoli
Copy link
Collaborator

kvoli commented Sep 14, 2023

Thanks for the detailed write-up @maxnilz! I'm going to tag our observability infrastructure team on this for triage cc @cockroachdb/obs-inf-prs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants