Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs do not pass through probabilistic sampling processor #36119

Open
aduncan314 opened this issue Oct 31, 2024 · 6 comments
Open

Logs do not pass through probabilistic sampling processor #36119

aduncan314 opened this issue Oct 31, 2024 · 6 comments
Labels
bug Something isn't working processor/probabilisticsampler Probabilistic Sampler processor

Comments

@aduncan314
Copy link

Component(s)

processor/probabilisticsampler

What happened?

Description

No matter how much I simplify the configuration, I cannot get any logs to pass through the probabilisitic sampler even when sampling_percentage is set to 100.

Even if I'm making a mistake configuring the attributes, I would expect a percentage of 100 to pass every log.

Steps to Reproduce

Run the otel docker image (I tried with latest, 0.103.1, and 0.102.1) using this script with the simplified config shown below

#!/bin/zsh

CONFIG_PATH=./config.yaml

if [[ ! -f $CONFIG_PATH ]];then
  echo No file found at $CONFIG_PATH
fi

docker run \
  -v $CONFIG_PATH:/etc/otelcol-contrib/config.yaml \
  -p 127.0.0.1:4317:4317 \
  -p 127.0.0.1:55679:55679 \
  otel/opentelemetry-collector-contrib:latest \
  2>&1 | tee collector-output.txt

and send logs or traces using telemetrygen, e.g.

telemetrygen logs --oltp-insecure --logs 10

Our real config is obviously more complex, but I kept cutting it down until it was a minimal config in order to test this. I tried setting different values for from_attribute with no change in behavior.

Expected Result

Some logs should be displayed by the debug exporter depending on the sampling_percentage and attribute_source/from_attribute. When sampling_percentage is set to 100, I expect all logs to pass through even if the sampling attribute is constant across all logs.

Actual Result

Sampling works as expected when sending traces with telemetrygen and logs are displayed when the sampler is not in the pipeline.

When the sampler is in the logs pipeline, 0 logs display even when percentage is set to 100

Collector version

0.103.1, latest(0.112.0 I think), and 0.102.1

Environment information

Environment

OS: Mac

running docker images using Rancher.

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  probabilistic_sampler:
    sampling_percentage: 100
#    hash_seed: 22
#    attribute_source: record
#    from_attribute: "Timestamp"

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [probabilistic_sampler]
      exporters: [debug]
    logs:
      receivers: [otlp]
      processors: [probabilistic_sampler]
      exporters: [debug]

Log output

2024-10-31T18:22:40.765Z	info	service@v0.112.0/service.go:135	Setting up own telemetry...
2024-10-31T18:22:40.765Z	info	telemetry/metrics.go:70	Serving metrics	{"address": "localhost:8888", "metrics level": "Normal"}
2024-10-31T18:22:40.765Z	info	builders/builders.go:26	Development component. May change in the future.	{"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-10-31T18:22:40.765Z	info	builders/builders.go:26	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-10-31T18:22:40.766Z	info	service@v0.112.0/service.go:207	Starting otelcol-contrib...	{"Version": "0.112.0", "NumCPU": 2}
2024-10-31T18:22:40.766Z	info	extensions/extensions.go:39	Starting extensions...
2024-10-31T18:22:40.766Z	warn	internal@v0.112.0/warning.go:40	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.	{"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-10-31T18:22:40.766Z	info	otlpreceiver@v0.112.0/otlp.go:112	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
2024-10-31T18:22:40.766Z	info	service@v0.112.0/service.go:230	Everything is ready. Begin running and processing data.

Additional context

No response

@aduncan314 aduncan314 added bug Something isn't working needs triage New item requiring triage labels Oct 31, 2024
@github-actions github-actions bot added the processor/probabilisticsampler Probabilistic Sampler processor label Oct 31, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@aduncan314
Copy link
Author

Logs pass through fine when the processor is not in the pipeline.

The docker networking is very unlikely to be the issue and Servbay does not cover our needs, but thank you for the ideas.

@atoulme
Copy link
Contributor

atoulme commented Nov 9, 2024

We have a test that checks this very scenario. I am a bit at a loss to help here. Someone needs to spend some time reproducing.

@atoulme
Copy link
Contributor

atoulme commented Nov 9, 2024

Here is what is happening: none of the logs you send have a traceID, so they get discarded.
That's because by default the probabilisticsampler is using the traceID field.

It works a bit differently if you try this:

telemetrygen logs --logs 10 --otlp-insecure --telemetry-attributes logID=\"2\"       
telemetrygen logs --logs 10 --otlp-insecure --telemetry-attributes logID=\"abcdefrg\"

And you run with this config:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  probabilistic_sampler:
    sampling_percentage: 50
    attribute_source: record # possible values: one of record or traceID
    from_attribute: logID # value is required if the source is not traceID

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [probabilistic_sampler]
      exporters: [debug]
    logs:
      receivers: [otlp]
      processors: [probabilistic_sampler]
      exporters: [debug]

Then it works as advertised.

@atoulme atoulme removed the needs triage New item requiring triage label Nov 9, 2024
@aduncan314
Copy link
Author

@atoulme, thank you.

Here is what is happening: none of the logs you send have a traceID, so they get discarded.
That's because by default the probabilisticsampler is using the traceID field.

This makes sense and I was also trying to use another attribute. I'm sure I was doing something wrong there and I'll figure it out, but I was confused since I couldn't confirm that the processor was working at all.

I expected logs should come through when set at 100%, but I suppose that could be my misunderstanding of how an empty traceID is handled. I assumed it was still handled by the processor hashing an empty string or something like that. It sounds like it is actually discarded at the very beginning.

@jpkrohling
Copy link
Member

I think there's a current UX problem with the processor. With a similar configuration, I see the following metric at the Collector's internal metrics:

otelcol_processor_probabilistic_sampler_count_logs_sampled{policy="missing_randomness",sampled="false",service_instance_id="d671077a-484f-4c2d-9e07-cb0e60c24a1b",service_name="otelcol-contrib",service_version="0.113.0"} 12285

You can verify this is the case by using a hash of the body as the "from_attribute":

processors:
  probabilistic_sampler:
    hash_seed: 22
    sampling_percentage: 10
    attribute_source: record
    from_attribute: body.hash
  transform:
    log_statements:
      - context: log
        statements:
          - set(attributes["body.hash"], FNV(body))

Note however, that this particular example is a bad practice: the idea of the probabilistic sampler is to store samples of every event related to a specific business transaction, so that you get a good representation of what your systems are doing in production. The example above will semi-randomly discard logs based on the whole record.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/probabilisticsampler Probabilistic Sampler processor
Projects
None yet
Development

No branches or pull requests

4 participants
@jpkrohling @atoulme @aduncan314 and others