-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/filter] Add metrics for dropped telemetry #13169
Comments
Pinging code owners: @boostchicken @pmm-sumo. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
We are moving towards using OpenTelemetry SDK for reporting Collector metrics. I'm not sure if we want to add more metrics reported by OpenCensus it this point. Maybe we can wait for adoption of OTel SDK before tackling this issue, but It's not clear how long it'll take. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Bumping, can this be reconsidered? |
@dmitryax In lieu of this I am curious if you any ideas how we can accurately determine success rate. We calculate this by doing after we add the filter processor we need the following but it does not seem possible |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I think adding some metrics for the processor when it drops a span is a good idea. We (Honeycomb) include similar metrics for Refinery and they are very useful. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I'm interested in working on this 🙂 |
Thanks @MacroPower! Assigned |
…#29081) **Description:** Adds telemetry for metrics, logs, and spans that were intentionally dropped via a `filterprocessor`. Specifically, the following metrics are added: `otelcol_processor_filter_datapoints_filtered` `otelcol_processor_filter_logs_filtered` `otelcol_processor_filter_spans_filtered` Please let me know any feedback/thoughts on the naming or anything else! **Link to tracking Issue:** #13169 **Testing:** I've used batchprocessor as an example for a couple of tests, Filter*ProcessorTelemetryWithOC. I kept the wrapping code so that OTEL versions can be easily added when that is ready in contrib. Overall the tests are not super comprehensive and I could improve them if needed, but as-is they were helpful for debugging. <details> <summary><i>Additionally, here's some stuff you can use for manually testing.</i></summary> There might be a better way to do this, but I just used hostmetrics, filelog, and [this article from honeycomb](https://www.honeycomb.io/blog/test-span-opentelemetry-collector) with otlp/http. Note, this should be run from the root of the contrib repo. Add/overwrite `local/config.yaml`, `local/span.json`, and run: ```bash mkdir -p local cat >local/config.yaml <<EOL receivers: hostmetrics: collection_interval: 30s initial_delay: 1s scrapers: load: filelog: include: ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log - /tmp/otel-test.log operators: - type: json_parser timestamp: parse_from: attributes.timestamp layout: "%Y-%m-%d %H:%M:%S" otlp: protocols: ## curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json http: processors: filter/test: metrics: metric: # Should drop 2 of the 3 metrics, 5m average remains - 'name=="system.cpu.load_average.1m"' - 'name=="system.cpu.load_average.15m"' logs: log_record: # Should filter out "bar" and "baz" - 'IsMatch(body, ".*ba.*")' traces: span: # Should drop 1 of the 2 spans - 'name == "foobar"' exporters: debug: verbosity: detailed sampling_initial: 5 sampling_thereafter: 200 service: extensions: [] pipelines: metrics: receivers: [hostmetrics] processors: [filter/test] exporters: [debug] logs: receivers: [filelog] processors: [filter/test] exporters: [debug] traces: receivers: [otlp] processors: [filter/test] exporters: [debug] telemetry: logs: level: debug metrics: level: detailed address: 0.0.0.0:8888 EOL cat >local/span.json <<EOL { "resourceSpans": [ { "resource": { "attributes": [ { "key": "service.name", "value": { "stringValue": "test-with-curl" } } ] }, "scopeSpans": [ { "scope": { "name": "manual-test" }, "spans": [ { "traceId": "71699b6fe85982c7c8995ea3d9c95df2", "spanId": "3c191d03fa8be065", "name": "spanitron", "kind": 2, "droppedAttributesCount": 0, "events": [], "droppedEventsCount": 0, "status": { "code": 1 } }, { "traceId": "71699b6fe85982c7c8995ea3d9c95df2", "spanId": "2f357b34d32f77b4", "name": "foobar", "kind": 2, "droppedAttributesCount": 0, "events": [], "droppedEventsCount": 0, "status": { "code": 1 } } ] } ] } ] } EOL make run ``` Send some data to the receivers: ```bash # Write some logs echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log # Write some spans curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json ``` Check the results: ```console $ curl http://localhost:8888/metrics | grep filtered # HELP otelcol_processor_filter_datapoints_filtered Number of metric data points dropped by the filter processor # TYPE otelcol_processor_filter_datapoints_filtered counter otelcol_processor_filter_datapoints_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2 # HELP otelcol_processor_filter_logs_filtered Number of logs dropped by the filter processor # TYPE otelcol_processor_filter_logs_filtered counter otelcol_processor_filter_logs_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2 # HELP otelcol_processor_filter_spans_filtered Number of spans dropped by the filter processor # TYPE otelcol_processor_filter_spans_filtered counter otelcol_processor_filter_spans_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 1 ``` </details> **Documentation:** I do not believe we document telemetry exposed by components, but I could add this if needed. --------- Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
…open-telemetry#29081) **Description:** Adds telemetry for metrics, logs, and spans that were intentionally dropped via a `filterprocessor`. Specifically, the following metrics are added: `otelcol_processor_filter_datapoints_filtered` `otelcol_processor_filter_logs_filtered` `otelcol_processor_filter_spans_filtered` Please let me know any feedback/thoughts on the naming or anything else! **Link to tracking Issue:** open-telemetry#13169 **Testing:** I've used batchprocessor as an example for a couple of tests, Filter*ProcessorTelemetryWithOC. I kept the wrapping code so that OTEL versions can be easily added when that is ready in contrib. Overall the tests are not super comprehensive and I could improve them if needed, but as-is they were helpful for debugging. <details> <summary><i>Additionally, here's some stuff you can use for manually testing.</i></summary> There might be a better way to do this, but I just used hostmetrics, filelog, and [this article from honeycomb](https://www.honeycomb.io/blog/test-span-opentelemetry-collector) with otlp/http. Note, this should be run from the root of the contrib repo. Add/overwrite `local/config.yaml`, `local/span.json`, and run: ```bash mkdir -p local cat >local/config.yaml <<EOL receivers: hostmetrics: collection_interval: 30s initial_delay: 1s scrapers: load: filelog: include: ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log ## echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log - /tmp/otel-test.log operators: - type: json_parser timestamp: parse_from: attributes.timestamp layout: "%Y-%m-%d %H:%M:%S" otlp: protocols: ## curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json http: processors: filter/test: metrics: metric: # Should drop 2 of the 3 metrics, 5m average remains - 'name=="system.cpu.load_average.1m"' - 'name=="system.cpu.load_average.15m"' logs: log_record: # Should filter out "bar" and "baz" - 'IsMatch(body, ".*ba.*")' traces: span: # Should drop 1 of the 2 spans - 'name == "foobar"' exporters: debug: verbosity: detailed sampling_initial: 5 sampling_thereafter: 200 service: extensions: [] pipelines: metrics: receivers: [hostmetrics] processors: [filter/test] exporters: [debug] logs: receivers: [filelog] processors: [filter/test] exporters: [debug] traces: receivers: [otlp] processors: [filter/test] exporters: [debug] telemetry: logs: level: debug metrics: level: detailed address: 0.0.0.0:8888 EOL cat >local/span.json <<EOL { "resourceSpans": [ { "resource": { "attributes": [ { "key": "service.name", "value": { "stringValue": "test-with-curl" } } ] }, "scopeSpans": [ { "scope": { "name": "manual-test" }, "spans": [ { "traceId": "71699b6fe85982c7c8995ea3d9c95df2", "spanId": "3c191d03fa8be065", "name": "spanitron", "kind": 2, "droppedAttributesCount": 0, "events": [], "droppedEventsCount": 0, "status": { "code": 1 } }, { "traceId": "71699b6fe85982c7c8995ea3d9c95df2", "spanId": "2f357b34d32f77b4", "name": "foobar", "kind": 2, "droppedAttributesCount": 0, "events": [], "droppedEventsCount": 0, "status": { "code": 1 } } ] } ] } ] } EOL make run ``` Send some data to the receivers: ```bash # Write some logs echo '{"timestamp":"2023-12-18 12:00:00","msg":"foo"}' >> /tmp/otel-test.log echo '{"timestamp":"2023-12-18 12:00:00","msg":"bar"}' >> /tmp/otel-test.log echo '{"timestamp":"2023-12-18 12:00:00","msg":"baz"}' >> /tmp/otel-test.log # Write some spans curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @local/span.json ``` Check the results: ```console $ curl http://localhost:8888/metrics | grep filtered # HELP otelcol_processor_filter_datapoints_filtered Number of metric data points dropped by the filter processor # TYPE otelcol_processor_filter_datapoints_filtered counter otelcol_processor_filter_datapoints_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2 # HELP otelcol_processor_filter_logs_filtered Number of logs dropped by the filter processor # TYPE otelcol_processor_filter_logs_filtered counter otelcol_processor_filter_logs_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 2 # HELP otelcol_processor_filter_spans_filtered Number of spans dropped by the filter processor # TYPE otelcol_processor_filter_spans_filtered counter otelcol_processor_filter_spans_filtered{filter="filter/test",service_instance_id="a99d9078-548b-425f-8466-3e9e2e9bf3b1",service_name="otelcontribcol",service_version="0.91.0-dev"} 1 ``` </details> **Documentation:** I do not believe we document telemetry exposed by components, but I could add this if needed. --------- Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Is your feature request related to a problem? Please describe.
We have a dashboard that shows the success rate of data flowing through a collector pipeline. We use four metrics to determine success rate:
We have a filter processor configured for telemetry we don't care about. When telemetry gets dropped from a filter processor, we have no metrics to identify that they were dropped. Since they were dropped, our success rate calculation is lower than expected.
Describe the solution you'd like
As a developer, it would be nice to know how many traces/spans/logs are getting dropped by a filter processor.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: