-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsamplingprocessor] Improve accuracy of count_traces_sampled
metric
#26724
Merged
jpkrohling
merged 5 commits into
open-telemetry:main
from
jmsnll:fix/tailsampling-metric-incorrect-count
Sep 27, 2023
Merged
[processor/tailsamplingprocessor] Improve accuracy of count_traces_sampled
metric
#26724
jpkrohling
merged 5 commits into
open-telemetry:main
from
jmsnll:fix/tailsampling-metric-incorrect-count
Sep 27, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…h policy decision - previously metric counts for each policy were based on the final decision and not on the outcome of each specific policy
- captures the global count of traces sampled by at least one policy - trying to capture the count in the existing `statCountTracesSampled` would have resulted in either an empty policy key or have a placeholder value (causing potential collision with user named policies from the configuration)
jpkrohling
approved these changes
Sep 20, 2023
This broke main
@jmsnll @jpkrohling would you be able to do a quick fix? Or should I revert the PR and we can address this more calmly? |
Closed
@mx-psi why |
Actually a fix has been submitted at #27246 |
Indeed :) that should fix it, thanks for taking a look in any case! |
jmsnll
added a commit
to jmsnll/opentelemetry-collector-contrib
that referenced
this pull request
Nov 12, 2023
…ampled` metric (open-telemetry#26724) **Description:** Previously, the calculation of `count_traces_sampled` was incorrect, resulting in all per-policy metric counts being the same. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 50 count_traces_sampled{policy="policy-b", sampled="true"} 50 ``` This issue stemmed from the metrics being incremented based on the final decision of trace sampling, rather than being attributed to the policies causing the sampling. With the fix implemented, the total sampling count can no longer be inferred from the metrics alone. To address this, a new metric, `global_count_traces_sampled`, was introduced to accurately capture the global count of sampled traces. ``` count_traces_sampled{policy="policy-a", sampled="false"} 50 count_traces_sampled{policy="policy-a", sampled="true"} 50 count_traces_sampled{policy="policy-b", sampled="false"} 24 count_traces_sampled{policy="policy-b", sampled="true"} 76 global_count_traces_sampled{sampled="false"} 24 global_count_traces_sampled{sampled="true"} 76 ``` Reusing the `count_traces_sampled` policy metric for this purpose would have either meant: 1. Leaving the policy field empty, which didn't feel very explicit. 2. Setting a global placeholder policy name (such as `final-decision`), which could then potentially collide with existing user-configured policy names without validation. **Link to tracking Issue:** Fixes open-telemetry#25882 **Testing:** Tested with various combinations of `probabilistic`, `latency`, `status_code` and attribute policies (including combinations of `and` and `composite` policies). No unit tests were added, open to suggestions for adding tests for metrics but I couldn't find any examples of it being done elsewhere. **Documentation:** No documentation changes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Previously, the calculation of
count_traces_sampled
was incorrect, resulting in all per-policy metric counts being the same.This issue stemmed from the metrics being incremented based on the final decision of trace sampling, rather than being attributed to the policies causing the sampling.
With the fix implemented, the total sampling count can no longer be inferred from the metrics alone. To address this, a new metric,
global_count_traces_sampled
, was introduced to accurately capture the global count of sampled traces.Reusing the
count_traces_sampled
policy metric for this purpose would have either meant:final-decision
), which could then potentially collide with existing user-configured policy names without validation.Link to tracking Issue: Fixes #25882
Testing:
Tested with various combinations of
probabilistic
,latency
,status_code
and attribute policies (including combinations ofand
andcomposite
policies).No unit tests were added, open to suggestions for adding tests for metrics but I couldn't find any examples of it being done elsewhere.
Documentation: No documentation changes.