Warning
OTEPs have been moved to the Specification repository. This repository has been preserved for reference purposes. Please otherwise refer to the Specification.
Use the W3C trace context to convey consistent parent sampling probability.
The parent sampling probability is the probability associated with
the start of a trace context that was used to determine whether the
W3C sampled
flag is set, which determines whether child contexts
will be sampled by a ParentBased
Sampler. It is useful to know the
parent sampling probability associated with a context in order to
build span-to-metrics pipelines when the built-in ParentBased
Sampler is used. Further motivation for supporting span-to-metrics
pipelines is presented in OTEP
170.
A consistent trace sampling decision is one that can be carried out at
any node in a trace, which supports collecting partial traces.
OpenTelemetry specifies a built-in TraceIDRatioBased
Sampler that
aims to accomplish this goal but was left incomplete (see a
TODO
in the v1.0 Trace specification).
We propose a Sampler option to propagate the necessary information
alongside the W3C sampled flag
using tracestate
with an ot
vendor tag, which will require
(separately) specifying how the OpenTelemetry project uses
tracestate
itself.
Two pieces of information are needed to convey consistent parent sampling probability:
- p-value representing the parent sampling probability.
- r-value representing the "randomness" as the source of consistent sampling decisions.
This proposal uses 6 bits of information to propagate each of these and does not depend on built-in TraceID randomness, which is not sufficiently specified for probability sampling at this time. This proposal closely follows research by Otmar Ertl.
The concept of adjusted count is introduced in OTEP 170. Briefly, adjusted count is defined in terms of the sampling probability, where:
Sampling probability | Adjusted count | Notes |
---|---|---|
probability != 0 |
adjusted_count = 1/probability |
For spans selected with non-zero probability, adjusted count is the inverse of their sampling probability. |
probability == 0 |
adjusted_count = 0 |
For spans that were not selected by a probability sampler, adjusted count is zero. |
The term is used to convey the representivity of an item that was (or was not) selected by a probability sampler. Items that are not selected by a probability sampler are logically assigned zero adjusted count, such that if they are recorded for any other reason they do not introduce bias in the estimated count of the total span population.
To limit the cost of this extension and for statistical reasons documented below, we propose to limit parent sampling probability to powers of two. This limits the available parent sampling probabilities to 1/2, 1/4, 1/8, and so on. We can compactly encode these probabilities as small integer values using the base-2 logarithm of the adjusted count.
Using six bits of information we can convey known sampling rates as small as 2**-62. The value 63 is reserved to mean sampling with probability 0, which conveys an adjusted count of 0 for the associated context.
When propagated, the "p-value" as it is known will be interpreted as shown in the following table. The p-value for known sampling probabilities is the negative base-2 logarithm of the probability:
p-value | Parent Probability |
---|---|
0 | 1 |
1 | 1/2 |
2 | 1/4 |
... | ... |
N | 2**-N |
... | ... |
61 | 2**-61 |
62 | 2**-62 |
63 | 0 |
As specified in OTEP 170 for the Trace data
model,
parent sampling probability can be stored in exported Span data to
enable span-to-metrics pipelines to be built. Because tracestate
is
already encoded in the OpenTelemetry Span, this proposal is requires
no changes to the Span protocol. Accepting this proposal means the
p-value can be derived from tracestate
when the parent sampling
probability is known.
An unknown value for p
cannot be propagated using tracestate
explicitly, simply omitting p
conveys an unknown parent sampling
probability.
With parent sampling probabilities limited to powers of two, the amount of randomness needed per trace context is limited. A consistent sampling decision is accomplished by propagating a specific random variable known as the r-value.
To develop an intuition for r-values, consider a scenario where every
bit of the TraceID
is generated by a uniform random bit generator
(i.e., every bit is 0 or 1 with equal probability). An 128-bit
TraceID
can therefore be treated as a 128-bit unsigned integer,
which can be mapped into a fraction with range [0, 1) by dividing by
2**128, a form known as the TraceID-ratio. Now, probability sampling
could be achieved by comparing the TraceID-ratio with the sampling
probability, setting the sampled
flag when TraceID-ratio is less
than the sampling probability.
It is easy to see that with sampling probability 1, all TraceIDs will
be accepted because TraceID ratios are exclusively less than 1.
Sampling with probability 50% will select TraceID ratios less than
0.5, which maps to all TraceIDs less than 2**127 or, equivalently, all
TraceIDs where the most significant bit is zero. By the same logic,
sampling with probability 25% means accepting TraceIDs where the most
significant two bits are zero. In general, with exact probability
2**-S
is equivalent to selecting TraceIDs with S
leading zeros in
this example scenario.
The r-value specified here directly describes the number of leading zeros in a random 62-bit string, specified in a way that does not require TraceID values to be constructed with random bits in specific positions or with hard requirements on their uniformity. In mathematical terms, the r-value is described by a truncated geometric distribution, listed below:
r value |
Probability of r value |
Implied sampling probabilities |
---|---|---|
0 | 1/2 | 1 |
1 | 1/4 | 1/2 and above |
2 | 1/8 | 1/4 and above |
3 | 1/16 | 1/8 and above |
... | ... | ... |
0 <= r <= 61 | 1/(2**(-r-1)) | 2**(-r) and above |
... | ... | ... |
59 | 2**-60 | 2**-59 and above |
60 | 2**-61 | 2**-60 and above |
61 | 2**-62 | 2**-61 and above |
62 | 2**-62 | 2**-62 and above |
Such a random variable r
can be generated using efficient
instructions on modern computer architectures, for example we may
compute the number of leading zeros using hardware support:
import (
"math/rand"
"math/bits"
)
func nextRValueLeading() int {
x := uint64(rand.Int63()) // 63 least-significant bits are random
y := x << 1 | 0x3 // 62 most-significant bits are random
return bits.LeadingZeros64(y)
}
Or we may compute the number of trailing zeros instead, for example (not using special instructions):
import (
"math/rand"
)
func nextRValueTrailing() int {
x := uint64(rand.Int63())
for r := 0; r < 62; r++ {
if x & 0x1 == 0x1 {
return r
}
x = x >> 1
}
return 62
}
More examples for calculating r-values are shown in here. For example, the value 3 means there were three leading zeros and corresponds with being sampled at probabilities 1-in-1 through 1-in-8 but not at probabilities 1-in-16 and smaller.
The consistent sampling r-value (r
) and the parent sampling
probability p-value (p
) will be propagated using two bytes of base16
content for each of the two fields, as follows:
tracestate: ot=p:PP;r:RR
where PP
are two bytes of base16 p-value and RR
are two bytes of
base16 r-value. These values are omitted when they are unknown.
This proposal should be taken as a recommendation and will be modified
to match whatever format OpenTelemtry specifies for its
tracestate
.
The choice of base16 encoding is therefore just a recommendation,
chosen because traceparent
uses base16 encoding.
The following tracestate
value is accompanied by sampled=true
:
tracestate: ot=r:0a;p:03
and translates to
base16(p-value) = 03 // 1-in-8 parent sampling probability
base16(r-value) = 0a // qualifies for 1-in-1024 or greater probability consistent sampling
A ParentBased
Sampler will include ot=r:0a;p:03
in the stored
TraceState
field, allowing consumers to count it as with an adjusted
count of 8 spans. The sampled=true
flag remains set.
A TraceIDRatioBased
Sampler configured with probability 2**-10 or
greater will enable sampled=true
and convey a new parent sampling
probability via tracestate: ot=r:0a;p:0a
.
A TraceIDRatioBased
Sampler configured with probability 2**-11 or
smaller will set sampled=false
and remove p
from the tracestate,
setting tracestate: ot=r:0a
.
The reasoning behind restricting the set of sampling rates is that it:
- Lowers the cost of propagating parent sampling probability
- Limits the number of random bits required
- Avoids floating-point to integer rounding errors
- Makes math involving partial traces tractable.
An algorithm for making statistical inference from partially-sampled traces has been published that explains how to work with a limited number of power-of-2 sampling rates.
The Sampler MUST be configured with a power-of-two probability
expressed as 2**-s
with s being an integer in the range [0, 62]
except for the special case of zero probability (in which case p=63
is used).
If the context is a new root, the initial tracestate
must be created
with randomness value r
, as described above, in the range [0, 62].
If the context is not a new root, output a new tracestate
with the
same r
value as the parent context.
In both cases, set the sampled bit if the outgoing p
is less than or
equal to the outgoing r
(i.e., p <= r
).
When sampled, in both cases, the context's p-value p
is set to the
value of s
in the range [0, 62]. If the sampling probability is
zero (the special case where s
is undefined), use p=63
the
specified value for zero probability.
If the context is not a new root and the incoming context's r-value
is not set, the implementation SHOULD notify the user of an error
condition and follow the incoming context's sampled
flag.
The ParentBased
sampler is unmodified by this proposal. It honors
the W3C sampled
flag and copies the incoming tracestate
keys to
the child context. If the incoming context has known parent sampling
probability, so does the Span.
The span's parent sampling probability is known when both p
and r
are defined in the ot
sub-key of tracestate
. When r
or p
are
not defined, the span's parent sampling probability is unknown.
The AlwaysOn
Sampler behaves the same as TraceIDRatioBased
with
100% sampling probability (i.e., p=1
).
The AlwaysOff
Sampler behaves the same as TraceIDRatioBased
with
zero probability (i.e., p=63
).
The behavior of these tables can be verified by hand using a smaller
example. The following table shows how these equations work where
r
, p
, and s
are limited to 3 bits instead of 6 bits.
Values of p
are interpreted as follows:
p value |
Adjusted count |
---|---|
0 | 1 |
1 | 2 |
2 | 4 |
3 | 8 |
4 | 16 |
5 | 32 |
6 | 64 |
7 | 0 |
Note there are only seven known non-zero values for the adjusted count
(p
) ranging from 1 to 64. Thus there are seven defined values of r
and s
. The following table shows r
and the corresponding
selection probability, along with the calculated adjusted count for
each s
:
r value |
probability of r |
s=0 |
s=1 |
s=2 |
s=3 |
s=4 |
s=5 |
s=6 |
---|---|---|---|---|---|---|---|---|
0 | 1/2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1/4 | 1 | 2 | 0 | 0 | 0 | 0 | 0 |
2 | 1/8 | 1 | 2 | 4 | 0 | 0 | 0 | 0 |
3 | 1/16 | 1 | 2 | 4 | 8 | 0 | 0 | 0 |
4 | 1/32 | 1 | 2 | 4 | 8 | 16 | 0 | 0 |
5 | 1/64 | 1 | 2 | 4 | 8 | 16 | 32 | 0 |
6 | 1/64 | 1 | 2 | 4 | 8 | 16 | 32 | 64 |
Notice that the sum of r
probability times adjusted count in each of
the s=*
columns equals 1. For example, in the s=4
column we have
0*1/2 + 0*1/4 + 0*1/8 + 0*1/16 + 16*1/32 + 16*1/64 + 16*1/64 = 1/2 + 1/4 + 1/4 = 1
. In the s=2
column we have 0*1/2 + 0*1/4 + 4*1/8 + 4*1/16 + 4*1/32 + 4*1/64 + 4*1/64 = 1/2 + 1/4 + 1/8 + 1/16 + 1/16 = 1
.
We conclude that when r
is chosen with the given probabilities,
any choice of s
produces one expected span.
The following table summarizes how the three Sampler cases behave with
respect to the incoming and outgoing values for p
, r
, and
sampled
:
Sampler | Incoming r |
Incoming p |
Incoming sampled |
Outgoing r |
Outgoing p |
Outgoing sampled |
---|---|---|---|---|---|---|
Parent | unused | expected | respected | checked and passed through | checked and passed through | checked and passed through |
TraceIDRatio(Non-Root) | used | unused | ignored | checked and passed through | set to s |
set to p <= r |
TraceIDRatio(Root) | n/a | n/a | n/a | random variable | set to s |
set to p <= r |
There are several cases where the resulting span's parent sampling probability is unknown:
Sampler | Unknown condition |
---|---|
Parent | no incoming p |
TraceIDRatio(Non-Root) | no incoming r |
TraceIDRatio(Root) | none |
The inputs are recognized as out-of-range as follows:
Range invariate | Remedy |
---|---|
p < 0 |
drop p from tracestate |
p > 63 |
drop p from tracestate |
r < 0 |
drop r and p from tracestate |
r > 62 |
drop r and p from tracestate |
There are cases where the combination of p
and r
and sampled
are
inconsistent with each other. The sampled
flag is equivalent to the
expression p <= r
. When the invariant sampled <=> p <= r
is
violated, the ParentBased
sampler MUST correct the propagated values
as discussed below.
The violation is always addressed by honoring the sampled
flag and
correcting p
to either 63 (for zero adjusted count) or unset (for
unknown parent sampling probability).
If sampled
is false and the invariant is violated, drop p
from the
outgoing context to convey unknown parent sampling probability.
The case where sampled
is true with p=63
indicating 0% probability
may by regarded as a special case to allow zero adjusted count
sampling, which permits non-probabilistic sampling to take place in
the presence of probability sampling. Set p
to 63.
If sampled
is true with p<63
(but p>r
), drop p
from the
outgoing context to convey unknown parent sampling probability.
This proposal has been prototyped in the OTel-Go
SDK. No
changes in the OTel-Go Tracing SDK's Sampler
or tracestate
APIs
were needed.
This proposal changes the logic of the TraceIDRatioBased
sampler,
currently part of the OpenTelemetry specification, in a way that makes
the name no longer meaningful. The proposed sampler may be named
ConsistentSampler
and the existing TraceIDRatioBased
sampler can
be deprecated.
Many SDKs already implement the TraceIDRatioBased
sampler and it has
been used for probability sampling at trace roots with arbitrary
(i.e., not power-of-two) probabilities. Because of this, we may keep
the current (under-specified) TraceIDRatioBased
sampler and rename
it ProbabilitySampler
to convey that it does behave in a specified
way with respect to the bits of the TraceID.
It would be possible, if TraceID were specified to have at least 62 uniform random bits, to compute the randomness value described above as the number of leading zeros among those 62 random bits.
However, this would require modifying the W3C traceparent specification, therefore we do not propose to use bits of the TraceID.
See W3C trace context issue 467.
It would be possible to make a consistent sampling decision by hashing the TraceID, but we feel such an approach is not sufficient for making unbiased sampling decisions. It is seen as a relatively difficult task to define and specify a good enough hashing function, much less to have it implemented in multiple languages.
Hashing is also computationally expensive. This proposal uses extra data to avoid the computational cost of hashing TraceIDs.
Restricting parent sampling probabilities to powers of two does not limit tail
Samplers from using arbitrary probabilities. The companion OTEP
170 has discussed
the use of a sampler.adjusted_count
attribute that would not be
limited to power-of-two values. Discussion about how to represent the
effective adjusted count for tail-sampled Spans belongs in OTEP
170, not this OTEP.
Restricting parent sampling probabilities to powers of two does not limit Samplers from using arbitrary effective probabilities over a period of time. For example, a typical trace sampling rate of 5% (i.e., 1 in 20) can be accomplished by choosing 1/16 sampling 60% of the time and 1/32 sampling 40% of the time:
1/16 * 0.6 + 1/32 * 0.4 = 0.05
Consistent trace sampling requires the r
value to be propagated even
when the span itself is not sampled. It is not necessary, however, to
propagate the p
value when the context is not sampled, since
ParentBased
samplers will not change the decision. Although one
use-case was docmented in Google's early Dapper system (known as
"inflationary sampling", see
OTEP 170), the same effect can
be achieved using a consistent sampling decision in this framework.
In order for consistent trace sampling decisions to be made, the r
value MUST be set at the root of the trace. This behavior could be
opt-in or opt-out. If opt-in, users would have to enable the setting
of r
and the setting and propagating of p
in the tracestate. If
opt-out, users would have to disable these features to turn them off.
The cost and convenience of Sampling features depend on this choice.
This author's recommendation is that these behaviors be opt-in at
first in order to demonstrate their usefulness. If it proves
successful, an on-by-default approach could be proposed using a
modified W3C trace context traceparent
, as this would allow p-values
to be propagated cheaply.
See W3C issue trace context issue
463 which is about
propagating sampling probability in the traceparent
header, which
makes it cheap enough to have on-by-default.