-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomness flag bit #467
Comments
It would be absolutely fabulous if TraceContext began supporting some amount guaranteed randomness. This is also a good time to do it, as many organizations which produce these IDs are still in the process of adopting this header. Referencing a related issue, OTel is considering using the remaining bits as a timestamp: open-telemetry/opentelemetry-specification#1947 But as I mentioned in that issue, what is the possibility that we will want ~128 bits of randomness in the future? I would suggest that we don't assume we can easily bump up the amount of randomness in a later version of the spec. |
I think it is worth mentioning that in order to use this flag on 64 bit systems as described in #349 essentially the whole trace ID would be required to be random. The spec requires systems like this to use the right-most bytes for their short trace ID, which are the same bits you are proposing to use. This requirement is currently non-normative so it may be ok https://www.w3.org/TR/trace-context/#handling-trace-id-for-compliant-platforms-with-shorter-internal-identifiers The original requirement for 63 bits was for a specific sampling requirement to be able to represent probabilities as low as |
@dyladan There was never a requirement to support sampling rates as low as 2^-63. It is rather a result of the number of bits that are used to encode the number of leading zeros (NLZ) of a uniform random number. If all sampling rates are powers of 1/2, it is sufficient for consistent sampling to propagate the NLZ only. If 5 bits are used for encoding the NLZ, the minimum supported sampling rate would be 2^-31 which might not be sufficiently small. Therefore, we proposed to use 6 bits for encoding the NLZ, which resulted in a minimum sampling rate of 2^-63. Hence, we could also think of having for example just 48 random bits supporting sampling rates >= 2^-47. @bogdandrutu, @jmacd Probably a minor issue, but dependent on the implementation and the used random number generator, generating a certain number of high-quality random bits could be more costly than generating the NLZ directly as in the alternative proposal (#463) which needs only 2 random bits on average and potentially allows to use the bits of one 64-bit random value for multiple traces. Maybe also interesting in this context, a quality comparison of fast common pseudo-random number generators https://github.com/lemire/testingRNG#visual-summary. |
This is correct, but so far the most system that I know they are/were using 64-bit trace-id are Zipkin and Jaeger which both use a "fully random" trace-id, which means they will be able to set the flag in both cases if they want to make use of the flag in their environments.
This is based on my experience (system I designed/wrote or maintained), I saw systems like Google using up to 2^20 sampling probability so they need 20 random bits at least, also I've seen other algorithms that may require more random bits. If the concern is that we are asking for "too" many random bits, I am happy to accept other proposals like 31 or 47?
Adding new bytes will cause a version update and will be backwards incompatible, so I think it is a huge disadvantage compared to the proposed solution which does not require a new version of the header. |
Nothing can stop us to add in the future another bit that says if the first 65-bits (or how many we may need) that has the same behavior as the proposed bit. Personally I see no reason to have this, and cannot find any use-case. Also that will definitely make the "64 bit systems" to not work with this proposal, see https://github.com/w3c/trace-context/blob/main/spec/60-trace-id-format.md#interoperating-with-existing-systems-which-use-shorter-identifiers |
During the WG meeting today it was brought up that some tracing systems use non-random trace ids. @SergeyKanzhelev mentioned Apache SkyWalking as one of those systems so I took a look at their javascript agent to see what they're doing. Their model is slightly different because each component in the trace has a "segment" and each segment may have multiple spans. In their documentation they have: Their header is composed of multiple fields concatenated with
|
Note that does not address #463 because it makes the operator choose between using externally generated ids (sometimes needed for legacy/compatibility reasons) and having randomness in the context. |
Based on the OpenTelemetry work related to sampling traces using a consistent head sampling rate, @jmacd @bogdandrutu and @oertl would like to make a proposal:
In OpenTelemetry we need to know if (and how many) bits from the TraceId are randomly generated. In the open-telemetry/oteps#168 we are proposing to use an "r" value that encodes how many bits are random from the trace-id, but this can be easily encoded using the trace-flags, and this information can be used by other libraries/implementations to do "logs" sampling or any other context specific samplings.
The proposal is to use 1 bit from the TraceFlags that encodes the following information:
Why 63 and not 64?
The main motivation for this is because:
The specification can ensure 64bit but will be overkill and unnecessary.
Related/Simplification: This will resolve/help with a large part of #463
The text was updated successfully, but these errors were encountered: