[OTEL-2348] Improve DDSketch to Sketch conversion #464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR:
DDSketch
toSketch
(pkg/quantile/ddsketch.go
)pkg/otlp/metrics/testdata
to reflect the new algorithm's outputSketch.Quantile
(pkg/quantile/sparse.go
)pkg/quantile/ddsketch.go
)pkg/quantile/ddsketch_test.go
)The changes themselves are pretty small, but somewhat technical, so if this is too much to review, I can split this PR into smaller ones.
Motivation for changes 1 and 2
I recently noticed that some tests were failing on
main
, but only when running locally on my M3 Macbook. Here are the test logs (after renaming the test cases to more accurate names):These tests compute the percentiles of a distribution before and after the DDSketch to Sketch conversion, and check that the relative error introduced is below some bound.
After investigating, this turned out to be because the rounding algorithm currently used to turn float-valued DDSketches into integer-valued Sketches is somewhat numerically unstable. The algorithm rounds bin counts down to an integer, but keeps track of the accumulated rounding error, and adds 1 extra unit to the current bin once it reaches 1.0.
Unfortunately, because of floating point errors, this accumulated error often ends up reaching 0.9999... in cases where it should be 1.0, which delays the addition of the extra unit until the next bin with a non-integer count, which can end up being far away from the logical range the extra unit should have been assigned to. Depending on the distribution, this transfer of a single unit between bins can end up changing the result of the quantile calculations.
The test already skipped the comparison for certain quantiles, presumably because of this issue. However, because of differences in floating-point calculations between x86 and ARM CPUs, this issue is triggered in different cases on the different architectures, hence the platform-specific failures.
In the first case, P99 shifted from a very small negative value to exactly zero, which is a small absolute error, but the relative error is mathematically undefined and defaults to 1. In the second case, P99 shifted from an accurate ~0.0025 to ~8.0, which is close to the P100 (ie. max) of the (very long-tailed) distribution, inducing a 300 000% relative error.
To solve this issue at the source, I modified the rounding algorithm so that it inserts the extra units determined from accumulated rounding error when said error reaches 0.5, instead of 1.0. This is done differently from the original algorithm, by tracking the sum of the input float counts and the output integer counts, and using
math.Round
to compute the difference. This adds the extra unit somewhere in the middle of its logical range of bins, instead of at/past the end, and should not present the same numerical instability on the computed quantiles, at least in the common case where the input DDSketch has integer counts. (When the DDSketch has fractional counts, we can still potentially see large shifts, please see the long comment in the code for an example.)Switching to this algorithm fixes the failing tests, and also allows us to remove the previous exceptions that were made in the test. However, this changes the exact output of the algorithm in a lot of cases, so I had to update the test files in
pkg/otlp/metrics/testdata
for theTestExponentialHistogramTranslatorOptions
test, which expect the output bin counts to have very precise values.Note that the test failure is very brittle, such that even just slightly changing the number of points added to the DDSketch under test would have actually fixed it, so the passing tests are not a good indicator of the new algorithm being more accurate or stable. But I hope my theoretical explanation above is enough to convince that it is at least somewhat more stable.
Motivation for changes 3 and 4
As part of my investigation, I noticed a few parts of the calculation that seemed incorrect.
One of them is in
Sketch.Quantile()
, which, after finding the bin containing the wanted quantile, performs linear interpolation between its bounds to further improve the estimation. However, the bounds are flipped for negative bins, giving results that are incorrect, although still in the correct bin, so not very far off.The second one is in
createDDSketchWithSketchMapping
, which remaps the original DDSketch to one with the same bins as the target Sketch. Despite the bin boundary calculation being the same between the two data structures, this code introduces a+ 0.5
bias in the calculation, resulting in all bin boundaries being off by about half a bin.I confirmed experimentally that both of these issues introduce unnecessary bias to the results.
There is a comment in the original code justifying the
+ 0.5
modification, but I do not understand its reasoning. I am interested if someone has an explanation.Motivation for change 5
Another piece of code that seemed incorrect is the calculation of the tolerated bound on the relative error on the quantiles introduced by the conversion, in the testing code. (Link to the current code)
A comment provides the link to the other codebase this calculation was taken from, but that code seems to be calculating a bound on the error introduced by the process of merging two DDSketches, which has little relation to the error introduced by the rounding algorithm. (By the way, the Javadoc referenced in that code is available at this link.)
I replaced this calculation with a new one, with a long comment explaining the reasoning behind it. Ironically, the result is not very different (old bound: ~0.02325 / new bound: ~0.02356), which is why this change does not really affect the test results.