Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of T-digest functions #5391

Merged
merged 1 commit into from
Oct 4, 2020

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Oct 2, 2020

Use TDigest.valuesAt() method to compute values at multiple quantiles
in a single pass of the T-Digest structure instead of multiple
calls to TDigest.valueAt() method.
Applies to:

  • approx_percentile() aggregation,
  • values_at_quantiles() function.
    NOTE: In function values_at_quantiles(tdigest, array),
    it is required that the input percentile array be sorted in ascending order.
    The similar function values_at_quantiles(qdigest, array) allows
    arbitrary order of percentiles.
    In approx_percentile() aggregation ordering of percentiles is not required.

depends on: #5158

@cla-bot cla-bot bot added the cla-signed label Oct 2, 2020
@kasiafi
Copy link
Member Author

kasiafi commented Oct 2, 2020

Sorting percentiles list is done as suggested here: #5158 (comment)
This is more efficient than the previous approach which used Ordering.natural().sortedCopy() to sort the percentiles list and a HashMap to reorder back the results.

@kasiafi kasiafi force-pushed the 134SinglePass branch 5 times, most recently from dcc633d to 850336b Compare October 3, 2020 22:21
Use `TDigest.valuesAt()` method to compute values at multiple quantiles
in a single pass of the T-Digest structure instead of multiple
calls to `TDigest.valueAt()` method.
Applies to:
- approx_percentile() aggregation,
- values_at_quantiles() function.
NOTE: In function values_at_quantiles(tdigest, array<double>),
it is required that the input percentile array be sorted in ascending order.
The similar function values_at_quantiles(qdigest, array<double>) allows
arbitrary order of percentiles.
In approx_percentile() aggregation ordering of percentiles is not required.
@kasiafi
Copy link
Member Author

kasiafi commented Oct 4, 2020

Rebased.

@martint martint merged commit 0d41a63 into trinodb:master Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants