Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust DateHistogram's bucket accounting to be iteratively #101012

Conversation

martijnvg
Copy link
Member

Adjust DateHistogram's consumeBucketsAndMaybeBreak to be iteratively during reduce instead accounting all buckets at the end of the reduce.

In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.

Note that for empty buckets accounting of the number of buckets already happens iteratively.

…during reduce instead accounting all buckets at the end of the reduce.

In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.

Note that for empty buckets accounting of the number of buckets already happens iteratively.
@martijnvg martijnvg marked this pull request as ready for review October 18, 2023 06:58
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 18, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should only call #consumeBucketsAndMaybeBreak for final reduce or we can rely that for partial reduce it should be a NOOP. Otherwise it is a good change. I want to hear your thought and I will approve it.

@@ -323,6 +324,10 @@ protected boolean lessThan(IteratorAndCurrent<Bucket> a, IteratorAndCurrent<Buck
// the key changes, reduce what we already buffered and reset the buffer for current buckets
final Bucket reduced = reduceBucket(currentBuckets, reduceContext);
if (reduced.getDocCount() >= minDocCount || reduceContext.isFinalReduce() == false) {
if (consumeBucketCount++ >= REPORT_EMPTY_EVERY) {
reduceContext.consumeBucketsAndMaybeBreak(consumeBucketCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only do this is is final reduce?

@@ -344,10 +349,14 @@ protected boolean lessThan(IteratorAndCurrent<Bucket> a, IteratorAndCurrent<Buck
final Bucket reduced = reduceBucket(currentBuckets, reduceContext);
if (reduced.getDocCount() >= minDocCount || reduceContext.isFinalReduce() == false) {
reducedBuckets.add(reduced);
if (consumeBucketCount++ >= REPORT_EMPTY_EVERY) {
reduceContext.consumeBucketsAndMaybeBreak(consumeBucketCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only do this is is final reduce?

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look in other places and rely in the operation to be a noop for partial reduce.

@martijnvg
Copy link
Member Author

I look in other places and rely in the operation to be a noop for partial reduce.

Right in case of ForPartial#consumeBucketCountAndMaybeBreak(...) this is a noop.

@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/part-3

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.11

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Oct 18, 2023
…01012)

Adjust DateHistogram's consumeBucketsAndMaybeBreak to be iteratively during reduce instead accounting all buckets at the end of the reduce.

In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.

Note that for empty buckets accounting of the number of buckets already happens iteratively.
elasticsearchmachine pushed a commit that referenced this pull request Oct 18, 2023
…101056)

Adjust DateHistogram's consumeBucketsAndMaybeBreak to be iteratively during reduce instead accounting all buckets at the end of the reduce.

In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.

Note that for empty buckets accounting of the number of buckets already happens iteratively.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Nov 14, 2023
Adjust Histogram's consumeBucketsAndMaybeBreak to be iteratively during reduce instead accounting all buckets at the end of the reduce.

In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.

Note that for empty buckets accounting of the number of buckets already happens iteratively.

Similar to elastic#101012
martijnvg added a commit that referenced this pull request Nov 15, 2023
Adjust Histogram's consumeBucketsAndMaybeBreak to be iteratively during reduce instead accounting all buckets at the end of the reduce.

In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.

Note that for empty buckets accounting of the number of buckets already happens iteratively.

Similar to #101012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.11.1 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants