-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Use a specialized sum accumulator for retractable aggregregates #6888
Conversation
self.data_type.clone(), | ||
self.nullable, | ||
), | ||
Field::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the PR is to remove this "count" field
@@ -146,7 +139,7 @@ impl AggregateExpr for Sum { | |||
} | |||
|
|||
fn create_sliding_accumulator(&self) -> Result<Box<dyn Accumulator>> { | |||
Ok(Box::new(SumAccumulator::try_new(&self.data_type)?)) | |||
Ok(Box::new(SlidingSumAccumulator::try_new(&self.data_type)?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the same pattern as Sliding{MinMax}Accumulator
https://github.com/apache/arrow-datafusion/blob/main/datafusion/physical-expr/src/aggregate/min_max.rs#L664
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
FYI @mustafasrepo |
Which issue does this PR close?
Closes #6878
Rationale for this change
I noticed this while working on #4973
The
SUM
accumulator is general (it can handle sliding windows) but to be general it carries through an extra count for each group in the "state" -- this count is ignored. For low cardinality groups this likely isn't a problem but for large cardinality groups creating and carrying through the counts takes a measurable amount of time (I saw it in #6800)What changes are included in this PR?
Use a specialized sum accumulator for retractable windows and only track the counts there. For non
Are these changes tested?
Existing coverage
Performance (the bigger changes like
q15 have a
sum` in them, so the change is plausible). The smaller changes don't have any sums which makes me suspiciousAre there any user-facing changes?
a little faster performance. I really expect this to help the code in #6800 so I can simplify the sum accumulator