Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spilling support for aggregations with sorting/ordering. #7455

Closed
spershin opened this issue Nov 7, 2023 · 2 comments
Closed

Add spilling support for aggregations with sorting/ordering. #7455

spershin opened this issue Nov 7, 2023 · 2 comments
Assignees
Labels
aggregates enhancement New feature or request

Comments

@spershin
Copy link
Contributor

spershin commented Nov 7, 2023

Description

Currently we don't support spilling if aggregation nodes has aggregations with 'sorting/ordering' like this:

SELECT count(c0 ORDER BY c2) FROM tmp GROUP BY c1;

Need to add the support if we see queries with this are breaching memory limits.

@spershin spershin added enhancement New feature or request aggregates labels Nov 7, 2023
@mbasmanova
Copy link
Contributor

Sorted aggregations need to accumulate all input rows, then sort these within each group, then compute aggregations over sorted rows. For spilling purposes, we can Accumulator::extractFunction that returns an array of structs that represents all the input rows for a group.

@mbasmanova
Copy link
Contributor

We can serialize rows as strings (VARBINARY) to save conversion to/from columnar format.

mbasmanova added a commit to mbasmanova/velox-1 that referenced this issue Nov 13, 2023
Summary:
Add spillType field to Accumulator struct and use it to generate RowType for
spilling.

This allows to provide spilling support for Accumulators that do not correspond
to aggregate functions, i.e. SortedAggregations.

Part of facebookincubator#7455


Reviewed By: xiaoxmeng

Differential Revision: D51230793

Pulled By: mbasmanova
facebook-github-bot pushed a commit that referenced this issue Nov 13, 2023
…7519)

Summary:
Add APIs to RowContainer to extract rows in serialized format. Will be used in
spilling, initially in spilling of aggregation over sorted inputs.

Part of #7455

Pull Request resolved: #7519

Reviewed By: xiaoxmeng

Differential Revision: D51213589

Pulled By: mbasmanova

fbshipit-source-id: 6b0d5fc03b7bb301ae229af509143d2a1c14ab55
facebook-github-bot pushed a commit that referenced this issue Nov 13, 2023
Summary:
Add spillType field to Accumulator struct and use it to generate RowType for
spilling.

This allows to provide spilling support for Accumulators that do not correspond
to aggregate functions, i.e. SortedAggregations.

Part of #7455

Pull Request resolved: #7525

Reviewed By: xiaoxmeng

Differential Revision: D51230793

Pulled By: mbasmanova

fbshipit-source-id: 38b1a2a389e96bc90e2d3291dc15b9b2fef191d5
@mbasmanova mbasmanova self-assigned this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregates enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants