Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](agg) Support spill to disk in aggregation #18051

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Mar 23, 2023

Proposed changes

When the aggregation node consumes too much memory, we should consider spilling the hash table and aggregation data to disk.

Problem summary

Describe your changes.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg mrhhsg force-pushed the agg_spill branch 2 times, most recently from dcb58dd to 43d7271 Compare March 23, 2023 09:59
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 24, 2023

run buildall

@hello-stephen
Copy link
Contributor

hello-stephen commented Mar 24, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 33.83 seconds
stream load tsv: 425 seconds loaded 74807831229 Bytes, about 167 MB/s
stream load json: 23 seconds loaded 2358488459 Bytes, about 97 MB/s
stream load orc: 59 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230419173420_clickbench_pr_131276.html

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 25, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 1, 2023

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@@ -61,6 +61,7 @@ class DataTypeFixedLengthObject final : public IDataType {

bool is_categorial() const override { return is_value_represented_by_integer(); }
bool can_be_inside_low_cardinality() const override { return false; }
bool can_be_inside_nullable() const override { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is useless?

@@ -227,6 +227,8 @@ class ColumnFixedLengthObject final : public COWHelper<IColumn, ColumnFixedLengt
LOG(FATAL) << "replace_column_data_default not supported";
}

bool can_be_inside_nullable() const override { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is useless?

@@ -414,8 +419,8 @@ Status AggregationNode::prepare_profile(RuntimeState* state) {
_align_aggregate_states) *
_align_aggregate_states));
if constexpr (HashTableTraits<HashTableType>::is_partitioned_table) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just revert two level hash map pr in agg node?

return Status::OK();
}

Status AggregationNode::_spill_to_disk(bool eos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to _try_spill_disk

@@ -1127,6 +1204,205 @@ Status AggregationNode::_pre_agg_with_serialized_key(doris::vectorized::Block* i
return Status::OK();
}

struct PartitionHelper {
static constexpr size_t PartitionCountBits = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set it to session variable, because if the query is very large, 16 sub hashtables will also consume a lot of memory. Maybe spill it to 256 sub hash tables for some query.

for (size_t i = 0; i < PartitionHelper::PartitionCount; ++i) {
Block block_to_write = block.clone_empty();
if (blocks_rows[i] == 0) {
writer->write(block_to_write);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why write an empty block?

@mrhhsg mrhhsg force-pushed the agg_spill branch 2 times, most recently from fcb636e to c36f2b1 Compare April 17, 2023 14:41
@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 17, 2023

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 18, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg mrhhsg force-pushed the agg_spill branch 2 times, most recently from 2783186 to 6c83624 Compare April 18, 2023 10:24
@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 18, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

yiguolei
yiguolei previously approved these changes Apr 19, 2023
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 19, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 19, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@yiguolei
Copy link
Contributor

run buildall

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 19, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 19, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit c4e469c into apache:master Apr 20, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
Reminiscent pushed a commit to Reminiscent/doris that referenced this pull request May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants