-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](agg) Support spill to disk in aggregation #18051
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
clang-tidy review says "All clean, LGTM! 👍" |
dcb58dd
to
43d7271
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
TeamCity pipeline, clickbench performance test result: |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
@@ -61,6 +61,7 @@ class DataTypeFixedLengthObject final : public IDataType { | |||
|
|||
bool is_categorial() const override { return is_value_represented_by_integer(); } | |||
bool can_be_inside_low_cardinality() const override { return false; } | |||
bool can_be_inside_nullable() const override { return true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is useless?
@@ -227,6 +227,8 @@ class ColumnFixedLengthObject final : public COWHelper<IColumn, ColumnFixedLengt | |||
LOG(FATAL) << "replace_column_data_default not supported"; | |||
} | |||
|
|||
bool can_be_inside_nullable() const override { return true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is useless?
@@ -414,8 +419,8 @@ Status AggregationNode::prepare_profile(RuntimeState* state) { | |||
_align_aggregate_states) * | |||
_align_aggregate_states)); | |||
if constexpr (HashTableTraits<HashTableType>::is_partitioned_table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just revert two level hash map pr in agg node?
return Status::OK(); | ||
} | ||
|
||
Status AggregationNode::_spill_to_disk(bool eos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to _try_spill_disk
@@ -1127,6 +1204,205 @@ Status AggregationNode::_pre_agg_with_serialized_key(doris::vectorized::Block* i | |||
return Status::OK(); | |||
} | |||
|
|||
struct PartitionHelper { | |||
static constexpr size_t PartitionCountBits = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set it to session variable, because if the query is very large, 16 sub hashtables will also consume a lot of memory. Maybe spill it to 256 sub hash tables for some query.
for (size_t i = 0; i < PartitionHelper::PartitionCount; ++i) { | ||
Block block_to_write = block.clone_empty(); | ||
if (blocks_rows[i] == 0) { | ||
writer->write(block_to_write); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why write an empty block?
fcb636e
to
c36f2b1
Compare
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
2783186
to
6c83624
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Proposed changes
When the aggregation node consumes too much memory, we should consider spilling the hash table and aggregation data to disk.
Problem summary
Describe your changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...