Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Exchange before GroupId to improve Partial Aggregation #24047

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aaneja
Copy link
Contributor

@aaneja aaneja commented Nov 14, 2024

See #23475 for more details

Previously closed PR - #11741

Description

Motivation and Context

See Javadoc of the new AddExchangesBelowPartialAggregationOverGroupIdRuleSet

Impact

Better performance for TPCDS Q22, Q67
See plan diffs (TPCDS SF 1000, unpartitioned) - https://aaneja.github.io/mypages/PR_24047_AddExchangesBelowPartialAggregationOverGroupId_OffVsOn.html

Test Plan

TODO : Add a new planner test

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Added a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId . Enabled with the boolean session property `enable_forced_exchange_below_group_id`


@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Nov 14, 2024
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 95fb49c to 62aaab6 Compare November 14, 2024 12:13
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Minor formatting nits, and include the PR number.

== RELEASE NOTES ==

General Changes
* Add a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId. Enabled with the boolean session property ``enable_forced_exchange_below_group_id``. :pr:`24047`

@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 62aaab6 to fa61dfd Compare November 18, 2024 12:58
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from fa61dfd to 39222b3 Compare January 9, 2025 17:03
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch 2 times, most recently from 86b145a to 5868a2f Compare January 21, 2025 14:34
@aaneja aaneja marked this pull request as ready for review January 21, 2025 14:35
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level first pass. Seems good for the most part. I will take another pass and look at the details of the rule tomorrow.

aaneja added a commit to aaneja/mypages that referenced this pull request Jan 24, 2025
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 5868a2f to 437a09a Compare January 24, 2025 05:16
return false;
}

return isEnabledAddExchangeBelowGroupId(session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should have 3 possible values - ALWAYS, COST_BASED, and NEVER (similar to partial aggregation pushdown). that way someone can enable this if they don't have stats or if the stats estimates are no good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we re-partition on if ALWAYS is chosen (for the non-trivial case of more than one partition variable) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. i guess we can leave as is for now, unless we want to has on all of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of leaving this as-is until a use case arises for ALWAYS

}

@Test
public void testAddExchangesWithoutProjection()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about a withProjection test. Also a test that it doesn't fire if it's disabled, only has one grouping set, has pass through keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'only has one grouping set' , added with this test

withProjection, does not fire if disabled -> will add

only has one grouping set, has pass through keys -> I could not build a use-case where this occurs. My understanding of when this could occur is unclear. Can you help me out with an example ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i didn't mean one grouping set with pass through keys, i meant the case covered by https://github.com/prestodb/presto/pull/24047/files#diff-2e788a27c31ea3e4d5d404dba18eee33a3868d656bae3b0c0380269afb838f24, so multiple grouping sets that all share some common keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlussel I think I've covered all the test cases now. Please take a look

@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 437a09a to 979d204 Compare February 3, 2025 12:47
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Minor formatting nits, and include the PR number.

== RELEASE NOTES ==

General Changes
* Add a new optimizer rule to add exchanges below a combination of partial aggregation+ GroupId. Enabled with the boolean session property ``enable_forced_exchange_below_group_id``. 

The minor formatting nits should still apply, but new release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@aaneja
Copy link
Contributor Author

aaneja commented Feb 4, 2025

@steveburnett Thanks for the feedback! We're still working through some naming+defaults for the new session/feature flags. I will update the release notes + optimizer docs once we close on this

@aaneja aaneja force-pushed the groupIdExchangeOptimization branch 2 times, most recently from 0874d82 to f95fab9 Compare February 4, 2025 06:13
ZacBlanco
ZacBlanco previously approved these changes Feb 4, 2025
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. one minor thing

@aaneja aaneja force-pushed the groupIdExchangeOptimization branch 4 times, most recently from e196f21 to 74178b4 Compare February 7, 2025 13:21
@@ -81,7 +81,9 @@ public GroupIdNode(
{
super(sourceLocation, id, statsEquivalentPlanNode);
this.source = requireNonNull(source);
this.groupingSets = listOfListsCopy(requireNonNull(groupingSets, "groupingSets is null"));
checkArgument(requireNonNull(groupingSets, "groupingSets is null").size() > 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we only create a GroupId node iff the size of the grouping sets is >1. So adding this invariant here, made sense. No tests failed, so IMO this should be added cc: @rschlussel

Based on: trinodb/trino@dc1d66fb
co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Based on : trinodb/trino@c573b34
co-authored-by: Lukasz Stec <lukasz.stec@starburstdata.com>
Based on: trinodb/trino@29328d3
co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
@aaneja aaneja force-pushed the groupIdExchangeOptimization branch from 74178b4 to 84cafec Compare February 7, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants