Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant distinct over group by #18512

Merged
merged 1 commit into from
Oct 21, 2022

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Oct 18, 2022

What's the change?

Add an optimization which remove distinct if the corresponding output is already distinct after a group by operation.

An example is query "SELECT DISTINCT orderpriority, SUM(totalprice) FROM orders GROUP BY orderpriority", where the distinct operation is redundant.

Test plan - (Please fill in how you tested your changes)

Add unit test.

Benchmark results

Sql query: select distinct orderkey, partkey, suppkey, avg(extendedprice) from lineitem group by orderkey, partkey, suppkey.
Control: 100.324 cpu ms
Test: 71.382 cpu ms

INFO: Without optimization
peak_memory:14040686,elapsed_millis:107,input_rows_per_second:563343,output_rows_per_second:563222,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:106817513,cpu_nanos:106122000,user_nanos:105276000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:121,input_rows_per_second:496354,output_rows_per_second:496247,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:121233823,cpu_nanos:116225000,user_nanos:112490000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:104,input_rows_per_second:577529,output_rows_per_second:577405,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:104193760,cpu_nanos:103269000,user_nanos:102787000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:99,input_rows_per_second:608323,output_rows_per_second:608191,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:98919466,cpu_nanos:98592000,user_nanos:98192000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:103,input_rows_per_second:586097,output_rows_per_second:585970,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:102670618,cpu_nanos:98834000,user_nanos:98230000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:226,input_rows_per_second:265874,output_rows_per_second:265816,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:226328984,cpu_nanos:101054000,user_nanos:100322000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:99,input_rows_per_second:608878,output_rows_per_second:608746,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:98829296,cpu_nanos:97290000,user_nanos:96909000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:97,input_rows_per_second:618267,output_rows_per_second:618134,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:97328389,cpu_nanos:97041000,user_nanos:96738000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:94,input_rows_per_second:639660,output_rows_per_second:639522,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:94073333,cpu_nanos:94008000,user_nanos:93760000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:91,input_rows_per_second:660615,output_rows_per_second:660472,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:91089298,cpu_nanos:90805000,user_nanos:90014000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
remove_redundant_distinct_aggregation ::  100.324 cpu ms :: 13.4MB peak memory :: in 60.2K,      0B,    600K/s,      0B/s :: out 60.2K,  2.07MB,    600K/s,  20.6MB/s

INFO: With optimization
peak_memory:7632508,elapsed_millis:74,input_rows_per_second:812957,output_rows_per_second:812781,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:74019902,cpu_nanos:69991000,user_nanos:69738000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:70,input_rows_per_second:853687,output_rows_per_second:853503,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:70488303,cpu_nanos:69767000,user_nanos:69512000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:70,input_rows_per_second:856021,output_rows_per_second:855836,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:70296139,cpu_nanos:69947000,user_nanos:69751000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:72,input_rows_per_second:839667,output_rows_per_second:839485,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:71665290,cpu_nanos:71003000,user_nanos:70721000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:69,input_rows_per_second:877072,output_rows_per_second:876882,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:68608956,cpu_nanos:68328000,user_nanos:67990000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:67,input_rows_per_second:894244,output_rows_per_second:894050,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:67291472,cpu_nanos:67286000,user_nanos:67140000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:68,input_rows_per_second:887679,output_rows_per_second:887487,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:67789139,cpu_nanos:67608000,user_nanos:67258000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:77,input_rows_per_second:783050,output_rows_per_second:782881,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:76846899,cpu_nanos:76142000,user_nanos:74259000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:85,input_rows_per_second:710566,output_rows_per_second:710412,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:84686002,cpu_nanos:80436000,user_nanos:76732000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:74,input_rows_per_second:815461,output_rows_per_second:815285,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:73792574,cpu_nanos:73309000,user_nanos:72923000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
remove_redundant_distinct_aggregation ::   71.382 cpu ms :: 7.28MB peak memory :: in 60.2K,      0B,    843K/s,      0B/s :: out 60.2K,  2.07MB,    843K/s,  28.9MB/s
== RELEASE NOTES ==

General Changes
* Add an optimization which removes redundant distinct if the output is already distinct after a group by operation.
   The optimization is controlled by session property `remove_redundant_distinct_aggregation` which is default to false.

@feilong-liu feilong-liu requested a review from a team as a code owner October 18, 2022 05:02
@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch 2 times, most recently from ab20709 to e20e813 Compare October 18, 2022 05:24
@feilong-liu feilong-liu requested a review from kaikalur October 18, 2022 05:27
@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch from e20e813 to bcb24f9 Compare October 18, 2022 05:56
@@ -238,6 +238,7 @@
private String nativeExecutionExecutablePath = "./presto_server";
private boolean randomizeOuterJoinNullKey;
private boolean isOptimizeConditionalAggregationEnabled;
private boolean isRemoveRedundantDistinctAggregationEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sest default to true

@kaikalur kaikalur requested a review from rschlussel October 18, 2022 14:49
@kaikalur
Copy link
Contributor

What's the change?

Add an optimization which remove distinct if the corresponding output is already distinct after a group by operation.

An example is query "SELECT DISTINCT orderpriority, SUM(totalprice) FROM orders GROUP BY orderpriority", where the distinct operation is redundant.

Test plan - (Please fill in how you tested your changes)

Add unit test.

Benchmark results

Sql query: select distinct orderkey, partkey, suppkey, avg(extendedprice) from lineitem group by orderkey, partkey, suppkey. Control: 100.324 cpu ms Test: 71.382 cpu ms

INFO: Without optimization
peak_memory:14040686,elapsed_millis:107,input_rows_per_second:563343,output_rows_per_second:563222,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:106817513,cpu_nanos:106122000,user_nanos:105276000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:121,input_rows_per_second:496354,output_rows_per_second:496247,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:121233823,cpu_nanos:116225000,user_nanos:112490000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:104,input_rows_per_second:577529,output_rows_per_second:577405,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:104193760,cpu_nanos:103269000,user_nanos:102787000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:99,input_rows_per_second:608323,output_rows_per_second:608191,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:98919466,cpu_nanos:98592000,user_nanos:98192000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:103,input_rows_per_second:586097,output_rows_per_second:585970,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:102670618,cpu_nanos:98834000,user_nanos:98230000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:226,input_rows_per_second:265874,output_rows_per_second:265816,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:226328984,cpu_nanos:101054000,user_nanos:100322000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:99,input_rows_per_second:608878,output_rows_per_second:608746,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:98829296,cpu_nanos:97290000,user_nanos:96909000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:97,input_rows_per_second:618267,output_rows_per_second:618134,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:97328389,cpu_nanos:97041000,user_nanos:96738000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:94,input_rows_per_second:639660,output_rows_per_second:639522,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:94073333,cpu_nanos:94008000,user_nanos:93760000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:14040686,elapsed_millis:91,input_rows_per_second:660615,output_rows_per_second:660472,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:91089298,cpu_nanos:90805000,user_nanos:90014000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
remove_redundant_distinct_aggregation ::  100.324 cpu ms :: 13.4MB peak memory :: in 60.2K,      0B,    600K/s,      0B/s :: out 60.2K,  2.07MB,    600K/s,  20.6MB/s

INFO: With optimization
peak_memory:7632508,elapsed_millis:74,input_rows_per_second:812957,output_rows_per_second:812781,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:74019902,cpu_nanos:69991000,user_nanos:69738000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:70,input_rows_per_second:853687,output_rows_per_second:853503,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:70488303,cpu_nanos:69767000,user_nanos:69512000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:70,input_rows_per_second:856021,output_rows_per_second:855836,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:70296139,cpu_nanos:69947000,user_nanos:69751000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:72,input_rows_per_second:839667,output_rows_per_second:839485,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:71665290,cpu_nanos:71003000,user_nanos:70721000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:69,input_rows_per_second:877072,output_rows_per_second:876882,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:68608956,cpu_nanos:68328000,user_nanos:67990000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:67,input_rows_per_second:894244,output_rows_per_second:894050,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:67291472,cpu_nanos:67286000,user_nanos:67140000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:68,input_rows_per_second:887679,output_rows_per_second:887487,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:67789139,cpu_nanos:67608000,user_nanos:67258000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:77,input_rows_per_second:783050,output_rows_per_second:782881,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:76846899,cpu_nanos:76142000,user_nanos:74259000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:85,input_rows_per_second:710566,output_rows_per_second:710412,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:84686002,cpu_nanos:80436000,user_nanos:76732000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
peak_memory:7632508,elapsed_millis:74,input_rows_per_second:815461,output_rows_per_second:815285,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:73792574,cpu_nanos:73309000,user_nanos:72923000,input_rows:60175,input_bytes:0,output_rows:60162,output_bytes:2165832
remove_redundant_distinct_aggregation ::   71.382 cpu ms :: 7.28MB peak memory :: in 60.2K,      0B,    843K/s,      0B/s :: out 60.2K,  2.07MB,    843K/s,  28.9MB/s
== RELEASE NOTES ==

General Changes
* Add an optimization which removes redundant distinct if the output is already distinct after a group by operation.
   The optimization is controlled by session property `remove_redundant_distinct_aggregation` which is default to false.

Let's make the default to true as this is a safe and general optimization.

@ClarenceThreepwood
Copy link
Contributor

Is this functionally different from the existing rule RemoveRedundantDistinct?

@kaikalur
Copy link
Contributor

Is this functionally different from the existing rule RemoveRedundantDistinct?

Interesting - I thought that's coming from uniqueness constraints. But this is using the planproperties. If/when we unify these two concepts we can get rid of one of them.

@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch from bcb24f9 to bec5eea Compare October 18, 2022 19:32
@kaikalur kaikalur requested a review from highker October 18, 2022 21:02
@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch from bec5eea to a080ac3 Compare October 18, 2022 21:26
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase as well

Comment on lines 454 to 456
new PruneRedundantProjectionAssignments(),
new InlineProjections(metadata.getFunctionAndTypeManager()),
new RemoveRedundantIdentityProjections())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same again... do we really need to run this again? Or we can just merge RemoveRedundantDistinctAggregation into the above rule?

Always keep in mind that running optimizer is costly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this optimizer rule and get rid of these additional projection rules.

@@ -238,6 +238,7 @@
private String nativeExecutionExecutablePath = "./presto_server";
private boolean randomizeOuterJoinNullKey;
private boolean isOptimizeConditionalAggregationEnabled;
private boolean isRemoveRedundantDistinctAggregationEnabled = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we want to enable it by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we want to enable it by default?

Yes this is a very general optimization that should always help and we are being very conservative and also adding more tests for making sure it works for different patterns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry is that if there is a bug in the code and it would big pain to do fixes in prod. If we have high confidence with full correctness in verifiers, I'm also ok either way.

Copy link
Contributor

@kaikalur kaikalur Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry is that if there is a bug in the code and it would big pain to do fixes in prod. If we have high confidence with full correctness in verifiers, I'm also ok either way.

Yes - we will do some targeted verifier runs as the pattern is relatively easy to look for in the logs.

Main issue is if we add something turned off we rarely turn it on - e.g optimize_nulls_in_join has been there for ~2 years but we never turned it on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - we will do some targeted verifier runs as the pattern is relatively easy to look for in the logs.

Yeah, I will run verifier test and report here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get about 40 queries which trigger this optimization, with most queries showing about 20% reduction in cpu time.

}
}

private class Rewriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@highker highker self-assigned this Oct 19, 2022
@kaikalur
Copy link
Contributor

@feilong-liu like we discussed yesterday, add more tests to the harness:

select distinct x, random() from (.. group by x)
select distinct x from (... group by x, y)
select distinct x+1 as x from (... group by x)
select distinct x from (.. group by x) AS T1 join T2 using(x)
select distinct x from (.. group by x) AS T1 join T2 using(y)

@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch from a080ac3 to 463c909 Compare October 20, 2022 05:31
@highker
Copy link
Contributor

highker commented Oct 20, 2022

Can we rebase and push?

@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch 5 times, most recently from 4fbf590 to 2cc71b9 Compare October 20, 2022 20:35
@simmend
Copy link
Contributor

simmend commented Oct 20, 2022

Excuse me but this problem is already handled in a very general way by #16416. If you there are specific use cases that are not covered by that implementation or if you have difficulty understanding it please shoot me an email at dave@ahana.io and I will be happy to discuss. Please close this PR in the meantime. Thanks.

@simmend
Copy link
Contributor

simmend commented Oct 20, 2022

Quick addendum to the last comment. The work of #16416 is disabled by default (it shouldn't be but that is the only way that could get reviewers to approve). If you enable it you will likely see that the existing rule RemoveRedundantDistinct will have already done the job. In fact there is another rule RemoveRedundantAggregateDistinct that will remove distinct specification from aggregate functions if based on preexisting keys or provable max cardinality of 1. I would be thrilled if you would extend this implementation to cover any additional use cases. Again, available to discuss dave@ahana.io

Remove distinct if the corresponding output is already distinct after a
group by operation.
@feilong-liu feilong-liu force-pushed the remove_distinct_over_group_by branch from 2cc71b9 to bd1aec4 Compare October 21, 2022 18:35
@highker highker removed the request for review from rschlussel October 21, 2022 18:35
@highker highker removed their assignment Oct 21, 2022
@kaikalur
Copy link
Contributor

Quick addendum to the last comment. The work of #16416 is disabled by default (it shouldn't be but that is the only way that could get reviewers to approve). If you enable it you will likely see that the existing rule RemoveRedundantDistinct will have already done the job. In fact there is another rule RemoveRedundantAggregateDistinct that will remove distinct specification from aggregate functions if based on preexisting keys or provable max cardinality of 1. I would be thrilled if you would extend this implementation to cover any additional use cases. Again, available to discuss dave@ahana.io

Yes we should definitely look into integrating these properties but we currently don't have bandwidth to test it as the constraints PR is a rather big one that touches a lot of parts (also the reason to merge it disabled by default). So for now, we will get this PR in and look into unifying these things later - tracking issue: #18547

@kaikalur kaikalur merged commit 87f543a into prestodb:master Oct 21, 2022
@wanglinsong wanglinsong mentioned this pull request Jan 12, 2023
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants