Set filter_and_project_min_output_page_row_count to 1 for final DistinctLimit #17640

kaikalur · 2022-04-13T21:27:44Z

Addressing issue:

For specific case of DistinctLimit

Test plan - tests already exist

== RELEASE NOTES ==
General Changes
* Added a new optimization for showing results for (interactive) distinct limit N as they become available with no buffering.  This can be enabled using the session param: quick_distinct_limit_enabled to true.

rschlussel · 2022-04-14T16:29:51Z

what's the effect on queries that have lots of distinct values (didn't end up waiting a long time for results before)? Also, if the distinct limit isn't at the end of the query or it writes to a table, so the project isn't flushing straight to output.

kaikalur · 2022-04-14T16:51:23Z

Should be fine still because it's final step and the limit is generally low. Those are corner cases and they have other issues.

As for lot of distinct values, our RR shuffle fixes that part. Note that we are doing this for the FINAL distinct limit.

kaikalur · 2022-04-19T14:08:17Z

what's the effect on queries that have lots of distinct values (didn't end up waiting a long time for results before)? Also, if the distinct limit isn't at the end of the query or it writes to a table, so the project isn't flushing straight to output.

Also the issue we had with was not so much about having too many distinct values - it's about the limit - when the limit is greater than the number of distinct values, it was getting stuck. When the limit is lower than the distinct values, it's fast and that behavior hasn't changed as we simply short circuit the distinct limit.

kaikalur · 2022-04-21T15:26:54Z

Ping - anything else needed?

rschlussel · 2022-04-21T15:55:32Z

what's the effect on queries that have lots of distinct values (didn't end up waiting a long time for results before)? Also, if the distinct limit isn't at the end of the query or it writes to a table, so the project isn't flushing straight to output.

Also the issue we had with was not so much about having too many distinct values - it's about the limit - when the limit is greater than the number of distinct values, it was getting stuck. When the limit is lower than the distinct values, it's fast and that behavior hasn't changed as we simply short circuit the distinct limit.

Yeah, i understand how this fixes the limit issue. My concern is when there are a lot of distinct values (where it would have worked fine before) you can now end up creating lots of pages. Can we gate this with the distinct limit session property in case things go wrong?

kaikalur · 2022-04-21T16:47:12Z

what's the effect on queries that have lots of distinct values (didn't end up waiting a long time for results before)? Also, if the distinct limit isn't at the end of the query or it writes to a table, so the project isn't flushing straight to output.

Also the issue we had with was not so much about having too many distinct values - it's about the limit - when the limit is greater than the number of distinct values, it was getting stuck. When the limit is lower than the distinct values, it's fast and that behavior hasn't changed as we simply short circuit the distinct limit.

Yeah, i understand how this fixes the limit issue. My concern is when there are a lot of distinct values (where it would have worked fine before) you can now end up creating lots of pages. Can we gate this with the distinct limit session property in case things go wrong?

Note that this is completely indepedent of the total number of distinct values. It will have at most N pages if the LIMIT is N as we do this for only the FINAL distinctlimit. But sure I can rename and use the previous flag.

kaikalur · 2022-04-21T17:07:16Z

what's the effect on queries that have lots of distinct values (didn't end up waiting a long time for results before)? Also, if the distinct limit isn't at the end of the query or it writes to a table, so the project isn't flushing straight to output.

Also the issue we had with was not so much about having too many distinct values - it's about the limit - when the limit is greater than the number of distinct values, it was getting stuck. When the limit is lower than the distinct values, it's fast and that behavior hasn't changed as we simply short circuit the distinct limit.

Yeah, i understand how this fixes the limit issue. My concern is when there are a lot of distinct values (where it would have worked fine before) you can now end up creating lots of pages. Can we gate this with the distinct limit session property in case things go wrong?

Note that this is completely indepedent of the total number of distinct values. It will have at most N pages if the LIMIT is N as we do this for only the FINAL distinctlimit. But sure I can rename and use the previous flag.

Done.

I renamed the previous flag we were using for the RR shuffle as quick_distinct_limit_enabled so we can do these two together. Please take a look.

…stinctLimit

Often times, while specifying the joining condition in JOIN ON clause, users make mistakes and not referencing the joining table along side with other table in join condition that result in performance issue due to conditional join resulting in CROSS JOIN. This change will reduce the number of cases when user is shown the PERFORMANCE_WARNING when JOIN ON clause is actually correct. See also: prestodb#17333, prestodb#17640

Often times, while specifying the joining condition in JOIN ON clause, users make mistakes and not referencing the joining table along side with other table in join condition that result in performance issue due to conditional join resulting in CROSS JOIN. This change will reduce the number of cases when user is shown the PERFORMANCE_WARNING when JOIN ON clause is actually correct. See also: #17333, #17640

Often times, while specifying the joining condition in JOIN ON clause, users make mistakes and not referencing the joining table along side with other table in join condition that result in performance issue due to conditional join resulting in CROSS JOIN. This change will reduce the number of cases when user is shown the PERFORMANCE_WARNING when JOIN ON clause is actually correct. See also: prestodb#17333, prestodb#17640

kaikalur requested review from mbasmanova and rschlussel April 13, 2022 21:27

kaikalur force-pushed the fix_distinct_limit branch from 2f04de2 to 166f3e3 Compare April 21, 2022 17:06

rschlussel approved these changes Apr 21, 2022

View reviewed changes

kaikalur force-pushed the fix_distinct_limit branch from 166f3e3 to fa56678 Compare April 21, 2022 17:35

Adjust filter_and_project_min_output_page_row_count to 1 for final Di…

2fe7ca1

…stinctLimit

kaikalur force-pushed the fix_distinct_limit branch from fa56678 to 2fe7ca1 Compare April 21, 2022 17:43

rschlussel merged commit d5290b5 into prestodb:master Apr 25, 2022

rohanpednekar mentioned this pull request May 5, 2022

Default filter_and_project_min_output_page_row_count is too high for some environments #17631

Closed

mshang816 mentioned this pull request May 17, 2022

Add release notes for 0.273 #17775

Merged

14 tasks

rmarduga mentioned this pull request Jul 15, 2022

Fix false positives in PERFORMANCE_WARNING for JOIN ON clause #18041

Merged

kaikalur mentioned this pull request Dec 1, 2022

Remove unused code #18749

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set filter_and_project_min_output_page_row_count to 1 for final DistinctLimit #17640

Set filter_and_project_min_output_page_row_count to 1 for final DistinctLimit #17640

kaikalur commented Apr 13, 2022 •

edited

Loading

rschlussel commented Apr 14, 2022

kaikalur commented Apr 14, 2022 •

edited

Loading

kaikalur commented Apr 19, 2022

kaikalur commented Apr 21, 2022

rschlussel commented Apr 21, 2022

kaikalur commented Apr 21, 2022

kaikalur commented Apr 21, 2022

Set filter_and_project_min_output_page_row_count to 1 for final DistinctLimit #17640

Set filter_and_project_min_output_page_row_count to 1 for final DistinctLimit #17640

Conversation

kaikalur commented Apr 13, 2022 • edited Loading

rschlussel commented Apr 14, 2022

kaikalur commented Apr 14, 2022 • edited Loading

kaikalur commented Apr 19, 2022

kaikalur commented Apr 21, 2022

rschlussel commented Apr 21, 2022

kaikalur commented Apr 21, 2022

kaikalur commented Apr 21, 2022

kaikalur commented Apr 13, 2022 •

edited

Loading

kaikalur commented Apr 14, 2022 •

edited

Loading