-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set filter_and_project_min_output_page_row_count to 1 for final DistinctLimit #17640
Conversation
what's the effect on queries that have lots of distinct values (didn't end up waiting a long time for results before)? Also, if the distinct limit isn't at the end of the query or it writes to a table, so the project isn't flushing straight to output. |
Should be fine still because it's final step and the limit is generally low. Those are corner cases and they have other issues. As for lot of distinct values, our RR shuffle fixes that part. Note that we are doing this for the FINAL distinct limit. |
Also the issue we had with was not so much about having too many distinct values - it's about the limit - when the limit is greater than the number of distinct values, it was getting stuck. When the limit is lower than the distinct values, it's fast and that behavior hasn't changed as we simply short circuit the distinct limit. |
Ping - anything else needed? |
Yeah, i understand how this fixes the limit issue. My concern is when there are a lot of distinct values (where it would have worked fine before) you can now end up creating lots of pages. Can we gate this with the distinct limit session property in case things go wrong? |
Note that this is completely indepedent of the total number of distinct values. It will have at most N pages if the LIMIT is N as we do this for only the FINAL distinctlimit. But sure I can rename and use the previous flag. |
2f04de2
to
166f3e3
Compare
Done. I renamed the previous flag we were using for the RR shuffle as quick_distinct_limit_enabled so we can do these two together. Please take a look. |
166f3e3
to
fa56678
Compare
fa56678
to
2fe7ca1
Compare
Often times, while specifying the joining condition in JOIN ON clause, users make mistakes and not referencing the joining table along side with other table in join condition that result in performance issue due to conditional join resulting in CROSS JOIN. This change will reduce the number of cases when user is shown the PERFORMANCE_WARNING when JOIN ON clause is actually correct. See also: prestodb#17333, prestodb#17640
Often times, while specifying the joining condition in JOIN ON clause, users make mistakes and not referencing the joining table along side with other table in join condition that result in performance issue due to conditional join resulting in CROSS JOIN. This change will reduce the number of cases when user is shown the PERFORMANCE_WARNING when JOIN ON clause is actually correct. See also: #17333, #17640
Often times, while specifying the joining condition in JOIN ON clause, users make mistakes and not referencing the joining table along side with other table in join condition that result in performance issue due to conditional join resulting in CROSS JOIN. This change will reduce the number of cases when user is shown the PERFORMANCE_WARNING when JOIN ON clause is actually correct. See also: prestodb#17333, prestodb#17640
Addressing issue:
#17631
For specific case of DistinctLimit
Test plan - tests already exist