You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT DISTINCT c1, c2 .. FROM table WHERE ... LIMIT N;
But due to optimize_hash_generation - the plan will add projections for a) hash computation in scan step and b) project for results after the final distinct. But if we disable hash generation - it's simply like distributed scan+ hash table with no blocking operations in the middle making the results show quicker (even if the query doesn't complete). This makes the query very useful as the users can mostly complete results. Often the limit is way higher than the actual number of distinct values which makes it a useful thing.
So I think we should disable optimizing hash generation for DistinctLimit.
A common ad-hoc/exploratory query is:
SELECT DISTINCT c1, c2 .. FROM table WHERE ... LIMIT N;
But due to optimize_hash_generation - the plan will add projections for a) hash computation in scan step and b) project for results after the final distinct. But if we disable hash generation - it's simply like distributed scan+ hash table with no blocking operations in the middle making the results show quicker (even if the query doesn't complete). This makes the query very useful as the users can mostly complete results. Often the limit is way higher than the actual number of distinct values which makes it a useful thing.
So I think we should disable optimizing hash generation for DistinctLimit.
CC: @mbasmanova @rongrong @nlaptev
The text was updated successfully, but these errors were encountered: