feat: Stageify plan on shuffle boundaries #3781
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor the current adaptive planner to stageify the plan on, works as follows:
Essentially, the logical boundaries are joins (these are the only places where data size affects the planning decision) and the physical boundaries are shuffles.
Results of TPCH SF1000, 4 nodes
After:
Daft Q1 took 31.26 seconds
Daft Q2 took 35.46 seconds
Daft Q3 took 53.77 seconds
Daft Q4 took 19.98 seconds
Daft Q5 took 206.01 seconds
Daft Q6 took 10.50 seconds
Daft Q7 took 91.34 seconds
Daft Q8 took 142.85 seconds
Daft Q9 took 271.16 seconds
Daft Q10 took 53.28 seconds
Total time: 918.28 seconds
Spilled 2194637 MiB
Before:
Q1 took 31.05 seconds
Q2 took 24.95 seconds
Q3 took 50.91 seconds
Q4 took 24.11 seconds
Q5 took 177.07 seconds
Q6 took 11.17 seconds
Q7 took 75.97 seconds
Q8 took 150.76 seconds
Q9 took 263.51 seconds
Q10 took 59.37 seconds
Total: 868.87 seconds
Spilled 2200948 MiB