Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Stageify plan on shuffle boundaries #3781

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Feb 7, 2025

Refactor the current adaptive planner to stageify the plan on, works as follows:

  1. Translate logical plan until the root is reached or we hit a point where materialized data can better inform the planning decision, i.e. a join.
  2. Split the translated physical plan on shuffle exchanges, first emitting the plan until the exchange, then the exchange, then repeat until the physical plan is complete.
  3. Optimize the logical plan and repeat from step 1.

Essentially, the logical boundaries are joins (these are the only places where data size affects the planning decision) and the physical boundaries are shuffles.

Results of TPCH SF1000, 4 nodes

After:

Daft Q1 took 31.26 seconds
Daft Q2 took 35.46 seconds
Daft Q3 took 53.77 seconds
Daft Q4 took 19.98 seconds
Daft Q5 took 206.01 seconds
Daft Q6 took 10.50 seconds
Daft Q7 took 91.34 seconds
Daft Q8 took 142.85 seconds
Daft Q9 took 271.16 seconds
Daft Q10 took 53.28 seconds
Total time: 918.28 seconds
Spilled 2194637 MiB

Before:

Q1 took 31.05 seconds
Q2 took 24.95 seconds
Q3 took 50.91 seconds
Q4 took 24.11 seconds
Q5 took 177.07 seconds
Q6 took 11.17 seconds
Q7 took 75.97 seconds
Q8 took 150.76 seconds
Q9 took 263.51 seconds
Q10 took 59.37 seconds
Total: 868.87 seconds
Spilled 2200948 MiB

@github-actions github-actions bot added the feat label Feb 7, 2025
Copy link

codspeed-hq bot commented Feb 7, 2025

CodSpeed Performance Report

Merging #3781 will degrade performances by 16.45%

Comparing colin/stageify (0f51cc7) with main (e6db43b)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_iter_rows_first_row[100 Small Files] 152.5 ms 182.5 ms -16.45%

Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 12.17712% with 238 lines in your changes missing coverage. Please review.

Project coverage is 77.75%. Comparing base (f9a4b70) to head (0f51cc7).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...daft-physical-plan/src/physical_planner/planner.rs 1.16% 169 Missing ⚠️
src/daft-physical-plan/src/ops/placeholder.rs 0.00% 28 Missing ⚠️
...rc/daft-logical-plan/src/optimization/optimizer.rs 0.00% 13 Missing ⚠️
src/daft-physical-plan/src/plan.rs 9.09% 10 Missing ⚠️
src/daft-scheduler/src/adaptive.rs 0.00% 5 Missing ⚠️
daft/runners/pyrunner.py 0.00% 3 Missing ⚠️
daft/runners/ray_runner.py 0.00% 3 Missing ⚠️
...ft-physical-plan/src/physical_planner/translate.rs 90.00% 3 Missing ⚠️
daft/plan_scheduler/physical_plan_scheduler.py 50.00% 2 Missing ⚠️
src/daft-physical-plan/src/display.rs 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3781      +/-   ##
==========================================
+ Coverage   75.60%   77.75%   +2.14%     
==========================================
  Files         748      751       +3     
  Lines       99035    94901    -4134     
==========================================
- Hits        74875    73788    -1087     
+ Misses      24160    21113    -3047     
Files with missing lines Coverage Δ
src/daft-catalog/src/lib.rs 59.63% <ø> (-0.37%) ⬇️
src/daft-logical-plan/src/display.rs 97.97% <ø> (-0.05%) ⬇️
src/daft-logical-plan/src/ops/source.rs 77.27% <100.00%> (+8.04%) ⬆️
...lan/src/optimization/rules/eliminate_cross_join.rs 88.09% <ø> (ø)
...lan/src/optimization/rules/simplify_expressions.rs 92.06% <ø> (-0.13%) ⬇️
src/daft-logical-plan/src/source_info/mod.rs 70.96% <ø> (+2.21%) ⬆️
src/daft-physical-plan/src/ops/mod.rs 0.00% <ø> (ø)
...n/src/optimization/rules/reorder_partition_keys.rs 77.36% <ø> (ø)
src/daft-sql/src/lib.rs 100.00% <ø> (ø)
src/daft-physical-plan/src/display.rs 0.00% <0.00%> (ø)
... and 10 more

... and 37 files with indirect coverage changes

@colin-ho colin-ho requested a review from samster25 February 7, 2025 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant