Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(optimizer): Infer additional join graph edges during join reordering #3807

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

desmondcheongzx
Copy link
Contributor

There are three missing pieces with our join ordering implementation:

  1. Infer additional join graph edges from existing edges. For example, if we have (A.x join B.x) and (B.x join C.x), then we infer (A.x join C.x). Additionally, in the absence of NDV stats, the total domain for these three columns (A.x, B.x, C.x) should be the minimal value of |A|, |B|, |C|.
  2. Adjust total domain computation from total domain = |size of relation| to total domain = |size of relation| / (selectivity of relation). |size of relation| does not reflect the true total domain of a join because filters reduce the size of the relation despite not affecting the selectivity of the join (assuming a primary key - foreign key join).
  3. Set the lower bound estimation for the number of rows per scan task to 1. This prevents us from underestimating the number of rows for selective predicates.

Combining these three changes allows us to speed up TPC-H queries.

@github-actions github-actions bot added the perf label Feb 14, 2025
@desmondcheongzx desmondcheongzx changed the title perf: Infer additional join graph edges during join reordering perf(optimizer): Infer additional join graph edges during join reordering Feb 14, 2025
Copy link

codspeed-hq bot commented Feb 14, 2025

CodSpeed Performance Report

Merging #3807 will improve performances by 10.13%

Comparing desmondcheongzx:join-order-infer-edges (784adbe) with main (f9a4b70)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_count[1 Small File] 3.9 ms 3.6 ms +10.13%

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.90%. Comparing base (f9a4b70) to head (784adbe).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3807      +/-   ##
==========================================
+ Coverage   75.60%   77.90%   +2.30%     
==========================================
  Files         748      748              
  Lines       99035    94933    -4102     
==========================================
- Hits        74875    73961     -914     
+ Misses      24160    20972    -3188     
Files with missing lines Coverage Δ
...tion/rules/reorder_joins/brute_force_join_order.rs 99.75% <100.00%> (+99.75%) ⬆️
...src/optimization/rules/reorder_joins/join_graph.rs 90.48% <100.00%> (+90.48%) ⬆️
.../rules/reorder_joins/naive_left_deep_join_order.rs 94.78% <100.00%> (-0.87%) ⬇️
src/daft-scan/src/lib.rs 65.23% <100.00%> (+0.05%) ⬆️

... and 33 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant