Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using AST on inner joins and avoid coalesce after nested loop join filter #3746

Merged
merged 3 commits into from
Oct 5, 2021

Conversation

jlowe
Copy link
Contributor

@jlowe jlowe commented Oct 4, 2021

Fixes #3736.

@abellina and I looked into the nested loop join performance degradation and there were primarily two issues:

  1. Computing the AST during the join was particularly expensive, resulting in approximately 4X performance improvement for the microbenchmark nested loop join if we revert back to generating a full cross join output and then post-filtering the results.
  2. The GpuFilterExec extracted from the join in Add filter in query plan for conditional nested loop and cartesian joins #3105 was causing a subsequent batch coalesce to occur that wasn't there before. Traces showed contiguous_split was quite expensive on the post-join output for this query. Updating the post-join filter to not coalesce batches resulted in another approximate 4X speedup from the previous change.

This updates nested loop joins to avoid using the AST if a post-filter can be used and also updates the post-filter injection code for joins to specify no coalesce after goal on the filter node.

@jlowe jlowe added the performance A performance related task/issue label Oct 4, 2021
@jlowe jlowe added this to the Oct 4 - Oct 15 milestone Oct 4, 2021
@jlowe jlowe self-assigned this Oct 4, 2021
@jlowe
Copy link
Contributor Author

jlowe commented Oct 4, 2021

build

tgravescs
tgravescs previously approved these changes Oct 4, 2021
abellina
abellina previously approved these changes Oct 4, 2021
@jlowe jlowe changed the title Avoid using AST on inner joins and avoid coalesce after post-join filter Avoid using AST on inner joins and avoid coalesce after post-join filter [databricks] Oct 4, 2021
@jlowe
Copy link
Contributor Author

jlowe commented Oct 4, 2021

build

@jlowe jlowe marked this pull request as draft October 4, 2021 22:01
@jlowe
Copy link
Contributor Author

jlowe commented Oct 4, 2021

Marking this as a draft until NDS performance runs are complete to verify there are no regressions.

@jlowe jlowe dismissed stale reviews from abellina and tgravescs via cb20aee October 5, 2021 16:18
@jlowe jlowe changed the title Avoid using AST on inner joins and avoid coalesce after post-join filter [databricks] Avoid using AST on inner joins and avoid coalesce after nested loop join filter Oct 5, 2021
@jlowe
Copy link
Contributor Author

jlowe commented Oct 5, 2021

NDS saw a perf regression in q72, the coalesce after the post-join filter turned out to be crucial in that case. Refined this to only avoid the coalesce on a nested loop join due to the microbenchmark results, filed #3749 and #3750 to track fixing this in the general case.

@jlowe
Copy link
Contributor Author

jlowe commented Oct 5, 2021

build

@jlowe jlowe marked this pull request as ready for review October 5, 2021 17:46
@jlowe
Copy link
Contributor Author

jlowe commented Oct 5, 2021

NDS and cross join microbenchmark results on latest patch look good, marking this ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot
4 participants