-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12441: [Rust][DataFusion] Support cartesian join #10092
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format?
or
See also: |
Codecov Report
@@ Coverage Diff @@
## master #10092 +/- ##
========================================
Coverage 78.87% 78.87%
========================================
Files 286 287 +1
Lines 64808 64974 +166
========================================
+ Hits 51119 51250 +131
- Misses 13689 13724 +35
Continue to review full report at Codecov.
|
Moved to apache/datafusion#11 |
This is a first (naive, but probably not that bad) implementation of the cartesian join and
CROSS JOIN
syntax.The left side gets loaded into memory and the right side is streamed and gets combined with the left side.
Memory consumption could be improved, the current implementation results in large batches if both of the sides are big, which could be solved by keeping a "cursor" of the left side and producing the batches one by one instead of concatenating the result of the full cartesian product.
FYI @andygrove @alamb @jorgecarleitao
This also makes query 9 run in DataFusion (though performance is not OK, but I believe that should be not related to the cross join itself, but is caused by another issue).