-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29145][SQL] Support sub-queries in join conditions #25854
Changes from 18 commits
c3de557
2cf3153
e5cd06c
569ab8a
91d1031
6dc61e7
f087b10
5ef8dad
5aa2ed6
fa55b3a
bd7c098
3108da2
dd37df8
6b58893
25f31dc
2ead378
6e210e1
3db4aaf
4ba7a17
307802a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -204,6 +204,84 @@ class SubquerySuite extends QueryTest with SharedSparkSession { | |
} | ||
} | ||
|
||
test("SPARK-29145: JOIN Condition use QueryList") { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move it to SQLQueryTestSuite? It sounds like it does not contain any test case that check the EXISTS subquery? Could you also add it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ok, will raise a pr follow your comment. |
||
withTempView("s1", "s2", "s3") { | ||
Seq(1, 3, 5, 7, 9).toDF("id").createOrReplaceTempView("s1") | ||
Seq(1, 3, 4, 6, 9).toDF("id").createOrReplaceTempView("s2") | ||
Seq(3, 4, 6, 9).toDF("id").createOrReplaceTempView("s3") | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN (select 9)"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we put correlated subquery in join condition? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Subquery is in join condition, LogicalPlan as below:
|
||
Row(9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id NOT IN (select 9)"), | ||
Row(1) :: Row(3) :: Nil) | ||
|
||
// case `IN` | ||
checkAnswer( | ||
sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id IN (select id from s3)"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for example, do we support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Cann't since strategy's idempotence is broken. Seem write sql like this is not reasonable... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also cc @dilipbiswal I checked with pgsql and it's supported. We need to update There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We should support it, checking on this issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to address the support in this pr? I think its ok to do in another jira. kindly ping @dilipbiswal |
||
Row(3) :: Row(9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id as id2 from s1 LEFT SEMI JOIN s2 " + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can you follow the format of the other tests? In multi-line cases, the format seems to be like this;
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed. |
||
"ON s1.id = s2.id and s1.id IN (select id from s3)"), | ||
Row(3) :: Row(9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id as id2 from s1 LEFT ANTI JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id IN (select id from s3)"), | ||
Row(1) :: Row(5) :: Row(7) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id, s2.id as id2 from s1 LEFT OUTER JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id IN (select id from s3)"), | ||
Row(1, null) :: Row(3, 3) :: Row(5, null) :: Row(7, null) :: Row(9, 9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id, s2.id as id2 from s1 RIGHT OUTER JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id IN (select id from s3)"), | ||
Row(null, 1) :: Row(3, 3) :: Row(null, 4) :: Row(null, 6) :: Row(9, 9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id, s2.id as id2 from s1 FULL OUTER JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id IN (select id from s3)"), | ||
Row(1, null) :: Row(3, 3) :: Row(5, null) :: Row(7, null) :: Row(9, 9) :: | ||
Row(null, 1) :: Row(null, 4) :: Row(null, 6) :: Nil) | ||
|
||
// case `NOT IN` | ||
checkAnswer( | ||
sql("SELECT s1.id from s1 JOIN s2 ON s1.id = s2.id and s1.id NOT IN (select id from s3)"), | ||
Row(1) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id as id2 from s1 LEFT SEMI JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id NOT IN (select id from s3)"), | ||
Row(1) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id as id2 from s1 LEFT ANTI JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id NOT IN (select id from s3)"), | ||
Row(3) :: Row(5) :: Row(7) :: Row(9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id, s2.id as id2 from s1 LEFT OUTER JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id NOT IN (select id from s3)"), | ||
Row(1, 1) :: Row(3, null) :: Row(5, null) :: Row(7, null) :: Row(9, null) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id, s2.id as id2 from s1 RIGHT OUTER JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id NOT IN (select id from s3)"), | ||
Row(1, 1) :: Row(null, 3) :: Row(null, 4) :: Row(null, 6) :: Row(null, 9) :: Nil) | ||
|
||
checkAnswer( | ||
sql("SELECT s1.id, s2.id as id2 from s1 FULL OUTER JOIN s2 " + | ||
"ON s1.id = s2.id and s1.id NOT IN (select id from s3)"), | ||
Row(1, 1) :: Row(3, null) :: Row(5, null) :: Row(7, null) :: Row(9, null) :: | ||
Row(null, 3) :: Row(null, 4) :: Row(null, 6) :: Row(null, 9) :: Nil) | ||
} | ||
} | ||
|
||
test("SPARK-14791: scalar subquery inside broadcast join") { | ||
val df = sql("select a, sum(b) as s from l group by a having a > (select avg(a) from l)") | ||
val expected = Row(3, 2.0, 3, 3.0) :: Row(6, null, 6, null) :: Nil | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't recall the details, but why it's not
Seq(j.left, j.right)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a mistake, raise a pr and remove this?