Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29145][SQL][FOLLOW-UP] Move tests from SubquerySuite to subquery/in-subquery/in-joins.sql #26406

Closed
wants to merge 4 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

Follow comment of #25854 (comment)

Why are the changes needed?

NO

Does this PR introduce any user-facing change?

NO

How was this patch tested?

ADD TEST CASE

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-29145][SQL][FOLLOW-UP] Port in-subquery-on-join-condition.sql [WIP][SPARK-29145][SQL][FOLLOW-UP] Port in-subquery-on-join-condition.sql Nov 6, 2019
@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 6, 2019

@gatorsmile @maropu
Currently, use exists in join condition with subquery have problem with LEFT OUTER JOIN & FULL OUTER JOIN.
Ok to explain but can't execute.

@AngersZhuuuu
Copy link
Contributor Author


create temporary view s1 as select * from values
    (1), (3), (5), (7), (9)
  as s1(id);

create temporary view s2 as select * from values
    (1), (3), (4), (6), (9)
  as s2(id);

create temporary view s3 as select * from values
    (3), (4), (6), (9)
  as s3(id);

 explain extended SELECT s1.id, s2.id as id2 FROM s1
 LEFT OUTER JOIN s2 ON s1.id = s2.id
 AND EXISTS (SELECT * FROM s3 WHERE s3.id > 6)

LogicalPlan

      == Parsed Logical Plan ==
'Project ['s1.id, 's2.id AS id2#291]
+- 'Join LeftOuter, (('s1.id = 's2.id) AND exists#290 [])
   :  +- 'Project [*]
   :     +- 'Filter ('s3.id > 6)
   :        +- 'UnresolvedRelation [s3]
   :- 'UnresolvedRelation [s1]
   +- 'UnresolvedRelation [s2]

== Analyzed Logical Plan ==
id: int, id2: int
Project [id#244, id#250 AS id2#291]
+- Join LeftOuter, ((id#244 = id#250) AND exists#290 [])
   :  +- Project [id#256]
   :     +- Filter (id#256 > 6)
   :        +- SubqueryAlias `s3`
   :           +- Project [value#253 AS id#256]
   :              +- LocalRelation [value#253]
   :- SubqueryAlias `s1`
   :  +- Project [value#241 AS id#244]
   :     +- LocalRelation [value#241]
   +- SubqueryAlias `s2`
      +- Project [value#247 AS id#250]
         +- LocalRelation [value#247]

== Optimized Logical Plan ==
Project [id#244, id#250 AS id2#291]
+- Join LeftOuter, (exists#290 [] AND (id#244 = id#250))
   :  +- Project [value#253 AS id#256]
   :     +- Filter (value#253 > 6)
   :        +- LocalRelation [value#253]
   :- Project [value#241 AS id#244]
   :  +- LocalRelation [value#241]
   +- Project [value#247 AS id#250]
      +- LocalRelation [value#247]

== Physical Plan ==
*(2) Project [id#244, id#250 AS id2#291]
+- *(2) BroadcastHashJoin [id#244], [id#250], LeftOuter, BuildRight, exists#290 []
   :  +- Project [value#253 AS id#256]
   :     +- Filter (value#253 > 6)
   :        +- LocalRelation [value#253]
   :- *(2) Project [value#241 AS id#244]
   :  +- *(2) LocalTableScan [value#241]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))), [id=#670]
      +- *(1) Project [value#247 AS id#250]
         +- *(1) LocalTableScan [value#247]

Exist can't be change to other form of the join, and exists 's child plan was not change to Physical Plan in the whole Physic plan.

+- *(2) BroadcastHashJoin [id#244], [id#250], LeftOuter, BuildRight, exists#290 []
   :  +- Project [value#253 AS id#256]
   :     +- Filter (value#253 > 6)
   :        +- LocalRelation [value#253]

@maropu
Copy link
Member

maropu commented Nov 6, 2019

How about moving these tests into subquery/in-subquery/in-joins.sql instead of making a new file? Also, I think we need to execute multiple runs for these tests with different configurations;

--SET spark.sql.autoBroadcastJoinThreshold=10485760
--SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true
--SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false

@maropu
Copy link
Member

maropu commented Nov 6, 2019

Currently, use exists in join condition with subquery have problem with LEFT OUTER JOIN & FULL OUTER JOIN.
Ok to explain but can't execute.

sorry, but I miss your point. You meant that this test could run correctly on SubquerySuite.scala but it couldn't run on SQLQueryTestSuite? I think its ok just to move the tests you added in the previous PR into SQLQueryTestSuite.

@AngersZhuuuu
Copy link
Contributor Author

sorry, but I miss your point. You meant that this test could run correctly on SubquerySuite.scala but it couldn't run on SQLQueryTestSuite? I think its ok just to move the tests you added in the previous PR into SQLQueryTestSuite.

No, current catalyst can't support well for EXISTS when use this as join's on condition and join type is LeftOuter/FullOuter like I have show in #26406 (comment)

In the end of above comment, you can see that some part of Physical Plan is not a Physical Plan , There is a LocalRelation, we can't doCodeGen for the whole physical plan.

@maropu
Copy link
Member

maropu commented Nov 6, 2019

Ur.... I see. You found the bug caused by your previous pr, right? If so, could you file a new jira for that and investigate the root cause of the bug there?

@AngersZhuuuu
Copy link
Contributor Author

Ur.... I see. You found the bug caused by your previous pr, right? If so, could you file a new jira for that and investigate the root cause of the bug there?

I don't think it is caused by my pr. My pr focus on IN/NOT IN, behavior between IN/NOT IN and Exists/NOT EXISTS is different.
Run this example in 2.4.0

== Parsed Logical Plan ==
'Project ['s1.id, 's2.id AS id2#4]
+- 'Join LeftOuter, (('s1.id = 's2.id) && exists#3 [])
   :  +- 'Project [*]
   :     +- 'Filter ('s3.id > 6)
   :        +- 'UnresolvedRelation `s3`
   :- 'UnresolvedRelation `s1`
   +- 'UnresolvedRelation `s2`

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27;
'Project ['s1.id, 's2.id AS id2#4]
+- 'Join LeftOuter, ((id#0 = id#1) && exists#3 [])
   :  +- 'Project [*]
   :     +- 'Filter ('s3.id > 6)
   :        +- 'UnresolvedRelation `s3`
   :- SubqueryAlias `s1`
   :  +- Project [id#0]
   :     +- SubqueryAlias `s1`
   :        +- LocalRelation [id#0]
   +- SubqueryAlias `s2`
      +- Project [id#1]
         +- SubqueryAlias `s2`
            +- LocalRelation [id#1]

org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27;
'Project ['s1.id, 's2.id AS id2#4]
+- 'Join LeftOuter, ((id#0 = id#1) && exists#3 [])
   :  +- 'Project [*]
   :     +- 'Filter ('s3.id > 6)
   :        +- 'UnresolvedRelation `s3`
   :- SubqueryAlias `s1`
   :  +- Project [id#0]
   :     +- SubqueryAlias `s1`
   :        +- LocalRelation [id#0]
   +- SubqueryAlias `s2`
      +- Project [id#1]
         +- SubqueryAlias `s2`
            +- LocalRelation [id#1]

== Optimized Logical Plan ==
org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27;
'Project ['s1.id, 's2.id AS id2#4]
+- 'Join LeftOuter, ((id#0 = id#1) && exists#3 [])
   :  +- 'Project [*]
   :     +- 'Filter ('s3.id > 6)
   :        +- 'UnresolvedRelation `s3`
   :- SubqueryAlias `s1`
   :  +- Project [id#0]
   :     +- SubqueryAlias `s1`
   :        +- LocalRelation [id#0]
   +- SubqueryAlias `s2`
      +- Project [id#1]
         +- SubqueryAlias `s2`
            +- LocalRelation [id#1]

== Physical Plan ==
org.apache.spark.sql.AnalysisException: Table or view not found: `s3`; line 3 pos 27;
'Project ['s1.id, 's2.id AS id2#4]
+- 'Join LeftOuter, ((id#0 = id#1) && exists#3 [])
   :  +- 'Project [*]
   :     +- 'Filter ('s3.id > 6)
   :        +- 'UnresolvedRelation `s3`
   :- SubqueryAlias `s1`
   :  +- Project [id#0]
   :     +- SubqueryAlias `s1`
   :        +- LocalRelation [id#0]
   +- SubqueryAlias `s2`
      +- Project [id#1]
         +- SubqueryAlias `s2`
            +- LocalRelation [id#1]
Time taken: 1.455 seconds, Fetched 1 row(s)

@AngersZhuuuu
Copy link
Contributor Author

Ur.... I see. You found the bug caused by your previous pr, right? If so, could you file a new jira for that and investigate the root cause of the bug there?

My pr make this situation pass Optimizer level. == , Anyway, I will raise a issue and explain the change and root cause.
Also try to make it work.

@maropu
Copy link
Member

maropu commented Nov 6, 2019

If so, could you separate the two work: the porting task you requested and the bug fix? I'm pretty confused now.

@AngersZhuuuu
Copy link
Contributor Author

If so, could you separate the two work: the porting task you requested and the bug fix? I'm pretty confused now.

Yeah, I want to make it clear to you since in comment #25854 (comment) @gatorsmile mentioned two problem together. I will raise a issue for EXISTS/NOT EXISTS problem.

@AngersZhuuuu
Copy link
Contributor Author

How about moving these tests into subquery/in-subquery/in-joins.sql instead of making a new file? Also, I think we need to execute multiple runs for these tests with different configurations;

--SET spark.sql.autoBroadcastJoinThreshold=10485760
--SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true
--SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false

move to in-join.sql, it will be run in three configuration auto?

@maropu
Copy link
Member

maropu commented Nov 10, 2019

Yea, you need to nothing for that. btw, still WIP in this pr?

@AngersZhuuuu
Copy link
Contributor Author

Yea, you need to nothing for that. btw, still WIP in this pr?

Remove WIP

@maropu
Copy link
Member

maropu commented Nov 10, 2019

Can you remove the test of SubquerySuite that you added in the previous pr? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala#L207-L353

@maropu maropu changed the title [WIP][SPARK-29145][SQL][FOLLOW-UP] Port in-subquery-on-join-condition.sql [SPARK-29145][SQL][FOLLOW-UP] Port in-subquery-on-join-condition.sql Nov 10, 2019
@maropu maropu changed the title [SPARK-29145][SQL][FOLLOW-UP] Port in-subquery-on-join-condition.sql [SPARK-29145][SQL][FOLLOW-UP] Move tests from SubquerySuite to subquery/in-subquery/in-joins.sql Nov 10, 2019
@AngersZhuuuu
Copy link
Contributor Author

Can you remove the test of SubquerySuite that you added in the previous pr? https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala#L207-L353

Done

create temporary view s3 as select * from values
(3), (4), (6), (9)
as s3(id);

Copy link
Member

@maropu maropu Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Plz drop view in the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz drop view in the end?

About t1, t2, t3, seems in other test file, don't drop table.

@maropu
Copy link
Member

maropu commented Nov 10, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113536 has finished for PR 26406 at commit 6df31c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 10, 2019

Test build #113538 has finished for PR 26406 at commit 65fe860.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@maropu Passed test, any more work to do?

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @maropu .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.

@maropu
Copy link
Member

maropu commented Nov 13, 2019

Thanks, @dongjoon-hyun !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants