Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null #20333

Closed
wants to merge 2 commits into from

Conversation

mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product.

How was this patch tested?

added UT

spark.sessionState.executePlan(planNull).optimizedPlan

val dfOne = df.select(lit(1).as("a"))
val dfTwo = spark.range(10).select(lit(2).as("a"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a -> b

@SparkQA
Copy link

SparkQA commented Jan 19, 2018

Test build #86401 has finished for PR 20333 at commit 9c88781.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

def apply(plan: LogicalPlan): LogicalPlan =
if (SQLConf.get.crossJoinEnabled) {
plan
} else plan transform {
case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition)
case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For inner joins, we will not hit this, because it is already optimized to an empty relation. For the other outer join types, we face the exactly same issue as the condition is true. That is, the size of the join result sets is still the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you saying that the size of the result set is the same?
If you have a relation A (of size n, let's say 1M rows) in outer join with a relation B (of size m, let's say 1M rows). If the condition is true, the output relation is 1M * 1M (ie. (n * m)); if the condition is false, the result is 1M (n) for a left join, 1M (m) for a right join, 1M + 1M (m +n) for a full outer join. Therefore the size is not the same at all. But maybe you meant something different, am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. For outer join, it makes sense to remove this check

@@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
checkAnswer(innerJoin, Row(1) :: Nil)
}

test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " +
"is false or null") {
val df = spark.range(10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be false?

@gatorsmile
Copy link
Member

LGTM except one minor comment.

@SparkQA
Copy link

SparkQA commented Jan 21, 2018

Test build #86418 has finished for PR 20333 at commit a4a6ac8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master/2.3

@gatorsmile
Copy link
Member

Will address my comment in my PR.

asfgit pushed a commit that referenced this pull request Jan 21, 2018
…on is false/null

## What changes were proposed in this pull request?

CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product.

## How was this patch tested?

added UT

Author: Marco Gaido <marcogaido91@gmail.com>

Closes #20333 from mgaido91/SPARK-23087.

(cherry picked from commit 121dc96)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@asfgit asfgit closed this in 121dc96 Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants