Skip to content

Commit

Permalink
[SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

The introduction of `AnalysisBarrier` prevented `FixNullability` to go through all the nodes. This introduced a bug, which can lead to wrong results, as the nullability of the output attributes of an outer join can be wrong.

The PR makes `FixNullability` going through the `AnalysisBarrier`s.

## How was this patch tested?

added UT

Author: Marco Gaido <marcogaido91@gmail.com>

Closes #22102 from mgaido91/SPARK-25051.
  • Loading branch information
mgaido91 authored and gatorsmile committed Aug 14, 2018
1 parent 0856b82 commit 34191e6
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1704,6 +1704,7 @@ class Analyzer(

def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
case p if !p.resolved => p // Skip unresolved nodes.
case ab: AnalysisBarrier => apply(ab.child)
case p: LogicalPlan if p.resolved =>
val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
case (exprId, attributes) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2300,4 +2300,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
checkAnswer(aggPlusFilter1, aggPlusFilter2.collect())
}
}

test("SPARK-25051: fix nullabilities of outer join attributes doesn't stop on AnalysisBarrier") {
val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
val df2 = spark.range(3).selectExpr("id")
assert(df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).collect().length == 1)
}
}

0 comments on commit 34191e6

Please sign in to comment.