[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606

dbtsai · 2017-04-11T09:10:26Z

What changes were proposed in this pull request?

NaNvl(float value, null) will be converted into NaNvl(float value, Cast(null, DoubleType)) and finally NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType)).

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

viirya · 2017-04-11T10:17:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

@@ -571,6 +571,7 @@ object TypeCoercion {
        NaNvl(l, Cast(r, DoubleType))
      case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType =>
        NaNvl(Cast(l, DoubleType), r)
+      case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType))


One question I have is, why NaNvl(FloatType, DoubleType) should be cast to NaNvl(DoubleType, DoubleType), but NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)?

They all change the input type from FloatType to DoubleType. Won't the first cast cause mismatching?

Yeah, this PR prevents casting from NaNvl(FloatType, NullType) to NaNvl(DoubleType, DoubleType) since we want to minimize the casting as much as possible. Also, if we want to replace NaN by null, we want to keep the output type the same as input type.

Whether NaNvl(FloatType, DoubleType) should be cast into NaNvl(DoubleType, DoubleType) is another story, and we should discuss it and fix it in another PR. I agree with you, we should downcast the replacement DoubleType into FloatType. And in my opinion, doing this implicit casting is error-prone, and we should do explicit casting by users instead.

@gatorsmile maybe you can chime in, and give the feedback whether we should cast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType).

Because FunctionArgumentConversion is executed before ImplicitTypeCasts. When there is no danger of loss of information, the cast can be implicit for better usability. We can add the extra configuration flag for users to stop implicit casting.

If we do not upcast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType), what is the output data type?

Since NaNvl evaluates to right when left is NaN, I think right should always cast to left. I wonder what is the behavior of other engines?

viirya · 2017-04-11T10:20:57Z

LGTM, if the above question doesn't matter.

SparkQA · 2017-04-11T10:50:38Z

Test build #75702 has finished for PR 17606 at commit fa5e1af.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-04-11T13:27:52Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala

    ruleTest(TypeCoercion.FunctionArgumentConversion,
      NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)),
      NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)))
+    ruleTest(TypeCoercion.FunctionArgumentConversion,
+      NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)),
+      NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType)))


oh. Literal.create(null, NullType) should be Cast(Literal.create(null, NullType), FloatType).

Thanks. The test is fixed. :)

viirya · 2017-04-11T13:28:56Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala

+      NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType)))
+    ruleTest(TypeCoercion.FunctionArgumentConversion,
+      NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, NullType)),
+      NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, DoubleType)))


then this should be Cast(Literal.create(null, NullType), DoubleType), I think.

dbtsai · 2017-04-11T18:31:05Z

+cc @cloud-fan @gatorsmile @rxin

SparkQA · 2017-04-11T20:46:59Z

Test build #75711 has finished for PR 17606 at commit e0625f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-04-12T03:19:41Z

LGTM, merging to master!

…aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Closes #17606 from dbtsai/fixNaNvl. (cherry picked from commit 8ad63ee) Signed-off-by: DB Tsai <dbtsai@dbtsai.com>

… cast to N… …aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? This is a backport of #17606 `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Author: DB Tsai <dbtsai@dbtsai.com> Closes #17618 from dbtsai/branch-2.0.

…aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Closes apache#17606 from dbtsai/fixNaNvl.

Added new NaNvl type coercion rule

fa5e1af

viirya reviewed Apr 11, 2017

View reviewed changes

Fix the test

e0625f5

asfgit closed this in 8ad63ee Apr 12, 2017

dbtsai mentioned this pull request Apr 12, 2017

[SPARK-20291][SQL][BACKPORT] NaNvl(FloatType, NullType) should not be cast to N… #17618

Closed

dbtsai deleted the fixNaNvl branch November 11, 2019 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606

dbtsai commented Apr 11, 2017

viirya Apr 11, 2017

dbtsai Apr 11, 2017 •

edited

Loading

gatorsmile Apr 12, 2017

dbtsai Apr 12, 2017

viirya commented Apr 11, 2017 •

edited

Loading

SparkQA commented Apr 11, 2017

viirya Apr 11, 2017

dbtsai Apr 11, 2017

viirya Apr 11, 2017

dbtsai commented Apr 11, 2017

SparkQA commented Apr 11, 2017

cloud-fan commented Apr 12, 2017

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606

Conversation

dbtsai commented Apr 11, 2017

What changes were proposed in this pull request?

How was this patch tested?

viirya Apr 11, 2017

Choose a reason for hiding this comment

dbtsai Apr 11, 2017 • edited Loading

Choose a reason for hiding this comment

gatorsmile Apr 12, 2017

Choose a reason for hiding this comment

dbtsai Apr 12, 2017

Choose a reason for hiding this comment

viirya commented Apr 11, 2017 • edited Loading

SparkQA commented Apr 11, 2017

viirya Apr 11, 2017

Choose a reason for hiding this comment

dbtsai Apr 11, 2017

Choose a reason for hiding this comment

viirya Apr 11, 2017

Choose a reason for hiding this comment

dbtsai commented Apr 11, 2017

SparkQA commented Apr 11, 2017

cloud-fan commented Apr 12, 2017

dbtsai Apr 11, 2017 •

edited

Loading

viirya commented Apr 11, 2017 •

edited

Loading