Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606

Closed
wants to merge 2 commits into from

Conversation

dbtsai
Copy link
Member

@dbtsai dbtsai commented Apr 11, 2017

What changes were proposed in this pull request?

NaNvl(float value, null) will be converted into NaNvl(float value, Cast(null, DoubleType)) and finally NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType)).

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@@ -571,6 +571,7 @@ object TypeCoercion {
NaNvl(l, Cast(r, DoubleType))
case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType =>
NaNvl(Cast(l, DoubleType), r)
case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have is, why NaNvl(FloatType, DoubleType) should be cast to NaNvl(DoubleType, DoubleType), but NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType)?

They all change the input type from FloatType to DoubleType. Won't the first cast cause mismatching?

Copy link
Member Author

@dbtsai dbtsai Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR prevents casting from NaNvl(FloatType, NullType) to NaNvl(DoubleType, DoubleType) since we want to minimize the casting as much as possible. Also, if we want to replace NaN by null, we want to keep the output type the same as input type.

Whether NaNvl(FloatType, DoubleType) should be cast into NaNvl(DoubleType, DoubleType) is another story, and we should discuss it and fix it in another PR. I agree with you, we should downcast the replacement DoubleType into FloatType. And in my opinion, doing this implicit casting is error-prone, and we should do explicit casting by users instead.

@gatorsmile maybe you can chime in, and give the feedback whether we should cast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because FunctionArgumentConversion is executed before ImplicitTypeCasts. When there is no danger of loss of information, the cast can be implicit for better usability. We can add the extra configuration flag for users to stop implicit casting.

If we do not upcast NaNvl(FloatType, DoubleType) to NaNvl(DoubleType, DoubleType), what is the output data type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since NaNvl evaluates to right when left is NaN, I think right should always cast to left. I wonder what is the behavior of other engines?

@viirya
Copy link
Member

viirya commented Apr 11, 2017

LGTM, if the above question doesn't matter.

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75702 has finished for PR 17606 at commit fa5e1af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ruleTest(TypeCoercion.FunctionArgumentConversion,
NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)),
NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)))
ruleTest(TypeCoercion.FunctionArgumentConversion,
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)),
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. Literal.create(null, NullType) should be Cast(Literal.create(null, NullType), FloatType).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The test is fixed. :)

NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType)))
ruleTest(TypeCoercion.FunctionArgumentConversion,
NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, NullType)),
NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, DoubleType)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then this should be Cast(Literal.create(null, NullType), DoubleType), I think.

@dbtsai
Copy link
Member Author

dbtsai commented Apr 11, 2017

+cc @cloud-fan @gatorsmile @rxin

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75711 has finished for PR 17606 at commit e0625f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in 8ad63ee Apr 12, 2017
asfgit pushed a commit that referenced this pull request Apr 12, 2017
…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

`NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: DB Tsai <dbt@netflix.com>

Closes #17606 from dbtsai/fixNaNvl.

(cherry picked from commit 8ad63ee)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
asfgit pushed a commit that referenced this pull request Apr 12, 2017
… cast to N…

…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

This is a backport of #17606

`NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: DB Tsai <dbt@netflix.com>
Author: DB Tsai <dbtsai@dbtsai.com>

Closes #17618 from dbtsai/branch-2.0.
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…aNvl(DoubleType, DoubleType)

## What changes were proposed in this pull request?

`NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`.

This will cause mismatching in the output type when the input type is float.

By adding extra rule in TypeCoercion can resolve this issue.

## How was this patch tested?

unite tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: DB Tsai <dbt@netflix.com>

Closes apache#17606 from dbtsai/fixNaNvl.
@dbtsai dbtsai deleted the fixNaNvl branch November 11, 2019 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants