-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to NaNvl(DoubleType, DoubleType) #17606
Conversation
@@ -571,6 +571,7 @@ object TypeCoercion { | |||
NaNvl(l, Cast(r, DoubleType)) | |||
case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType => | |||
NaNvl(Cast(l, DoubleType), r) | |||
case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I have is, why NaNvl(FloatType, DoubleType)
should be cast to NaNvl(DoubleType, DoubleType)
, but NaNvl(FloatType, NullType)
should not be cast to NaNvl(DoubleType, DoubleType)
?
They all change the input type from FloatType
to DoubleType
. Won't the first cast cause mismatching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this PR prevents casting from NaNvl(FloatType, NullType)
to NaNvl(DoubleType, DoubleType)
since we want to minimize the casting as much as possible. Also, if we want to replace NaN
by null
, we want to keep the output type the same as input type.
Whether NaNvl(FloatType, DoubleType)
should be cast into NaNvl(DoubleType, DoubleType)
is another story, and we should discuss it and fix it in another PR. I agree with you, we should downcast the replacement DoubleType
into FloatType
. And in my opinion, doing this implicit casting is error-prone, and we should do explicit casting by users instead.
@gatorsmile maybe you can chime in, and give the feedback whether we should cast NaNvl(FloatType, DoubleType)
to NaNvl(DoubleType, DoubleType)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because FunctionArgumentConversion
is executed before ImplicitTypeCasts
. When there is no danger of loss of information, the cast can be implicit for better usability. We can add the extra configuration flag for users to stop implicit casting.
If we do not upcast NaNvl(FloatType, DoubleType)
to NaNvl(DoubleType, DoubleType)
, what is the output data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since NaNvl evaluates to right
when left
is NaN, I think right
should always cast to left
. I wonder what is the behavior of other engines?
LGTM, if the above question doesn't matter. |
Test build #75702 has finished for PR 17606 at commit
|
ruleTest(TypeCoercion.FunctionArgumentConversion, | ||
NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)), | ||
NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType))) | ||
ruleTest(TypeCoercion.FunctionArgumentConversion, | ||
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)), | ||
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. Literal.create(null, NullType)
should be Cast(Literal.create(null, NullType), FloatType)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The test is fixed. :)
NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType))) | ||
ruleTest(TypeCoercion.FunctionArgumentConversion, | ||
NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, NullType)), | ||
NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, DoubleType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then this should be Cast(Literal.create(null, NullType), DoubleType)
, I think.
Test build #75711 has finished for PR 17606 at commit
|
LGTM, merging to master! |
…aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Closes #17606 from dbtsai/fixNaNvl. (cherry picked from commit 8ad63ee) Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
… cast to N… …aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? This is a backport of #17606 `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Author: DB Tsai <dbtsai@dbtsai.com> Closes #17618 from dbtsai/branch-2.0.
…aNvl(DoubleType, DoubleType) ## What changes were proposed in this pull request? `NaNvl(float value, null)` will be converted into `NaNvl(float value, Cast(null, DoubleType))` and finally `NaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))`. This will cause mismatching in the output type when the input type is float. By adding extra rule in TypeCoercion can resolve this issue. ## How was this patch tested? unite tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: DB Tsai <dbt@netflix.com> Closes apache#17606 from dbtsai/fixNaNvl.
What changes were proposed in this pull request?
NaNvl(float value, null)
will be converted intoNaNvl(float value, Cast(null, DoubleType))
and finallyNaNvl(Cast(float value, DoubleType), Cast(null, DoubleType))
.This will cause mismatching in the output type when the input type is float.
By adding extra rule in TypeCoercion can resolve this issue.
How was this patch tested?
unite tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.