-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point #26573
Conversation
…gling decial point
cc @cloud-fan @MaxGekk @maropu @HyukjinKwon thanks for reviewing in advance. Pre-discussion could be found here #26491 (comment) |
@@ -505,6 +505,7 @@ object IntervalUtils { | |||
var days: Int = 0 | |||
var microseconds: Long = 0 | |||
var fractionScale: Int = 0 | |||
val validOriginFractionScale = (NANOS_PER_SECOND / 10).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just initialFractionScale
?
Test build #114008 has finished for PR 26573 at commit
|
@@ -582,7 +583,7 @@ object IntervalUtils { | |||
case _ if '0' <= b && b <= '9' && fractionScale > 0 => | |||
fraction += (b - '0') * fractionScale | |||
fractionScale /= 10 | |||
case ' ' => | |||
case ' ' if fractionScale != initialFractionScale => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: seems fractionScale < initialFractionScale
is safer?
Test build #114020 has finished for PR 26573 at commit
|
Test build #114024 has finished for PR 26573 at commit
|
} | ||
|
||
test("string to interval: seconds with fractional part") { | ||
checkFromString("0.1 seconds", new CalendarInterval(0, 0, 100000)) | ||
checkFromString("1. seconds", new CalendarInterval(0, 0, 1000000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we explicitly disable this case?
scala> sql("select 0.")
res0: org.apache.spark.sql.DataFrame = [0: decimal(1,0)]
At least I know Python, Java, Scala and R support this way as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presto> select interval '1.' second;
_col0
----------------
0 00:00:01.000
(1 row)
Query 20191119_051438_00001_f5kcs, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:03 [0 rows, 0B] [0 rows/s, 0B/s]
presto> select interval '.' second;
Query 20191119_051452_00002_f5kcs failed: Invalid INTERVAL SECOND value: .
Also check with presto, '1.' seconds is valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon good point. To be consistent with Spark's own parser, maybe we should allow interval '1.' second
. But interval '.' second
should be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ +1!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, please recheck, thanks.
@@ -505,7 +505,9 @@ object IntervalUtils { | |||
var days: Int = 0 | |||
var microseconds: Long = 0 | |||
var fractionScale: Int = 0 | |||
val initialFractionScale = (NANOS_PER_SECOND / 10).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need it now?
@@ -582,7 +586,7 @@ object IntervalUtils { | |||
case _ if '0' <= b && b <= '9' && fractionScale > 0 => | |||
fraction += (b - '0') * fractionScale | |||
fractionScale /= 10 | |||
case ' ' => | |||
case ' ' if !pointPrefixed || fractionScale < initialFractionScale => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since 1.
is allowed, I don't think fractionScale < initialFractionScale
is a valid check any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this, the cases are (1.0
, 0.1
, .1
, 1.
) + (.
), this condition is equal to !(pointPrefixed && fractionScale == initialFractionScale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm fractionScale < initialFractionScale
forbiding .1
case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see!
Test build #114076 has finished for PR 26573 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
Currently, we support to parse '1. second' to 1s or even '. second' to 0s.
We fix this by fixing the new interval parser's VALUE_FRACTIONAL_PART state
With further digging, we found that 1. is valid in python, r, scala, and presto and so on... so this PR
ONLY forbid the invalid interval value in the form of '. seconds'.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
yes, now we treat '. second' .... as invalid intervals
How was this patch tested?
add ut