Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point #26573

Closed
wants to merge 4 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 18, 2019

What changes were proposed in this pull request?

Currently, we support to parse '1. second' to 1s or even '. second' to 0s.

-- !query 118
select interval '1. seconds'
-- !query 118 schema
struct<1 seconds:interval>
-- !query 118 output
1 seconds


-- !query 119
select interval '. seconds'
-- !query 119 schema
struct<0 seconds:interval>
-- !query 119 output
0 seconds
postgres=# select interval '1. second';
ERROR:  invalid input syntax for type interval: "1. second"
LINE 1: select interval '1. second';

postgres=# select interval '. second';
ERROR:  invalid input syntax for type interval: ". second"
LINE 1: select interval '. second';

We fix this by fixing the new interval parser's VALUE_FRACTIONAL_PART state

With further digging, we found that 1. is valid in python, r, scala, and presto and so on... so this PR
ONLY forbid the invalid interval value in the form of '. seconds'.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

yes, now we treat '. second' .... as invalid intervals

How was this patch tested?

add ut

@yaooqinn
Copy link
Member Author

cc @cloud-fan @MaxGekk @maropu @HyukjinKwon thanks for reviewing in advance. Pre-discussion could be found here #26491 (comment)

@@ -505,6 +505,7 @@ object IntervalUtils {
var days: Int = 0
var microseconds: Long = 0
var fractionScale: Int = 0
val validOriginFractionScale = (NANOS_PER_SECOND / 10).toInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just initialFractionScale?

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114008 has finished for PR 26573 at commit cdc7b2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -582,7 +583,7 @@ object IntervalUtils {
case _ if '0' <= b && b <= '9' && fractionScale > 0 =>
fraction += (b - '0') * fractionScale
fractionScale /= 10
case ' ' =>
case ' ' if fractionScale != initialFractionScale =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: seems fractionScale < initialFractionScale is safer?

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114020 has finished for PR 26573 at commit 42afa6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 18, 2019

Test build #114024 has finished for PR 26573 at commit 982b63d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

test("string to interval: seconds with fractional part") {
checkFromString("0.1 seconds", new CalendarInterval(0, 0, 100000))
checkFromString("1. seconds", new CalendarInterval(0, 0, 1000000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we explicitly disable this case?

scala> sql("select 0.")
res0: org.apache.spark.sql.DataFrame = [0: decimal(1,0)]

At least I know Python, Java, Scala and R support this way as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presto> select interval '1.' second;
     _col0
----------------
 0 00:00:01.000
(1 row)

Query 20191119_051438_00001_f5kcs, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:03 [0 rows, 0B] [0 rows/s, 0B/s]

presto> select interval '.' second;
Query 20191119_051452_00002_f5kcs failed: Invalid INTERVAL SECOND value: .

Also check with presto, '1.' seconds is valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon good point. To be consistent with Spark's own parser, maybe we should allow interval '1.' second. But interval '.' second should be invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ +1!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, please recheck, thanks.

@yaooqinn yaooqinn changed the title [SPARK-29926][SQL] Fix weird interval string whose value end with a dangling decimal point [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point Nov 19, 2019
@@ -505,7 +505,9 @@ object IntervalUtils {
var days: Int = 0
var microseconds: Long = 0
var fractionScale: Int = 0
val initialFractionScale = (NANOS_PER_SECOND / 10).toInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need it now?

@@ -582,7 +586,7 @@ object IntervalUtils {
case _ if '0' <= b && b <= '9' && fractionScale > 0 =>
fraction += (b - '0') * fractionScale
fractionScale /= 10
case ' ' =>
case ' ' if !pointPrefixed || fractionScale < initialFractionScale =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since 1. is allowed, I don't think fractionScale < initialFractionScale is a valid check any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this, the cases are (1.0, 0.1, .1, 1.) + (.), this condition is equal to !(pointPrefixed && fractionScale == initialFractionScale)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm fractionScale < initialFractionScale forbiding .1 case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see!

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114076 has finished for PR 26573 at commit f6b0edd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 79ed4ae Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants