Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses when ANSI mode enabled #20433

Closed
wants to merge 3 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Jan 30, 2018

What changes were proposed in this pull request?

This pr updated parsing rules in SqlBase.g4 to support a SQL query below when ANSI mode enabled;

SELECT CAST('2017-08-04' AS DATE) + 1 days;

The current master cannot parse it though, other dbms-like systems support the syntax (e.g., hive and mysql). Also, the syntax is frequently used in the official TPC-DS queries.

This pr added new tokens as follows;

YEAR | YEARS | MONTH | MONTHS | WEEK | WEEKS | DAY | DAYS | HOUR | HOURS | MINUTE 
MINUTES | SECOND | SECONDS | MILLISECOND | MILLISECONDS | MICROSECOND | MICROSECONDS

Then, it registered the keywords below as the ANSI reserved (this follows SQL-2011);

 DAY | HOUR | MINUTE | MONTH | SECOND | YEAR

How was this patch tested?

Added tests in SQLQuerySuite, ExpressionParserSuite, and TableIdentifierParserSuite.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86799 has finished for PR 20433 at commit 830cf8d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 30, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86810 has finished for PR 20433 at commit 830cf8d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 31, 2018

Test build #86849 has finished for PR 20433 at commit 4173ff0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -561,8 +561,11 @@ class ExpressionParserSuite extends PlanTest {
Literal(CalendarInterval.fromSingleUnitString(u, s))
}

// Empty interval statement
intercept("interval", "at least one time unit should be given for interval literal")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shall still check the empty interval statement, now it shall produce a different error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, antlr just throws an exception when hitting this case;

scala> sql("select cast('2018-01-12' as DATE) + 1 days").show
+---------------------------------------------------------------------------+
|CAST(CAST(CAST(2018-01-12 AS DATE) AS TIMESTAMP) + interval 1 days AS DATE)|
+---------------------------------------------------------------------------+
|                                                                 2018-01-13|
+---------------------------------------------------------------------------+

scala> sql("select cast('2018-01-12' as DATE) + interval").show
org.apache.spark.sql.AnalysisException: cannot resolve '`interval`' given input columns: []; line 1 pos 36;
'Project [unresolvedalias((cast(2018-01-12 as date) + 'interval), None)]
+- OneRowRelation

@maropu
Copy link
Member Author

maropu commented Mar 2, 2018

@gatorsmile kindly ping

@maropu
Copy link
Member Author

maropu commented Mar 6, 2018

ping

HOUR: 'HOUR' | 'HOURS';
MINUTE: 'MINUTE' | 'MINUTES';
SECOND: 'SECOND' | 'SECONDS';
MILLISECOND: 'MILLISECOND' | 'MILLISECONDS';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering which systems support MILLISECOND , MICROSECOND and NANOSECOND ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, it sounds like we already support them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea.

SECOND: 'SECOND' | 'SECONDS';
MILLISECOND: 'MILLISECOND' | 'MILLISECONDS';
MICROSECOND: 'MICROSECOND' | 'MICROSECONDS';
NANOSECOND: 'NANOSECOND' | 'NANOSECONDS';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not support nanosecond.

@@ -790,6 +796,16 @@ ASC: 'ASC';
DESC: 'DESC';
FOR: 'FOR';
INTERVAL: 'INTERVAL';
YEAR: 'YEAR' | 'YEARS';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update TableIdentifierParserSuite

@gatorsmile
Copy link
Member

gatorsmile commented Mar 6, 2018

Could you create interval.sql by adding the test cases in https://issues.apache.org/jira/browse/HIVE-13557 ?

BTW, check the discussion in that JIRA?

@maropu
Copy link
Member Author

maropu commented Mar 6, 2018

ok, I'll update based on the comments soon

@maropu
Copy link
Member Author

maropu commented Mar 6, 2018

You meant the HIVE jira? If so, no (I'm checking now). Any point I should know?

@maropu
Copy link
Member Author

maropu commented Mar 6, 2018

better to port all the related tests in Hive clientpositive into the Spark sql query tests?

# hive-master/ql/src/test/queries/clientpositive: (master=)$ls | grep interval
interval_1.q
interval_2.q
interval_3.q
interval_alt.q
interval_arithmetic.q
interval_comparison.q
...

None
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this pr though, I think it is some useful to run tests selectively in SQLQueryTestSuite (cuz the number of tests there grows recently...). If possibly, could we add this feature in a separate pr? Otherwise, I'll drop this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us create a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll do later.

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88191 has finished for PR 20433 at commit c0710d6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

This sounds good to me

@maropu
Copy link
Member Author

maropu commented Mar 13, 2018

ok, I added related tests from hive clientpositive in interval.sql.

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88192 has finished for PR 20433 at commit 1eec819.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88198 has finished for PR 20433 at commit 39fe5dc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88204 has finished for PR 20433 at commit ba6cce5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88209 has finished for PR 20433 at commit f6210a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -155,6 +155,7 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
case (null, _) => "null"
case (s: String, StringType) => "\"" + s + "\""
case (decimal, DecimalType()) => decimal.toString
case (interval, CalendarIntervalType) => interval.toString
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case to capture this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll try to add tests for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct<>
-- !query 30 output
org.apache.spark.sql.AnalysisException
cannot resolve '(DATE '2012-01-01' + (t.`a` + 1))' due to data type mismatch: differing types in '(DATE '2012-01-01' + (t.`a` + 1))' (date and int).; line 1 pos 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the columns having the matched data type?

Copy link
Member Author

@maropu maropu Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll add tests to check
You mean interval-interval cases? If so, this one? https://github.com/apache/spark/pull/20433/files#diff-24539a8bfdac0a14cec58755dd6565dbR237

@gatorsmile
Copy link
Member

Change the PR title to [SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses?

@maropu
Copy link
Member Author

maropu commented Jan 30, 2019

aha, got it.
Based on the ANSI one below, I think the the plural from is not reserved:
https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf
So, to comply with the ANSI SQL, we need to reserve year, month, week, day, hour, minute, second, millisecond, and microsecond for the interval.

Ah,... I just noticed the parser regards 2 seconds as a interval is not a ANSI behaviour because seconds is not reserved...? Postgre/Hive just regards 2 seconds as a interval though.

Anyway, we should include the fix to reserve these keywords in this pr?
To do so, we need to port some logics from #23259(https://github.com/apache/spark/pull/23259/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R1405) to here, and then I feel this pr gets a little complicated?

@cloud-fan
Copy link
Contributor

My understanding is, this PR is blocked by #23259. Without a well-defined list of keywords, we can't simply change the parser rule of interval literal.

To unblock it, we want to build a framework to optionally(controlled by a config) define something as keyword, and then define the interval related keywords first.

I just noticed the parser regards 2 seconds as a interval is not a ANSI behaviour because seconds is not reserved

Can you check some mainstream databases like postgres, sql server, oracle?

@maropu
Copy link
Member Author

maropu commented Jan 30, 2019

Can you check some mainstream databases like postgres, sql server, oracle?

ok, I'll check.

To unblock it, we want to build a framework to optionally(controlled by a config) define something as > keyword, and then define the interval related keywords first.

ok, I'll simplify #23259 first to build the framework, then I'll revisit this.

@maropu
Copy link
Member Author

maropu commented Feb 17, 2019

I checked if the time unit keywords (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, WEEK, MILLISECOND, MICROSECOND) are reserved in SQL-2011/database implementations:

SQL-2011

  • reserved: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND
  • non-reserved: WEEK, MILLISECOND, MICROSECOND

PostgreSQL, MySQL, Oracle, and SQL Server

  • reserved:
  • non-reserved: YEAR, MONTH, WEEK, DAY, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND

Also, all the the plural forms (YEARS, MONTHS, ...) are not reserved in all the cases.
I'm not sure about the historical reason why these databases don't handle them as reserved though, IMHO we would be better to follow SQL-2011; YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND are reserved if ansi=true.

As for SQL server, these time unit keywords are listed in future keywords:
https://docs.microsoft.com/ja-jp/sql/t-sql/language-elements/reserved-keywords-transact-sql?view=sql-server-2017#future-keywords

@SparkQA
Copy link

SparkQA commented Feb 17, 2019

Test build #102428 has finished for PR 20433 at commit e11cb2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the SPARK-23264 branch 2 times, most recently from 41645a8 to 5699547 Compare February 24, 2019 00:48
@@ -0,0 +1,188 @@
-- Turns on ANSI mode
SET spark.sql.parser.ansi.enabled=true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving these kinds of ANSI-related tests into a new dir sql-tests/inputs/ansi/?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #102714 has finished for PR 20433 at commit 41645a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #102715 has finished for PR 20433 at commit 5699547.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 8, 2019

ping @cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 11, 2019

Test build #103287 has finished for PR 20433 at commit a76f8a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses [SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses when ANSI mode enabled Mar 13, 2019
@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103413 has finished for PR 20433 at commit feccbc8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 13, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103421 has finished for PR 20433 at commit feccbc8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM except one comment: #20433 (comment)

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103445 has finished for PR 20433 at commit 8e414ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 14, 2019

Thanks! Merged to master.

@maropu maropu closed this Mar 14, 2019
maropu added a commit that referenced this pull request Mar 14, 2019
… when ANSI mode enabled

## What changes were proposed in this pull request?
This pr updated parsing rules in `SqlBase.g4` to support a SQL query below when ANSI mode enabled;
```
SELECT CAST('2017-08-04' AS DATE) + 1 days;
```
The current master cannot parse it though, other dbms-like systems support the syntax (e.g., hive and mysql). Also, the syntax is frequently used in the official TPC-DS queries.

This pr added new tokens as follows;
```
YEAR | YEARS | MONTH | MONTHS | WEEK | WEEKS | DAY | DAYS | HOUR | HOURS | MINUTE
MINUTES | SECOND | SECONDS | MILLISECOND | MILLISECONDS | MICROSECOND | MICROSECONDS
```
Then, it registered the keywords below as the ANSI reserved (this follows SQL-2011);
```
 DAY | HOUR | MINUTE | MONTH | SECOND | YEAR
```

## How was this patch tested?
Added tests in `SQLQuerySuite`, `ExpressionParserSuite`, and `TableIdentifierParserSuite`.

Closes #20433 from maropu/SPARK-23264.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
gatorsmile pushed a commit that referenced this pull request Jan 3, 2020
…nabled"

### What changes were proposed in this pull request?

Revert #20433 .
### Why are the changes needed?

According to the SQL standard, the INTERVAL prefix is required:
```
<interval literal> ::=
  INTERVAL [ <sign> ] <interval string> <interval qualifier>

<interval string> ::=
  <quote> <unquoted interval string> <quote>
```

### Does this PR introduce any user-facing change?

yes, but omitting the INTERVAL prefix is a new feature in 3.0

### How was this patch tested?

existing tests

Closes #27080 from cloud-fan/interval.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants