-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23264][SQL] Make INTERVAL keyword optional in INTERVAL clauses when ANSI mode enabled #20433
Conversation
Test build #86799 has finished for PR 20433 at commit
|
retest this please. |
Test build #86810 has finished for PR 20433 at commit
|
Test build #86849 has finished for PR 20433 at commit
|
@@ -561,8 +561,11 @@ class ExpressionParserSuite extends PlanTest { | |||
Literal(CalendarInterval.fromSingleUnitString(u, s)) | |||
} | |||
|
|||
// Empty interval statement | |||
intercept("interval", "at least one time unit should be given for interval literal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shall still check the empty interval statement, now it shall produce a different error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, antlr just throws an exception when hitting this case;
scala> sql("select cast('2018-01-12' as DATE) + 1 days").show
+---------------------------------------------------------------------------+
|CAST(CAST(CAST(2018-01-12 AS DATE) AS TIMESTAMP) + interval 1 days AS DATE)|
+---------------------------------------------------------------------------+
| 2018-01-13|
+---------------------------------------------------------------------------+
scala> sql("select cast('2018-01-12' as DATE) + interval").show
org.apache.spark.sql.AnalysisException: cannot resolve '`interval`' given input columns: []; line 1 pos 36;
'Project [unresolvedalias((cast(2018-01-12 as date) + 'interval), None)]
+- OneRowRelation
@gatorsmile kindly ping |
ping |
HOUR: 'HOUR' | 'HOURS'; | ||
MINUTE: 'MINUTE' | 'MINUTES'; | ||
SECOND: 'SECOND' | 'SECONDS'; | ||
MILLISECOND: 'MILLISECOND' | 'MILLISECONDS'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering which systems support MILLISECOND
, MICROSECOND
and NANOSECOND
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, it sounds like we already support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea.
SECOND: 'SECOND' | 'SECONDS'; | ||
MILLISECOND: 'MILLISECOND' | 'MILLISECONDS'; | ||
MICROSECOND: 'MICROSECOND' | 'MICROSECONDS'; | ||
NANOSECOND: 'NANOSECOND' | 'NANOSECONDS'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not support nanosecond
.
@@ -790,6 +796,16 @@ ASC: 'ASC'; | |||
DESC: 'DESC'; | |||
FOR: 'FOR'; | |||
INTERVAL: 'INTERVAL'; | |||
YEAR: 'YEAR' | 'YEARS'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update TableIdentifierParserSuite
Could you create BTW, check the discussion in that JIRA? |
ok, I'll update based on the comments soon |
You meant the HIVE jira? If so, no (I'm checking now). Any point I should know? |
better to port all the related tests in Hive
|
None | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to this pr though, I think it is some useful to run tests selectively in SQLQueryTestSuite
(cuz the number of tests there grows recently...). If possibly, could we add this feature in a separate pr? Otherwise, I'll drop this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us create a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll do later.
Test build #88191 has finished for PR 20433 at commit
|
This sounds good to me |
ok, I added related tests from hive |
Test build #88192 has finished for PR 20433 at commit
|
Test build #88198 has finished for PR 20433 at commit
|
Test build #88204 has finished for PR 20433 at commit
|
Test build #88209 has finished for PR 20433 at commit
|
@@ -155,6 +155,7 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) { | |||
case (null, _) => "null" | |||
case (s: String, StringType) => "\"" + s + "\"" | |||
case (decimal, DecimalType()) => decimal.toString | |||
case (interval, CalendarIntervalType) => interval.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test case to capture this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll try to add tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct<> | ||
-- !query 30 output | ||
org.apache.spark.sql.AnalysisException | ||
cannot resolve '(DATE '2012-01-01' + (t.`a` + 1))' due to data type mismatch: differing types in '(DATE '2012-01-01' + (t.`a` + 1))' (date and int).; line 1 pos 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the columns having the matched data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll add tests to check
You mean interval-interval cases? If so, this one? https://github.com/apache/spark/pull/20433/files#diff-24539a8bfdac0a14cec58755dd6565dbR237
Change the PR title to |
aha, got it. Ah,... I just noticed Anyway, we should include the fix to reserve these keywords in this pr? |
My understanding is, this PR is blocked by #23259. Without a well-defined list of keywords, we can't simply change the parser rule of interval literal. To unblock it, we want to build a framework to optionally(controlled by a config) define something as keyword, and then define the interval related keywords first.
Can you check some mainstream databases like postgres, sql server, oracle? |
ok, I'll check.
ok, I'll simplify #23259 first to build the framework, then I'll revisit this. |
I checked if the time unit keywords (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, WEEK, MILLISECOND, MICROSECOND) are reserved in SQL-2011/database implementations: SQL-2011
PostgreSQL, MySQL, Oracle, and SQL Server
Also, all the the plural forms (YEARS, MONTHS, ...) are not reserved in all the cases. As for SQL server, these time unit keywords are listed in future keywords: |
Test build #102428 has finished for PR 20433 at commit
|
41645a8
to
5699547
Compare
@@ -0,0 +1,188 @@ | |||
-- Turns on ANSI mode | |||
SET spark.sql.parser.ansi.enabled=true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving these kinds of ANSI-related tests into a new dir sql-tests/inputs/ansi/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Test build #102714 has finished for PR 20433 at commit
|
Test build #102715 has finished for PR 20433 at commit
|
ping @cloud-fan |
Test build #103287 has finished for PR 20433 at commit
|
Test build #103413 has finished for PR 20433 at commit
|
retest this please |
Test build #103421 has finished for PR 20433 at commit
|
LGTM except one comment: #20433 (comment) |
Test build #103445 has finished for PR 20433 at commit
|
Thanks! Merged to master. |
… when ANSI mode enabled ## What changes were proposed in this pull request? This pr updated parsing rules in `SqlBase.g4` to support a SQL query below when ANSI mode enabled; ``` SELECT CAST('2017-08-04' AS DATE) + 1 days; ``` The current master cannot parse it though, other dbms-like systems support the syntax (e.g., hive and mysql). Also, the syntax is frequently used in the official TPC-DS queries. This pr added new tokens as follows; ``` YEAR | YEARS | MONTH | MONTHS | WEEK | WEEKS | DAY | DAYS | HOUR | HOURS | MINUTE MINUTES | SECOND | SECONDS | MILLISECOND | MILLISECONDS | MICROSECOND | MICROSECONDS ``` Then, it registered the keywords below as the ANSI reserved (this follows SQL-2011); ``` DAY | HOUR | MINUTE | MONTH | SECOND | YEAR ``` ## How was this patch tested? Added tests in `SQLQuerySuite`, `ExpressionParserSuite`, and `TableIdentifierParserSuite`. Closes #20433 from maropu/SPARK-23264. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…nabled" ### What changes were proposed in this pull request? Revert #20433 . ### Why are the changes needed? According to the SQL standard, the INTERVAL prefix is required: ``` <interval literal> ::= INTERVAL [ <sign> ] <interval string> <interval qualifier> <interval string> ::= <quote> <unquoted interval string> <quote> ``` ### Does this PR introduce any user-facing change? yes, but omitting the INTERVAL prefix is a new feature in 3.0 ### How was this patch tested? existing tests Closes #27080 from cloud-fan/interval. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This pr updated parsing rules in
SqlBase.g4
to support a SQL query below when ANSI mode enabled;The current master cannot parse it though, other dbms-like systems support the syntax (e.g., hive and mysql). Also, the syntax is frequently used in the official TPC-DS queries.
This pr added new tokens as follows;
Then, it registered the keywords below as the ANSI reserved (this follows SQL-2011);
How was this patch tested?
Added tests in
SQLQuerySuite
,ExpressionParserSuite
, andTableIdentifierParserSuite
.