[SPARK-5213] [SQL] Pluggable SQL Parser Support #4015

chenghao-intel · 2015-01-13T08:29:09Z

This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI.

# add the jar into the classpath
$hcheng@mydesktop:spark>bin/spark-sql --jars sql99.jar

-- switch to "hiveql" dialect
   spark-sql>SET spark.sql.dialect=hiveql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to "sql" dialect
   spark-sql>SET spark.sql.dialect=sql;
   spark-sql>SELECT * FROM src LIMIT 1;

-- switch to a custom dialect
   spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect;
   spark-sql>SELECT * FROM src LIMIT 1;

-- register the non-exist SQL dialect
   spark-sql> SET spark.sql.dialect=NotExistedClass;
   spark-sql> SELECT * FROM src LIMIT 1;
-- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext)

SparkQA · 2015-01-13T08:32:37Z

Test build #25465 has started for PR 4015 at commit 2fe7d99.

This patch merges cleanly.

OopsOutOfMemory · 2015-01-13T09:47:35Z

nice feature. 👍

SparkQA · 2015-01-13T10:32:37Z

Test build #25465 timed out for PR 4015 at commit 2fe7d99 after a configured wait of 120m.

AmplabJenkins · 2015-01-13T10:32:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25465/
Test FAILed.

scwf · 2015-01-13T11:37:19Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

+@AlphaComponent
+abstract class SQLDialect {
+  /**
+   * We assume the DDLParser has higher priority than any of the other SQL Parsers,


This assumption may lead to some problem, an example from #3935 (comment).

I agree with @scwf
Since our goal is to support variety sql dialects, we can not expect them all have the same behaviours so that the priority of parser is a problem.
What about leave each dialect's own implementation and abstract a method in SQLDialect to let each dialect implement their own order of parsing ?
And Sorry if I'm wrong.

I think the difference about describe table between hive and sparksql is a known issue, we added those cases involved into blacklist in HiveCompatibilitySuite.

Well, even if we moved the extended parser first, I don't think we want to skip the DDLParser, right? in the meantime, we have to consider the parsing fallback (once fail, we have to resort to the DDLParser) for EVERY extended parser, then, why NOT just do the fallback in DDLParser by moving it ahead of time? That's exactly the currently implementation!

And I don't think the issues @scwf described is the motive we need to update the code here, probably a better solution is we define a unified DescribeCommand logical node, and it can be casted into different execution within the context (HiveContext / SQLContext).

Agree @chenghao-intel , we can define a unified DescribeCommand for that issue. And the order of DDLParser and sqlParser is not a big point since they cover different sql syntax range.

SparkQA · 2015-01-14T07:17:38Z

Test build #25519 has started for PR 4015 at commit 336cd89.

This patch merges cleanly.

SparkQA · 2015-01-14T08:26:27Z

Test build #25519 has finished for PR 4015 at commit 336cd89.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class SQLDialect
- class DefaultSQLDialect extends SQLDialect
- sys.error(s"$clazz is not the subclass of $
- class HiveQLDialect extends SQLDialect

AmplabJenkins · 2015-01-14T08:26:30Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25519/
Test PASSed.

marmbrus · 2015-01-20T00:57:29Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

@@ -71,7 +178,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
  def getConf(key: String): String = conf.getConf(key)

  /**
-   * Return the value of Spark SQL configuration property for the given key. If the key is not set
+   * Return the value of Sparkf SQL configuration property for the given key. If the key is not set


SparkQA · 2015-01-20T07:27:38Z

Test build #25803 has started for PR 4015 at commit 983d53c.

This patch merges cleanly.

SparkQA · 2015-01-20T07:31:18Z

Test build #25803 has finished for PR 4015 at commit 983d53c.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-20T07:31:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25803/
Test FAILed.

SparkQA · 2015-01-20T07:37:48Z

Test build #25807 has started for PR 4015 at commit d958589.

This patch merges cleanly.

SparkQA · 2015-01-20T07:41:26Z

Test build #25807 has finished for PR 4015 at commit d958589.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class SQLDialect
- class DefaultSQLDialect extends SQLDialect
- class HiveQLDialect extends SQLDialect

AmplabJenkins · 2015-01-20T07:41:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25807/
Test FAILed.

SparkQA · 2015-01-20T07:47:40Z

Test build #25808 has started for PR 4015 at commit 1c6edfa.

This patch merges cleanly.

SparkQA · 2015-01-20T07:59:24Z

Test build #25808 has finished for PR 4015 at commit 1c6edfa.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class SQLDialect
- class DefaultSQLDialect extends SQLDialect
- class HiveQLDialect extends SQLDialect

AmplabJenkins · 2015-01-20T07:59:26Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25808/
Test FAILed.

SparkQA · 2015-01-21T03:02:43Z

Test build #25866 has started for PR 4015 at commit c8f154d.

This patch merges cleanly.

SparkQA · 2015-01-21T04:13:50Z

Test build #25866 has finished for PR 4015 at commit c8f154d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class SQLDialect
- class DefaultSQLDialect extends SQLDialect
- class HiveQLDialect extends SQLDialect

AmplabJenkins · 2015-01-21T04:13:53Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25866/
Test PASSed.

SparkQA · 2015-01-22T06:27:32Z

Test build #25947 has started for PR 4015 at commit b0e8084.

This patch merges cleanly.

SparkQA · 2015-01-22T07:31:29Z

Test build #25947 has finished for PR 4015 at commit b0e8084.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- protected case class Keyword(str: String)
- class SqlLexical extends StdLexical
- abstract class SQLDialect
- class DefaultSQLDialect extends SQLDialect
- class HiveQLDialect extends SQLDialect

marmbrus · 2015-04-24T19:37:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/Dialect.scala

+ * interface for advanced user.
+ *
+ */
+abstract class Dialect {


An abstract interface for adding a new SQL dialect. A `Dialect` is responsible for creating a logical plan from a string representation of a query. Since the `LogicalPlan` interface is not a public stable API, custom dialects will likely be tied to specific Spark releases.

Explicitly annotate this as an @DeveloperAPI.

marmbrus · 2015-04-24T19:41:02Z

Final comments to improve user documentation. Otherwise LGTM.

chenghao-intel · 2015-04-27T01:04:47Z

test this please

chenghao-intel · 2015-04-27T02:34:48Z

retest this please

chenghao-intel · 2015-04-27T06:11:24Z

@liancheng @rxin @marmbrus can you trigger the unit test for me?

Thanks.

rxin · 2015-04-27T06:14:45Z

I think Jenkins is having some trouble right now.

rxin · 2015-04-27T06:14:52Z

Jenkins, retest this please.

chenghao-intel · 2015-04-28T00:30:14Z

Jenkins, retest this please.

SparkQA · 2015-04-28T00:32:45Z

Test build #31088 has started for PR 4015 at commit 493775c.

SparkQA · 2015-04-28T02:30:13Z

Test build #31088 has finished for PR 4015 at commit 493775c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class Dialect
- class DialectException(msg: String, cause: Throwable) extends Exception(msg, cause)
This patch adds the following new dependencies:
- tachyon-0.6.4.jar
- tachyon-client-0.6.4.jar
This patch removes the following dependencies:
- tachyon-0.5.0.jar
- tachyon-client-0.5.0.jar

chenghao-intel · 2015-04-28T02:48:14Z

cc @marmbrus

marmbrus · 2015-05-01T01:51:01Z

Thanks, merged to master.

scwf · 2015-05-01T03:29:47Z

@marmbrus #5727 merged so actually this will fail mima test, now master branch failed due to mima check since you have merged this PR.

based on #4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <hao.cheng@intel.com> Author: scwf <wangfei1@huawei.com> Closes #5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI. ``` # add the jar into the classpath $hchengmydesktop:spark>bin/spark-sql --jars sql99.jar -- switch to "hiveql" dialect spark-sql>SET spark.sql.dialect=hiveql; spark-sql>SELECT * FROM src LIMIT 1; -- switch to "sql" dialect spark-sql>SET spark.sql.dialect=sql; spark-sql>SELECT * FROM src LIMIT 1; -- switch to a custom dialect spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect; spark-sql>SELECT * FROM src LIMIT 1; -- register the non-exist SQL dialect spark-sql> SET spark.sql.dialect=NotExistedClass; spark-sql> SELECT * FROM src LIMIT 1; -- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits: 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <hao.cheng@intel.com> Author: scwf <wangfei1@huawei.com> Closes apache#5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI. ``` # add the jar into the classpath $hchengmydesktop:spark>bin/spark-sql --jars sql99.jar -- switch to "hiveql" dialect spark-sql>SET spark.sql.dialect=hiveql; spark-sql>SELECT * FROM src LIMIT 1; -- switch to "sql" dialect spark-sql>SET spark.sql.dialect=sql; spark-sql>SELECT * FROM src LIMIT 1; -- switch to a custom dialect spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect; spark-sql>SELECT * FROM src LIMIT 1; -- register the non-exist SQL dialect spark-sql> SET spark.sql.dialect=NotExistedClass; spark-sql> SELECT * FROM src LIMIT 1; -- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits: 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <hao.cheng@intel.com> Author: scwf <wangfei1@huawei.com> Closes apache#5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

This PR aims to make the SQL Parser Pluggable, and user can register it's own parser via Spark SQL CLI. ``` # add the jar into the classpath $hchengmydesktop:spark>bin/spark-sql --jars sql99.jar -- switch to "hiveql" dialect spark-sql>SET spark.sql.dialect=hiveql; spark-sql>SELECT * FROM src LIMIT 1; -- switch to "sql" dialect spark-sql>SET spark.sql.dialect=sql; spark-sql>SELECT * FROM src LIMIT 1; -- switch to a custom dialect spark-sql>SET spark.sql.dialect=com.xxx.xxx.SQL99Dialect; spark-sql>SELECT * FROM src LIMIT 1; -- register the non-exist SQL dialect spark-sql> SET spark.sql.dialect=NotExistedClass; spark-sql> SELECT * FROM src LIMIT 1; -- Exception will be thrown and switch to default sql dialect ("sql" for SQLContext and "hiveql" for HiveContext) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes apache#4015 from chenghao-intel/sqlparser and squashes the following commits: 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

based on apache#4015, we should not delete `sqlParser` from sqlcontext, that leads to mima failed. Users implement dialect to give a fallback for `sqlParser` and we should construct `sqlParser` in sqlcontext according to the dialect `protected[sql] val sqlParser = new SparkSQLParser(getSQLDialect().parse(_))` Author: Cheng Hao <hao.cheng@intel.com> Author: scwf <wangfei1@huawei.com> Closes apache#5827 from scwf/sqlparser1 and squashes the following commits: 81b9737 [scwf] comment fix 0878bd1 [scwf] remove comments c19780b [scwf] fix mima tests c2895cf [scwf] Merge branch 'master' of https://github.com/apache/spark into sqlparser1 493775c [Cheng Hao] update the code as feedback 81a731f [Cheng Hao] remove the unecessary comment aab0b0b [Cheng Hao] polish the code a little bit 49b9d81 [Cheng Hao] shrink the comment for rebasing

scwf reviewed Jan 13, 2015
View reviewed changes

chenghao-intel force-pushed the sqlparser branch from 2fe7d99 to 336cd89 Compare January 14, 2015 07:15

marmbrus reviewed Jan 20, 2015
View reviewed changes

chenghao-intel force-pushed the sqlparser branch from 336cd89 to 4f7f626 Compare January 20, 2015 07:24

chenghao-intel mentioned this pull request Jan 22, 2015

[SPARK-5009] [SQL] Long keyword support in SQL Parsers #3926

Closed

chenghao-intel force-pushed the sqlparser branch from c8f154d to b0e8084 Compare January 22, 2015 06:26

chenghao-intel changed the title ~~[SPARK-5213] [SQL] [WIP] Sql Parser dialect support~~ [SPARK-5213] [SQL] Sql Parser dialect support Jan 22, 2015

marmbrus reviewed Apr 24, 2015
View reviewed changes

update the code as feedback

493775c

asfgit closed this in 3ba5aaa May 1, 2015

zhzhan mentioned this pull request May 1, 2015

[SPARK-6479][Block Manager]Create off-heap block storage API #5430

Closed

This was referenced May 1, 2015

[SPARK-4699] [SQL] Make caseSensitive configurable in spark sql analyzer #5806

Closed

[SPARK-5213] [SQL] Pluggable SQL Parser Support #5827

Closed

chenghao-intel deleted the sqlparser branch July 2, 2015 08:34

HyukjinKwon mentioned this pull request Mar 21, 2024

[SPARK-47482] Add HiveDialect to sql module #45609

Closed

dongjoon-hyun mentioned this pull request Mar 21, 2024

[SPARK-47482] Add HiveDialect to sql module #45644

Closed

[SPARK-5213] [SQL] Pluggable SQL Parser Support #4015

[SPARK-5213] [SQL] Pluggable SQL Parser Support #4015

Conversation

chenghao-intel commented Jan 13, 2015

SparkQA commented Jan 13, 2015

OopsOutOfMemory commented Jan 13, 2015

SparkQA commented Jan 13, 2015

AmplabJenkins commented Jan 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 14, 2015

SparkQA commented Jan 14, 2015

AmplabJenkins commented Jan 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 20, 2015

SparkQA commented Jan 20, 2015

AmplabJenkins commented Jan 20, 2015

SparkQA commented Jan 20, 2015

SparkQA commented Jan 20, 2015

AmplabJenkins commented Jan 20, 2015

SparkQA commented Jan 20, 2015

SparkQA commented Jan 20, 2015

AmplabJenkins commented Jan 20, 2015

SparkQA commented Jan 21, 2015

SparkQA commented Jan 21, 2015

AmplabJenkins commented Jan 21, 2015

SparkQA commented Jan 22, 2015

SparkQA commented Jan 22, 2015

Choose a reason for hiding this comment

marmbrus commented Apr 24, 2015

chenghao-intel commented Apr 27, 2015

chenghao-intel commented Apr 27, 2015

chenghao-intel commented Apr 27, 2015

rxin commented Apr 27, 2015

rxin commented Apr 27, 2015

chenghao-intel commented Apr 28, 2015

SparkQA commented Apr 28, 2015

SparkQA commented Apr 28, 2015

chenghao-intel commented Apr 28, 2015

marmbrus commented May 1, 2015

scwf commented May 1, 2015