[SPARK-47482] Add HiveDialect to sql module #45644

xleoken · 2024-03-21T15:15:03Z

What changes were proposed in this pull request?

Add HiveDialect to sql module

Why are the changes needed?

In scenarios with multiple hive catalogs, throw ParseException

SQL

bin/spark-sql \
  --conf "spark.sql.catalog.aaa=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
  --conf "spark.sql.catalog.aaa.url=jdbc:hive2://172.16.10.12:10000/data" \
  --conf "spark.sql.catalog.aaa.driver=org.apache.hive.jdbc.HiveDriver" \
  --conf "spark.sql.catalog.bbb=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
  --conf "spark.sql.catalog.bbb.url=jdbc:hive2://172.16.10.13:10000/data" \
  --conf "spark.sql.catalog.bbb.driver=org.apache.hive.jdbc.HiveDriver"

select count(1) from aaa.data.data_part;

Exception

24/03/19 21:58:25 INFO HiveSessionImpl: Operation log session directory is created: /tmp/root/operation_logs/f15a5434-6356-455b-aa8e-4ce9903c1b81
24/03/19 21:58:25 INFO SparkExecuteStatementOperation: Submitting query 'SELECT * FROM "data"."data_part" WHERE 1=0' with a7459d6d-2a5c-4b56-945c-3159e58d12fd
24/03/19 21:58:25 INFO SparkExecuteStatementOperation: Running query with a7459d6d-2a5c-4b56-945c-3159e58d12fd
24/03/19 21:58:25 INFO DAGScheduler: Asked to cancel job group a7459d6d-2a5c-4b56-945c-3159e58d12fd
24/03/19 21:58:25 ERROR SparkExecuteStatementOperation: Error executing query with a7459d6d-2a5c-4b56-945c-3159e58d12fd, currentState RUNNING, 
org.apache.spark.sql.catalyst.parser.ParseException: 
Syntax error at or near '"data"'(line 1, pos 14)

== SQL ==
SELECT * FROM "data"."data_part" WHERE 1=0
--------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:89)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)

Does this PR introduce any user-facing change?

no

How was this patch tested?

local test

Was this patch authored or co-authored using generative AI tooling?

no

xleoken · 2024-03-21T15:18:50Z

@xleoken I think you can implements the catalog plugin and register two custom hive jdbc dialects.

Just FYI, SPARK-47496 makes loading a custom dialect much easier.

This is too heavy for users and there's no need for it.

As Daniel Fernandez said, only two functions should be overriden. in https://issues.apache.org/jira/browse/SPARK-22016

https://issues.apache.org/jira/browse/SPARK-21063
https://issues.apache.org/jira/browse/SPARK-22016
https://issues.apache.org/jira/browse/SPARK-31457

dongjoon-hyun

To the reviews, we discussed this here.

[SPARK-47482] Add HiveDialect to sql module #45609

And, old PRs,

xleoken · 2024-03-22T00:46:10Z

Hi @dongjoon-hyun @yaooqinn @HyukjinKwon, please look into this issue seriously. The old related PRs hasn't been active for a long time, we can discuss this here.

When we met this issue, the client told me Table or view not found, while the server told org.apache.spark.sql.catalyst.parser.ParseException. We spend a lot time to analyze this issue, and solved it.

By the way, can throw not support jdbc:hive2 exception directly? Or update the doc to told user need to custom dialect.

Make a list

From the following exception stacktrace, we need to spend a lot of time analyzing that the root cause of this problem is from JdbcDialects#quoteIdentifier.
It can be provided as a thridparty library or implements the catalog plugin, it is too heavy for users.
As yaooqinn said, it's difficult to register a custom JDBC dialect to use. [SPARK-47496][SQL] Java SPI Support for dynamic JDBC dialect registering #45626

1、Startup thriftserver

sbin/start-thriftserver.sh

2、Startup spark-shell

bin/spark-shell \
--conf spark.sql.catalog.aaa=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog \
--conf spark.sql.catalog.aaa.url=jdbc:hive2://172.16.10.12:10000/data \
--conf spark.sql.catalog.aaa.driver=org.apache.hive.jdbc.HiveDriver

3、Query

select * from aaa.data.data_part limit 1

4、Client Exception : (Table or view not found: aaa.data.data_part)

scala> spark.sql("select * from aaa.data.data_part limit 1").show();
24/03/22 08:35:53 WARN HiveConnection: Failed to connect to 172.16.10.12:10000
org.apache.spark.sql.AnalysisException: Table or view not found: aaa.data.data_part; line 1 pos 14;
'GlobalLimit 1
+- 'LocalLimit 1
   +- 'Project [*]
      +- 'UnresolvedRelation [aaa, data, data_part], [], false

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:131)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)

5、Server Exception (org.apache.spark.sql.catalyst.parser.ParseException)

24/03/22 08:45:42 INFO ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V10
24/03/22 08:45:42 INFO HiveSessionImpl: Operation log session directory is created: /tmp/root/operation_logs/4d373392-cc24-45fd-b9b7-4e27eeb48292
24/03/22 08:45:42 INFO SparkExecuteStatementOperation: Submitting query 'SELECT * FROM "data"."data_part" WHERE 1=0' with b5e0d91c-6d3f-4a79-9bd6-d78233150e56
24/03/22 08:45:42 INFO SparkExecuteStatementOperation: Running query with b5e0d91c-6d3f-4a79-9bd6-d78233150e56
24/03/22 08:45:42 INFO DAGScheduler: Asked to cancel job group b5e0d91c-6d3f-4a79-9bd6-d78233150e56
24/03/22 08:45:42 ERROR SparkExecuteStatementOperation: Error executing query with b5e0d91c-6d3f-4a79-9bd6-d78233150e56, currentState RUNNING, 
org.apache.spark.sql.catalyst.parser.ParseException: 
Syntax error at or near '"data"'(line 1, pos 14)

== SQL ==
SELECT * FROM "data"."data_part" WHERE 1=0
--------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:89)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)

charlesy6 · 2024-03-23T05:31:34Z

This patch works for me too.

xleoken · 2024-03-26T01:09:21Z

cc @dongjoon-hyun @yaooqinn @HyukjinKwon

Stale

github-actions · 2024-12-03T00:27:14Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added the SQL label Mar 21, 2024

dongjoon-hyun previously requested changes Mar 21, 2024

View reviewed changes

xleoken force-pushed the patch branch 2 times, most recently from 4eda53e to bb58cab Compare March 25, 2024 10:35

xleoken force-pushed the patch branch 7 times, most recently from ebb3a3e to c3ebf30 Compare April 3, 2024 07:08

xleoken force-pushed the patch branch 2 times, most recently from 1a9a6f7 to 9fdb990 Compare April 8, 2024 01:02

kristgpt approved these changes Apr 8, 2024

View reviewed changes

xleoken force-pushed the patch branch 11 times, most recently from b20f84b to 80d5e4f Compare April 10, 2024 08:51

xleoken force-pushed the patch branch 12 times, most recently from b4ed745 to 2e9fc8b Compare May 15, 2024 00:34

xleoken force-pushed the patch branch 4 times, most recently from 52dd798 to a100cc9 Compare May 24, 2024 02:26

xleoken requested a review from kristgpt May 24, 2024 09:38

xleoken force-pushed the patch branch 2 times, most recently from 90b0e05 to 1859d90 Compare May 28, 2024 01:13

xleoken force-pushed the patch branch from 1859d90 to 855332a Compare June 3, 2024 03:00

xleoken force-pushed the patch branch 2 times, most recently from 36c61b3 to 5f2d815 Compare June 17, 2024 01:30

xleoken force-pushed the patch branch 2 times, most recently from e9e5778 to 3ac43da Compare June 26, 2024 05:19

xleoken force-pushed the patch branch from 3ac43da to 92240be Compare June 30, 2024 10:06

[SPARK-47482] Add HiveDialect to sql module

19e2452

xleoken force-pushed the patch branch from 92240be to 19e2452 Compare August 24, 2024 12:47

github-actions bot added the Stale label Dec 3, 2024

github-actions bot closed this Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47482] Add HiveDialect to sql module #45644

[SPARK-47482] Add HiveDialect to sql module #45644

xleoken commented Mar 21, 2024

xleoken commented Mar 21, 2024 •

edited

Loading

dongjoon-hyun left a comment

xleoken commented Mar 22, 2024 •

edited

Loading

charlesy6 commented Mar 23, 2024

xleoken commented Mar 26, 2024

github-actions bot commented Dec 3, 2024

[SPARK-47482] Add HiveDialect to sql module #45644

[SPARK-47482] Add HiveDialect to sql module #45644

Conversation

xleoken commented Mar 21, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

xleoken commented Mar 21, 2024 • edited Loading

dongjoon-hyun left a comment

Choose a reason for hiding this comment

xleoken commented Mar 22, 2024 • edited Loading

charlesy6 commented Mar 23, 2024

xleoken commented Mar 26, 2024

github-actions bot commented Dec 3, 2024

xleoken commented Mar 21, 2024 •

edited

Loading

xleoken commented Mar 22, 2024 •

edited

Loading