Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11783][SQL] Fixes execution Hive client when using remote Hive metastore #9895

Conversation

liancheng
Copy link
Contributor

When using remote Hive metastore, hive.metastore.uris is set to the metastore URI. However, it overrides javax.jdo.option.ConnectionURL unexpectedly, thus the execution Hive client connects to the actual remote Hive metastore instead of the Derby metastore created in the temporary directory. Cleaning this configuration for the execution Hive client fixes this issue.

@SparkQA
Copy link

SparkQA commented Nov 22, 2015

Test build #46491 has finished for PR 9895 at commit 560ca28.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Derby metastore created in the temporary directory. Cleaning this configuration for
// the execution Hive client fixes this issue.
timeConfVars ++ Seq(
ConfVars.METASTOREURIS.varname -> ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configure() is only used by metadataHive, I'm wondering how this patch could fix the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, made a stupid mistake here... It should be added in HiveContext.newTemporaryConfigurations() Unfortunately I couldn't figure out a way to add a regression test for this change since it requires a remote Hive metastore to test. I was verifying the result by remote debugging and checking the internal underlying Hive metastore client instance of executionHive. Somehow came to the wrong conclusion :(

@liancheng liancheng force-pushed the spark-11783.clean-remote-metastore-config branch from 560ca28 to 1d9b85a Compare November 24, 2015 11:27
@liancheng
Copy link
Contributor Author

This has been broken for quite a while ever since we introduced the isolated Hive client in 1.4. My theory about the reason why people seldom noticed it is that:

  1. Commands executed by the execution Hive client are mostly transient, they don't touch data stored in the real metastore. Thus logically it doesn't matter which Hive client execute them.
  2. Even if the remote Hive metastore runs a version that is lower than Spark SQL's execution Hive client, it still works as long as the Thrift protocols used by involved commands are backwards compatible.
  3. Although we've already upgraded to Hive 1.2.1, we haven't implemented many advanced features that only exist in new Hive versions yet, thus most commands taken by the execution Hive client are indeed backwards compatible with lower versions.

Unfortunately the only reliable way I found to verify this change is to inspect the internal HiveMetaStoreClient instance of the execution Hive client via remote debugging. Because we need a remote Hive metastore here. For example, we can start the Thrift server using:

$SPARK_HOME/sbin/start-thriftserver.sh\
  --driver-java-options "-agentlib:jdwp=transport=dt_socket,server=y,address=localhost:5005,suspend=y"

Then attach the debugger to the endpoint localhost:5005. (Remote debugging facilities in IntelliJ IDEA can be quite neat here.)

Also, please refer to the JIRA ticket for more information about how to reproduce this issue locally.

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #46601 has finished for PR 9895 at commit 1d9b85a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Nov 24, 2015

I tested it with remote metastore setup. It indeed fixes the issue.

@yhuai
Copy link
Contributor

yhuai commented Nov 24, 2015

https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin does mention that If hive.metastore.uris is empty local mode is assumed, remote otherwise. Also https://github.com/apache/hive/blob/release-1.2.1/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2694-L2699 shows that if metastore uri is set, the mode will not be local.

@yhuai
Copy link
Contributor

yhuai commented Nov 24, 2015

I am going to merge it to master and branch 1.6. I will modify the comment to point to hive's code and doc.

asfgit pushed a commit that referenced this pull request Nov 24, 2015
… metastore

When using remote Hive metastore, `hive.metastore.uris` is set to the metastore URI.  However, it overrides `javax.jdo.option.ConnectionURL` unexpectedly, thus the execution Hive client connects to the actual remote Hive metastore instead of the Derby metastore created in the temporary directory.  Cleaning this configuration for the execution Hive client fixes this issue.

Author: Cheng Lian <lian@databricks.com>

Closes #9895 from liancheng/spark-11783.clean-remote-metastore-config.

(cherry picked from commit c7f95df)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@yhuai
Copy link
Contributor

yhuai commented Nov 24, 2015

Merged. @liancheng You can find my updated comments in c7f95df.

@asfgit asfgit closed this in c7f95df Nov 24, 2015
@liancheng liancheng deleted the spark-11783.clean-remote-metastore-config branch November 25, 2015 00:54
@liancheng
Copy link
Contributor Author

@yhuai Cool, thanks for the update.

@djh4230
Copy link

djh4230 commented Dec 27, 2016

@liancheng I use spark 1.6.3 and hive 1.2.1, but i still encounted the same error. Did it be resolved in spark 1.6.3? The below is my hive-site.xml

hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse hive.metastore.uris thrift://spark05:9083 Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore. hive.metastore.connect.retries 3 Number of retries while opening a connection to metastore hive.metastore.failure.retries 3 Number of retries upon failure of Thrift metastore calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants