-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11783][SQL] Fixes execution Hive client when using remote Hive metastore #9895
[SPARK-11783][SQL] Fixes execution Hive client when using remote Hive metastore #9895
Conversation
Test build #46491 has finished for PR 9895 at commit
|
// Derby metastore created in the temporary directory. Cleaning this configuration for | ||
// the execution Hive client fixes this issue. | ||
timeConfVars ++ Seq( | ||
ConfVars.METASTOREURIS.varname -> "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configure()
is only used by metadataHive
, I'm wondering how this patch could fix the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, made a stupid mistake here... It should be added in HiveContext.newTemporaryConfigurations()
Unfortunately I couldn't figure out a way to add a regression test for this change since it requires a remote Hive metastore to test. I was verifying the result by remote debugging and checking the internal underlying Hive metastore client instance of executionHive
. Somehow came to the wrong conclusion :(
560ca28
to
1d9b85a
Compare
This has been broken for quite a while ever since we introduced the isolated Hive client in 1.4. My theory about the reason why people seldom noticed it is that:
Unfortunately the only reliable way I found to verify this change is to inspect the internal $SPARK_HOME/sbin/start-thriftserver.sh\
--driver-java-options "-agentlib:jdwp=transport=dt_socket,server=y,address=localhost:5005,suspend=y" Then attach the debugger to the endpoint localhost:5005. (Remote debugging facilities in IntelliJ IDEA can be quite neat here.) Also, please refer to the JIRA ticket for more information about how to reproduce this issue locally. |
Test build #46601 has finished for PR 9895 at commit
|
I tested it with remote metastore setup. It indeed fixes the issue. |
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin does mention that |
I am going to merge it to master and branch 1.6. I will modify the comment to point to hive's code and doc. |
… metastore When using remote Hive metastore, `hive.metastore.uris` is set to the metastore URI. However, it overrides `javax.jdo.option.ConnectionURL` unexpectedly, thus the execution Hive client connects to the actual remote Hive metastore instead of the Derby metastore created in the temporary directory. Cleaning this configuration for the execution Hive client fixes this issue. Author: Cheng Lian <lian@databricks.com> Closes #9895 from liancheng/spark-11783.clean-remote-metastore-config. (cherry picked from commit c7f95df) Signed-off-by: Yin Huai <yhuai@databricks.com>
Merged. @liancheng You can find my updated comments in c7f95df. |
@yhuai Cool, thanks for the update. |
@liancheng I use spark 1.6.3 and hive 1.2.1, but i still encounted the same error. Did it be resolved in spark 1.6.3? The below is my hive-site.xml hive.metastore.warehouse.dir /user/hive/warehouse location of default database for the warehouse hive.metastore.uris thrift://spark05:9083 Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore. hive.metastore.connect.retries 3 Number of retries while opening a connection to metastore hive.metastore.failure.retries 3 Number of retries upon failure of Thrift metastore calls |
When using remote Hive metastore,
hive.metastore.uris
is set to the metastore URI. However, it overridesjavax.jdo.option.ConnectionURL
unexpectedly, thus the execution Hive client connects to the actual remote Hive metastore instead of the Derby metastore created in the temporary directory. Cleaning this configuration for the execution Hive client fixes this issue.