-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11191] [SQL] Looks up temporary function using execution Hive client #9664
Conversation
Test build #45743 has finished for PR 9664 at commit
|
Opened PR #9671 for branch-1.5. |
This LGTM, but what about functions that are added to the metastore? Should we also check the metadata hive next? Eitherway, I think we can merge this now to restore old behavior. |
…lient When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary functions are registered there. Author: Cheng Lian <lian@databricks.com> Closes #9664 from liancheng/spark-11191.fix-temp-function. (cherry picked from commit 4fe99c7) Signed-off-by: Michael Armbrust <michael@databricks.com>
@marmbrus Hive UDFs stored in metastore is a harder problem. For those functions, we need to talk to metadataHive. But, when we do function lookup, we do not know which metastore to search. Maybe we can first try executionHive and if it failed, we try metadataHive. |
+1 |
@@ -454,7 +454,7 @@ class HiveContext private[hive]( | |||
// Note that HiveUDFs will be overridden by functions registered in this context. | |||
@transient | |||
override protected[sql] lazy val functionRegistry: FunctionRegistry = | |||
new HiveFunctionRegistry(FunctionRegistry.builtin.copy()) { | |||
new HiveFunctionRegistry(FunctionRegistry.builtin.copy(), this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liancheng ah, I just noticed that we override lookupFunction at here and wrap the super.lookupFunction(name, children)
in executionHive.withHiveState
. Does that already resolve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. At first I didn't notice this part either. Just reading the code, I'd assume that this already fixes the issue. But it wasn't the case.
After some investigation, I'm quite puzzled by the behavior here. Without this PR, we can add a jar, create a UDTF from the jar, and apply this UDTF in SQL queries successfully. However, DESCRIBE FUNCTION
still returns "Function: is not found". I tried single-step debugging DescribeFunction
and noticed that the sqlContext.functionRegistry.lookupFunction
call goes directly to HiveFunctionRegistry.lookupFunction
without calling the overriden version defined in this anonymous class. I probably missed something important here.
Anyway, now we can remove this anonymous class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a PR to remove this anonymous class from master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #9737 to clean this up.
…lient When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary functions are registered there. Author: Cheng Lian <lian@databricks.com> Closes apache#9664 from liancheng/spark-11191.fix-temp-function.
The main purpose of this PR is to backport #9664, which depends on #9277. Author: Cheng Lian <lian@databricks.com> Closes #9671 from liancheng/spark-11191.fix-temp-function.branch-1_5.
…ctionRegistry According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now. Author: Cheng Lian <lian@databricks.com> Closes #9737 from liancheng/spark-11191.follow-up.
…ctionRegistry According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now. Author: Cheng Lian <lian@databricks.com> Closes #9737 from liancheng/spark-11191.follow-up. (cherry picked from commit fa13301) Signed-off-by: Cheng Lian <lian@databricks.com>
…ctionRegistry According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now. Author: Cheng Lian <lian@databricks.com> Closes #9737 from liancheng/spark-11191.follow-up. (cherry picked from commit fa13301) Signed-off-by: Cheng Lian <lian@databricks.com>
When looking up Hive temporary functions, we should always use the
SessionState
within the execution Hive client, since temporary functions are registered there.