Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11191] [SQL] Looks up temporary function using execution Hive client #9664

Closed

Conversation

liancheng
Copy link
Contributor

When looking up Hive temporary functions, we should always use the SessionState within the execution Hive client, since temporary functions are registered there.

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45743 has finished for PR 9664 at commit 8601fce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Opened PR #9671 for branch-1.5.

@marmbrus
Copy link
Contributor

This LGTM, but what about functions that are added to the metastore? Should we also check the metadata hive next?

Eitherway, I think we can merge this now to restore old behavior.

asfgit pushed a commit that referenced this pull request Nov 12, 2015
…lient

When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary functions are registered there.

Author: Cheng Lian <lian@databricks.com>

Closes #9664 from liancheng/spark-11191.fix-temp-function.

(cherry picked from commit 4fe99c7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in 4fe99c7 Nov 12, 2015
@yhuai
Copy link
Contributor

yhuai commented Nov 12, 2015

@marmbrus Hive UDFs stored in metastore is a harder problem. For those functions, we need to talk to metadataHive. But, when we do function lookup, we do not know which metastore to search. Maybe we can first try executionHive and if it failed, we try metadataHive.

@marmbrus
Copy link
Contributor

+1

@@ -454,7 +454,7 @@ class HiveContext private[hive](
// Note that HiveUDFs will be overridden by functions registered in this context.
@transient
override protected[sql] lazy val functionRegistry: FunctionRegistry =
new HiveFunctionRegistry(FunctionRegistry.builtin.copy()) {
new HiveFunctionRegistry(FunctionRegistry.builtin.copy(), this) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng ah, I just noticed that we override lookupFunction at here and wrap the super.lookupFunction(name, children) in executionHive.withHiveState. Does that already resolve the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. At first I didn't notice this part either. Just reading the code, I'd assume that this already fixes the issue. But it wasn't the case.

After some investigation, I'm quite puzzled by the behavior here. Without this PR, we can add a jar, create a UDTF from the jar, and apply this UDTF in SQL queries successfully. However, DESCRIBE FUNCTION still returns "Function: is not found". I tried single-step debugging DescribeFunction and noticed that the sqlContext.functionRegistry.lookupFunction call goes directly to HiveFunctionRegistry.lookupFunction without calling the overriden version defined in this anonymous class. I probably missed something important here.

Anyway, now we can remove this anonymous class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a PR to remove this anonymous class from master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #9737 to clean this up.

dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
…lient

When looking up Hive temporary functions, we should always use the `SessionState` within the execution Hive client, since temporary functions are registered there.

Author: Cheng Lian <lian@databricks.com>

Closes apache#9664 from liancheng/spark-11191.fix-temp-function.
asfgit pushed a commit that referenced this pull request Nov 15, 2015
The main purpose of this PR is to backport #9664, which depends on #9277.

Author: Cheng Lian <lian@databricks.com>

Closes #9671 from liancheng/spark-11191.fix-temp-function.branch-1_5.
@liancheng liancheng deleted the spark-11191.fix-temp-function branch November 16, 2015 13:48
asfgit pushed a commit that referenced this pull request Nov 17, 2015
…ctionRegistry

According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now.

Author: Cheng Lian <lian@databricks.com>

Closes #9737 from liancheng/spark-11191.follow-up.
asfgit pushed a commit that referenced this pull request Nov 17, 2015
…ctionRegistry

According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now.

Author: Cheng Lian <lian@databricks.com>

Closes #9737 from liancheng/spark-11191.follow-up.

(cherry picked from commit fa13301)
Signed-off-by: Cheng Lian <lian@databricks.com>
asfgit pushed a commit that referenced this pull request Nov 17, 2015
…ctionRegistry

According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now.

Author: Cheng Lian <lian@databricks.com>

Closes #9737 from liancheng/spark-11191.follow-up.

(cherry picked from commit fa13301)
Signed-off-by: Cheng Lian <lian@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants