Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21499] [SQL] Support creating persistent function for Spark UDAF(UserDefinedAggregateFunction) #18700

Closed
wants to merge 13 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jul 21, 2017

What changes were proposed in this pull request?

This PR is to enable users to create persistent Scala UDAF (that extends UserDefinedAggregateFunction).

CREATE FUNCTION myDoubleAvg AS 'test.org.apache.spark.sql.MyDoubleAvg'

Before this PR, Spark UDAF only can be registered through the API spark.udf.register(...)

How was this patch tested?

Added test cases

@SparkQA
Copy link

SparkQA commented Jul 21, 2017

Test build #79829 has finished for PR 18700 at commit 4028155.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new AnalysisException(s\"Can not load class '$className' when registering \" +

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80553 has finished for PR 18700 at commit 4028155.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new AnalysisException(s\"Can not load class '$className' when registering \" +

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80575 has finished for PR 18700 at commit a65607c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new AnalysisException(s\"Can not load class '$className' when registering \" +

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80577 has finished for PR 18700 at commit 12cefc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @ueshin

(children: Seq[Expression]) => {
try {
val clsForUDAF =
Utils.classForName("org.apache.spark.sql.expressions.UserDefinedAggregateFunction")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move the UDAF interface to catalyst?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
 * The base class for implementing user-defined aggregate functions (UDAF).
 *
 * @since 1.5.0
 */
@InterfaceStability.Stable
abstract class UserDefinedAggregateFunction

This interface has been marked as stable. Can we still move it? or make a trait in Catalyst?

@@ -1096,8 +1099,42 @@ class SessionCatalog(
* This performs reflection to decide what type of [[Expression]] to return in the builder.
*/
protected def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = {
Copy link
Contributor

@cloud-fan cloud-fan Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be overwritten by HiveSessionCatalog, does it mean we can not register spark UDAF if hive support is enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here are for HiveSessionCatalog. Also, we have a test case in HiveUDAFSuite.scala to verify it.

@SparkQA
Copy link

SparkQA commented Aug 21, 2017

Test build #80924 has finished for PR 18700 at commit bd5ae26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// When we instantiate hive UDF wrapper class, we may throw exception if the input
// expressions don't satisfy the hive UDF, such as type mismatch, input number
// mismatch, etc. Here we catch the exception and throw AnalysisException instead.
override def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need to overwrite this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for issuing different exceptions.

@SparkQA
Copy link

SparkQA commented Aug 21, 2017

Test build #80933 has finished for PR 18700 at commit 7251be9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val clazz = Utils.classForName(functionClassName)
(children: Seq[Expression]) => {
try {
makeFunctionExpression(name, Utils.classForName(functionClassName), children).getOrElse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils.classForName(functionClassName) -> clazz

protected def makeFunctionBuilder(name: String, functionClassName: String): FunctionBuilder = {
// TODO: at least support UDAFs here
throw new UnsupportedOperationException("Use sqlContext.udf.register(...) instead.")
protected def makeFunctionExpression(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we need to catch exception for this method anyway, how about we just make this method return Expression and document that it can throw exception if the given class is not supported? Then HiveSessionCatalog can define its own exception message.

@SparkQA
Copy link

SparkQA commented Aug 22, 2017

Test build #80974 has finished for PR 18700 at commit d3fbdc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* Checks whether the Hive metastore is being used
*/
private def isUsingHiveMetastore: Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this?

}

/**
* Constructs a [[FunctionBuilder]] based on the provided class that represents a function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method returns Expression, not FunctionBuilder

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Aug 22, 2017

Test build #80992 has finished for PR 18700 at commit 57607b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2017

Test build #80995 has finished for PR 18700 at commit 05e8168.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2017

Test build #80997 has finished for PR 18700 at commit 50224a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2017

Test build #80996 has finished for PR 18700 at commit aff8f9e.

  • This patch fails SparkR unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

The latest build #80997 passed.

@gatorsmile
Copy link
Member Author

Thanks! Merging to master.

@asfgit asfgit closed this in 43d71d9 Aug 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants