-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34388][SQL] Propagate the registered UDF name to ScalaUDF, ScalaUDAF and ScalaAggregator #31500
Conversation
Test build #134963 has finished for PR 31500 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #134977 has finished for PR 31500 at commit
|
@cloud-fan this is from #31273 (comment) |
@@ -1088,4 +1088,6 @@ trait ComplexTypeMergingExpression extends Expression { | |||
* Common base trait for user-defined functions, including UDF/UDAF/UDTF of different languages | |||
* and Hive function wrappers. | |||
*/ | |||
trait UserDefinedExpression | |||
trait UserDefinedExpression { | |||
def name: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe default to using the class name or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal trait, seems OK to require it.
thanks, merging to master! |
…laUDAF and ScalaAggregator ### What changes were proposed in this pull request? This PR proposes to propagate the name used for registering UDFs to `ScalaUDF`, `ScalaUDAF` and `ScaalAggregator`. Note that `PythonUDF` gets the name correctly: https://github.com/apache/spark/blob/466c045bfac20b6ce19f5a3732e76a5de4eb4e4a/python/pyspark/sql/udf.py#L358-L359 , and same for Hive UDFs: https://github.com/apache/spark/blob/466c045bfac20b6ce19f5a3732e76a5de4eb4e4a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala#L67 ### Why are the changes needed? This PR can help in the following scenarios: 1) Better EXPLAIN output 2) By adding `def name: String` to `UserDefinedExpression`, we can match an expression by `UserDefinedExpression` and look up the catalog, an use case needed for apache#31273. ### Does this PR introduce _any_ user-facing change? The EXPLAIN output involving udfs will be changed to use the name used for UDF registration. For example, for the following: ``` sql("CREATE TEMPORARY FUNCTION test_udf AS 'org.apache.spark.examples.sql.Spark33084'") sql("SELECT test_udf(col1) FROM VALUES (1), (2), (3)").explain(true) ``` The output of the optimized plan will change from: ``` Aggregate [spark33084(cast(col1#223 as bigint), org.apache.spark.examples.sql.Spark330846906be0f, 1, 1) AS spark33084(col1)apache#237] +- LocalRelation [col1#223] ``` to ``` Aggregate [test_udf(cast(col1#223 as bigint), org.apache.spark.examples.sql.Spark330847a62d697, 1, 1, Some(test_udf)) AS test_udf(col1)apache#237] +- LocalRelation [col1#223] ``` ### How was this patch tested? Added new tests. Closes apache#31500 from imback82/udaf_name. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
It seems this PR already has been merged, so I'll close this. |
What changes were proposed in this pull request?
This PR proposes to propagate the name used for registering UDFs to
ScalaUDF
,ScalaUDAF
andScaalAggregator
.Note that
PythonUDF
gets the name correctly:spark/python/pyspark/sql/udf.py
Lines 358 to 359 in 466c045
, and same for Hive UDFs:
spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
Line 67 in 466c045
Why are the changes needed?
This PR can help in the following scenarios:
def name: String
toUserDefinedExpression
, we can match an expression byUserDefinedExpression
and look up the catalog, an use case needed for [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase #31273.Does this PR introduce any user-facing change?
The EXPLAIN output involving udfs will be changed to use the name used for UDF registration.
For example, for the following:
The output of the optimized plan will change from:
to
How was this patch tested?
Added new tests.