[SPARK-8478][SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf #6920

BenFradet · 2015-06-20T15:53:30Z

Follow-up of #6902 for being coherent between Udf and UDF

SparkQA · 2015-06-20T18:02:29Z

Test build #35364 has finished for PR 6920 at commit 660b669.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-06-23T20:43:33Z

pinging @marmbrus
PR we talked about on the JIRA for SPARK-8356.

SparkQA · 2015-06-23T21:35:10Z

Test build #35589 has finished for PR 6920 at commit ef53470.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-06-23T22:32:58Z

Jenkins, retest this please.

SparkQA · 2015-06-23T22:55:57Z

Test build #35606 has finished for PR 6920 at commit ef53470.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-06-24T07:57:03Z

Jenkins, retest this please (#6974)

SparkQA · 2015-06-24T08:11:35Z

Test build #35663 has finished for PR 6920 at commit ef53470.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-29T19:03:25Z

Test build #36009 has finished for PR 6920 at commit c500f29.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-06-29T19:15:52Z

Jenkins, retest this please.

SparkQA · 2015-06-29T21:25:28Z

Test build #36017 has finished for PR 6920 at commit c500f29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2015-06-29T22:27:19Z

Thanks, merged to master.

…sure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action ### What changes were proposed in this pull request? This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action. ### Why are the changes needed? After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183 iunstead of the sbt `Test/runMain`. `ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions: ``` Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132) at org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151) at org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:322) at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:530) at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:185) at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:133) at org.apache.spark.sql.hive.test.TestHive$.<init>(TestHive.scala:54) at org.apache.spark.sql.hive.test.TestHive$.<clinit>(TestHive.scala:53) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47) at org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.<clinit>(ObjectHashAggregateExecBenchmark.scala:45) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128) at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328) at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91) at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1116) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1125) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.type.TypeFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 40 more ``` Then I found that `originalUDFs` is a unused val in `TestHive` now(SPARK-1251 | #6920 introduced it and become unused after SPARK-20667 | #17908), so this pr remove it from `TestHive` to avoid calling `FunctionRegistry#getFunctionNames`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Run `ObjectHashAggregateExecBenchmark` on Github Action: **Before** https://github.com/LuciferYang/spark/actions/runs/5128228630/jobs/9224706982 <img width="1181" alt="image" src="https://github.com/apache/spark/assets/1475305/02a58e3c-2dad-4ad4-85e4-f8576a5aabed"> **After** https://github.com/LuciferYang/spark/actions/runs/5128227211/jobs/9224704507 <img width="1282" alt="image" src="https://github.com/apache/spark/assets/1475305/27c70ec6-e55d-4a19-a6c3-e892789b97f7"> `ObjectHashAggregateExecBenchmark` run successfully. Closes #41369 from LuciferYang/hive-udf. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>

…sure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action ### What changes were proposed in this pull request? This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action. ### Why are the changes needed? After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183 iunstead of the sbt `Test/runMain`. `ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions: ``` Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132) at org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151) at org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:322) at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:530) at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:185) at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:133) at org.apache.spark.sql.hive.test.TestHive$.<init>(TestHive.scala:54) at org.apache.spark.sql.hive.test.TestHive$.<clinit>(TestHive.scala:53) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47) at org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.<clinit>(ObjectHashAggregateExecBenchmark.scala:45) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128) at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328) at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91) at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1116) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1125) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.type.TypeFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 40 more ``` Then I found that `originalUDFs` is a unused val in `TestHive` now(SPARK-1251 | apache#6920 introduced it and become unused after SPARK-20667 | apache#17908), so this pr remove it from `TestHive` to avoid calling `FunctionRegistry#getFunctionNames`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Run `ObjectHashAggregateExecBenchmark` on Github Action: **Before** https://github.com/LuciferYang/spark/actions/runs/5128228630/jobs/9224706982 <img width="1181" alt="image" src="https://github.com/apache/spark/assets/1475305/02a58e3c-2dad-4ad4-85e4-f8576a5aabed"> **After** https://github.com/LuciferYang/spark/actions/runs/5128227211/jobs/9224704507 <img width="1282" alt="image" src="https://github.com/apache/spark/assets/1475305/27c70ec6-e55d-4a19-a6c3-e892789b97f7"> `ObjectHashAggregateExecBenchmark` run successfully. Closes apache#41369 from LuciferYang/hive-udf. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Yuming Wang <yumwang@ebay.com>

…sure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action ### What changes were proposed in this pull request? This pr remove `originalUDFs` from `TestHive` to ensure `ObjectHashAggregateExecBenchmark` can run successfully on Github Action. ### Why are the changes needed? After SPARK-43225, `org.codehaus.jackson:jackson-mapper-asl` becomes a test scope dependency, so when using GA to run benchmark, it is not in the classpath because GA uses https://github.com/apache/spark/blob/d61c77cac17029ee27319e6b766b48d314a4dd31/.github/workflows/benchmark.yml#L179-L183 iunstead of the sbt `Test/runMain`. `ObjectHashAggregateExecBenchmark` used `TestHive`, and `TestHive` will always call `org.apache.hadoop.hive.ql.exec.FunctionRegistry#getFunctionNames` to init `originalUDFs` before this pr, so when we run `ObjectHashAggregateExecBenchmark` on GitHub Actions, there will be the following exceptions: ``` Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory at org.apache.hadoop.hive.ql.udf.UDFJson.<clinit>(UDFJson.java:64) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132) at org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151) at org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154) at org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:322) at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:530) at org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:185) at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:133) at org.apache.spark.sql.hive.test.TestHive$.<init>(TestHive.scala:54) at org.apache.spark.sql.hive.test.TestHive$.<clinit>(TestHive.scala:53) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.getSparkSession(ObjectHashAggregateExecBenchmark.scala:47) at org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark.$init$(SqlBasedBenchmark.scala:35) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark$.<clinit>(ObjectHashAggregateExecBenchmark.scala:45) at org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark.main(ObjectHashAggregateExecBenchmark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.benchmark.Benchmarks$.$anonfun$main$7(Benchmarks.scala:128) at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328) at org.apache.spark.benchmark.Benchmarks$.main(Benchmarks.scala:91) at org.apache.spark.benchmark.Benchmarks.main(Benchmarks.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1025) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1116) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1125) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.type.TypeFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 40 more ``` Then I found that `originalUDFs` is a unused val in `TestHive` now(SPARK-1251 | apache#6920 introduced it and become unused after SPARK-20667 | apache#17908), so this pr remove it from `TestHive` to avoid calling `FunctionRegistry#getFunctionNames`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GitHub Actions - Run `ObjectHashAggregateExecBenchmark` on Github Action: **Before** https://github.com/LuciferYang/spark/actions/runs/5128228630/jobs/9224706982 <img width="1181" alt="image" src="https://github.com/apache/spark/assets/1475305/02a58e3c-2dad-4ad4-85e4-f8576a5aabed"> **After** https://github.com/LuciferYang/spark/actions/runs/5128227211/jobs/9224704507 <img width="1282" alt="image" src="https://github.com/apache/spark/assets/1475305/27c70ec6-e55d-4a19-a6c3-e892789b97f7"> `ObjectHashAggregateExecBenchmark` run successfully. Closes apache#41369 from LuciferYang/hive-udf. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: Yuming Wang <yumwang@ebay.com> (cherry picked from commit 3472619)

BenFradet added 9 commits June 29, 2015 19:41

renamed ScalaUdf to ScalaUDF

261e6fb

renamed pythonUdfs to pythonUDFs

2a1ca76

renamed Hive UDF related code

8c756f1

renamed ExtractPythonUdfs to ExtractPythonUDFs

e51b9ac

renamed HiveUdfSuite to HiveUDFSuite

c52608d

modified HiveUDFSuite to use only UDF

7738f74

renamed originalUdfs in TestHive to originalUDFs

98696c2

renamed idUdf to idUDF in SQLQuerySuite

8ab0f2d

renamed a few variables in functions to use UDF

c500f29

asfgit closed this in 931da5c Jun 29, 2015

LuciferYang mentioned this pull request May 31, 2023

[SPARK-43868][SQL][TESTS] Remove originalUDFs from TestHive to ensure ObjectHashAggregateExecBenchmark can run successfully on Github Action #41369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8478][SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf #6920

[SPARK-8478][SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf #6920

BenFradet commented Jun 20, 2015

SparkQA commented Jun 20, 2015

BenFradet commented Jun 23, 2015

SparkQA commented Jun 23, 2015

BenFradet commented Jun 23, 2015

SparkQA commented Jun 23, 2015

BenFradet commented Jun 24, 2015

SparkQA commented Jun 24, 2015

SparkQA commented Jun 29, 2015

BenFradet commented Jun 29, 2015

SparkQA commented Jun 29, 2015

marmbrus commented Jun 29, 2015

[SPARK-8478][SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf #6920

[SPARK-8478][SQL] Harmonize UDF-related code to use uniformly UDF instead of Udf #6920

Conversation

BenFradet commented Jun 20, 2015

SparkQA commented Jun 20, 2015

BenFradet commented Jun 23, 2015

SparkQA commented Jun 23, 2015

BenFradet commented Jun 23, 2015

SparkQA commented Jun 23, 2015

BenFradet commented Jun 24, 2015

SparkQA commented Jun 24, 2015

SparkQA commented Jun 29, 2015

BenFradet commented Jun 29, 2015

SparkQA commented Jun 29, 2015

marmbrus commented Jun 29, 2015