-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27893][SQL][PYTHON][FOLLOW-UP] Allow Scalar Pandas and Python UDFs can be tested with Scala test base #24945
Conversation
sys.props("spark.test.home") | ||
assert(sys.props.contains("spark.test.home") || | ||
sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not set.") | ||
sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for IDE case. spark.test.home
can be missing if we run the tests in IDE without any other settings. In that case, it falls back to SPARK_HOME
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment for this reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I missed this. Actually there are multiple places like this. Let me fix them together later separately.
Test build #106814 has finished for PR 24945 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala
Outdated
Show resolved
Hide resolved
* }}} | ||
* | ||
* To use it in Scala API and SQL: | ||
* {{{ | ||
* sql("SELECT udf_name(1)") | ||
* spark.select(expr("udf_name(1)") | ||
* spark.range(10).select(expr("udf_name(id)") | ||
* spark.range(10).select(pandasTestUDF($"id")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we use it? In SQLQueryTestSuite
, I think udfs are all registered for UDFTestCase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this one will be used at #24946
Test build #106843 has finished for PR 24945 at commit
|
Thank you @viirya. This is not invasive at all. Let me merge it. Merged to master. |
…UDFs can be tested with Scala test base ## What changes were proposed in this pull request? After this PR, we can test Pandas and Python UDF as below **in Scala side**: ```scala import IntegratedUDFTestUtils._ val pandasTestUDF = TestScalarPandasUDF("udf") spark.range(10).select(pandasTestUDF($"id")).show() ``` ## How was this patch tested? Manually tested. Closes apache#24945 from HyukjinKwon/SPARK-27893-followup. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Late +1, very nice! |
…h in EpochTracker (to support Python UDFs) This PR proposes to use `InheritableThreadLocal` instead of `ThreadLocal` for current epoch in `EpochTracker`. Python UDF needs threads to write out to and read it from Python processes and when there are new threads, previously set epoch is lost. After this PR, Python UDFs can be used at Structured Streaming with the continuous mode. The test cases were written on the top of apache#24945. Unit tests were added. Manual tests. Closes apache#24946 from HyukjinKwon/SPARK-27234. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
After this PR, we can test Pandas and Python UDF as below in Scala side:
How was this patch tested?
Manually tested.