Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27893][SQL][PYTHON][FOLLOW-UP] Allow Scalar Pandas and Python UDFs can be tested with Scala test base #24945

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

After this PR, we can test Pandas and Python UDF as below in Scala side:

import IntegratedUDFTestUtils._
val pandasTestUDF = TestScalarPandasUDF("udf")
spark.range(10).select(pandasTestUDF($"id")).show()

How was this patch tested?

Manually tested.

@HyukjinKwon
Copy link
Member Author

cc @viirya, @dongjoon-hyun, @BryanCutler

sys.props("spark.test.home")
assert(sys.props.contains("spark.test.home") ||
sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not set.")
sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for IDE case. spark.test.home can be missing if we run the tests in IDE without any other settings. In that case, it falls back to SPARK_HOME.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment for this reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I missed this. Actually there are multiple places like this. Let me fix them together later separately.

@SparkQA
Copy link

SparkQA commented Jun 24, 2019

Test build #106814 has finished for PR 24945 at commit 2e939f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait TestUDF

* }}}
*
* To use it in Scala API and SQL:
* {{{
* sql("SELECT udf_name(1)")
* spark.select(expr("udf_name(1)")
* spark.range(10).select(expr("udf_name(id)")
* spark.range(10).select(pandasTestUDF($"id"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we use it? In SQLQueryTestSuite, I think udfs are all registered for UDFTestCase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this one will be used at #24946

@SparkQA
Copy link

SparkQA commented Jun 24, 2019

Test build #106843 has finished for PR 24945 at commit 8fe2474.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Thank you @viirya. This is not invasive at all. Let me merge it.

Merged to master.

kiku-jw pushed a commit to kiku-jw/spark that referenced this pull request Jun 26, 2019
…UDFs can be tested with Scala test base

## What changes were proposed in this pull request?

After this PR, we can test Pandas and Python UDF as below **in Scala side**:

```scala
import IntegratedUDFTestUtils._
val pandasTestUDF = TestScalarPandasUDF("udf")
spark.range(10).select(pandasTestUDF($"id")).show()
```

## How was this patch tested?

Manually tested.

Closes apache#24945 from HyukjinKwon/SPARK-27893-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@BryanCutler
Copy link
Member

Late +1, very nice!

HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 15, 2019
…h in EpochTracker (to support Python UDFs)

This PR proposes to use `InheritableThreadLocal` instead of `ThreadLocal` for current epoch in `EpochTracker`. Python UDF needs threads to write out to and read it from Python processes and when there are new threads, previously set epoch is lost.

After this PR, Python UDFs can be used at Structured Streaming with the continuous mode.

The test cases were written on the top of apache#24945.
Unit tests were added.

Manual tests.

Closes apache#24946 from HyukjinKwon/SPARK-27234.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the SPARK-27893-followup branch March 3, 2020 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants