-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21394][SPARK-21432][PYTHON] Reviving callable object/partial function support in UDF in PySpark #18615
Conversation
cc @holdenk, could you take a look when you have some time? |
Test build #79572 has finished for PR 18615 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this and the quick fix. One suggestion that we should leave a comment about the intention behind what we are doing but otherwise looks reasonable.
python/pyspark/sql/functions.py
Outdated
@@ -2087,10 +2087,13 @@ def _wrapped(self): | |||
""" | |||
Wrap this udf with a function and attach docstring from func | |||
""" | |||
@functools.wraps(self.func) | |||
assignments = tuple(a for a in functools.WRAPPER_ASSIGNMENTS if a != "__name__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put a comment here saying what this is for just so that when we see this next year we don't forget why we stripped out name from the things we asked functools to assign.
Sure, thanks @holdenk. I just address your comments. |
Test build #79582 has finished for PR 18615 at commit
|
(gentle ping @holdenk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM will merge tomorrow unless anyone else has anything to say.
python/pyspark/sql/functions.py
Outdated
@functools.wraps(self.func) | ||
|
||
# It is possible for a callable instance without __name__ attribute or/and | ||
# __module__ attribute to be wrapped here For example, functools.partial. In this case, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here For example -> here. For example
Or:
here For example -> here, for example
LGTM |
Test build #79653 has finished for PR 18615 at commit
|
Test build #79654 has finished for PR 18615 at commit
|
The update looks good to me. I'll merge this to master. |
Merged into master. |
Resolve issues with udf functions and nullability. Backport apache#18615 Eliminate whitespace diffs.
Resolve issues with udf functions and nullability. Backport apache#18615 Eliminate whitespace diffs.
What changes were proposed in this pull request?
This PR proposes to avoid
__name__
in the tuple naming the attributes assigned directly from the wrapped function to the wrapper function, and useself._name
(func.__name__
orobj.__class__.name__
).After SPARK-19161, we happened to break callable objects as UDFs in Python as below:
This worked in Spark 2.1:
After
In addition, we also happened to break partial functions as below:
This worked in Spark 2.1:
After
How was this patch tested?
Unit tests in
python/pyspark/sql/tests.py
and manual tests.