Skip to content

Commit

Permalink
Clarify the length of each series is of each batch within scalar Pand…
Browse files Browse the repository at this point in the history
…as UDF
  • Loading branch information
HyukjinKwon committed Jan 11, 2018
1 parent b46e58b commit d2cfed3
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2184,6 +2184,11 @@ def pandas_udf(f=None, returnType=None, functionType=None):
| 8| JOHN DOE| 22|
+----------+--------------+------------+
.. note:: The length of `pandas.Series` within a scalar UDF is not of the whole input column
but of the batch internally used, and it is called for each batch. Therefore,
this can be used, for example, to ensure the length of each returned `pandas.Series`
but should not be used as the length of the whole input.
2. GROUP_MAP
A group map UDF defines transformation: A `pandas.DataFrame` -> A `pandas.DataFrame`
Expand Down

0 comments on commit d2cfed3

Please sign in to comment.