Clarify the length of each series is of each batch within scalar Pand…

…as UDF
apache · Jan 11, 2018 · d2cfed3 · d2cfed3
1 parent b46e58b
commit d2cfed3
Showing 1 changed file with 5 additions and 0 deletions.
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
@@ -2184,6 +2184,11 @@ def pandas_udf(f=None, returnType=None, functionType=None):
        |         8|      JOHN DOE|          22|
        +----------+--------------+------------+
 
+       .. note:: The length of `pandas.Series` within a scalar UDF is not of the whole input column
+           but of the batch internally used, and it is called for each batch. Therefore,
+           this can be used, for example, to ensure the length of each returned `pandas.Series`
+           but should not be used as the length of the whole input.
+
     2. GROUP_MAP
 
        A group map UDF defines transformation: A `pandas.DataFrame` -> A `pandas.DataFrame`