forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup
Add `applyInArrow` method to PySpark `groupBy` and `groupBy.cogroup` to allow for user functions that work on Arrow. Similar to existing `mapInArrow`. PySpark allows to transform a `DataFrame` via Pandas and Arrow API: ``` df.mapInArrow(map_arrow, schema="...") df.mapInPandas(map_pandas, schema="...") ``` For `df.groupBy(...)` and `df.groupBy(...).cogroup(...)`, there is only a Pandas interface, no Arrow interface: ``` df.groupBy("id").applyInPandas(apply_pandas, schema="...") ``` Providing a pure Arrow interface allows user code to use **any** Arrow-based data framework, not only Pandas, e.g. Polars: ``` def apply_polars(df: polars.DataFrame) -> polars.DataFrame: return df def apply_arrow(table: pyarrow.Table) -> pyarrow.Table: df = polars.from_arrow(table) return apply_polars(df).to_arrow() df.groupBy("id").applyInArrow(apply_arrow, schema="...") ``` This adds method `applyInPandas` to PySpark `groupBy` and `groupBy.cogroup`. Tested with unit tests. Closes apache#38624 from EnricoMi/branch-pyspark-grouped-apply-in-arrow. Authored-by: Enrico Minack <github@enrico.minack.dev> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information
1 parent
99d6f42
commit c117bb0
Showing
22 changed files
with
1,668 additions
and
125 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.