Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-51112][CONNECT] Avoid using pyarrow's
to_pandas
on an empty …
…table ### What changes were proposed in this pull request? When the `pyarrow` table is empty, avoid calling the `to_pandas` method due to potential segfault failures. Instead, an empty pandas dataframe is created manually. ### Why are the changes needed? Consider the following code: ```python from pyspark.sql.types import StructField, ArrayType, StringType, StructType, IntegerType import faulthandler faulthandler.enable() spark = SparkSession.builder \ .remote("sc://localhost:15002") \ .getOrCreate() sp_df = spark.createDataFrame( data = [], schema=StructType( [ StructField( name='b_int', dataType=IntegerType(), nullable=False, ), StructField( name='b', dataType=ArrayType(ArrayType(StringType(), True), True), nullable=True, ), ] ) ) print(sp_df) print('Spark dataframe generated.') print(sp_df.toPandas()) print('Pandas dataframe generated.') ``` Executing this may lead to a segfault when the line `sp_df.toPandas()` is run. Example: ``` Thread 0x00000001f1904f40 (most recent call first): File "/Users/venkata.gudesa/spark/test_env/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 808 in table_to_dataframe File "/Users/venkata.gudesa/spark/test_env/lib/python3.13/site-packages/pyspark/sql/connect/client/core.py", line 949 in to_pandas File "/Users/venkata.gudesa/spark/test_env/lib/python3.13/site-packages/pyspark/sql/connect/dataframe.py", line 1857 in toPandas File "<python-input-3>", line 1 in <module> File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/code.py", line 92 in runcode File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/console.py", line 205 in runsource File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/code.py", line 313 in push File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/simple_interact.py", line 160 in run_multiline_interactive_console File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/main.py", line 59 in interactive_console File "/opt/homebrew/Cellar/python3.13/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/__main__.py", line 6 in <module> File "<frozen runpy>", line 88 in _run_code ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49834 from vicennial/SPARK-51112. Lead-authored-by: vicennial <venkata.gudesa@databricks.com> Co-authored-by: Venkata Sai Akhil Gudesa <gvs.akhil1997@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information