-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45065][PYTHON][PS] Support Pandas 2.1.0 #42793
Closed
Closed
Changes from 6 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
bf79e7a
[SPARK-45065][PYTHON][PS] Support Pandas 2.1.0
itholic e81a97a
Fix tests
itholic 246d2a0
Merge branch 'master' of https://github.com/apache/spark into pandas_…
itholic f874b85
Respect as_index=False when given funcs is a type of list
itholic 49c5c5d
Apply the Pandas 2.1.0 changes
itholic 7184a3b
Merge branch 'master' of https://github.com/apache/spark into pandas_…
itholic 15c5aa7
Fix ordering for stack
itholic 5dbf456
Added migration guide
itholic 2a17d1d
Deprecate all features from Pandas 2.1.0.
itholic f48215c
Merge branch 'master' of https://github.com/apache/spark into pandas_…
itholic dc68af1
Fix linter
itholic 4e89cf5
fix test
itholic a585fbe
Fix linter
itholic 1831923
fix
zhengruifeng 91b865c
resolve conflicts
itholic 76433d0
Merge branch 'master' of https://github.com/apache/spark into pandas_…
itholic afecab9
Retrigger the CI
itholic 6ff4df2
replace the import
itholic 5a0fe26
revert unnecess change
itholic bba34a0
fix linter
itholic f66d824
Merge branch 'master' of https://github.com/apache/spark into pandas_…
itholic 21a7dfe
Remove circular import
itholic 2323237
resolve conflicts
itholic 0c07f55
resolve conflicts
itholic 5c054ec
resolve conflicts
itholic 1cb9df4
do not call applymap
itholic 0e8ea3b
fix linter
itholic 46cd7dd
Recommend to use Pandas 2.0.0 and above
itholic 357fbce
fix linter
itholic 76f0720
resolve conflicts
itholic cf54c67
resolve conflicts
itholic 0008089
Import
itholic 5723b6c
Merge branch 'master' of https://github.com/apache/spark into pandas_…
itholic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,11 +20,6 @@ | |
import numpy as np | ||
import pandas as pd | ||
|
||
try: | ||
from pandas._testing import makeMissingDataframe | ||
except ImportError: | ||
from pandas.util.testing import makeMissingDataframe | ||
|
||
from pyspark import pandas as ps | ||
from pyspark.pandas.config import option_context | ||
from pyspark.testing.pandasutils import PandasOnSparkTestCase, SPARK_CONF_ARROW_ENABLED | ||
|
@@ -273,7 +268,18 @@ def test_skew_kurt_numerical_stability(self): | |
self.assert_eq(psdf.kurt(), pdf.kurt(), almost=True) | ||
|
||
def test_dataframe_corr(self): | ||
pdf = makeMissingDataframe(0.3, 42) | ||
pdf = pd.DataFrame( | ||
index=[ | ||
"".join( | ||
np.random.choice( | ||
list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"), 10 | ||
) | ||
) | ||
for _ in range(30) | ||
], | ||
columns=list("ABCD"), | ||
dtype="float64", | ||
) | ||
Comment on lines
-276
to
+282
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The testing util |
||
psdf = ps.from_pandas(pdf) | ||
|
||
with self.assertRaisesRegex(ValueError, "Invalid method"): | ||
|
@@ -347,7 +353,18 @@ def test_dataframe_corr(self): | |
) | ||
|
||
def test_series_corr(self): | ||
pdf = makeMissingDataframe(0.3, 42) | ||
pdf = pd.DataFrame( | ||
index=[ | ||
"".join( | ||
np.random.choice( | ||
list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"), 10 | ||
) | ||
) | ||
for _ in range(30) | ||
], | ||
columns=list("ABCD"), | ||
dtype="float64", | ||
) | ||
pser1 = pdf.A | ||
pser2 = pdf.B | ||
psdf = ps.from_pandas(pdf) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -487,23 +487,23 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType, ScalarTyp | |
... pass | ||
>>> inferred = infer_return_type(func) | ||
>>> inferred.dtypes | ||
[dtype('int64'), CategoricalDtype(categories=[3, 4, 5], ordered=False)] | ||
[dtype('int64'), CategoricalDtype(categories=[3, 4, 5], ordered=False, categories_dtype=int64)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
>>> inferred.spark_type | ||
StructType([StructField('c0', LongType(), True), StructField('c1', LongType(), True)]) | ||
|
||
>>> def func() -> ps.DataFrame[zip(pdf.columns, pdf.dtypes)]: | ||
... pass | ||
>>> inferred = infer_return_type(func) | ||
>>> inferred.dtypes | ||
[dtype('int64'), CategoricalDtype(categories=[3, 4, 5], ordered=False)] | ||
[dtype('int64'), CategoricalDtype(categories=[3, 4, 5], ordered=False, categories_dtype=int64)] | ||
>>> inferred.spark_type | ||
StructType([StructField('a', LongType(), True), StructField('b', LongType(), True)]) | ||
|
||
>>> def func() -> ps.Series[pdf.b.dtype]: | ||
... pass | ||
>>> inferred = infer_return_type(func) | ||
>>> inferred.dtype | ||
CategoricalDtype(categories=[3, 4, 5], ordered=False) | ||
CategoricalDtype(categories=[3, 4, 5], ordered=False, categories_dtype=int64) | ||
>>> inferred.spark_type | ||
LongType() | ||
|
||
|
@@ -521,7 +521,8 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType, ScalarTyp | |
... pass | ||
>>> inferred = infer_return_type(func) | ||
>>> inferred.dtypes | ||
[dtype('int64'), dtype('int64'), CategoricalDtype(categories=[3, 4, 5], ordered=False)] | ||
[dtype('int64'), dtype('int64'), | ||
CategoricalDtype(categories=[3, 4, 5], ordered=False, categories_dtype=int64)] | ||
>>> inferred.spark_type.simpleString() | ||
'struct<__index_level_0__:bigint,c0:bigint,c1:bigint>' | ||
>>> inferred.index_fields | ||
|
@@ -533,7 +534,8 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType, ScalarTyp | |
... pass | ||
>>> inferred = infer_return_type(func) | ||
>>> inferred.dtypes | ||
[CategoricalDtype(categories=[3, 4, 5], ordered=False), dtype('int64'), dtype('int64')] | ||
[CategoricalDtype(categories=[3, 4, 5], ordered=False, categories_dtype=int64), | ||
dtype('int64'), dtype('int64')] | ||
>>> inferred.spark_type.simpleString() | ||
'struct<index:bigint,id:bigint,A:bigint>' | ||
>>> inferred.index_fields | ||
|
@@ -544,7 +546,8 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType, ScalarTyp | |
... pass | ||
>>> inferred = infer_return_type(func) | ||
>>> inferred.dtypes | ||
[dtype('int64'), dtype('int64'), CategoricalDtype(categories=[3, 4, 5], ordered=False)] | ||
[dtype('int64'), dtype('int64'), | ||
CategoricalDtype(categories=[3, 4, 5], ordered=False, categories_dtype=int64)] | ||
>>> inferred.spark_type.simpleString() | ||
'struct<__index_level_0__:bigint,a:bigint,b:bigint>' | ||
>>> inferred.index_fields | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fixed in Pandas: pandas-dev/pandas#52849.