Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40500][PS] Deprecate iteritems in DataFrame and Seriese #37947

Closed

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Sep 20, 2022

What changes were proposed in this pull request?

  1. Use pd.items instead of pd.iteritems
  2. Deprecate ps.iteritems

Why are the changes needed?

pd.iteritems is deprecated in 1.5

before:

In [4]: import pyspark.pandas as ps

In [5]: ps.Series([3, 4, 1, 1, 5])
/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  fields = [
/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  for column, series in pdf.iteritems():
                                                                                
0    3
1    4
2    1
3    1
4    5
dtype: int64

after:

In [1]: import pyspark.pandas as ps

In [2]: ps.Series([3, 4, 1, 1, 5])
                                                                                
0    3
1    4
2    1
3    1
4    5
dtype: int64

Does this PR introduce any user-facing change?

Eliminate iteritems warnings

How was this patch tested?

existing UT

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK, but there also seem to be more usages in the code, like in frame.py and test_dataframe.py - are those usages the same, also deprecated?

@zhengruifeng
Copy link
Contributor Author

remaining iteritemss in frame.py and test_dataframe.py are the definition and tests of PS's iteritems itself, so I think we should not modify them.

as to the deprecation of PS's iteritems, I think we can deprecate them now, WDYT @itholic @HyukjinKwon @Yikun

@HyukjinKwon
Copy link
Member

Yeah let's match w/ pandas

Copy link
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as to the deprecation of PS's iteritems, I think we can deprecate them now

Agree. And items are also supported by _minimum_pandas_version (v1.0.5), so it's a safe change.

>>> pd.__version__
'1.0.5'
>>> pd.Series([3, 4, 1, 1, 5]).iteritems()
<zip object at 0x7f2087983280>
>>> pd.Series([3, 4, 1, 1, 5]).items()
<zip object at 0x7f2087afcb80>

so LGTM

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe should we also add .. deprecated:: 3.4.0 into the docstring of DataFrame.iteritems and Series.iteritems ??

Otherwise, LGTM.

@zhengruifeng zhengruifeng changed the title [SPARK-40500][PS] Use pd.items instead of pd.iteritems [SPARK-40500][PS] Deprecate iteritems in DataFrame and Seriese Sep 21, 2022
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the ps_iteritems_to_items branch September 21, 2022 02:09
@zhengruifeng
Copy link
Contributor Author

Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants