Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: Iteration should always yield a python scalar #17491

Merged
merged 1 commit into from
Sep 12, 2017

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Sep 10, 2017

xref #10904
closes #13236
closes #13258
xref #14216

@jreback jreback added API Design Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions labels Sep 10, 2017
@jreback jreback added this to the 0.21.0 milestone Sep 10, 2017
@codecov
Copy link

codecov bot commented Sep 11, 2017

Codecov Report

Merging #17491 into master will increase coverage by <.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17491      +/-   ##
==========================================
+ Coverage   91.15%   91.15%   +<.01%     
==========================================
  Files         163      163              
  Lines       49534    49540       +6     
==========================================
+ Hits        45153    45160       +7     
+ Misses       4381     4380       -1
Flag Coverage Δ
#multiple 88.94% <94.73%> (+0.02%) ⬆️
#single 40.22% <57.89%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 94.92% <ø> (-0.03%) ⬇️
pandas/core/indexes/base.py 96.28% <ø> (-0.01%) ⬇️
pandas/core/base.py 96.01% <100%> (+0.05%) ⬆️
pandas/core/indexes/category.py 98.54% <100%> (ø) ⬆️
pandas/core/categorical.py 95.51% <100%> (+0.01%) ⬆️
pandas/core/sparse/array.py 91.3% <85.71%> (-0.12%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.43% <0%> (-0.1%) ⬇️
pandas/plotting/_converter.py 65.05% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6aed2e...2ebcbfc. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 11, 2017

Codecov Report

Merging #17491 into master will increase coverage by 0.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17491      +/-   ##
==========================================
+ Coverage   91.15%   91.16%   +0.01%     
==========================================
  Files         163      163              
  Lines       49534    49543       +9     
==========================================
+ Hits        45153    45168      +15     
+ Misses       4381     4375       -6
Flag Coverage Δ
#multiple 88.95% <94.73%> (+0.03%) ⬆️
#single 40.21% <57.89%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 94.92% <ø> (-0.03%) ⬇️
pandas/core/indexes/base.py 96.28% <ø> (-0.01%) ⬇️
pandas/core/indexes/category.py 98.54% <100%> (ø) ⬆️
pandas/core/categorical.py 95.51% <100%> (+0.01%) ⬆️
pandas/core/base.py 96.01% <100%> (+0.05%) ⬆️
pandas/core/sparse/array.py 91.3% <85.71%> (-0.12%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.05%) ⬇️
pandas/core/groupby.py 92.22% <0%> (+0.01%) ⬆️
pandas/core/reshape/pivot.py 96.35% <0%> (+0.99%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a84274...05f8a6f. Read the comment docs.

@jreback jreback force-pushed the map branch 4 times, most recently from c0fd989 to 6a02e4f Compare September 11, 2017 11:13
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

(I know you didn't directly merge it but only after a couple of days, but I think for such api changes we should wait until at least some other core dev had the time to review, or explicitly ping us)


Previously:

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python -> ipython


.. ipython:: python

s = Series([1, 2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series -> pd.Series

Iteration of Series/Index will now return python scalars
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, when using certain iteration methods for a ``Series`` with dtype ``int`` or ``float``, you would receive a ``numpy`` scalar, e.g. a ``np.int64``, rather than a python ``int``. Issue (:issue:`10904`) corrected this for ``Series.tolist()`` and ``list(Series)``. This change makes all iteration methods consistent, in particular, for ``__iter__()`` and ``.map()``; note that this only affect int/float dtypes. (:issue:`13236`, :issue:`13258`, :issue:`14216`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only affect -> this only affects


Previously:

.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python -> ipython

@@ -884,6 +890,21 @@ def argmin(self, axis=None):
"""
return nanops.nanargmin(self.values)

def tolist(self):
"""
return a list of the values; box to scalars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a bit more explanation that python scalar types are returned?

if is_datetimelike(self):
return (_maybe_box_datetimelike(x) for x in self._values)
else:
return iter(self._values.tolist())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tolist implementation, this seems a bit double work: values are converted to list, then iterateted over, and then again converted to list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -253,6 +253,10 @@ def get_values(self):
""" return the underlying data as an ndarray """
return self._data.get_values()

def __iter__(self):
""" iterate like Categorical """
return self._data.__iter__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels not clean. The tolist of Categorical should already ensure this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed up

# gh-10904
# gh-13258
# coerce iteration to underlying python / pandas types
s = typ([1], dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a separate construction for object/category, because the test is not ensuring this is correct. For example on master, a categorical series of integers will box to np.int64, but a np.int64 scalar passes the ininstance(.., object) test

Timestamp('2000-12-31')])

result = method(i)[0]
assert isinstance(result, Timestamp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add test for Series.iteritems and DataFrame.itertuples/iterrows as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are in tests/frame/test_api.py already

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you said this PR changed the behaviour of itertuples ? (#13468 (comment)) Then we should have a test for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that's all well tested see
frame/test_api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, missed that one-line change in existing test in the diff. Thanks for the clarification!

jreback added a commit to jreback/pandas that referenced this pull request Sep 12, 2017
jreback added a commit to jreback/pandas that referenced this pull request Sep 13, 2017
jreback added a commit to jreback/pandas that referenced this pull request Sep 13, 2017
jreback added a commit that referenced this pull request Sep 13, 2017
jreback pushed a commit that referenced this pull request Sep 17, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COMPAT: box int/floats in __iter__ COMPAT: .map iterates over python types rather than storage type
2 participants