-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: Iteration should always yield a python scalar #17491
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -188,6 +188,53 @@ the target. Now, a ``ValueError`` will be raised when such an input is passed in | |
... | ||
ValueError: Cannot operate inplace if there is no assignment | ||
|
||
.. _whatsnew_0210.api_breaking.iteration_scalars: | ||
|
||
Iteration of Series/Index will now return python scalars | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Previously, when using certain iteration methods for a ``Series`` with dtype ``int`` or ``float``, you would receive a ``numpy`` scalar, e.g. a ``np.int64``, rather than a python ``int``. Issue (:issue:`10904`) corrected this for ``Series.tolist()`` and ``list(Series)``. This change makes all iteration methods consistent, in particular, for ``__iter__()`` and ``.map()``; note that this only affect int/float dtypes. (:issue:`13236`, :issue:`13258`, :issue:`14216`). | ||
|
||
.. ipython:: python | ||
|
||
s = Series([1, 2, 3]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Series -> pd.Series |
||
s | ||
|
||
Previously: | ||
|
||
.. code-block:: python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. python -> ipython |
||
|
||
In [2]: type(list(s)[0]) | ||
Out[2]: numpy.int64 | ||
|
||
New Behaviour: | ||
|
||
.. ipython:: python | ||
|
||
type(list(s)[0]) | ||
|
||
Furthermore this will now correctly box the results of iteration for :func:`DataFrame.to_dict` as well. | ||
|
||
.. ipython:: python | ||
|
||
d = {'a':[1], 'b':['b']} | ||
df = DataFrame(d) | ||
|
||
Previously: | ||
|
||
.. code-block:: python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. python -> ipython |
||
|
||
In [8]: type(df.to_dict()['a'][0]) | ||
Out[8]: numpy.int64 | ||
|
||
New Behaviour: | ||
|
||
.. ipython:: python | ||
|
||
type(df.to_dict()['a'][0]) | ||
|
||
.. _whatsnew_0210.api_breaking.dtype_conversions: | ||
|
||
Dtype Conversions | ||
^^^^^^^^^^^^^^^^^ | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,12 @@ | |
|
||
from pandas.core.dtypes.missing import isna | ||
from pandas.core.dtypes.generic import ABCDataFrame, ABCSeries, ABCIndexClass | ||
from pandas.core.dtypes.common import is_object_dtype, is_list_like, is_scalar | ||
from pandas.core.dtypes.common import ( | ||
is_object_dtype, | ||
is_list_like, | ||
is_scalar, | ||
is_datetimelike) | ||
|
||
from pandas.util._validators import validate_bool_kwarg | ||
|
||
from pandas.core import common as com | ||
|
@@ -18,7 +23,8 @@ | |
from pandas.compat import PYPY | ||
from pandas.util._decorators import (Appender, cache_readonly, | ||
deprecate_kwarg, Substitution) | ||
from pandas.core.common import AbstractMethodError | ||
from pandas.core.common import AbstractMethodError, _maybe_box_datetimelike | ||
|
||
from pandas.core.accessor import DirNamesMixin | ||
|
||
_shared_docs = dict() | ||
|
@@ -884,6 +890,21 @@ def argmin(self, axis=None): | |
""" | ||
return nanops.nanargmin(self.values) | ||
|
||
def tolist(self): | ||
""" | ||
return a list of the values; box to scalars | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you put a bit more explanation that python scalar types are returned? |
||
""" | ||
return list(self.__iter__()) | ||
|
||
def __iter__(self): | ||
""" | ||
provide iteration over the values; box to scalars | ||
""" | ||
if is_datetimelike(self): | ||
return (_maybe_box_datetimelike(x) for x in self._values) | ||
else: | ||
return iter(self._values.tolist()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
|
||
@cache_readonly | ||
def hasnans(self): | ||
""" return if I have any nans; enables various perf speedups """ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -253,6 +253,10 @@ def get_values(self): | |
""" return the underlying data as an ndarray """ | ||
return self._data.get_values() | ||
|
||
def __iter__(self): | ||
""" iterate like Categorical """ | ||
return self._data.__iter__() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this feels not clean. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed up |
||
|
||
@property | ||
def codes(self): | ||
return self._data.codes | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,9 +13,10 @@ | |
is_object_dtype, is_datetimetz, | ||
needs_i8_conversion) | ||
import pandas.util.testing as tm | ||
from pandas import (Series, Index, DatetimeIndex, TimedeltaIndex, PeriodIndex, | ||
Timedelta, IntervalIndex, Interval) | ||
from pandas.compat import StringIO, PYPY | ||
from pandas import (Series, Index, DatetimeIndex, TimedeltaIndex, | ||
PeriodIndex, Timedelta, IntervalIndex, Interval, | ||
CategoricalIndex, Timestamp) | ||
from pandas.compat import StringIO, PYPY, long | ||
from pandas.compat.numpy import np_array_datetime64_compat | ||
from pandas.core.base import PandasDelegate, NoNewAttributesMixin | ||
from pandas.core.indexes.datetimelike import DatetimeIndexOpsMixin | ||
|
@@ -433,7 +434,7 @@ def test_value_counts_unique_nunique(self): | |
# datetimetz Series returns array of Timestamp | ||
assert result[0] == orig[0] | ||
for r in result: | ||
assert isinstance(r, pd.Timestamp) | ||
assert isinstance(r, Timestamp) | ||
tm.assert_numpy_array_equal(result, | ||
orig._values.asobject.values) | ||
else: | ||
|
@@ -1031,3 +1032,73 @@ def f(): | |
|
||
pytest.raises(AttributeError, f) | ||
assert not hasattr(t, "b") | ||
|
||
|
||
class TestToIterable(object): | ||
# test that we convert an iterable to python types | ||
|
||
dtypes = [ | ||
('int8', (int, long)), | ||
('int16', (int, long)), | ||
('int32', (int, long)), | ||
('int64', (int, long)), | ||
('uint8', (int, long)), | ||
('uint16', (int, long)), | ||
('uint32', (int, long)), | ||
('uint64', (int, long)), | ||
('float16', float), | ||
('float32', float), | ||
('float64', float), | ||
('datetime64[ns]', Timestamp), | ||
('datetime64[ns, US/Eastern]', Timestamp), | ||
('timedelta64[ns]', Timedelta)] | ||
|
||
@pytest.mark.parametrize( | ||
'dtype, rdtype', | ||
dtypes + [ | ||
('object', object), | ||
('category', object)]) | ||
@pytest.mark.parametrize( | ||
'method', | ||
[ | ||
lambda x: x.tolist(), | ||
lambda x: list(x), | ||
lambda x: list(x.__iter__()), | ||
], ids=['tolist', 'list', 'iter']) | ||
@pytest.mark.parametrize('typ', [Series, Index]) | ||
def test_iterable(self, typ, method, dtype, rdtype): | ||
# gh-10904 | ||
# gh-13258 | ||
# coerce iteration to underlying python / pandas types | ||
s = typ([1], dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make a separate construction for object/category, because the test is not ensuring this is correct. For example on master, a categorical series of integers will box to np.int64, but a np.int64 scalar passes the |
||
result = method(s)[0] | ||
assert isinstance(result, rdtype) | ||
|
||
@pytest.mark.parametrize( | ||
'dtype, rdtype', | ||
dtypes + [ | ||
('object', (int, long)), | ||
('category', (int, long))]) | ||
@pytest.mark.parametrize('typ', [Series, Index]) | ||
def test_iterable_map(self, typ, dtype, rdtype): | ||
# gh-13236 | ||
# coerce iteration to underlying python / pandas types | ||
s = typ([1], dtype=dtype) | ||
result = s.map(type)[0] | ||
if not isinstance(rdtype, tuple): | ||
rdtype = tuple([rdtype]) | ||
assert result in rdtype | ||
|
||
@pytest.mark.parametrize( | ||
'method', | ||
[ | ||
lambda x: x.tolist(), | ||
lambda x: list(x), | ||
lambda x: list(x.__iter__()), | ||
], ids=['tolist', 'list', 'iter']) | ||
def test_categorial_datetimelike(self, method): | ||
i = CategoricalIndex([Timestamp('1999-12-31'), | ||
Timestamp('2000-12-31')]) | ||
|
||
result = method(i)[0] | ||
assert isinstance(result, Timestamp) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add test for Series.iteritems and DataFrame.itertuples/iterrows as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these are in tests/frame/test_api.py already There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but you said this PR changed the behaviour of itertuples ? (#13468 (comment)) Then we should have a test for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and that's all well tested see There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, missed that one-line change in existing test in the diff. Thanks for the clarification! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only affect -> this only affects