-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/BUG: implement __iter__ for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - need to generalize this
also need a whats new note
pandas/core/arrays/integer.py
Outdated
@@ -357,6 +357,13 @@ def __pos__(self): | |||
def __abs__(self): | |||
return type(self)(np.abs(self._data), self._mask) | |||
|
|||
def __iter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually be in our base class so that bool and float will work as well here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lambda s: list(iter(s))[0], | ||
], | ||
) | ||
def test_conversion_methods_return_type_is_native(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paramterize per all Int and UInt types
also add tests for float / bool
may need o move this to the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the tests. Atm they're in:
pandas/tests/arrays/boolean/test_astype.py
pandas/tests/arrays/floating/test_astype.py
pandas/tests/arrays/integer/test_dtypes.py
Do you want them moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/arrays/integer.py
Outdated
@@ -357,6 +357,13 @@ def __pos__(self): | |||
def __abs__(self): | |||
return type(self)(np.abs(self._data), self._mask) | |||
|
|||
def __iter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lambda s: list(iter(s))[0], | ||
], | ||
) | ||
def test_conversion_methods_return_type_is_native(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the tests. Atm they're in:
pandas/tests/arrays/boolean/test_astype.py
pandas/tests/arrays/floating/test_astype.py
pandas/tests/arrays/integer/test_dtypes.py
Do you want them moved?
Getting a doctest failure after this patch: is it ok to change the doc here?
|
#21256 - fixed on current master (I'll add tests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this!
As mentioned in #29738, if we see this as a general rule for EAs, we should also document this expected behaviour in the base class. And maybe also try to add a base extension test (although the expected result is probably not that easy to know, we could at least check it's not a numpy scalar)
lambda s: list(iter(s))[0], | ||
], | ||
) | ||
def test_conversion_methods_return_type_is_native(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could maybe move this test to the common arrays/masked/ section? We should be able to know the expected python class for each of the data types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure. I'll make a separate file there (call it test_conversions.py
maybe)
@@ -118,3 +118,19 @@ def test_astype_object(dtype): | |||
# check exact element types | |||
assert isinstance(result[0], float) | |||
assert result[1] is pd.NA | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah can you move this to the base class to avoid this duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use the data fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved + rewritten with fixtures (though not 100% sure that what I did is what you asked)
assert isinstance(func(s), int) | ||
|
||
|
||
def test_conversion_to_dict_oriented_record_returns_native(any_nullable_int_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments about about moving this then these tests are much simpler
@@ -32,11 +32,13 @@ def test_tolist(self, data): | |||
expected = list(data) | |||
assert result == expected | |||
|
|||
@pytest.mark.skip(reason="Floating precision issues") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this skip? Because the conversion to string is different for numpy scalars vs native scalars?
(the skip should also not be here, I think, but only for those dtypes that are failing them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so. It's specifically that in the construction of expected
expected = pd.Series([str(x) for x in data[:5]], dtype=str)
we get a lot more decimal places. The astype operation works as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what if you remove the str(..)
call around x
? Because by specifying string dtype, we will already use our internal machinery to handle the conversion, which should do it consistently for numpy vs native numbers, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this, problem persists (for both tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth opening an issue about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this. It's surprisingly hard to get a copy-pastable reproducer but the test failure is reliable - will keep looking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a while but I finally got a copy-pastable example:
In [8]: import numpy as np
...: import pandas as pd
...:
...: data = pd.array([0, 0.1, 0.2, 0.3, 0.4, 0.5], dtype="Float32")
...: s = pd.Series(data)
...: res = s.astype(str)
...: expected = pd.Series([x for x in data], dtype=str)
In [9]: expected
Out[9]:
0 0.0
1 0.10000000149011612
2 0.20000000298023224
3 0.30000001192092896
4 0.4000000059604645
5 0.5
dtype: object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only the 32 bit that fails for me locally (on a 64-bit system) - I wonder if it's something to do with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a problem with .item()
with np.float32
scalar:
In [15]: scalar = np.float32(0.1)
In [16]: scalar
Out[16]: 0.1
In [17]: scalar.item()
Out[17]: 0.10000000149011612
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was resolved by the numpy folks at numpy/numpy#17880 (not a bug)
@@ -0,0 +1,60 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment was actually about moving them to a file in tests/arrays/masked/
(so to share the test with the different nullable dtypes).
Now, I am also fine to move it to the base extension tests as you did here, but then we should add this to all EAs. If it's only activated for nullable floating/integer/boolean as is done now, then it should move to tests/arrays/masked/
, as it's not a general test for all EAs.
Making it truly generic is probably a bit annoying (you need to know the exact expected type for the scalars), but so maybe the subclasses can override get_native_dtype
where needed? Or maybe it can be an attribute on the test class (if it's always a single type for each dtype, then that's simpler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment was actually about moving them to a file in
tests/arrays/masked/
(so to share the test with the different nullable dtypes).Now, I am also fine to move it to the base extension tests as you did here, but then we should add this to all EAs. If it's only activated for nullable floating/integer/boolean as is done now, then it should move to
tests/arrays/masked/
, as it's not a general test for all EAs.
I did misunderstand but this makes sense
Making it truly generic is probably a bit annoying (you need to know the exact expected type for the scalars), but so maybe the subclasses can override
get_native_dtype
where needed? Or maybe it can be an attribute on the test class (if it's always a single type for each dtype, then that's simpler)
This sounds like the way to go. I like not having to keep a registry of the mapping between scalars and native types in a single place
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
closing as stale. if you want to continue, pls ping and can re-open. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Reheating #31328 (as it's been stale for a few months)