ENH/BUG: implement iter for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

arw2019 · 2020-10-24T01:17:17Z

closes API: ExtensionArrays and conversion to "native" types (eg in tolist, to_dict, iteration, ..) #29738
closes BUG: DataFrame.to_dict() converts Nullable Int types to numpy.int #34665
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Reheating #31328 (as it's been stale for a few months)

jreback

looks good - need to generalize this

also need a whats new note

jreback · 2020-10-24T02:59:04Z

pandas/core/arrays/integer.py

@@ -357,6 +357,13 @@ def __pos__(self):
    def __abs__(self):
        return type(self)(np.abs(self._data), self._mask)

+    def __iter__(self):


this actually be in our base class so that bool and float will work as well here

jreback · 2020-10-24T03:00:14Z

pandas/tests/arrays/integer/test_dtypes.py

+        lambda s: list(iter(s))[0],
+    ],
+)
+def test_conversion_methods_return_type_is_native(func):


paramterize per all Int and UInt types

also add tests for float / bool

may need o move this to the base class

Added the tests. Atm they're in:

pandas/tests/arrays/boolean/test_astype.py pandas/tests/arrays/floating/test_astype.py pandas/tests/arrays/integer/test_dtypes.py

Do you want them moved?

jreback · 2020-10-24T03:03:40Z

likely also close

Possibly related to: #27616, #25969, #21256

pls indicate this by adding tests (or comment that we don't close these)

arw2019

likely also close

Possibly related to: #27616, #25969, #21256

pls indicate this by adding tests (or comment that we don't close these)

Looking into these

arw2019 · 2020-10-24T15:26:31Z

pandas/core/arrays/integer.py

@@ -357,6 +357,13 @@ def __pos__(self):
    def __abs__(self):
        return type(self)(np.abs(self._data), self._mask)

+    def __iter__(self):


arw2019 · 2020-10-24T15:32:53Z

pandas/tests/arrays/integer/test_dtypes.py

+        lambda s: list(iter(s))[0],
+    ],
+)
+def test_conversion_methods_return_type_is_native(func):


Added the tests. Atm they're in:

pandas/tests/arrays/boolean/test_astype.py pandas/tests/arrays/floating/test_astype.py pandas/tests/arrays/integer/test_dtypes.py

Do you want them moved?

arw2019 · 2020-10-24T16:13:01Z

Getting a doctest failure after this patch: is it ok to change the doc here?

_____________ [doctest] pandas.core.arrays.floating.FloatingArray ______________
240 
241     Returns
242     -------
243     FloatingArray
244 
245     Examples
246     --------
247     Create an FloatingArray with :func:`pandas.array`:
248 
249     >>> pd.array([0.1, None, 0.3], dtype=pd.Float32Dtype())
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
     <FloatingArray>
    -[0.1, <NA>, 0.3]
    +[0.10000000149011612, <NA>, 0.30000001192092896]
     Length: 3, dtype: Float32

arw2019 · 2020-10-24T19:06:26Z

likely also close

Possibly related to: #27616, #25969, #21256

pls indicate this by adding tests (or comment that we don't close these)

#21256 - fixed on current master (I'll add tests)
#27616, #25969 - open & unaffected by this fix (they're bugs in the NumPy-backed data structures). I'll investigate in a separate PR

jorisvandenbossche

Thanks a lot for working on this!

As mentioned in #29738, if we see this as a general rule for EAs, we should also document this expected behaviour in the base class. And maybe also try to add a base extension test (although the expected result is probably not that easy to know, we could at least check it's not a numpy scalar)

jorisvandenbossche · 2020-10-28T21:55:24Z

pandas/tests/arrays/boolean/test_astype.py

+        lambda s: list(iter(s))[0],
+    ],
+)
+def test_conversion_methods_return_type_is_native(func):


We could maybe move this test to the common arrays/masked/ section? We should be able to know the expected python class for each of the data types

For sure. I'll make a separate file there (call it test_conversions.py maybe)

jreback · 2020-10-30T16:28:05Z

pandas/tests/arrays/floating/test_astype.py

@@ -118,3 +118,19 @@ def test_astype_object(dtype):
    # check exact element types
    assert isinstance(result[0], float)
    assert result[1] is pd.NA
+
+
+@pytest.mark.parametrize(


yeah can you move this to the base class to avoid this duplication

you can use the data fixture

Moved + rewritten with fixtures (though not 100% sure that what I did is what you asked)

jreback · 2020-10-30T16:28:52Z

pandas/tests/arrays/integer/test_dtypes.py

+    assert isinstance(func(s), int)
+
+
+def test_conversion_to_dict_oriented_record_returns_native(any_nullable_int_dtype):


see my comments about about moving this then these tests are much simpler

jorisvandenbossche · 2020-11-03T15:23:20Z

pandas/tests/extension/base/casting.py

@@ -32,11 +32,13 @@ def test_tolist(self, data):
        expected = list(data)
        assert result == expected

+    @pytest.mark.skip(reason="Floating precision issues")


What's the reason for this skip? Because the conversion to string is different for numpy scalars vs native scalars?

(the skip should also not be here, I think, but only for those dtypes that are failing them)

Yes I think so. It's specifically that in the construction of expected

expected = pd.Series([str(x) for x in data[:5]], dtype=str)

we get a lot more decimal places. The astype operation works as before

And what if you remove the str(..) call around x? Because by specifying string dtype, we will already use our internal machinery to handle the conversion, which should do it consistently for numpy vs native numbers, I think

Tried this, problem persists (for both tests)

Is this worth opening an issue about?

Looking into this. It's surprisingly hard to get a copy-pastable reproducer but the test failure is reliable - will keep looking

took a while but I finally got a copy-pastable example:

In [8]: import numpy as np ...: import pandas as pd ...: ...: data = pd.array([0, 0.1, 0.2, 0.3, 0.4, 0.5], dtype="Float32") ...: s = pd.Series(data) ...: res = s.astype(str) ...: expected = pd.Series([x for x in data], dtype=str) In [9]: expected Out[9]: 0 0.0 1 0.10000000149011612 2 0.20000000298023224 3 0.30000001192092896 4 0.4000000059604645 5 0.5 dtype: object

It's only the 32 bit that fails for me locally (on a 64-bit system) - I wonder if it's something to do with that

It seems to be a problem with .item() with np.float32 scalar:

In [15]: scalar = np.float32(0.1) In [16]: scalar Out[16]: 0.1 In [17]: scalar.item() Out[17]: 0.10000000149011612

This was resolved by the numpy folks at numpy/numpy#17880 (not a bug)

jorisvandenbossche · 2020-11-03T15:27:48Z

pandas/tests/extension/base/return_types.py

@@ -0,0 +1,60 @@
+import pytest


My previous comment was actually about moving them to a file in tests/arrays/masked/ (so to share the test with the different nullable dtypes).

Now, I am also fine to move it to the base extension tests as you did here, but then we should add this to all EAs. If it's only activated for nullable floating/integer/boolean as is done now, then it should move to tests/arrays/masked/, as it's not a general test for all EAs.

Making it truly generic is probably a bit annoying (you need to know the exact expected type for the scalars), but so maybe the subclasses can override get_native_dtype where needed? Or maybe it can be an attribute on the test class (if it's always a single type for each dtype, then that's simpler)

My previous comment was actually about moving them to a file in tests/arrays/masked/ (so to share the test with the different nullable dtypes).

Now, I am also fine to move it to the base extension tests as you did here, but then we should add this to all EAs. If it's only activated for nullable floating/integer/boolean as is done now, then it should move to tests/arrays/masked/, as it's not a general test for all EAs.

I did misunderstand but this makes sense

Making it truly generic is probably a bit annoying (you need to know the exact expected type for the scalars), but so maybe the subclasses can override get_native_dtype where needed? Or maybe it can be an attribute on the test class (if it's always a single type for each dtype, then that's simpler)

This sounds like the way to go. I like not having to keep a registry of the mapping between scalars and native types in a single place

…ests

…29738

github-actions · 2020-12-31T00:26:10Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

…29738

github-actions · 2021-01-31T00:17:18Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

…29738

jreback · 2021-02-11T01:31:17Z

closing as stale. if you want to continue, pls ping and can re-open.

arw2019 added 2 commits October 24, 2020 01:10

TST: add tests from OP

07782bc

ENH: implement __iter__ from IntegerArray

1b164c3

jreback requested changes Oct 24, 2020

View reviewed changes

jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Dtype Conversions Unexpected or buggy dtype conversions labels Oct 24, 2020

jreback mentioned this pull request Oct 24, 2020

Ensure conversion to "native" types for integer EA #31328

Closed

3 tasks

arw2019 added 4 commits October 24, 2020 15:38

feedback: move __iter__ method to base class

142c81f

TST: parametrize Int tests on dtype

3357901

TST: add floating tests

e38934e

Merge remote-tracking branch 'upstream/master' into GH29738

fa2166d

arw2019 commented Oct 24, 2020

View reviewed changes

TST: add boolean tests

c32aafa

TST: #346654

2eb7219

jorisvandenbossche reviewed Oct 28, 2020

View reviewed changes

jreback requested changes Oct 30, 2020

View reviewed changes

arw2019 added 11 commits October 30, 2020 18:36

feedback: gather tests in separate file + use fixtures

db4154d

TST: remove tests from original locations

30d2f09

Merge remote-tracking branch 'upstream/master' into GH29738

26314f7

Merge remote-tracking branch 'upstream/master' into GH29738

8d81ec9

TST: rewrite expected construction using pd.array

d7fced7

TST: add comment

7bdc0ac

TST: skip float-string conversion, reason:M f-p precision

9bf6b25

TST/BUG: correct test rewrite

ec837c0

TST: skip string conversion test due to fp-precision issues

c7db14a

TST: DRY the code using data fixture

e70a7df

CLN: remove unused code

ff1ede7

jorisvandenbossche reviewed Nov 3, 2020

View reviewed changes

TST/BUG: implement jorisvandenbossche suggestion to fix astype(str) t…

1df019c

…ests

arw2019 mentioned this pull request Nov 3, 2020

ENH: should a sequence of integers be a valid input to BooleanArray? #37614

Open

arw2019 added 4 commits November 3, 2020 21:17

TST: skip boolean combine_add test

2a5df3e

Merge remote-tracking branch 'upstream/master' into GH29738

79582d4

Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…

1206096

…29738

skip str astype tests for Float32Dtype

bcf0896

arw2019 mentioned this pull request Nov 30, 2020

rounding problem in string representation of np.float32 numpy/numpy#17880

Closed

github-actions bot added the Stale label Dec 31, 2020

Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…

cde954b

…29738

arw2019 removed the Stale label Dec 31, 2020

github-actions bot added the Stale label Jan 31, 2021

arw2019 added 2 commits January 31, 2021 11:16

Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…

d9f4809

…29738

docstring fix

225a260

jreback closed this Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/BUG: implement iter for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

ENH/BUG: implement iter for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

arw2019 commented Oct 24, 2020 •

edited

Loading

jreback left a comment

jreback Oct 24, 2020

arw2019 Oct 24, 2020

jreback Oct 24, 2020

arw2019 Oct 24, 2020

jreback commented Oct 24, 2020

arw2019 left a comment

arw2019 Oct 24, 2020

arw2019 Oct 24, 2020

arw2019 commented Oct 24, 2020

arw2019 commented Oct 24, 2020

jorisvandenbossche left a comment

jorisvandenbossche Oct 28, 2020

arw2019 Oct 29, 2020

jreback Oct 30, 2020

jreback Oct 30, 2020

arw2019 Oct 30, 2020

jreback Oct 30, 2020

jorisvandenbossche Nov 3, 2020

arw2019 Nov 3, 2020

jorisvandenbossche Nov 3, 2020

arw2019 Nov 3, 2020

arw2019 Nov 3, 2020

arw2019 Nov 4, 2020

arw2019 Nov 30, 2020

arw2019 Nov 30, 2020

arw2019 Nov 30, 2020

arw2019 Dec 31, 2020

jorisvandenbossche Nov 3, 2020

arw2019 Nov 3, 2020

github-actions bot commented Dec 31, 2020

github-actions bot commented Jan 31, 2021

jreback commented Feb 11, 2021

		assert isinstance(func(s), int)


		def test_conversion_to_dict_oriented_record_returns_native(any_nullable_int_dtype):

ENH/BUG: implement __iter__ for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

ENH/BUG: implement __iter__ for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

Conversation

arw2019 commented Oct 24, 2020 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 24, 2020

arw2019 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arw2019 commented Oct 24, 2020

arw2019 commented Oct 24, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 31, 2020

github-actions bot commented Jan 31, 2021

jreback commented Feb 11, 2021

ENH/BUG: implement iter for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

ENH/BUG: implement iter for IntegerArray so conversions (to_dict, tolist, etc.) return python native types #37377

arw2019 commented Oct 24, 2020 •

edited

Loading