ENH: add NDArrayBackedExtensionArray to public API #45544

tswast · 2022-01-21T23:17:00Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

tswast · 2022-01-21T23:20:01Z

In the db-dtypes package (time and date data types at the moment), we found NDArrayBackedExtensionArray to be quite useful. Unfortunately, this is a private API at the moment. Per googleapis/python-db-dtypes-pandas#28 (comment), sending a PR to make this public.

jreback · 2022-01-23T00:24:55Z

no objections

cc @jbrockmendel @jorisvandenbossche

jreback · 2022-01-23T00:25:26Z

pandas/api/extensions/__init__.py

@@ -19,6 +19,7 @@
    ExtensionArray,
    ExtensionScalarOpsMixin,
 )
+from pandas.core.arrays._mixins import NDArrayBackedExtensionArray


i am shocked we don't have a tests that asserts this api e.g. test_api.py can you add one ?

we should probably import this into pandas.core.arrays.__init__ and here do the import from pandas.core.arrays

jbrockmendel · 2022-01-23T00:29:27Z

I've been meaning to get around to this for a while. Thanks @tswast for beating me to it!

One other thing we should ideally do is add a paragraph in the doc/source/development/extending.rst about this. something to the effect of "if your EA is a thin wrapper around an ndarray, you can save some effort by using ..."

…ndas-issue28

… into python-db-dtypes-pandas-issue28

tswast · 2022-01-24T17:09:48Z

Added some tests and docs. Should be ready for review now.

…ndas-issue28

pandas/core/arrays/_mixins.py

doc/source/development/extending.rst

…ndas-issue28

doc/source/development/extending.rst

tswast · 2022-08-25T20:11:24Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

Just pulled in the latest changes. Would love to get this documented, as it has been helpful in building the dbtime and dbdate dtypes in https://github.com/googleapis/python-db-dtypes-pandas

jbrockmendel · 2022-08-25T23:37:35Z

doc/source/development/extending.rst

+  Convert a value or values for use in setting a value or values in the backing
+  NumPy array.
+
+``_validate_searchsorted_value``


In 2.0 i think this is going away and we'll re-use _validate_setitem_value for this

Clarified that most implementations will be identical to _validate_setitem_value.

_validate_searchsorted_value is gone now

can you remove _validate_searchsorted_value here

jbrockmendel · 2022-08-25T23:38:55Z

doc/source/development/extending.rst

+
+
+To support 2D arrays, use the ``_from_backing_data`` helper function when a
+method is called on multi-dimensional data.


specify the data should be of the same dtype as self._ndarray?

jbrockmendel · 2022-08-25T23:39:49Z

couple of comments, otherwise lgtm. thanks for your patience

jbrockmendel · 2022-11-22T00:54:11Z

@tswast can you merge main. some of the deprecations discussed above have been enforced

…ndas-issue28

…andas into python-db-dtypes-pandas-issue28

…ndas-issue28

tswast · 2022-11-23T17:00:22Z

@jbrockmendel I've synced to main and addressed a remaining docs build warning.

jbrockmendel · 2022-11-30T02:47:45Z

doc/source/development/extending.rst

+        _internal_fill_value = numpy.datetime64("NaT")
+
+        def __init__(self, values):
+            backing_array_dtype = "<M8[ns]"


can you make this a np.dtype object instead of a string

jbrockmendel · 2022-11-30T02:49:06Z

doc/source/development/extending.rst

+
+        def _validate_setitem_value(self, value):
+            if pandas.api.types.is_list_like(value):
+                return [self._validate_scalar(v) for v in value]


this should be an ndarray of the same dtype as self._ndarray

jbrockmendel · 2022-11-30T02:50:19Z

doc/source/whatsnew/v2.0.0.rst

@@ -65,6 +65,7 @@ Other enhancements
 - :func:`timedelta_range` now supports a ``unit`` keyword ("s", "ms", "us", or "ns") to specify the desired resolution of the output index (:issue:`49824`)
 - :meth:`DataFrame.to_json` now supports a ``mode`` keyword with supported inputs 'w' and 'a'. Defaulting to 'w', 'a' can be used when lines=True and orient='records' to append record oriented json lines to an existing json file. (:issue:`35849`)
 - Added ``name`` parameter to :meth:`IntervalIndex.from_breaks`, :meth:`IntervalIndex.from_arrays` and :meth:`IntervalIndex.from_tuples` (:issue:`48911`)
+- :class:`NDArrayBackedExtensionArray` now exposed in the public API. (:issue:`45544`)


no trailing period

jbrockmendel · 2022-11-30T02:52:56Z

pandas/tests/api/test_api.py

+    def test_api(self):
+        checkthese = self.classes + self.funcs + self.misc
+
+        self.check(namespace=extensions, expected=checkthese)


im not that familiar with this test file. what is being tested here?

simonjayhawkins · 2023-02-22T13:39:11Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

MichaelTiemannOSC · 2024-01-20T22:17:50Z

@tswast now that 2.2 is out the door, and now that Pandas 2.x in general has made huge strides in its ExtensionArray implementation(s), could we get this PR back to active status by merging in main and resubmitting? If you don't have time for this, perhaps I can take a crack at it (with help from other Pint-Pandas people). In that case, what is the git-friendly way for me to pick up where you left off? We have an active use case that's ready to put this to the test.

CC: @andrewgsavage

andrewgsavage · 2024-01-21T22:15:56Z

@MichaelTiemannOSC I had a crack at it in my PR #56755. It's still active, just waiting on some help from pandas devs.

tswast · 2024-01-22T19:28:02Z

Thanks @MichaelTiemannOSC and @andrewgsavage , I don't have time to revive this, so I appreciate your efforts.

tswast added 2 commits January 21, 2022 17:16

ENH: add NDArrayBackedExtensionArray to public API

1f93779

add whatsnew

522b548

jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jan 23, 2022

Merge branch 'main' into python-db-dtypes-pandas-issue28

ee4e23d

jreback reviewed Jan 23, 2022

View reviewed changes

tswast added 4 commits January 24, 2022 10:09

add NDArrayBackedExtensionArray to pandas.core.arrays.__init__

945f840

add tests for extensions api

721ae11

add docs

ae68f9d

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

05d0e08

…ndas-issue28

tswast marked this pull request as ready for review January 24, 2022 17:08

Merge remote-tracking branch 'origin/python-db-dtypes-pandas-issue28'…

1ad0338

… into python-db-dtypes-pandas-issue28

tswast requested review from jbrockmendel and jreback January 24, 2022 17:11

tswast added 2 commits January 24, 2022 14:29

add autosummary for methods and attributes

38113c8

remove unreferenced methods from docs

18ec784

jbrockmendel mentioned this pull request Jan 24, 2022

QDataFrame #45602

Closed

4 tasks

tswast added 2 commits January 25, 2022 09:33

fix docstrings

2919f60

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

0c52366

…ndas-issue28

tswast commented Jan 25, 2022

View reviewed changes

pandas/core/arrays/_mixins.py Show resolved Hide resolved

jbrockmendel reviewed Jan 25, 2022

View reviewed changes

doc/source/development/extending.rst Show resolved Hide resolved

tswast added 3 commits January 26, 2022 10:11

use doc decorator

319ac2b

add code samples and reference to test suite

8513863

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

5309895

…ndas-issue28

tswast requested a review from jbrockmendel January 26, 2022 23:02

jbrockmendel reviewed Jan 27, 2022

View reviewed changes

doc/source/development/extending.rst Show resolved Hide resolved

jreback added this to the 1.5 milestone Jan 28, 2022

github-actions bot added the Stale label Aug 3, 2022

mroeschke removed this from the 1.5 milestone Aug 22, 2022

Merge branch 'main' into python-db-dtypes-pandas-issue28

f4df0e9

jbrockmendel reviewed Aug 25, 2022

View reviewed changes

clarify _validate_searchsorted_value and 2d backing array

8876b9a

tswast requested a review from jbrockmendel August 26, 2022 22:32

Merge branch 'main' into python-db-dtypes-pandas-issue28

1bdd1cd

tswast added 4 commits November 22, 2022 15:16

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

4b0a948

…ndas-issue28

Merge branch 'python-db-dtypes-pandas-issue28' of github.com:tswast/p…

5920778

…andas into python-db-dtypes-pandas-issue28

DOC: make insert docstring have single line summary

38018e6

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

9277cf5

…ndas-issue28

Merge branch 'main' into python-db-dtypes-pandas-issue28

0b86bd5

jbrockmendel reviewed Nov 30, 2022

View reviewed changes

simonjayhawkins closed this Feb 22, 2023

tswast mentioned this pull request Mar 3, 2023

chore: address issues identified in code review on pandas docs PR googleapis/python-db-dtypes-pandas#175

Open

andrewgsavage mentioned this pull request Nov 14, 2023

Add support for UFloat in PintArray (#139) hgrecco/pint-pandas#140

Closed

5 tasks

This was referenced Jan 1, 2024

add pandas uncertainty array lmfit/uncertainties#184

Closed

ENH: add NDArrayBackedExtensionArray to public API #56755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add NDArrayBackedExtensionArray to public API #45544

ENH: add NDArrayBackedExtensionArray to public API #45544

tswast commented Jan 21, 2022

tswast commented Jan 21, 2022

jreback commented Jan 23, 2022

jreback Jan 23, 2022 •

edited

Loading

jbrockmendel Jan 23, 2022

jbrockmendel commented Jan 23, 2022

tswast commented Jan 24, 2022

tswast commented Aug 25, 2022

jbrockmendel Aug 25, 2022

tswast Aug 26, 2022

jbrockmendel Nov 30, 2022

jbrockmendel Dec 15, 2022

jbrockmendel Aug 25, 2022

jbrockmendel commented Aug 25, 2022

jbrockmendel commented Nov 22, 2022

tswast commented Nov 23, 2022

jbrockmendel Nov 30, 2022

jbrockmendel Nov 30, 2022

jbrockmendel Nov 30, 2022

jbrockmendel Nov 30, 2022

simonjayhawkins commented Feb 22, 2023

MichaelTiemannOSC commented Jan 20, 2024

andrewgsavage commented Jan 21, 2024

tswast commented Jan 22, 2024



		To support 2D arrays, use the ``_from_backing_data`` helper function when a
		method is called on multi-dimensional data.

ENH: add NDArrayBackedExtensionArray to public API #45544

ENH: add NDArrayBackedExtensionArray to public API #45544

Conversation

tswast commented Jan 21, 2022

tswast commented Jan 21, 2022

jreback commented Jan 23, 2022

jreback Jan 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jan 23, 2022

tswast commented Jan 24, 2022

tswast commented Aug 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Aug 25, 2022

jbrockmendel commented Nov 22, 2022

tswast commented Nov 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Feb 22, 2023

MichaelTiemannOSC commented Jan 20, 2024

andrewgsavage commented Jan 21, 2024

tswast commented Jan 22, 2024

jreback Jan 23, 2022 •

edited

Loading