-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add NDArrayBackedExtensionArray to public API #45544
Conversation
tswast
commented
Jan 21, 2022
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
In the |
no objections |
pandas/api/extensions/__init__.py
Outdated
@@ -19,6 +19,7 @@ | |||
ExtensionArray, | |||
ExtensionScalarOpsMixin, | |||
) | |||
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am shocked we don't have a tests that asserts this api e.g. test_api.py
can you add one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably import this into pandas.core.arrays.__init__
and here do the import from pandas.core.arrays
I've been meaning to get around to this for a while. Thanks @tswast for beating me to it! One other thing we should ideally do is add a paragraph in the doc/source/development/extending.rst about this. something to the effect of "if your EA is a thin wrapper around an ndarray, you can save some effort by using ..." |
… into python-db-dtypes-pandas-issue28
Added some tests and docs. Should be ready for review now. |
Just pulled in the latest changes. Would love to get this documented, as it has been helpful in building the |
Convert a value or values for use in setting a value or values in the backing | ||
NumPy array. | ||
|
||
``_validate_searchsorted_value`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 2.0 i think this is going away and we'll re-use _validate_setitem_value for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified that most implementations will be identical to _validate_setitem_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_validate_searchsorted_value is gone now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove _validate_searchsorted_value here
doc/source/development/extending.rst
Outdated
|
||
|
||
To support 2D arrays, use the ``_from_backing_data`` helper function when a | ||
method is called on multi-dimensional data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify the data should be of the same dtype as self._ndarray?
couple of comments, otherwise lgtm. thanks for your patience |
@tswast can you merge main. some of the deprecations discussed above have been enforced |
…andas into python-db-dtypes-pandas-issue28
@jbrockmendel I've synced to main and addressed a remaining docs build warning. |
_internal_fill_value = numpy.datetime64("NaT") | ||
|
||
def __init__(self, values): | ||
backing_array_dtype = "<M8[ns]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a np.dtype object instead of a string
|
||
def _validate_setitem_value(self, value): | ||
if pandas.api.types.is_list_like(value): | ||
return [self._validate_scalar(v) for v in value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an ndarray of the same dtype as self._ndarray
@@ -65,6 +65,7 @@ Other enhancements | |||
- :func:`timedelta_range` now supports a ``unit`` keyword ("s", "ms", "us", or "ns") to specify the desired resolution of the output index (:issue:`49824`) | |||
- :meth:`DataFrame.to_json` now supports a ``mode`` keyword with supported inputs 'w' and 'a'. Defaulting to 'w', 'a' can be used when lines=True and orient='records' to append record oriented json lines to an existing json file. (:issue:`35849`) | |||
- Added ``name`` parameter to :meth:`IntervalIndex.from_breaks`, :meth:`IntervalIndex.from_arrays` and :meth:`IntervalIndex.from_tuples` (:issue:`48911`) | |||
- :class:`NDArrayBackedExtensionArray` now exposed in the public API. (:issue:`45544`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no trailing period
def test_api(self): | ||
checkthese = self.classes + self.funcs + self.misc | ||
|
||
self.check(namespace=extensions, expected=checkthese) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not that familiar with this test file. what is being tested here?
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
@tswast now that 2.2 is out the door, and now that Pandas 2.x in general has made huge strides in its ExtensionArray implementation(s), could we get this PR back to active status by merging in main and resubmitting? If you don't have time for this, perhaps I can take a crack at it (with help from other Pint-Pandas people). In that case, what is the git-friendly way for me to pick up where you left off? We have an active use case that's ready to put this to the test. CC: @andrewgsavage |
@MichaelTiemannOSC I had a crack at it in my PR #56755. It's still active, just waiting on some help from pandas devs. |
Thanks @MichaelTiemannOSC and @andrewgsavage , I don't have time to revive this, so I appreciate your efforts. |