-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Extending Pandas with custom types #19174
ENH: Extending Pandas with custom types #19174
Conversation
Define an interface for custom types.
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on January 12, 2018 at 20:10 Hours UTC |
@@ -409,6 +410,11 @@ def dtype(self): | |||
"""The :class:`~pandas.api.types.CategoricalDtype` for this instance""" | |||
return self._dtype | |||
|
|||
@property | |||
def _block_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still thinking about how best to handle this. The conflict is
- We don't want 3rd party libs to worry about
Blocks
at all - We (maybe) want Categorical, etc. to be instances of
ExtensionArray
- We do want to use
CategoricalBlock
instead of the defaultExtensionBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to use CategoricalBlock instead of ExtensionBlock with categorical dtype? Because that would require more changes in the code?
As in theory /ideally, the categorical custom operations should be all defined in the array (Categorical
) and the Block should not have to worry about the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm looking through at the methods CategoricalBlock
implements, that aren't part of the interface
fillna
: should probably be addedinterpolate
: Actual interpolation probably isn't important enough, but I think this is called indirectly forfillna
?shift
to_native_types
Trying to balance complexity of implementation for 3rd-parties here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this is great. It might be worth trying to simplify the interface here a little bit, e.g., by insisting that it's only for 1D data.
My current understanding from @wesm (which I agree with) is that the plans is for pandas2 to a new namespace and exist along-side pandas. So this shouldn't be a problem from a forwards compatibility perspective.
base = np.dtype([('hi', '>u8'), ('lo', '>u8')]) | ||
type = IPTypeType | ||
kind = 'O' | ||
fill_value = np.array([(0, 0)], dtype=base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that these fields can be properties, e.g., based on parameters set in the constructor.
pandas/core/extensions.py
Outdated
""" | ||
@property | ||
@abc.abstractmethod | ||
def type(self) -> type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be comment type annotations instead :)
pandas/core/extensions.py
Outdated
pass | ||
|
||
@abc.abstractmethod | ||
def to_dense(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this does/indicates.
pandas/core/extensions.py
Outdated
# ------------------------------------------------------------------------ | ||
@abc.abstractmethod | ||
def take(self, indexer, allow_fill=True, fill_value=None): | ||
"""For slicing""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to clarify if this is the NumPy version of take
, or the pandas internal version that uses -1
to indicate fill values.
If it's the later (as I suspect is the case) it could be valuable to expose some of our internal utility functions for this rather than force users to re-implement it. It is non-trivial to get right in a performant way.
pandas/core/extensions.py
Outdated
|
||
@property | ||
@abc.abstractmethod | ||
def shape(self) -> T.Tuple[int, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps all ExtensionArray objects should be required to be 1D?
That would be quite a bit simpler. Full support for ND data would probably require quite a few other methods (e.g., multidimensional indexing, transpose, multi-dimensional concatenation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable for now I think.
return issubclass(dtype, cls) | ||
|
||
|
||
class ExtensionArray(metaclass=abc.ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose you want to implement arithmetic or aggregations for your custom dtype (e.g., consider a units dtype). In principle almost any function that does computation might need to be overloaded in a dtype specific way. (This is what NumPy does for ufuncs.)
Would this be done via defining special methods on ExtensionArray
? Or do we simply say that this isn't possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment I've ignored arithmetic methods and reductions. I'm OK with saying it isn't possible for now, but am open to allowing it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been working on making Series/Index/DataFrame arithmetic methods behave predictably by sharing implementations (in particular datetime and timedelta types). Eventually the implementations need to get out of core.ops
and into some type of mixin classes that get mixed into both Index
subclasses and Block
subclasses (e.g. TimestampVector --> DatetimeIndex, DatetimeBlock
). The base classes for that look a lot like ExtensionArray
here; it's probably worth figuring out how these relate sooner rather than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. If you look at core/period.py
I had PeriodArray
inherit from DatetimeIndexOpsMixin
to get arithmetic ops to work, which doesn't feel great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had PeriodArray inherit from DatetimeIndexOpsMixin to get arithmetic ops to work
But does a Series with periods also rely on those methods? (which for real extensions would be important; and if not, I am not sure we should add it to PeriodArray et al)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had PeriodArray inherit from DatetimeIndexOpsMixin to get arithmetic ops to work, which doesn't feel great.
Which methods did you need to get from DatetimeIndexOpsMixin? There's a lot of Index-specific stuff in that class that isn't needed for a vector base class.
But does a Series with periods also rely on those methods? (which for real extensions would be important; and if not, I am not sure we should add it to PeriodArray et al)
@jorisvandenbossche not at the moment, no. But the idea as I understand it is that PeriodArray
would get mixed into something like PeriodBlock
, then Series[Period] would dispatch certain operations to self._data.blocks[0].__add__
or whatever.
|
||
@classmethod | ||
def concat_same_type(cls, to_concat): | ||
return cls(np.concatenate([arr.data.to_pandas() for arr in to_concat])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want to actually use np.concatenate()
here, since converting to NumPy arrays can be slow and/or lossy.
pandas/core/extensions.py
Outdated
@property | ||
@abc.abstractmethod | ||
def base(self) -> np.dtype: | ||
# TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me how this makes sense if the data is not backed by a NumPy array. I would be inclined to remove it from the interface if possible.
Perhaps it could be used by an optional NumpyExtensionArray
base-class that pre-implements some methods if you are using a NumPy array as the backing store.
@@ -76,7 +78,169 @@ | |||
from pandas.compat import range, map, zip, u | |||
|
|||
|
|||
class Block(PandasObject): | |||
class BlockOpsMixin(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is a core.internal_ops
that is empty. Was some of this intended to go in there? I very much like the idea of having blocks ("vectors"?) implement their arithmetic methods, so Series/DataFrame can just pass through to those implementations. shoyer opened an issue to that effect IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, these were temporarily in internal_ops
and shared between a couple files, but then I made a refactor to remove internal_ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be done in a different way.
- add a reasonable interface to the ExtensionDtype/Array
- move pandas extensions to subclass this one by one
- then actually change core code
its a bit counterintutive to do it this way, but you end up changing so much code that needs review I don't think it is avoidable.
There are many many things done to 'just make things work'. So these need to be cleaned up to be more general before you can even attempt this change.
# handle Categorical and sparse, | ||
if (is_extension_type(values) and not | ||
is_datetime64tz_dtype(values)): | ||
# Need the not is_datetime64tz_dtype since it's actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh? this is not friendly
@@ -891,7 +891,8 @@ def _map_values(self, mapper, na_action=None): | |||
return new_values | |||
|
|||
# we must convert to python types | |||
if is_extension_type(self.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to be iterable
# XXX: we have many places where we call this with a `.dtype`, | ||
# instead of a type. Think about supporting that too... | ||
from pandas.core.extensions import ExtensionArray, ExtensionDtype | ||
return (isinstance(arr, ExtensionArray) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs just to satisfy is_array_like
(which is an iterable with a dtype)
Yeah, I wanted to make sure we were comfortable with that. So we'll have a |
Refactored PeriodIndex to be backed by PeriodArray, and IntervalIndex to be backed by IntervalArray. Still a WIP. |
FYI, I haven't thought through whether PeriodArray / IntervalArray should be part of the public API in the same way that Categorical is. One the one hand, you'll get them when you do pd.Series(pd.period_range("2000", periods=10)), so they should probably be documented, etc. On the other hand, it's just one more thing to worry about. Is it useful to generate a pandas.PeriodArray(...) on it's own? |
@@ -128,7 +131,9 @@ def _isna_ndarraylike(obj): | |||
values = getattr(obj, 'values', obj) | |||
dtype = values.dtype | |||
|
|||
if is_string_dtype(dtype): | |||
if isinstance(values, ExtensionArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to duck-type this to check for objects that implement isna
method? I had a use case a while back where it would have been nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference to using the interface here, instead of duck-typing, just since it's part of the interface (and I think isna
is important enough to make it mandatory).
Virtual subclasses can register themselves if they want pd.isna
to work, but don't want to inherit from ExtensionArray
.
Thanks a lot for working on this! This is related to the work I have been doing on allowing custom Block types spiked by geopandas (but this is more general!), so I very much like to see this area improve. I should also test the implementation with geopandas. Some general feedback: I think the alternative we should consider is to better support custom Block types (although this might be harder in practice, as currently in many places arrays are passed around and not blocks), which I started for concat. From the developer point of view, I think this is very similar, as the additional methods on the ExtensionArray you need to implement, are basically just wrapped in the ExtensionBlock (I mean, writing an ExtensionArray or an ExtensionBlock will probably be more or less the same code). When designing a new pandas, I think we want array objects for all dtypes (and then a dataframe is a collection of such arrays with more methods / indexing capabilities), so in that regard this is a nice move. On the other hand, it is maybe too late in the pandas evolution to go this route for current pandas itself as well? We already have blocks, and also adding those arrays might complicate things further? (the goal of having arrays should partly be to not have blocks) This also relates to the question of to what extent the pandas arrays (PeriodArray, IntervalArray, DatetimeTZArray) should be part of the public interface. In the ideal case, I think the answer would be yes (and eg Series.values (or another more backward-compatible attribute) should return those arrays). But again, it might be too late for pandas and add additional user complexity to change this now.
I think in principle it should be possible to have an Index with the underlying values an ExtensionArray (which is actually a good reason for the array-way, as in such a case we could try to combine things that are now duplicated in eg DatetimeIndex and Series with DatetimeBlock in a single place as the DatetimeArray) Final note: I haven't really thought through in practice my comments above on using blocks instead, so it is quite possible that your current approach of an extension array is just much better/easier! (also for current pandas, and not only "in the ideal case of designing from scratch") |
Sorry for my long post above. Trying to more shortly summarize my worries: To add to my comments above, I think we should at some point also solicit some more feedback (on the mailing list) on the extension array interface. We should also think about the broader implications of the change, eg for downstream users. Eg what happens if statsmodels or scikit-learn encounter dataframes with extension types (not our responsability?) There are also plans to make the numpy dtypes more extensible, which are more concrete now with the funding I think. Not saying that we should wait on that (as it can still take a couple of years), but it might another reason to make our extension arrays not too public (in case we use arrays for our internal extension types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more concrete comments!
''''''''''''''' | ||
|
||
This class should describe your data type. The most important fields are | ||
``name`` and ``base``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mentioned you removed base
from the interface
@property | ||
def type(self): | ||
# type: () -> T.Any | ||
"""Typically a metaclass inheriting from 'type' with no methods.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a very clear explanation IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably since I don't understand what's it's used for :) In my next push, this won't be abstract. I'll have a default implementation that seems to work, so subclasses shouldn't have to worry about it.
@@ -409,6 +410,11 @@ def dtype(self): | |||
"""The :class:`~pandas.api.types.CategoricalDtype` for this instance""" | |||
return self._dtype | |||
|
|||
@property | |||
def _block_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to use CategoricalBlock instead of ExtensionBlock with categorical dtype? Because that would require more changes in the code?
As in theory /ideally, the categorical custom operations should be all defined in the array (Categorical
) and the Block should not have to worry about the type?
return issubclass(dtype, cls) | ||
|
||
|
||
class ExtensionArray(metaclass=abc.ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had PeriodArray inherit from DatetimeIndexOpsMixin to get arithmetic ops to work
But does a Series with periods also rely on those methods? (which for real extensions would be important; and if not, I am not sure we should add it to PeriodArray et al)
@abc.abstractmethod | ||
def __len__(self): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a __setitem__
as well to be able to specify how assigning new data to the series with such an array should update the array?
return None | ||
|
||
@abc.abstractmethod | ||
def formatting_values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that those block-related methods like formatting_values
, concat_same_type
, get_values
, slice
, .. should be private (so with underscore, but still part of the contract), because they are not that useful for direct usage of an ExtensionArray ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, in geopandas I think I also needed to implement view and ravel on the underlying GeometryArray to get things working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone back and forth on this. Apparently, including private methods in an interface is bad practice. But we have two sets of "users" here
- library authors implementing the interface. To them,
formatting_values
is not private - End-users of those library authors. To them,
formatting_values
is private.
I'd be curious to hear other's thoughts here.
|
||
def slice(self, slicer): | ||
# TODO: is this right? | ||
# In general, no. Probably just remove it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currenlty in geopandas, I needed a custom slicer on the Block level (so I think we need at least something here or on the block level)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, by "remove" I meant remove the default implementation and make it abstract.
|
||
if fill_tuple is None: | ||
fill_value = self.fill_value | ||
new_values = algos.take_nd(values, indexer, axis=axis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use the ExtensionArray's take method? (so values.take
)
return "{}({})".format(self.__class__.__name__, self._format_data()) | ||
|
||
@property | ||
def values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called values
on the Array?
I think there are in general (although already not fully applicable in the Interval case, but it is for periods) two possible arrays: the actual storage array (eg the integers in case of Periods, the record array in case of ipaddress) or a numpy array with boxed objects.
Both should be accessible I think (although the first is maybe an specific Array class implementation detail and should only be used inside the class itself?), but not sure the second should be .values
.
The question is also what __array__
should be if we add it?
|
||
|
||
class PeriodArray(DatetimeIndexOpsMixin, ExtensionArray): | ||
dtype = PeriodDtype() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this dtype hold the freq when it is an actual instance?
@jorisvandenbossche You might want to weight in on the block-registry discussion in #19189. |
Thanks Joris, some replies below. I'm still pretty convinced that a public
Initially I had a custom block as part of the interface, but removed it since I
My main motivation for array over block is not having to expose as much of
I think this argues for my approach :) If we have 3rd parties writing custom
FWIW, we don't have to refactor the current code to use extension arrays. We
Either way, I'll continue on with this since
Yes, I'll reach out to the NumPy devs when this is a bit more stable. Review-wise, it may make sense to split this into multiple PRs: |
Yes, this is probably a good reason. My idea was first: this doesn't really matter as it is only for the developers of third-party libraries and they can handle some complexity (if it was documented how to do, this explanation for CustomArray or CustomBlock would not be that different or much more complex for either case), and because I thought of our Blocks a bit is 'set in stone' for the current version. But I agree (after working a while with it) keeping open the possibility to do some refactorings / clean-up in this area is still potentially valuable (eg the non-consolidating discussion).
Yes, that would be nice indeed (although it's not necessarily an argument as similar but different effort could have created blocks for them, but it would be a very nice consequence of the effort). But just to be clear, this is in the current pushed PR not yet the case?
Yep, I know, that's certainly a good reason to do that :-) The question however still is to what extent they would be exposed publicly (but that can be a separate discussion later on)
From a reviewer standpoint, that would certainly be easier. But I am not sure how easy it would be for you. Any idea how you would want to split it? |
So I think a reasonable architecture would be to form a new set of modules. Then using these re-write Then do the same for Then maybe even do this for Then of course, you would ideally just create a By this time I think the api will be very very clear of what needs to be in these super-classes in general. These can all be done incrementally w/o disrupting things or causing very long and sweeping code reviews (which are bad generally). This is not actually as much work as I am painting (at least for Categorical and dtz), as its mostly about figuring out what should go in the super classes and what should end up in the sub-class. pandas has very full featured extension classes, so what better way to design an API than to use an extensive test suite and an already full featured implementation. note that on purpose these extension should be 'separate' from the internals, the internals will simply 'use' these blocks generically and conform to there exposed API rather than peer inside, as we do now. I suspect on our internals can be quite a lot simpler i things are organized along these lines. |
That sounds good. Two questions though:
|
IMO the goal of an ExtensionArray should be to not have to expose ExtensionBlock (we will have to see if it is fully possible, but at least it should be the goal I think). |
Because of circular imports, or just for organization? That file is quite long. |
I meant for file organization, that makes more sense IMO. If the file is too long, we can always make I would rather re-use the existing |
The only problem with this is that those would need to be 2D, and I think it would be nice to limit our array-based types (and extension interface) to 1D. |
so mod of proposal; create a sub-dir:
dtype can be in I the whole intent is allow very piecemeal and incremental changes that can be merged. IOW the very first think that should happen (after structure) is to make |
Not sure if you're suggesting this, but I don't think e.g. |
@TomAugspurger actually I am proposing ImmutableArray as well, though that just makes things even more complicated. So let's just stick to a Array which is mutable (make that very clear). But yes need to incorporate all of our existing functionaility. |
@jreback are you in favor of removing The tricky thing with datetimetz is that we don't have a |
no we will still have all of the existing blocks BUT they will have common functions in a Block; the data is held in an Array (which wraps things like DTI and Categorical, or maybe u can simply make Array a base class as i said before) CategoricalBlock and such are just implementations of a generic Block (what u r calling an ExtensuonBlock6 |
This, along with the mention a while back of PeriodArray subclassing DatetimelikeIndexOpsMixin, leads into the discussion of what exactly needs to be defined by an Array class. I've been making an effort to refactor some bits of the Index subclasses and Block subclasses to inherit from Array-like mixin classes. What I've found is:
Other Notes:
|
def equals(self, other): | ||
if not isinstance(other, type(self)): | ||
return False | ||
return (self.freq == other.freq and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.freq == other.freq
may raise, in which case you generally want to return False here.
return result | ||
|
||
def __iter__(self): | ||
return iter(self._data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this iter(self._box_func(x) for x in self._data)
?
@jbrockmendel agreed in general.
No kidding :) Let's enumerate some TODOs.
I think that 1 would be a relatively small, clean PR, but it wouldn't have any cleanup benefit. Anyway, I'm partway through implementing 1 and 2 as a first PR. The sticking point is the inheritence hierarchy of blocks. Currently I have
but there are a few issues to work out |
Split the first section off to #19268 |
@TomAugspurger for the discussion of refactoring the arithmetic ops out of |
Closing in favor of #19268 |
Adds an interface for 3rd party libraries to define custom array types. When
pandas encounters these objects, we'll resist the temptation to cast them to
object
dtype.This isn't near being ready yet, but I wanted to make a PR to get some thoughts
on the general approach / scope. I wouldn't recommend reviewing the code in
detail yet.
Some questions:
Do we want to do this?
The big concern being the pandas2 transition. I've tried to hide all
block-related changes in an adapter class. I don't think that allowing
extension types will hinder the transition in a major way.
Should our current extension types implement the interface?
Categorical essentially satisfies it already.
Interval basically does, but there isn't an
IntervalArray
class really,just an
IntervalIndex
.Period is the same as Interval.
SparseArray is a bit weird since it's an actual subclass of
ndarray
.Needs some more thought.
datetime-with-tz is also weird (ndarray, just with a custom type).
What to do with indexes?
Having an
ExtensionIndex
will be useful I think (value_counts, groupby,etc.). But I haven't tackled it yet. Haven't really thought about indexing at
all, since it sounds hard.
How much of the ndarray implementation should we push onto the pandas array
interface?
Probably some others, but that should kick things off. For those interested in
following along, I'm using this over in
https://github.com/continuumio/pandas-ip. It's nice to have a concrete problem
to solve.
Closes #18767