-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExtensionArray meta-issue #19696
Comments
Should these have a |
We're adding an ABCExtensionArray and `EA._type` with the value 'extension'
in #19520.
SparseArray being _array isn't the perfect name, but I don't think that
matters, do you? The more important name is the ABC.
…On Wed, Feb 21, 2018 at 11:37 PM, jbrockmendel ***@***.***> wrote:
Should these have a _typ attribute? If so, should SparseArray._typ be
changed from "array" to something more specific?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#19696 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkHC59w-YBE7ATTy1l6gtL-GgsSgks5tXP0MgaJpZM4SFUFN>
.
|
Quick status update. The two big ones are in, and now work can mostly proceed in parallel (aside from #19863 which a couple of my followups depend on). I'm currently rebasing all my branches on master and will make PRs today. I'll also go through all the TODOs we've added an make issues / checkboxes for those. If anyone is interested in working on these, |
|
Porting from #19902 (comment) When do we want to call it and release 0.23? I can get The big remaining question then is moving other array types to EAs
Do we want to block 0.23 for those? Do we have an estimate for how long that would take? |
For the datetime/timedelta/period arrays (or the arithmetic/comparison bits of them), if/when #19902 goes through, I can get the next two PRs in that sequence out pretty quickly. But after those are in there will be some non-trivial design decisions to be made. The end of March would be an optimistic goal. |
cc @shoyer @jreback @jorisvandenbossche @chris-b1 (don't think you got pinged on this, since I did it in an edit). |
I think ideally, we would already have converted Interval, Period, etc to use ExtensionArrays before doing a release, as this will exercise and stress-test the interface a lot. |
Another topic we need to discuss (although actually trying to implement it might make a discussion more tangible), is whether we want our current Index classes to subclass the ExtensionArrays (eg I think we previously thought to do composition, but @TomAugspurger said here #19902 (comment): (also @jreback mentioned this in #19957 (comment))
@TomAugspurger what do you see as advantages of using inheritance instead of composition? I didn't really think about in detail (code-wise), but: |
I think Series might be the wrong comparison: FooBlock should be the analogous object to FooIndex, shouldn't it? |
So I am a big +1 on making Index a subclass of EA. This will unify the external interface with the internal one (e.g. the user facing API). And will allow us to make sure the impl shares code (e.g. mainly this will involve changing some bespoke impl details of Index to conform to the new patterns of EA). We already have a de-facto implementation of this for Datetime w/tz, meaning we are using it like an EA. subclassing DTI will just formalize this. We also get the Period & II impl for free if we do this. We could also change the categorical impl to actually use CI and again get some more code sharing / harmony. The point of all this is to really dogfood EA internally in pandas. This will simplify code and promote a unified interface. Then once the above is done, we can actually remove all of the Block types that are not simple ones. These would then merely become EA implementations. Again this would simplify the implementation details and friction that we currently have. |
What do you mean with unify interfaces? Which interfaces? You mean the ExtensionArray API and the Index API? I don't think those should be unified. The ExtensionArray interface is deliberately much smaller.
As far as I know,
We can perfectly dogfood EA internally in pandas using composition instead of subclassing. So this is not relevant for the discussion (it is not about whether we want to use EAs internally in pandas, but how to use them).
That's already the goal of the ExtensionBlock. Again that is AFAIK no argument in favor or against subclassing. |
The main advantage I see to subclassing is avoiding |
@jbrockmendel you give an example with Series. But I don't think we are actually contemplating making Series a subclass of extension array? Or are we, which would be a huge change in design? (in my head at least it was only about Index) |
@jorisvandenbossche You're right, a more apt example would have been |
@jreback, @jbrockmendel do you have thoughts on this? |
Assuming we are going with the Index-subclasses-EA approach, then yes. That said two preferences on naming conventions:
|
I don't follow. Doesn't the fact that |
Tom's question was rather about not assuming this. |
My answer then has to be a convex combination of "I don't have an opinion" and "I don't have an informed opinion." |
Should this issue have a ExtensionArray label? |
New thought on inheritance vs composition: there are |
FWIW, https://github.com/pandas-dev/pandas/pull/20611/files didn't require any changes to the read only attributes (e.g. |
I don't know the |
Can you explain this further? What is the problem with the Array class using it for the Index having cached it? The Index caching such properties of the underlying values is in principle somewhat brittle in case somebody changes the underlying values of the Index in place; but, users are not 'supposed' to do that, and you can currently already break some of the cached properties of Index in a similar way. Further, if we want, we can actually also think about caching expensive properties on Arrays as well, if we want. We just need to make sure to clear the cache once its underlying values (left, right, closed in case of Interval) are changed. |
@jreback @TomAugspurger to come back to our discussion on hangout As I understand it now (Jeff, correct me if I am wrong), one of the main arguments you make is:
I agree that those two points are important to have, but I think the main discussion is how to achieve this. IMO it is not needed for Index to be an ExtensionArray to have those points. But let's use some very concrete examples to discuss this, so using some methods that are currently part of the ExtensionArray interface
In both those examples, I think we indeed want that Index and ExtensionArray have a consistent public interface, I fully agree on that. But I don't think we want them both having the ExtensionArray interface (because this interface is more than the public --> So let's think about how to make the public API consistent and having this consistency guaranteed by the code, without making Index an actual extension array? Just brainstorming here, but we could have a @jreback does the above explanation makes some sense? (which does not mean you have to agree :)) Do you understand our standpoint? |
One more specific case, this time in Series / DataFrame constructors which @jreback mentioned on the call. Indexes have names, while arrays don't. So |
Under composition, both the addition and the subtraction ops involve lookups of
Yah, if a user does that they're on their own. This usage would render the cache invalid in all scenarios being discussed.
Yah, its doable, just less-obvious. FWIW I'm coming around to liking the composition approach more and more. BTW sorry I missed the hangout. UTC-7. I'll take a look at the minutes. |
We didn't take much notes, I think the main thing that came out of the discussion is my long comment above. |
@jorisvandenbossche on your comments above: #19696 (comment) +1 on this. pretty good summarization. Yes I think just like we expose These could provide really just an interface or a trivial imiplementation ( Would for sure make use consistent in method names and to some extent signatures across all pandas objects. Could also test for this. |
Just to avoid confusion in terminology: what we currently call the "extension array interface" is what is defined and explained in the There are also some questions regarding to this to what extent we actually want them completely consistent. For example, ExtensionArray is 1D, so we did not add an |
Should |
I don't think it should share code. I think |
Is the IntervalArray checkbox check-able? |
I think so, yes (at least, IntervalArray is an ExtensionArray now) |
closable? |
Looks like most of the check boxes from the original issue are checked so I think we can close. We can open separate issues for the EA interface as they come up |
Just so things don't get lost
assert_*_equal
(REF: Base class for all extension tests #19863)__setitem__
works properly (ENH: ExtensionArray.setitem #19907)The text was updated successfully, but these errors were encountered: