Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Pandas adopt a naming convention for its protocol methods #26915

Closed
ghost opened this issue Jun 18, 2019 · 6 comments
Closed

Should Pandas adopt a naming convention for its protocol methods #26915

ghost opened this issue Jun 18, 2019 · 6 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@ghost
Copy link

ghost commented Jun 18, 2019

numpy has defined various protocol extensions over the years, and has consistently
named them with an array prefix, and "dunder" notation:

pandas has't been as diligent. The one example I've been dealing with, is _reduce. Originally an internal Series method (not a protocol), it now dispatches to _reduce for subclasses of ExtensionArray.

pandas/pandas/core/series.py

Lines 3743 to 3745 in baa77c3

elif isinstance(delegate, ExtensionArray):
# dispatch to ExtensionArray interface
return delegate._reduce(name, skipna=skipna, **kwds)

When reading the code for a new EA project, its hard to pick out that _reduce is actually an override of the parent class, instead of just an internal function written by the EA author. Something like __pandas_reduce__ would have made this clearer.

Due to inexperience with EA, I lost a bit of time figuring out why s.sum() wasn't invoking the EA implementation I thought it would.

Granted, it's not exactly a protocol. For the forseeable future, this will always be an ExtensionArray subclass, rather than a duck typed object. But, for readability,
and while EA is experimental, this might be the right time to clean it up.

@simonjayhawkins simonjayhawkins added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jun 18, 2019
@TomAugspurger
Copy link
Contributor

Typically dunder methods are associated with a top-level method intended for users to call. Since there isn't a single reduce method that corresponds to ExtensionArray._reduce, I don't think a dunder method is appropriate.

@jorisvandenbossche
Copy link
Member

Note that this is partly also historically. Until recently with the EAs, we never had candidates for such methods. And the specific example of _reduce: this already existed internally before it was exposed as part of the EA interface, so this was mainly added to the EA interface to quickly have something that EAs can tap into.

But I think I agree with @TomAugspurger that at this moment, a full dunder method is maybe not warranted. It are indeed only overrides of the parent class, not actual protocols (and good documentation / comments in the source code can make this clear I think).

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 19, 2019 via email

@ghost
Copy link
Author

ghost commented Jun 19, 2019

#26935

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 19, 2019

Typically dunder methods are associated with a top-level method intended for users to call.

not so. array? array_ufunc? none of the numpy methods have to do with top level methods. As I see it, duner methods are about protocols and/or duck typing.

They are (but with top-level functions, not methods as Tom said, that was maybe confusing): __array__ is a way to tap into np.(as)array(..), __array_ufunc__ are a way to tap into np.add, np.subtract, .. top-level ufuncs.

I don't think we are ready at all to make official dunder methods. It took many years for numpy to get to this. Let's first further experiment with what we have, and for sure, look into what is still missing for other numerical ops (the things being discussed in the other issues you opened)

@jorisvandenbossche
Copy link
Member

This issue is not actually about _reduce, but about the suggestion in #26935, that we build a single dispatch interface which unifies all the series methods,

Well, but the example we were discussing was _reduce. And we have many similar other methods that are part of our "EA interface", such as _from_sequence, _from_factorized, _concat_same_type, ... I don't think there is any point to start renaming those at this point.

To be clear, that is not to say the discussion in #26935 is not worth having. Absolutely not, that are problems we need to find a solution for. But just that the starting to rename the already existing methods is currently not that useful, I think (not that we could have chosen better names if we would start over). So closing this, let's continue the discussion about the actual method dispatch problem in #26935.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

3 participants