Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add top-level functions as method #12640

Closed
1 of 4 tasks
jreback opened this issue Mar 16, 2016 · 10 comments
Closed
1 of 4 tasks

API: add top-level functions as method #12640

jreback opened this issue Mar 16, 2016 · 10 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

xref #12578

so its pretty easy to add (and eventually deprecate) certain top-level functions that could/should be methods. Promotes a cleaner syntax and method chaining.

In 0.18.1 would propose adding these, deprecating in 0.19.0.

This would eliminate the ability to use directly with np.ndarrays, but I view this is as a positive (note we effectively did this for the .rolling/expanding/ewm with np.ndarrays as well, though deprecated for a while).

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Mar 16, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 16, 2016
@jreback jreback modified the milestones: Next Major Release, 0.18.1 Apr 26, 2016
@jreback jreback added the Master Tracker High level tracker for similar issues label Jan 27, 2017
@jorisvandenbossche
Copy link
Member

From the above list, pivot_table is already a method, so I think that one can be removed.

Further, I don't think I find it necessary to deprecate those. Those are frequently used functions, so not sure it is worth deprecating them (which does not mean some of them would be valuable as a method as well).

@ResidentMario
Copy link
Contributor

@jreback I'm interested in taking a stab at this, but since I've never toyed with pandas before I wanted to first ask what your opinion is as to implementation?

The easiest way to port these to methods would be to simply thread a call to the globals from a method (akin to what is done in Series.from_csv/Series.to_csv). The advantage of this approach is that it's easy to do and avoids code refactoring. The disadvantage is that you would have to add a handful of additional dependencies (e.g. from pandas.tools.tile import (cut, qcut)) to series.py and dataframe.py, which I assume would slow down module initialization. Perhaps these imports could be made in the method body? Seems suboptimal...

As I see it, the best option would be to instead move the core of each of the method bodies into a core module (common.py?), then simplify the old global and new class methods into wrappers on those. This might allow object method implementations to use a (marginally?) faster code path, since less type-checking would be required. However, it would require a refactor.

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2017

@ResidentMario conceptually this is very easy, (look at something like .unstack()) or this:

def melt(self, ....):
    from pandas.core.reshape import melt
    return melt(self, ....)

BUT I didn't put a doc-string. And you cannot import things directly at the top-level in core/frame.py. It wouldn't slow down initialization much, the issue is dependencies (e.g. these modules import DataFrame, so they can't be imported directly.

so here's a way to do this.

create DataFrame.melt (like above), but use a _shared_docs doc-string (and define it in core/frame.py). Then you can set the doc-string directly in pandas.core.reshape.melt directly.

I think this is a nice pattern that will work.

@ResidentMario
Copy link
Contributor

Ok, great, glad I asked!

@ResidentMario
Copy link
Contributor

ResidentMario commented Feb 26, 2017

Line no. 186 in pivot.py in the current source:

DataFrame.pivot_table = pivot_table

git blame pandas/tools/pivot.py -L 186 shows that this is a very old Wes McKinney commit from circa 2011. Technically, this approach is correct, but it doesn't generate a reference in the docs. Do we want to refactor this into the new pattern?

Edit: actually it does generate a doc entry. Sphinx is more clever than I thought! So I think that pd.pivot_table can be marked "done" as-is.

@ResidentMario
Copy link
Contributor

Technically speaking, if reusing the original docstring is OK, this pattern could be used for all of the other conversions as well, since the files they are defined in already import that Series and DataFrame names where needed.

However I think that the old functions ought to get links to the new methods in their docstrings, and vice versa, and that the new methods ought to additionally get a versionadded docstring. To do that, Jeff's pattern is required.

In the case of pivot_table, the alias is already ancient. Nevertheless, refactoring it using Jeff's pattern is an arguable documentation improvement, since DataFrame.pivot_table would now link to pd.pivot_table in the docs (and vice versa).

@jreback what do you think?

@jreback
Copy link
Contributor Author

jreback commented Feb 26, 2017

@ResidentMario yeah ok with having a standard way of doing this for doc-strings and such (so should fix pivot_table).

So for how to do this, Ideally establish the pattern doing 1 or 2 methods, then in additional PR's can do the rest. IOW, can tweek to get the pattern that we want (and maybe make helpers / decorators to assist first).

FYI, I have made a decorator e75eacb (see pandas.utils.decorator for doing something almost like this.

I am going to split this off and push to master shortly. (the decorator)

jreback pushed a commit that referenced this issue Apr 4, 2017
xref #12640
xref #14876

Author: Aleksey Bilogur <aleksey.bilogur@gmail.com>

Closes #15521 from ResidentMario/12640 and squashes the following commits:

1657246 [Aleksey Bilogur] two doc changes
28a38f2 [Aleksey Bilogur] tweak whatsnew entry.
5f306a9 [Aleksey Bilogur] +whatsnew
ff895fe [Aleksey Bilogur] Add tests, update docs.
11f3fe4 [Aleksey Bilogur] rm stray debug.
3cbbed5 [Aleksey Bilogur] Melt docstring.
d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.
@jreback jreback modified the milestones: Next Major Release, High Level Issue Tracking Sep 24, 2017
@TomAugspurger TomAugspurger removed the Master Tracker High level tracker for similar issues label Jul 6, 2018
@TomAugspurger TomAugspurger removed this from the High Level Issue Tracking milestone Jul 6, 2018
@jbrockmendel
Copy link
Member

still relevant?

@jreback
Copy link
Contributor Author

jreback commented Sep 22, 2020

i think it's worth adding get_dummies (there is an existing PR for this) and cut/qcut to series - separately can consider deprecating the top level

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Jan 23, 2023
@jbrockmendel
Copy link
Member

Discussed on today's dev call and the consensus was against, as in the time since the OP we have moved in the opposite direction towards not having e.g. DataFrame.read_csv. @mroeschke also pointed out that qcut can return one object or a tuple depending on arguments, which defeats chaining. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants