API/ENH: relabel method #15104

chris-b1 · 2017-01-11T01:22:19Z

closes API: allow DataFrame.rename to take a list-like of colums #14829
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

small example

In [7]: df = pd.DataFrame({'a': [1,2], 'b': [3, 4]})

In [8]: df.relabel(columns=['J', 'K'])
Out[8]: 
   J  K
0  1  3
1  2  4

jreback · 2017-01-11T01:26:59Z

how would this work with groupby? (just thinking out loud)

chris-b1 · 2017-01-11T01:31:23Z

Setup:

df = pd.DataFrame({'key': [1, 1, 2], 'value': [1, 2, 3]})
gb = df.groupby('key')

I can't think of a meaningful answer for what gb.relabel would do? You would of course be able to do:

In [17]: gb.sum()
Out[17]: 
     value
key       
1        3
2        3

In [18]: gb.sum().relabel(columns=['newvalue'])
Out[18]: 
     newvalue
key          
1           3
2           3

jreback · 2017-01-11T01:43:45Z

pandas/core/generic.py

+            Labels to construct new axis from - number of labels
+            must match the length of the existing axis
+        copy : boolean, default True
+            Also copy underlying data


I know we have this flag (copy) on .rename as well. I think these are confusing, though not sure what a better API is for these. (and copy=True, inplace=True is pretty much meaningless.

Yeah, not sure - I agree it's confusing. I'd be tempted to deprecate copy altogether, though if you know what you're doing I suppose it's useful.

jreback · 2017-01-11T01:44:28Z

pandas/core/generic.py

+
+        See Also
+        --------
+        pandas.NDFrame.rename


prob best to add pandas.Series.rename, pandas.DataFrame.rename, pandas.Panel.rename here instead

codecov-io · 2017-01-11T10:43:54Z

Current coverage is 84.75% (diff: 93.75%)

Merging #15104 into master will decrease coverage by <.01%

@@             master     #15104   diff @@
==========================================
  Files           145        145          
  Lines         51220      51263    +43   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43415      43448    +33   
- Misses         7805       7815    +10   
  Partials          0          0

Powered by Codecov. Last update c71f214...db1779b

jreback · 2017-01-12T00:22:24Z

doc/source/indexing.rst

@@ -1586,6 +1586,23 @@ If you create an index yourself, you can just assign it to the ``index`` field:

   data.index = index

+.. versionadded:: 0.20.0
+
+Alternatively, the :meth:`~DataFrame.relabel` can be used to assign new


are these underlined? or is that the diff?

It's the diff - no clue why it looks like that though

jreback · 2017-01-12T21:19:30Z

what does SQL, Spark, R call this rename/relabel API?

not that we need to follow, but consistency is not a bad thing.

chris-b1 · 2017-01-12T21:42:21Z

dplyr and SQL would both do this through the select verb, dplyr also has rename, which basically works the same way, but selects everything else too.

# R
df %>% select(J = a, K = b)
df %>% rename(J = a)  # includes column `b`

# SQL
SELECT a as J, b as K
FROM df

In base R it also looks like you can do label assignment via the names function (not 100% sure about this)

# R
names(df) <- c(J, K)

chris-b1 · 2017-01-12T22:02:09Z

I suppose #12392 also needs consideration. From the discussion there, there was some preference towards moving towards a relabel(labels=['J', 'K'], axis=1) API, though I personally prefer the named axis arguments for something like this.

cc @nickeubank @jorisvandenbossche @shoyer

jorisvandenbossche · 2017-01-12T23:21:03Z

Yes, we need to take #12392 also in consideration. For now, you only implemented the new feature (renaming all columns using a list-like) in relabel, but I think the original idea was to have the full functionality of rename in relabel (at least, that was my interpretation, but that's certainly open for discussion!). In that case, we should first decide on #12392 before implementing it here.

though I personally prefer the named axis arguments for something like this.

me as well, but I think there was a compromise in #12392 to have both possibilities at the same time. But let's leave that discussion over there. #12392 is something I would like to see happen for 0.20/1.0, so I can try to take a new look at that this weekend.

jreback · 2017-01-12T23:24:09Z

I view this different than @jorisvandenbossche

This is like .set_index but w/o the 'guessing' whether you are meaning to set by a particular column (we had a PR try to implement, #11944 this but it got complicated). So I view this as complimentary to .rename, and would have both actually (and keep the API consistent until / unless we change in #12392)

jorisvandenbossche · 2017-01-12T23:26:52Z

Regarding the relabel name: I personally find it also not ideal, as rename(column={'a':'b'}) flows very naturally for "renaming column 'a' to 'b'".
I think the main problem with rename is the conflict with the fact that a Series has a name attribute? Or were there other reasons to introduce a relabel method?

ibis also calls this relabel: http://docs.ibis-project.org/generated/ibis.expr.api.TableExpr.relabel.html#ibis.expr.api.TableExpr.relabel

cc @pandas-dev/pandas-core

jorisvandenbossche · 2017-01-12T23:29:21Z

This is like .set_index but w/o the 'guessing' whether you are meaning to set by a particular column

That is certainly also a possibility, but then I would keep relabel for exactly that (and not being able to also rename individual labels like rename, and then maybe something like set_labels would be a clearer name (that you replace the full set of labels)

chris-b1 · 2017-01-12T23:44:03Z

Or were there other reasons to introduce a relabel method?

My thinking was that rename is already too overloaded and can run into corner cases with a list-like #14829 (comment), but can't really be changed for back-compat.

set_labels wouldn't be bad either, although somewhat confusingly we already have a set_axis that does the same thing, but inplace only.

TomAugspurger · 2017-01-13T01:56:03Z

I'm in favor of relabel gaining the dict and function behaviors of rename. I think relabel is the better name for changing the row or column labels. That leaves rename for changing the index and column names.

…

_____________________________ From: chris-b1 <notifications@github.com> Sent: Thursday, January 12, 2017 17:44 Subject: Re: [pandas-dev/pandas] API/ENH: relabel method (#15104) To: pandas-dev/pandas <pandas@noreply.github.com> Cc: Tom Augspurger <thomas-augspurger@uiowa.edu>, Manual <manual@noreply.github.com> Or were there other reasons to introduce a relabel method? My thinking was that rename is already too overloaded and can run into corner cases with a list-like #14829 (comment), but can't really be changed for back-compat. set_labels wouldn't be bad either, although somewhat confusingly we have a set_axis that does the same thing, but inplace only. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

shoyer · 2017-01-13T03:08:26Z

The name "labels" is nicely unambiguous about referring to DataFrame.columns or DataFrame.index, unlike the highly overloaded "name". So all things being equal, relabel would seem more consistent with setting index/column values than rename.

BUT rename already basically does this and I'm not sure a deprecation cycle is worth it. I do think multiple methods may be a good thing, but they should be distinguished by which names they change, not now how they change them (dict vs list).

I am more inclined toward the solution #14829, which enables list input to rename. I am OK with distinguishing between tuples and lists for dispatching on Series.name. If I had my way, np.array and pd.Series would not coerce tuples (treating them as immutable scalars), only lists.

It's also worth noting that we do have an existing rename_axis method, too. To summarize the current state of affairs:

Series.rename sets Series.index (dict/function) or Series.name, depending on the type of input .
Series.rename_axis sets Series.index (dict/function) or Series.index.name (others), depending on the type of input.
DataFrame.rename sets DataFrame.columns/DataFrame.index
DataFrame.rename_axis sets DataFrame.columns/DataFrame.index (dict/function) or DataFrame.columns.name/DataFrame.index.name (others), depending on the type of input

CCing @MaximilianR in case he has ideas.

jreback · 2017-01-13T14:44:41Z

#14636 here is a proposal to change .set_axis and make it a full-fledged functions (which is similar to .relabel)

chris-b1 · 2017-03-20T15:02:34Z

I'm uncertain about the api here, and this specific change isn't solving a significant problem so closing for now. Worth re-evaluating after/ as part of #12392.

API/ENH: relabel method

a308d87

chris-b1 mentioned this pull request Jan 11, 2017

API: allow list-like to DataFrame rename #15029

Closed

4 tasks

chris-b1 added API Design Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 11, 2017

jreback reviewed Jan 11, 2017

View reviewed changes

fix test for py2

6bbf39f

catch panel4d warnings

db1779b

jreback reviewed Jan 12, 2017

View reviewed changes

chris-b1 closed this Mar 20, 2017

chris-b1 added this to the No action milestone Mar 20, 2017

jreback mentioned this pull request Jul 17, 2017

Add a .relabel method; deprecate .rename and .rename_axis for relabeling #16990

Closed

toobaz mentioned this pull request Jul 17, 2017

ENH: provide "inplace" argument to set_axis() #16994

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API/ENH: relabel method #15104

API/ENH: relabel method #15104

chris-b1 commented Jan 11, 2017

jreback commented Jan 11, 2017

chris-b1 commented Jan 11, 2017

jreback Jan 11, 2017

chris-b1 Jan 11, 2017

jreback Jan 11, 2017

codecov-io commented Jan 11, 2017 •

edited

Loading

jreback Jan 12, 2017

chris-b1 Jan 12, 2017

jreback commented Jan 12, 2017

chris-b1 commented Jan 12, 2017 •

edited

Loading

chris-b1 commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

jreback commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

chris-b1 commented Jan 12, 2017 •

edited

Loading

TomAugspurger commented Jan 13, 2017 via email

shoyer commented Jan 13, 2017

jreback commented Jan 13, 2017

chris-b1 commented Mar 20, 2017

API/ENH: relabel method #15104

API/ENH: relabel method #15104

Conversation

chris-b1 commented Jan 11, 2017

jreback commented Jan 11, 2017

chris-b1 commented Jan 11, 2017

jreback Jan 11, 2017

Choose a reason for hiding this comment

chris-b1 Jan 11, 2017

Choose a reason for hiding this comment

jreback Jan 11, 2017

Choose a reason for hiding this comment

codecov-io commented Jan 11, 2017 • edited Loading

Current coverage is 84.75% (diff: 93.75%)

jreback Jan 12, 2017

Choose a reason for hiding this comment

chris-b1 Jan 12, 2017

Choose a reason for hiding this comment

jreback commented Jan 12, 2017

chris-b1 commented Jan 12, 2017 • edited Loading

chris-b1 commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

jreback commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

jorisvandenbossche commented Jan 12, 2017

chris-b1 commented Jan 12, 2017 • edited Loading

TomAugspurger commented Jan 13, 2017 via email

shoyer commented Jan 13, 2017

jreback commented Jan 13, 2017

chris-b1 commented Mar 20, 2017

codecov-io commented Jan 11, 2017 •

edited

Loading

chris-b1 commented Jan 12, 2017 •

edited

Loading

chris-b1 commented Jan 12, 2017 •

edited

Loading