Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Allow dictionary argument in rename_axis to change some names of MultiIndex #19978

Closed
Dr-Irv opened this issue Mar 2, 2018 · 12 comments · Fixed by #20046
Closed

API: Allow dictionary argument in rename_axis to change some names of MultiIndex #19978

Dr-Irv opened this issue Mar 2, 2018 · 12 comments · Fixed by #20046

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 2, 2018

Code Sample, a copy-pastable example if possible

mi = pd.MultiIndex.from_product([['a','b','c'],[1,2]], names=['ll','nn'])
df = pd.DataFrame({'x': [i for i in range(len(mi))], 'y' : [i*10 for i in range(len(mi))]}, index=mi)
df.rename(columns={'x':'z'})  # This works to change name of one column
df.rename_axis({'x' : 'z'}, axis='columns')  # This also works to change name of one column (but you get a deprecation warning)
df.rename(index={'nn':'zz'}) # This does not work to change name of one level of a MultiIndex
df.rename_axis({'nn' : 'zz'}, axis='index') # This does not work to change name of one level of a MultiIndex

Problem description

Ideally, if you call df.rename_axis(dict, axis='index'), only the names specified in the dictionary would be changed in the corresponding MultiIndex

Expected Output

For the expression

df.rename_axis({'nn' : 'zz'}, axis='index')

the output should be:

       x   y
zz ll       
1  a   0   0
   b   1  10
   c   2  20
2  a   3  30
   b   4  40
   c   5  50

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
In [ ]:

@jreback
Copy link
Contributor

jreback commented Mar 5, 2018

duplicate of #4160 a pull-request is welcome.

MI rename is entirely broken.

@jreback jreback closed this as completed Mar 5, 2018
@jreback jreback added Duplicate Report Duplicate issue or pull request MultiIndex labels Mar 5, 2018
@jreback jreback added this to the No action milestone Mar 5, 2018
@jorisvandenbossche jorisvandenbossche added Enhancement and removed Duplicate Report Duplicate issue or pull request labels Mar 6, 2018
@jorisvandenbossche jorisvandenbossche removed this from the No action milestone Mar 6, 2018
@jorisvandenbossche
Copy link
Member

This is another issue. #4160 is about renaming labels in a MultiIndex (with rename), this issue is about renaming MultiIndex level names (with rename_index)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 6, 2018

@jorisvandenbossche Yes, this is about changing the level names, but using rename_axis (not rename_index - that doesn't exist!)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 6, 2018

So in order to allow this functionality (using a mapper to change some names of a MultiIndex) now without removing the deprecated behavior, I propose to add a parameter rename_names=False that keeps the deprecated behavior, and then we flip it to rename_names=True when the deprecated behavior (passing a dict or function to change some names, which changes the labels) is removed.

Otherwise, I can't see how to add this functionality until some future version where we can officially remove the deprecated behavior. Comments from @jorisvandenbossche and @jreback welcome.

@jreback
Copy link
Contributor

jreback commented Mar 6, 2018

In [6]: mi = pd.MultiIndex.from_product([['a','b','c'],[1,2]], names=['ll','nn'])
   ...: df = pd.DataFrame({'x': [i for i in range(len(mi))], 'y' : [i*10 for i in range(len(mi))]}, index=mi)
   ...: 

In [7]: df.index = df.index.set_names('baz', level=1)

In [8]: df
Out[8]: 
        x   y
ll baz       
a  1    0   0
   2    1  10
b  1    2  20
   2    3  30
c  1    4  40
   2    5  50

@Dr-Irv you can just do this. this is quite idiomatic and should be used more anyhow.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 6, 2018

@jreback Yes, but then I can't do the renaming inline, which is the capability that I'd like to have.

With the current rename_axis, I can rename all the names inline, but not individual ones. In other words, in your example:

df.rename_axis(['ll','baz'],axis='index')

produces the desired result, but then you have to specify all of the names, rather than just the ones you want to change.

Just like rename allows changing only some of the labels, I'd like to have rename_axis change some of the names.

@jreback
Copy link
Contributor

jreback commented Mar 6, 2018

sure you can, use inplace=True. its completely non-idiomatic but you can use it

@jreback
Copy link
Contributor

jreback commented Mar 6, 2018

I am -1 on changing this and why I closed it.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 6, 2018

@jreback I'm looking at this from having some cleanliness and symmetry in the API's. Let's say that I have a DataFrame with lots of columns that has come from doing lots of computation, and it is indexed on a MultiIndex with multiple levels. In a chained computation, I can write df.rename(columns=dict(...)) to rename some of the columns. From my point of view, the names of the MultiIndex are just like columns. They just happen to be the index, i.e., like primary keys in a database. So I'd like to be able to do df.rename_axis(dict(...), axis='index') to change some of the names in the index, and use that in a chained computation.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 6, 2018

@jreback Here's an example that illustrates the need for rename_axis with an argument dictionary. In this example, there are 2 DataFrames, (df3 and df2) and the contents of the indexes are exactly the same, except that one of the names of the index in df2 is different. So if try to add the 2 Series, I get an error because the names of the indexes are different. But if I change that single name, I can then do the add.


In [2]: mi3 = pd.MultiIndex.from_product([list('AB'),list('CD'),list('EF')], name
   ...: s=['AB', 'CD', 'EF'])
   ...: df3 = pd.DataFrame([i for i in range(len(mi3))], index=mi3, columns=['N']
   ...: )
   ...: df3
   ...:
Out[2]:
          N
AB CD EF
A  C  E   0
      F   1
   D  E   2
      F   3
B  C  E   4
      F   5
   D  E   6
      F   7

In [3]: df2 = df3.reorder_levels(['EF','CD','AB']).sort_index().reorder_levels(['
   ...: AB','CD','EF'])
   ...: df2.index.names = ['AB','CD','EF1']
   ...: df2
   ...:
Out[3]:
           N
AB CD EF1
A  C  E    0
B  C  E    4
A  D  E    2
B  D  E    6
A  C  F    1
B  C  F    5
A  D  F    3
B  D  F    7

In [4]: df3 + df2
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-4-b209b0c95b4e> in <module>()
----> 1 df3 + df2

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\ops.py in f(self, other, axis, level, fill_value)
   1493
   1494         if isinstance(other, ABCDataFrame):  # Another DataFrame
-> 1495             return self._combine_frame(other, na_op, fill_value, level)
   1496         elif isinstance(other, ABCSeries):
   1497             return _combine_series_frame(self, other, na_op,

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\frame.py in _combine_frame(self, other, func, fill_value, level)
   3967
   3968     def _combine_frame(self, other, func, fill_value=None, level=None):
-> 3969         this, other = self.align(other, join='outer', level=level, copy=False)
   3970         new_index, new_columns = this.index, this.columns
   3971

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\frame.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   3010                                             method=method, limit=limit,
   3011                                             fill_axis=fill_axis,
-> 3012                                             broadcast_axis=broadcast_axis)

   3013
   3014     @Appender(_shared_docs['reindex'] % _shared_doc_kwargs)

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   6324                                      copy=copy, fill_value=fill_value,
   6325                                      method=method, limit=limit,
-> 6326                                      fill_axis=fill_axis)
   6327         elif isinstance(other, Series):
   6328             return self._align_series(other, join=join, axis=axis, level=level,

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\generic.py in _align_frame(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   6346             if not self.index.equals(other.index):
   6347                 join_index, ilidx, iridx = self.index.join(
-> 6348                     other.index, how=join, level=level, return_indexers=True)
   6349
   6350         if axis is None or axis == 1:

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\indexes\base.py in join(self, other, how, level, return_indexers, sort)
   3229             else:
   3230                 return self._join_multi(other, how=how,
-> 3231                                         return_indexers=return_indexers)
   3232
   3233         # join on the level

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\indexes\base.py in _join_multi(self, other, how, return_indexers)
   3327                              "overlapping names")
   3328         if len(overlap) > 1:
-> 3329             raise NotImplementedError("merging with more than one level "
   3330                                       "overlap on a multi-index is not "
   3331                                       "implemented")

NotImplementedError: merging with more than one level overlap on a multi-index is not implemented

So now I change the name of the mismatched column:

In [5]: df3.index.names = ['AB','CD','EF1']
   ...: df3 + df2
   ...:
Out[5]:
            N
AB CD EF1
A  C  E     0
      F     2
   D  E     4
      F     6
B  C  E     8
      F    10
   D  E    12
      F    14

The example is convoluted, but what is happening in my application is that I have two Series. In my application, the first Series s1 has a 3-level MultiIndex, names are ['a','b','k1'], and the other Series s2 has a 3-level MultiIndex, names are ['a','b','k']. I now need to add them together. That fails (as in the example above) because the level names are different. But if I could do in my example:

df3.rename_axis({'EF' : 'EF1'}) + df2

or

df3 + df2.rename_axis({'EF1':'EF'}) 

I wouldn't have to permanently change the index of df3, which is needed elsewhere with the original index names.

@jorisvandenbossche
Copy link
Member

Jeff, you closed it as a duplicate (which it wasn't after further thought)

@Dr-Irv I am personally in general fine with the idea. But, I don't really like the introduction of a temporary keyword like rename_names=True.
But I was thinking of another way we might get around the backwards compatibility / deprecation. For reindex and rename we harmonized the labels, axis=1/0 idiom and the index=labels, columns=labels idiom (before each function did one of them, now both accept both ways to specify the columns or index to change). rename_axis currently uses the axis keyword, but we could also introduce here the index and columns keywords, and for those we can directly use the new behaviour. So something like df.rename_axis(index={'EF1':'EF'}) could then work.

@jreback the reason I am fine with this is because it feels consistent with how we do things in other places in pandas. Eg rename with a dict let's you rename specific values of the columns/index without needing to specify them all, here it would be very similar but for the columns/index level names.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 8, 2018

@jorisvandenbossche I implemented your suggestion in my pull request #20046

@jreback jreback added this to the 0.24.0 milestone Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants