Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort_index doesn't work after concat #15622

Closed
mwiebusch78 opened this issue Mar 8, 2017 · 3 comments
Closed

sort_index doesn't work after concat #15622

mwiebusch78 opened this issue Mar 8, 2017 · 3 comments

Comments

@mwiebusch78
Copy link

The sort_index method does not seem to work properly if the dataframe was created with concat. See this example:

>>> df = pd.DataFrame(index=['a', 'b'])
>>> pd.concat([df, df], keys=[0.8, 0.5]).sort_index()
Empty DataFrame
Columns: []
Index: [(0.8, a), (0.8, b), (0.5, a), (0.5, b)]

The 0.5 tuples should come before the 0.8 ones. Everything works fine if I create the multi-index from a product:

>>> pd.DataFrame(index=pd.MultiIndex.from_product([[0.8, 0.5], ['a', 'b']]))
Empty DataFrame
Columns: []
Index: [(0.8, a), (0.8, b), (0.5, a), (0.5, b)]
>>> pd.DataFrame(index=pd.MultiIndex.from_product([[0.8, 0.5], ['a', 'b']])).sort_index()
Empty DataFrame
Columns: []
Index: [(0.5, a), (0.5, b), (0.8, a), (0.8, b)]

I'm on pandas version 0.18.1.

@chris-b1
Copy link
Contributor

chris-b1 commented Mar 8, 2017

Thanks for the report. xref #14015

There are two issues here. The first, which is the linked issue above is that sort_index sorts by the ordering of the levels, not necessarily lexicographically. In other words:

df = pd.DataFrame(index=pd.MultiIndex(levels=[[0.8, 0.5], ['a', 'b']], labels=[[0, 1], [0, 1]]))

df.index
Out[105]: 
MultiIndex(levels=[[0.8, 0.5], ['a', 'b']],
           labels=[[0, 1], [0, 1]])


df.sort_index()
Out[106]: 
Empty DataFrame
Columns: []
Index: [(0.8, a), (0.5, b)]

The specific problem here (which could be changed) - is that concat doesn't lexographically sort levels which constructing the resulting MultiIndex. Most other methods of a constructing a mi do sort the levels, so I think that change would be consistent.

@mwiebusch78
Copy link
Author

mwiebusch78 commented Mar 8, 2017

Is there another method which re-orders a level of a multi-index? I've tried sortlevel but that doesn't have an effect either.

>>> df = pd.DataFrame(index=pd.MultiIndex(levels=[[0.8, 0.5], ['a', 'b']], labels=[[0, 1], [0, 1]]))
>>> df.sortlevel(0).index
MultiIndex(levels=[[0.8, 0.5], ['a', 'b']],
           labels=[[0, 1], [0, 1]])

@chris-b1
Copy link
Contributor

chris-b1 commented Mar 8, 2017

You can call .sort_values on the MultiIndex itself and pass to reindex.

df.reindex(index=df.index.sort_values())
Out[156]: 
Empty DataFrame
Columns: []
Index: [(0.5, b), (0.8, a)]

@jreback jreback added this to the 0.20.0 milestone Mar 14, 2017
@jreback jreback closed this as completed in f478e4f Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants