Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort_index(axis=1) doesn't sort integers #15355

Closed
ohadle opened this issue Feb 9, 2017 · 3 comments
Closed

sort_index(axis=1) doesn't sort integers #15355

ohadle opened this issue Feb 9, 2017 · 3 comments

Comments

@ohadle
Copy link

ohadle commented Feb 9, 2017

Original:

In [45]: a
Out[45]:
            commits
                 -1       -3       -2
auth_date
2013-01-04  42217.0  19804.0  37007.0
2013-01-05  44793.0  37007.0  42217.0

In [46]: a.sort_index(axis=1)
Out[46]:
            commits
                 -1       -3       -2
auth_date
2013-01-04  42217.0  19804.0  37007.0
2013-01-05  44793.0  37007.0  42217.0

In [47]: a.sort_index(axis=1, level=1)
Out[47]:
            commits
                 -1       -3       -2
auth_date
2013-01-04  42217.0  19804.0  37007.0
2013-01-05  44793.0  37007.0  42217.0

In [48]: a.sort_index(axis=1, ascending=False)
Out[48]:
            commits
                 -2       -3       -1
auth_date
2013-01-04  37007.0  19804.0  42217.0
2013-01-05  42217.0  37007.0  44793.0

However I couldn't clearly recreate with a copy-pastable version:

In [49]: print a.to_dict()
{('commits', -3): {Timestamp('2013-01-05 00:00:00', freq='D'): 37007.0, Timestamp('2013-01-04 00:00:00', freq='D'): 19804.0}, ('commits', -1): {Timestamp('2013-01-05 00:00:00', freq='D'): 44793.0, Timestamp('2013-01-04 00:00:00', freq='D'): 42217.0}, ('commits', -2): {Timestamp('2013-01-05 00:00:00', freq='D'): 42217.0, Timestamp('2013-01-04 00:00:00', freq='D'): 37007.0}}

In [50]: from pandas import Timestamp, DataFrame

In [51]: b = DataFrame({('commits', -3): {Timestamp('2013-01-05 00:00:00', freq='D'): 37007.0, Timestamp('2013-01-04 00:00:00', freq='D'): 19804.0}, ('commits', -1): {Timestamp('2013-01-05 00:00:00', fr
    ...: eq='D'): 44793.0, Timestamp('2013-01-04 00:00:00', freq='D'): 42217.0}, ('commits', -2): {Timestamp('2013-01-05 00:00:00', freq='D'): 42217.0, Timestamp('2013-01-04 00:00:00', freq='D'): 37007.
    ...: 0}})

In [52]: b
Out[52]:
            commits
                 -3       -2       -1
2013-01-04  19804.0  37007.0  42217.0
2013-01-05  37007.0  42217.0  44793.0

In [54]: b.sort_index(axis=1, ascending=False)
Out[54]:
            commits
                 -1       -2       -3
2013-01-04  42217.0  37007.0  19804.0
2013-01-05  44793.0  42217.0  37007.0

Problem description

Well, I expected sorting of the MultiIndex columns.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 16.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 28.6.1
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.9.1
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

Can you show the output of a.columns? I suspect this is due to the order of the labels in the levels (and sort_index uses that to sort)

@jorisvandenbossche jorisvandenbossche added the Needs Info Clarification about behavior needed to assess issue label Feb 9, 2017
@ohadle
Copy link
Author

ohadle commented Feb 9, 2017

Looks like it.

In [16]: a.columns
Out[16]:
MultiIndex(levels=[[u'commits'], [-1, -3, -2]],
           labels=[[0, 0, 0], [0, 1, 2]])

@jorisvandenbossche
Copy link
Member

Indeed. We have already a few issues about this, and we should discuss more what we want to do with this (because this can be really confusing / unexpected).

Some discussion is here: #14015, #13431, #14672

@jorisvandenbossche jorisvandenbossche added MultiIndex and removed Needs Info Clarification about behavior needed to assess issue labels Feb 9, 2017
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Feb 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants