BUG: multi-indexing sorting on axis=1 on >0 levels #14015

WindJunkie · 2016-08-16T21:50:27Z

In a DataFrame with MultiIndex, sorting on the level with date values does not do anything (order remains unchanged). This happens for both row and column indexes. In my case I am starting with a string index, converting that to datetime index, have not tried with datetime values at the start.

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([[1, 2], [6, 7]])
df.columns = pd.MultiIndex.from_tuples([(0, '8/11/2016 12:00:00 AM'), (0, '8/9/2016 12:00:00 AM')], names=['l1', 'Date'])
df.columns.set_levels(df.columns.levels[1].to_datetime(), level=1, inplace=True)
df.sort_index(axis=1, level=1)

Expected Output

Columns ordered by date is the expected output.

Date    2016-08-09 00:00:00 2016-08-11 00:00:00
0   2   1
1   7   6

Actual output are columns in the original sort order (not even lexicographically sorted).

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-08-17T10:54:28Z

xref #13431

jreback · 2016-08-17T10:55:03Z

I seem to remember this exact issue, but can't find ATM.

chris-b1 · 2016-08-17T13:24:47Z

As discussed in some of those issues, MultiIndex sorting is based on the ordering in the levels. The levels are sorted on construction, but not on re-assignment. So a couple workarounds would be to construct the mi with the converted levels, or the index also has a sort_values() method that sorts by values.

In [22]: df.reindex(columns = df.columns.sort_values())
Out[22]: 
l1            0           
Date 2016-08-09 2016-08-11
0             2          1
1             7          6

jorisvandenbossche · 2016-08-17T22:56:07Z

MultiIndex sorting is based on the ordering in the levels.

Is this something we could consider changing?
In my opinion, this behaviour does not make much sense from a user perspective. If you want such behaviour, you can now use explicitly a CategoricalIndex. For most users of MultiIndex, the fact that it is implemented with label/levels (codes/categories) is only an implementation detail.

shoyer · 2016-08-17T23:20:52Z

Rather than basing MultiIndex sorting on something other than sorted levels, what about requiring that each level always be sorted? I believe this is already done by default in every case where pandas constructs the MultiIndex levels, so this would only breaks cases where levels are provided explicitly in the MultiIndex constructor or set using set_levels.

chris-b1 · 2016-08-17T23:24:49Z

That seems fairly reasonable, although the sorting behavior is documented, and I'm sure it would break somebody's code, though probably not too painful to detect/deprecate in 0.19, fix in 0.20 / 1.0?

shoyer · 2016-08-17T23:39:26Z

Yes, assuredly someone relies on the existing behavior, but we could probably deprecate it. In my opinion the fact that levels and labels can be sorted differently is a major source of confusion.

One option with set_level would be to automatically factorize new levels and change the underlying integer codes, too. That's probably not a good idea if someone explicitly wrote MultiIndex(levels, labels), though.

toobaz · 2016-08-21T12:28:28Z

@chris-b1 : are you referring to the sentence "the present implementation of MultiIndex requires that the labels be sorted for some of the slicing"? Actually, I think most people (including me, until few minutes ago) interpret "the labels be sorted" as "the MultiIndex be sorted", not "the initialization arrays for .levels be sorted". And if instead they give the second interpretation, well then they should already be following the design proposed by @shoyer .

So if I'm not missing anything, it would be possible and great to have already in 0.19 the following behaviour: for each component of .levels passed at initialization, if it is not sorted, sort it, rearranging the correspondent component of .labels.

By the way: another case in which bad things currently happen is in a .join() of unsorted dataframes:

In [1]: import pandas as pd
   ...: labels = ['c', 'b']
   ...: comp = pd.DataFrame(index=pd.MultiIndex.from_product([labels, labels],
   ...:                                names =['uid', 'oth']))
   ...: idists = pd.Series(0, index=labels, name='Charles')
   ...: idists.index.name = 'uid'
   ...: cc = comp.join(idists, how='inner').sort_index()
   ...: len(cc.index.levels), cc.index.lexsort_depth, cc.index.is_monotonic
   ...: 
Out[1]: (2, 0, True)

chris-b1 · 2016-08-21T12:54:32Z

I meant this note at the end of paragraph, though I agree that isn't super clear either.

... labels are grouped and sorted by the original ordering of the associated factor at that level. Note that this does not necessarily mean the labels will be sorted lexicographically!

jorisvandenbossche · 2016-08-21T14:01:15Z

Yes, it is rather hidden in the docs that sorting does sort according to the order of the levels.

I am +1 to change the behaviour of sort to actually sort. But, if we do this by sorting the levels (on initialization, or when sorting), how many people would rely on the actual order of the levels? Eg if you use set_levels, you implicitly rely on the order ...

shoyer · 2016-08-22T00:57:11Z

Looking at the MultiIndex docs, it looks like sort_index() was originally written to ensure that a MultiIndex is "lexsorted" in the way that MultiIndex needs for efficient operations (sorted integer labels a.k.a. codes #13443). I would be much happier using something more explicit like sort_index_codes() for that, though, and reserving sort_index() for actually sorting the index.

I'm less certain now that it's the right thing to always require that levels be sorted. There are some cases where this lets you do different types of indexing efficiently, and right now we expose most of the MultiIndex implementation directly as public API, so people are indeed probably making use of this.

toobaz · 2016-08-23T11:38:42Z

While I understand that both behaviours can be useful, to my eyes it is a bit unnecessary to provide another way (than Categoricals) to impose an order on labels in a level. I'd rather have nice parameters that, for instance, create the required Categoricals on the fly when creating a MultiIndex.from, if a specific ordering is desired.

But if supporting the two different behaviours is considered worth the effort, then maybe a parameter (e.g. codes_sort=False) in sort_index() is sufficient, rather than adding a new method?

What I strongly agree on is that the default behaviour should be changed, despite the backwards incompatibility.

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 xref pandas-dev#13431

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Difficulty Advanced labels Aug 17, 2016

jreback added this to the Next Major Release milestone Aug 17, 2016

jreback changed the title ~~Pandas does not sort on date values in multiindex~~ BUG: multi-indexing sorting on axis=1 on >0 levels Aug 17, 2016

shoyer mentioned this issue Aug 22, 2016

WIP: add sort_levels to MultiIndex.from_product #14062

Closed

3 tasks

jorisvandenbossche modified the milestones: 0.20.0, Next Major Release Aug 30, 2016

shoyer mentioned this issue Nov 16, 2016

Add an option not to sort levels in MultiIndex.from_product? #14672

Open

jorisvandenbossche mentioned this issue Nov 28, 2016

ENH: support kind and na_position kwargs in Series.sort_index #14445

Merged

brandonmburroughs mentioned this issue Dec 1, 2016

na_position doesn't work for sort_index() with MultiIndex #14784

Closed

jreback mentioned this issue Jan 9, 2017

When running set_index on a categorical to a MultiIndex, it gets coerced to a string. #15058

Closed

jorisvandenbossche mentioned this issue Feb 9, 2017

sort_index(axis=1) doesn't sort integers #15355

Closed

chris-b1 mentioned this issue Mar 8, 2017

sort_index doesn't work after concat #15622

Closed

jreback added a commit to jreback/pandas that referenced this issue Mar 15, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

5155ebb

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015

jreback mentioned this issue Mar 15, 2017

BUG: DataFrame.sort_index broken if not both lexsorted and monotonic in levels #15694

Closed

jreback added a commit to jreback/pandas that referenced this issue Mar 15, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

e7c0c14

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 xref pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 15, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

72bc7d0

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 xref pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 16, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

698e05f

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 16, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

a6f352c

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 16, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

54c6e93

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 16, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

1a9be09

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 16, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

685dadc

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 16, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

efaf233

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 19, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

dd85330

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 22, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

933710c

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 22, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

53c7e6c

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 22, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

083b4b0

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 23, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

7784ec0

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Mar 25, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

09f842d

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Apr 2, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

ae3777e

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Apr 4, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

84c8999

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Apr 4, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

53b844d

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Apr 6, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

a1390ce

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback added a commit to jreback/pandas that referenced this issue Apr 7, 2017

BUG: construct MultiIndex identically from levels/labels when concatting

47c67d6

closes pandas-dev#15622 closes pandas-dev#15687 closes pandas-dev#14015 closes pandas-dev#13431

jreback closed this as completed in f478e4f Apr 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: multi-indexing sorting on axis=1 on >0 levels #14015

BUG: multi-indexing sorting on axis=1 on >0 levels #14015

WindJunkie commented Aug 16, 2016

jreback commented Aug 17, 2016

jreback commented Aug 17, 2016

chris-b1 commented Aug 17, 2016 •

edited

Loading

jorisvandenbossche commented Aug 17, 2016

shoyer commented Aug 17, 2016

chris-b1 commented Aug 17, 2016

shoyer commented Aug 17, 2016 •

edited

Loading

toobaz commented Aug 21, 2016

chris-b1 commented Aug 21, 2016

jorisvandenbossche commented Aug 21, 2016

shoyer commented Aug 22, 2016

toobaz commented Aug 23, 2016

BUG: multi-indexing sorting on axis=1 on >0 levels #14015

BUG: multi-indexing sorting on axis=1 on >0 levels #14015

Comments

WindJunkie commented Aug 16, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Aug 17, 2016

jreback commented Aug 17, 2016

chris-b1 commented Aug 17, 2016 • edited Loading

jorisvandenbossche commented Aug 17, 2016

shoyer commented Aug 17, 2016

chris-b1 commented Aug 17, 2016

shoyer commented Aug 17, 2016 • edited Loading

toobaz commented Aug 21, 2016

chris-b1 commented Aug 21, 2016

jorisvandenbossche commented Aug 21, 2016

shoyer commented Aug 22, 2016

toobaz commented Aug 23, 2016

output of `pd.show_versions()`

chris-b1 commented Aug 17, 2016 •

edited

Loading

shoyer commented Aug 17, 2016 •

edited

Loading