Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: change sort behavior in stack() so it's user-directed #35343

Closed
pmberkeley opened this issue Jul 19, 2020 · 11 comments
Closed

ENH: change sort behavior in stack() so it's user-directed #35343

pmberkeley opened this issue Jul 19, 2020 · 11 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@pmberkeley
Copy link

pmberkeley commented Jul 19, 2020

this = this.sort_index(level=level_to_sort, axis=1)

Is your feature request related to a problem?
I need a multiindex dataframe to stack in a specific order. I also need the columns to fail the this.columns.is_lexsorted() test (the duplicated column names are how I'm merging the data while stacking it, as a workaround to not being able to get a linspace result out of the rolling method). Currently, the sort_index method is causing the dataframe to reorder alphabetically, which is not the order I need it to stack in.

Describe the solution you'd like
One of the following (preferably not the last):

  1. be able to choose the index/column order by passing in a list or set of list
  2. be able to choose to keep the input order of the stacking indexes/columns (or to have it default to keeping the input order; this seems the easiest)
  3. receive a warning or be provided with documentation that indicates all indexes/columns will be reordered alphabetically if they have any duplicated labels, and to name their secondary sorting indexes/columns accordingly.

API breaking implications
Unsure about option 1, but the default suggested in option 2 should be fine.

Describe alternatives you've considered
This is already a workaround for the rolling method not working with non-scalar outputs. I'll be renaming the columns in the short term, but this seems hacky.

Additional context
This issues is closely related to the functionality of rolling. The stack method is being used/suggested as a workaround for the inability to use rolling to output linear results. MultiIndex is the pandas way of dealing with additional dimensionality of data; when the rolling method doesn't play nice with adding a dimension to the data set, the user then ends up trying to recreate a rolling method equivalent via stack (and other means); in order for stack and other methods to work well in this context, sorting behavior has to be explicit.

@pmberkeley pmberkeley changed the title Unpredictable/counterintuitive stack() behavior caused by sort_index() ENH: change sort behavior in stack() so it's user-directed or more intuitive Jul 19, 2020
@pmberkeley pmberkeley changed the title ENH: change sort behavior in stack() so it's user-directed or more intuitive ENH: change sort behavior in stack() so it's user-directed Jul 19, 2020
@jreback
Copy link
Contributor

jreback commented Jul 19, 2020

pls show an example that reproduces

@pmberkeley
Copy link
Author

pmberkeley commented Jul 19, 2020

import pandas as pd

B = pd.DataFrame()
B['x'] = [0., 0.1, 0.2]
B['y'] = [0., 0.375, 0.75]

A = pd.DataFrame()
A['x'] = [0.05, 0.15]
A['y'] = [0.1875, 0.5625]


BA = pd.concat([B,A],axis=1)
col = pd.MultiIndex(levels=[['B','A'],['x','y']],codes=[[0,0,1,1],[0,1,0,1]])
BA.columns = col

AB = BA.stack(level=0) #as can be seen, it sorts/stacks in AB order rather than desired BA order

Output:

BA 
   B            A        
     x      y     x       y
0  0.0  0.000  0.05  0.1875
1  0.1  0.375  0.15  0.5625
2  0.2  0.750   NaN     NaN

AB

        x       y
0 A  0.05  0.1875
  B  0.00  0.0000
1 A  0.15  0.5625
  B  0.10  0.3750
2 B  0.20  0.7500

@jreback
Copy link
Contributor

jreback commented Jul 19, 2020

pls show versions as instructed

@pmberkeley
Copy link
Author

I opened this request from quoting the line in code. I didn't get any instructions, I just copied other people's format that I saw.

The version is quoted above, in the link to the source code. It's pretty clearly described how df.sort_index() works, and sort_index() is used in the source code linked to above.

@jreback
Copy link
Contributor

jreback commented Jul 19, 2020

there are pretty clear instructions when u open an issue

w/o showing versions
this is not very informative

@simonjayhawkins
Copy link
Member

Thanks @pmberkeley for the report. One of the properties of a DataFrame is implicit ordering.

from https://arxiv.org/abs/2001.00888

an intuitive data model that embraces an implicit ordering on both rows and columns and treats them symmetrically

so I would regard this as a bug rather than an enhancement.

I think the OP is more detailed than necessary, maybe just a small example of stack changing the order of the index elements.

@pmberkeley
Copy link
Author

@simonjayhawkins thanks for the info! I was surprised by the behavior, but didn't know that implicit ordering was the expectation. Do I need to change the title to "BUG" (or whatever it is supposed to be)?

@pmberkeley
Copy link
Author

@jreback I recommend you go to the source code and try out the github functionality that lets you open an issue directly from a line of code. It provides you with zero instructions.

@pmberkeley
Copy link
Author

This seems to be related to #18265 and #9514, in that the fixes proposed for these appear to be causing the broken behavior.

@simonjayhawkins
Copy link
Member

I think this issue is covered by #15105, so closing as duplicate. lmk if I misunderstood something.

@simonjayhawkins simonjayhawkins added Bug Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 19, 2020
@pmberkeley
Copy link
Author

@simonjayhawkins nope, that's exactly it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants