-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: unstack() does not always sort index in 0.23 #21675
Comments
As mentioned in the first linked comment there's not any explicit ordering going on with an unstack operation and I think it's preferable NOT to do that (i.e. previous behavior was wrong). You as the end user can easily sort after the fact if you wanted to, but wouldn't as easily be able to maintain the original ordering if this operation implicitly sorted for you. I certainly see where the documentation is misleading on that though - any interest in submitting a PR to update the docs and add a test case to ensure this behavior doesn't regress going forward? |
@WillAyd, #15105 asks for a new I suggest that the longstanding and clearly documented behavior (sorted unstack) should be restored, and then #15105 can continue to explore new ideas such as adding an option to not sort. If the default behavior is to be changed, a FutureWarning could be used to help users transition. You have marked this as a Docs issue. But it is a regression in Pandas 0.23, and a functional bug in the code, not a cosmetic one in the docs. |
Hi, @jzwinck @WillAyd
will result in
columns order has been kept consistent, but in pandas 0.23.0 empty columns removing is a part of Unstacker.init()
and here is the issue: this method will place rows and columns in the order of provided labels, e.g.
will result in
rows are not consistent any more because labels[0] starts from 4, but columns looks sorted because labels[1] is sorted. I think that this behaviour is really a regression issue and should be fixed as a bug. Initially this code was introduced in #18460 as a part of bug fix solution. I would like to work on this issue, and I'll appreciate any recommendations. As of now I see the following options:
|
@jreback Would you mind weighing in on @deisdenis's question just above? Should we rollback from using |
a lot has changed since 0.23 |
@jreback The behaviour is still the same as 0.23 on master |
Hi @deisdenis, I have also looked into this issue. It seems that the function descriptor of |
Hi @jiangyue12392, sounds cool, I like your idea! I'll dig into it. |
@deisdenis Did you make any progress? |
Hey @jzwinck , I'm sorry, but I don't think we need to complicate |
Code Sample
Problem description
In Pandas 0.20, 0.21, and 0.22, this gave the expected result:
But in Pandas 0.23, the result is not sorted:
The documentation says "The level involved will automatically get sorted", and while I've seen the explanation of confusing implementation details leaking out in #15105 and some other outright bugs in #9514, this seems to be a different bug, and a regression.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Linux
machine: x86_64
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.1
numpy: 1.13.1
The text was updated successfully, but these errors were encountered: