-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing Unused Levels in MultiIndex with NaN values corrupts Index #18616
Comments
cc @toobaz anything in here we should add to tests? |
@jlandercy note that we never have |
Yes I guess dealing with So, if I understand when I ship v0.22.0 the problem will vanish. Thank you for your work, Pandas is a great tool. |
In principle, NaNs in indexes should behave just like normal values with respect to comparison (differently form NaNs in values). However this is currently affected by #18455 (which should be fixed soon) for flat
I don't think so... the case of the MVCE above (with no unused levels in the input) is already covered. |
Minimal Verifiable Complete Exemple
Below a MVCE of the behavior:
Problem description
Using method
remove_unused_levels
on MultiIndex containingNaN
create a new MultiIndex that is not equal to the original as documentation says:This is why I suspect it is a bug.
Float Index
Single
Index
usesNaN
as modality:But,
MultiIndex
does not, it has negative modality index instead:MultiIndex corruption
When refreshed,
NaN
values point to a copy of the lastfloat
modality (here300.0
) of the level, this lead to a kind of corrupted index because those auto-filled value do not have any meaning.As a consequence Index are not equal (which contradicts documentation):
Even worse, original value (
300.0
is not referenced anymore), and then it is a unused value/modality in the newly generated index.To confirm it, lets apply the method twice, we get:
Expected Output
I believe expected output of
set_index
andremove_unused_levels
should be:The problem also occurs when rows are removed from the DataFrame, and then it makes sense to use the method
remove_unused_levels
to clean up index. Anyway, when building the MCVE I found it was working on the whole Index whatever the level order.Pandas Versions
The text was updated successfully, but these errors were encountered: