-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn on duplicate names in MI? #19029
Comments
cc @toobaz |
I don't have a strong opinion on this... we could revert and emit a warning, we could revert and abandon the idea of forbidding duplicated names (and solve otherwise the problems exposed in #18872), we could rename My personal preference, at least if we think that our main source of concern should be This said, in the above examples we would probably better serve the user by dropping one of the two levels, which are exact copies. Certainly we can come up with examples which would still fail, but maybe both things are worth implementing together, so that the "black magic" (reset name to |
Hmm, this was supposed to be done for 0.23, but we missed it. I still think it's worthwhile doing for 0.23.1 (cc @guenteru if you have time to make a PR). |
As of version 0.23.0 MultiIndex throws an exception in case it contains duplicated level names. This can happen as a result of various groupby operations (pandas-dev#21075). This commit changes the behavior of groupby slightly: In case there are duplicated names contained in the index these names get suffixed by there corresonding position (i.e. [name,name] => [name0,name1])
I think this is actually an important one to decide upon. The cases we have seen are in my opinion genuine use cases that we should somehow enable (eg the |
Mangling the name like |
Though we allow non-string names for names, so mangling isn't always straightforward. |
Yeah, mangling wouldn't be a very general solution. I'd rather set problematic names to As an alternative, it should be trivial to add an As I stated, I'm not necessarily against re-allowing duplicate names, but on an index with duplicated names, all level selection by names (e.g. ``mi.get_level_values("string_label")'', but also unstacking) should then just error. |
This is certainly fine I think |
moving this to 0.23.2. there are a number of solutions, need to see an implementation. |
And we seem to already do this. At least every code that uses |
So to make this more concrete, I put up a PR for the option to again allow duplicate index level names: #21423 IMO, this is the most sensible thing to do for now on the short term. Alternatives:
|
Any feedback on my last comments here / the PR ? |
Any feedback here? |
Will look at the PR now. |
I agree that in the short-term, re-allowing duplicate names is the best path forward. I think we (I) didn't fully appreciate all the cases that can lead to duplicate names. So a sequence of
seems sensible. |
That sequence seems sensible indeed. I only don't yet really know what "providing ways to avoid getting in a situation with duplicate names" would look like, and if we would find a solution here. |
No objection. I would even dare to say that duplicate index level names are analogous to duplicate elements in axes: not ideal, and we should avoid producing them in our API, but if the user does, fair enough, we will just raise an error any time levels are requested by name. In particular, I don't see a MI with repeated names as more problematic than a MI with no/missing names. |
Closed by #21423 |
Opening a new issue so this isn't lost.
In #18882 banned duplicate names in a MultiIndex. I think this is a good change since allowing duplicates hit a lot of edge cases when you went to actually do something. I want to make sure we understand all the cases that actually produce duplicate names in the MI though, specifically groupby.apply.
Another, more realistic example: groupwise drop_duplicates:
Is it possible to throw a warning on this for now, in case duplicate names are more common than we thought?
The text was updated successfully, but these errors were encountered: