-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn on boolean frame indexer #39373
Conversation
pandas/core/frame.py
Outdated
@@ -3043,6 +3043,11 @@ def __getitem__(self, key): | |||
|
|||
# Do we have a (boolean) DataFrame? | |||
if isinstance(key, DataFrame): | |||
if not (key.index.equals(self.index) and key.columns.equals(self.columns)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure is equals is the right thing to do here.
idx = Index([1, 2, 3])
idx2 = Index([3, 2, 1])
idx.equals(idx2)
This returns False but we can certainly align bot indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that they are alignable. I thought the idea was to not accept alignables anymore. I just wanted to make an exception were alignables are accepted if they are already aligned, so that stuff like df[df > 0]
is still possible. For every unaligned boolean frame indexer the warning refers to where
.
If we are doing this we should be consistent here. Meaning setitem and getitem for Frame and Series. Additionally we have to discuss what to do with the rhs in setitem
Both cases currently align on index, but ignore the column. Edit: Additionally we should document the differences between loc and []. #37516 |
we merged #39403 does this cover this use case? |
No, this fixed a bug but did not touch alignment of missing labels |
@jreback There is no point for me to continue anything if there is no consensus. I want to warn on unaligned indexers (with indexer I mean boolean DataFrame), @phofl does not want to do this because unaligned indexers can be aligned (?) and you want to warn on any indexer. What do I implement now? This is also what I asked in the original pull request as a question:
|
@gooney47 @phofl was fixing a bug. this is a separate issue. I am ok with warning on a dataframe that that is a boolean array, e.g. needs alignment or not. your original example should show the warning. it is possible that other things may break and need fixing. |
Warning in all cases sounds good to me too. |
What to do with tests that throw the new warning? There are |
Also there might be tests that throw this warning, that are only using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test for setitem as well; also would need to catch anywhere we are actually using it now (or rather simply change those tests)
Co-authored-by: Jeff Reback <jeff@reback.net>
I will change the triggering tests on the weekend |
I feel like I made the world a worse place. These changes do not feel good to me, especially because the setitem/getitem indexing feels much more intuitive, than using |
ok i think maybe i pushed you in the wrong direction here. we only want to warn when I think this should significantly narrow the cases where we warn. |
closes BUG: IndexError: positional indexers are out-of-bounds iloc boolean indexing #39004
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry
Is it ok to check index/column to allow things like df[df > 0] or should I just warn on DataFrame in general?
Can I do the same thing for
setitem
as well?Current version warns in 3 already existing tests: