Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on boolean frame indexer #39373

Closed
wants to merge 10 commits into from
Closed

Conversation

gooney47
Copy link

Current version warns in 3 already existing tests:

  • pandas/tests/frame/indexing/test_where.py::TestDataFrameIndexingWhere::test_where_datetime
  • pandas/tests/frame/test_nonunique_indexes.py::TestDataFrameNonuniqueIndexes::test_getitem_boolean_frame_unaligned_with_duplicate_columns
  • pandas/tests/indexing/test_indexing.py::TestMisc::test_partial_boolean_frame_indexing

@@ -3043,6 +3043,11 @@ def __getitem__(self, key):

# Do we have a (boolean) DataFrame?
if isinstance(key, DataFrame):
if not (key.index.equals(self.index) and key.columns.equals(self.columns)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure is equals is the right thing to do here.

idx = Index([1, 2, 3])

idx2 = Index([3, 2, 1])

idx.equals(idx2)

This returns False but we can certainly align bot indexes

Copy link
Author

@gooney47 gooney47 Jan 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they are alignable. I thought the idea was to not accept alignables anymore. I just wanted to make an exception were alignables are accepted if they are already aligned, so that stuff like df[df > 0] is still possible. For every unaligned boolean frame indexer the warning refers to where.

@phofl
Copy link
Member

phofl commented Jan 24, 2021

If we are doing this we should be consistent here. Meaning setitem and getitem for Frame and Series.

Additionally we have to discuss what to do with the rhs in setitem

df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

rhs = DataFrame({"b": [7, 8, 9]}, index=Index([1, 2, 3]))
df[["b"]] = rhs
rhs = DataFrame({"c": [70, 80, 90]}, index=Index([1, 2, 3]))
df[["b"]] = rhs

Both cases currently align on index, but ignore the column.

Edit: Additionally we should document the differences between loc and []. #37516

@jreback
Copy link
Contributor

jreback commented Feb 2, 2021

we merged #39403

does this cover this use case?

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Feb 2, 2021
@phofl
Copy link
Member

phofl commented Feb 2, 2021

No, this fixed a bug but did not touch alignment of missing labels

@jreback
Copy link
Contributor

jreback commented Feb 3, 2021

thanks @phofl

@gooney47 if you can revise according the comments and merge master

@gooney47
Copy link
Author

gooney47 commented Feb 6, 2021

@jreback There is no point for me to continue anything if there is no consensus. I want to warn on unaligned indexers (with indexer I mean boolean DataFrame), @phofl does not want to do this because unaligned indexers can be aligned (?) and you want to warn on any indexer. What do I implement now?

This is also what I asked in the original pull request as a question:

Is it ok to check index/column to allow things like df[df > 0] or should I just warn on DataFrame in general?

@jreback
Copy link
Contributor

jreback commented Feb 7, 2021

@jreback There is no point for me to continue anything if there is no consensus. I want to warn on unaligned indexers (with indexer I mean boolean DataFrame), @phofl does not want to do this because unaligned indexers can be aligned (?) and you want to warn on any indexer. What do I implement now?

This is also what I asked in the original pull request as a question:

Is it ok to check index/column to allow things like df[df > 0] or should I just warn on DataFrame in general?

@gooney47 @phofl was fixing a bug. this is a separate issue. I am ok with warning on a dataframe that that is a boolean array, e.g. needs alignment or not. your original example should show the warning. it is possible that other things may break and need fixing.

@phofl
Copy link
Member

phofl commented Feb 7, 2021

Warning in all cases sounds good to me too.

@gooney47
Copy link
Author

gooney47 commented Feb 8, 2021

What to do with tests that throw the new warning?

There are getitem tests that throw the new warning. The question is if they test actual behavior of getitem or of where. If we are lucky there is no actual behavior of getitem in the case of a DataFrame, which would mean all these tests actually are proper where tests.

@gooney47
Copy link
Author

Also there might be tests that throw this warning, that are only using getitem/where to test some other functionality in which case there is no need to rename the test.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test for setitem as well; also would need to catch anywhere we are actually using it now (or rather simply change those tests)

pandas/core/frame.py Outdated Show resolved Hide resolved
@gooney47
Copy link
Author

I will change the triggering tests on the weekend

@gooney47 gooney47 changed the title Warn on unaligned boolean frame indexer Warn on boolean frame indexer Feb 12, 2021
@gooney47
Copy link
Author

gooney47 commented Feb 14, 2021

I feel like I made the world a worse place. These changes do not feel good to me, especially because the setitem/getitem indexing feels much more intuitive, than using where and mask. Theoretically you could only use one of them (where or mask), but then you would have to deal with negation of the mask which obfuscates things. If you're on the other hand using both you have to learn two new functions, while people already are familiar with getitem and setitem from all kinds of other objects like lists or numpy arrays. To give example in code: df[mask] translates to either df.where(mask) or df.mask(~mask), while df[mask] = val tanslates to either df.mask(mask, val, True) or df.where(~mask, val, True).

@jreback
Copy link
Contributor

jreback commented Feb 15, 2021

@gooney47

ok i think maybe i pushed you in the wrong direction here. we only want to warn when .iloc is used and a DataFrame indexer. using getitem / .loc are a-ok because these by-definition do alignment (and they are of course convenient). Note we should do this for both .iloc setting and getting.

I think this should significantly narrow the cases where we warn.

@gooney47
Copy link
Author

@jreback Ah ok, we had this already here: #39022

@jreback
Copy link
Contributor

jreback commented Feb 15, 2021

@jreback Ah ok, we had this already here: #39022

ok let's talk about this case there.

@gooney47 gooney47 closed this Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: IndexError: positional indexers are out-of-bounds iloc boolean indexing
3 participants