Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression in .loc accepting a boolean Index as an indexer #17738

Merged
merged 1 commit into from
Oct 2, 2017

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 2, 2017

closes #17131

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 2, 2017
@jreback jreback added this to the 0.21.0 milestone Oct 2, 2017
@codecov
Copy link

codecov bot commented Oct 2, 2017

Codecov Report

Merging #17738 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17738      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49813    49813              
==========================================
- Hits        45442    45433       -9     
- Misses       4371     4380       +9
Flag Coverage Δ
#multiple 89% <100%> (ø) ⬆️
#single 40.33% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/common.py 91.42% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf5b089...1d63701. Read the comment docs.

@jreback jreback merged commit a607872 into pandas-dev:master Oct 2, 2017
@jorisvandenbossche
Copy link
Member

@jreback This is not a regression bug fix, but an backwards incompatible API change.

Not saying that we shouldn't do this, but that we have to take a bit more care to think about it (and possibly deprecate, although given that the previous behaviour of using it as labels was broken in the latest release, maybe not needed), and certainly note it in the whatsnew as such.

@jorisvandenbossche
Copy link
Member

Actually, it was only broken in master, not in the latest release.

So to clarify what I mean, before the boolean values were interpreted as labels:

In [53]: pd.Series([1, 2, 3]).loc[pd.Index([True, False, True])]
Out[53]: 
1    2
0    1
1    2
dtype: int64


In [54]: pd.__version__
Out[54]: '0.20.3'

which is not useful, so I agree changing this is a good idea.
But the question is then what to do with booleans in the index? Shouldn't this be kept label-based rather than boolean mask ?

@jreback
Copy link
Contributor Author

jreback commented Oct 2, 2017

@jorisvandenbossche oh I though this was broken in 0.20.3; its a regression none the less.

your example is unambiguous. an Index of booleans is de-facto the same as a boolean ndarray, it has no other interpretation that as a boolean indexer (a boolean Series is completely different of course).

I'll remove the whatsnew note then.

@jreback
Copy link
Contributor Author

jreback commented Oct 2, 2017

I think some refactoring broke this (and was not explicity tested).

@jreback
Copy link
Contributor Author

jreback commented Oct 2, 2017

@jorisvandenbossche actually looking at your example [53] again, I think that is completely broken and not expected behavior at all. A boolean indexer by-definition is not label based. I guess could add a note about that.

@jreback
Copy link
Contributor Author

jreback commented Oct 2, 2017

@jorisvandenbossche actually, that is even moar broken [53]. Its interpreting the bools as labels, again completely wrong.

@jorisvandenbossche
Copy link
Member

As I said, I agree the behaviour was kind of 'broken' (or at least strange) and good to fix, but still I wouldn't call this a regression fix (it has been like that for a long time). Although I can agree calling it a 'bug fix' for the integer case.

But for the boolean index case it is really a change in behaviour, because there the original behaviour made some sense:

In [58]: pd.__version__
Out[58]: '0.20.3'

In [59]: pd.Series([1, 2, 3], index=[False, True, False]).loc[pd.Index([True, False, True])]
Out[59]: 
True     2
False    1
False    3
True     2
dtype: int64
In [2]: pd.__version__
Out[2]: '0.21.0.dev+560.ga607872'

In [3]: pd.Series([1, 2, 3], index=[False, True, False]).loc[pd.Index([True, False, True])]
Out[3]: 
False    1
False    3
dtype: int64

@jreback
Copy link
Contributor Author

jreback commented Oct 2, 2017

@jorisvandenbossche ok I'll add a note. We don't have first class support for boolean index anyhow (esp when its duplicated its really weird).

@jorisvandenbossche
Copy link
Member

But we could also keep the behaviour (so not use boolean indexing when the index (inferred) type is bool)

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 2, 2017
* 'master' of github.com:pandas-dev/pandas: (188 commits)
  Separate out _convert_datetime_to_tsobject (pandas-dev#17715)
  DOC: remove whatsnew note for xref pandas-dev#17131
  BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738)
  DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691)
  CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735)
  Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736)
  TST: add backward compat for offset testing for pickles (pandas-dev#17733)
  remove unused time conversion funcs (pandas-dev#17711)
  DEPR: Deprecate convert parameter in take (pandas-dev#17352)
  BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587)
  BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153)
  BUG: Fix unexpected sort in groupby (pandas-dev#17621)
  DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731)
  BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654)
  DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675)
  Doc improvements for IntervalIndex and Interval (pandas-dev#17714)
  BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly
  Last of the timezones funcs (pandas-dev#17669)
  Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712)
  update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713)
  ...
@jreback
Copy link
Contributor Author

jreback commented Oct 2, 2017

But we could also keep the behaviour (so not use boolean indexing when the index (inferred) type is bool)

this is very awkward, an Index is virtually a ndarray except works with all types, for it not to work 'as-expected' in indexing would be odd (and enabling instead for a special case of selecting out a non-supported boolean Index with True/False values)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Minor Bug: Boolean masking not working for Index objects
2 participants