BUG: Regression in .loc accepting a boolean Index as an indexer #17738

jreback · 2017-10-02T10:32:10Z

codecov · 2017-10-02T11:58:59Z

Codecov Report

Merging #17738 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17738      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49813    49813              
==========================================
- Hits        45442    45433       -9     
- Misses       4371     4380       +9

Flag	Coverage Δ
#multiple	`89% <100%> (ø)`	⬆️
#single	`40.33% <100%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/common.py	`91.42% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.73% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf5b089...1d63701. Read the comment docs.

closes #17131

jorisvandenbossche · 2017-10-02T12:08:49Z

@jreback This is not a regression bug fix, but an backwards incompatible API change.

Not saying that we shouldn't do this, but that we have to take a bit more care to think about it (and possibly deprecate, although given that the previous behaviour of using it as labels was broken in the latest release, maybe not needed), and certainly note it in the whatsnew as such.

jorisvandenbossche · 2017-10-02T12:17:18Z

Actually, it was only broken in master, not in the latest release.

So to clarify what I mean, before the boolean values were interpreted as labels:

In [53]: pd.Series([1, 2, 3]).loc[pd.Index([True, False, True])]
Out[53]: 
1    2
0    1
1    2
dtype: int64


In [54]: pd.__version__
Out[54]: '0.20.3'

which is not useful, so I agree changing this is a good idea.
But the question is then what to do with booleans in the index? Shouldn't this be kept label-based rather than boolean mask ?

jreback · 2017-10-02T12:26:28Z

@jorisvandenbossche oh I though this was broken in 0.20.3; its a regression none the less.

your example is unambiguous. an Index of booleans is de-facto the same as a boolean ndarray, it has no other interpretation that as a boolean indexer (a boolean Series is completely different of course).

I'll remove the whatsnew note then.

jreback · 2017-10-02T12:26:56Z

I think some refactoring broke this (and was not explicity tested).

jreback · 2017-10-02T12:36:35Z

@jorisvandenbossche actually looking at your example [53] again, I think that is completely broken and not expected behavior at all. A boolean indexer by-definition is not label based. I guess could add a note about that.

jreback · 2017-10-02T12:37:38Z

@jorisvandenbossche actually, that is even moar broken [53]. Its interpreting the bools as labels, again completely wrong.

jorisvandenbossche · 2017-10-02T12:43:47Z

As I said, I agree the behaviour was kind of 'broken' (or at least strange) and good to fix, but still I wouldn't call this a regression fix (it has been like that for a long time). Although I can agree calling it a 'bug fix' for the integer case.

But for the boolean index case it is really a change in behaviour, because there the original behaviour made some sense:

In [58]: pd.__version__
Out[58]: '0.20.3'

In [59]: pd.Series([1, 2, 3], index=[False, True, False]).loc[pd.Index([True, False, True])]
Out[59]: 
True     2
False    1
False    3
True     2
dtype: int64

In [2]: pd.__version__
Out[2]: '0.21.0.dev+560.ga607872'

In [3]: pd.Series([1, 2, 3], index=[False, True, False]).loc[pd.Index([True, False, True])]
Out[3]: 
False    1
False    3
dtype: int64

jreback · 2017-10-02T12:44:50Z

@jorisvandenbossche ok I'll add a note. We don't have first class support for boolean index anyhow (esp when its duplicated its really weird).

jorisvandenbossche · 2017-10-02T12:47:12Z

But we could also keep the behaviour (so not use boolean indexing when the index (inferred) type is bool)

* 'master' of github.com:pandas-dev/pandas: (188 commits) Separate out _convert_datetime_to_tsobject (pandas-dev#17715) DOC: remove whatsnew note for xref pandas-dev#17131 BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738) DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691) CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735) Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736) TST: add backward compat for offset testing for pickles (pandas-dev#17733) remove unused time conversion funcs (pandas-dev#17711) DEPR: Deprecate convert parameter in take (pandas-dev#17352) BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587) BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153) BUG: Fix unexpected sort in groupby (pandas-dev#17621) DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731) BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654) DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675) Doc improvements for IntervalIndex and Interval (pandas-dev#17714) BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly Last of the timezones funcs (pandas-dev#17669) Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712) update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713) ...

jreback · 2017-10-02T12:50:33Z

But we could also keep the behaviour (so not use boolean indexing when the index (inferred) type is bool)

this is very awkward, an Index is virtually a ndarray except works with all types, for it not to work 'as-expected' in indexing would be odd (and enabling instead for a special case of selecting out a non-supported boolean Index with True/False values)

…as-dev#17738) closes pandas-dev#17131

BUG: Regression in .loc accepting a boolean Index as an indexer

1d63701

closes pandas-dev#17131

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 2, 2017

jreback added this to the 0.21.0 milestone Oct 2, 2017

jreback mentioned this pull request Oct 2, 2017

BUG: boolean masking with Index objects #17219

Closed

jreback added a commit that referenced this pull request Oct 2, 2017

BUG: Regression in .loc accepting a boolean Index as an indexer (#17738)

a607872

closes #17131

jreback merged commit a607872 into pandas-dev:master Oct 2, 2017

jreback mentioned this pull request Oct 2, 2017

DOC: note for changes in #17738, .loc with a boolean Index #17740

Closed

jorisvandenbossche mentioned this pull request Oct 2, 2017

DEPR: deprecate .select() #17633

Merged

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

BUG: Regression in .loc accepting a boolean Index as an indexer (pand…

e512d8b

…as-dev#17738) closes pandas-dev#17131

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BUG: Regression in .loc accepting a boolean Index as an indexer (pand…

cc827fe

…as-dev#17738) closes pandas-dev#17131

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Regression in .loc accepting a boolean Index as an indexer #17738

BUG: Regression in .loc accepting a boolean Index as an indexer #17738

jreback commented Oct 2, 2017

codecov bot commented Oct 2, 2017 •

edited

Loading

jorisvandenbossche commented Oct 2, 2017

jorisvandenbossche commented Oct 2, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

jorisvandenbossche commented Oct 2, 2017

jreback commented Oct 2, 2017

jorisvandenbossche commented Oct 2, 2017

jreback commented Oct 2, 2017

BUG: Regression in .loc accepting a boolean Index as an indexer #17738

BUG: Regression in .loc accepting a boolean Index as an indexer #17738

Conversation

jreback commented Oct 2, 2017

codecov bot commented Oct 2, 2017 • edited Loading

Codecov Report

jorisvandenbossche commented Oct 2, 2017

jorisvandenbossche commented Oct 2, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

jreback commented Oct 2, 2017

jorisvandenbossche commented Oct 2, 2017

jreback commented Oct 2, 2017

jorisvandenbossche commented Oct 2, 2017

jreback commented Oct 2, 2017

codecov bot commented Oct 2, 2017 •

edited

Loading