Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't slice float indices, but can slice integer indices? #7501

Closed
mrjbq7 opened this issue Jun 18, 2014 · 94 comments · Fixed by #45324
Closed

Can't slice float indices, but can slice integer indices? #7501

mrjbq7 opened this issue Jun 18, 2014 · 94 comments · Fixed by #45324
Labels
Bug Docs Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@mrjbq7
Copy link

mrjbq7 commented Jun 18, 2014

Using pandas 0.14, the slicing changed strangely. I can slice rows using start/end with an integer index, but not a float index:

In [18]: df = pandas.DataFrame(np.random.randn(10, 5), index=np.arange(10, 20))

In [19]: df
Out[19]: 
           0         1         2         3         4
10 -1.878123 -0.581537  0.189536  0.173014  0.132059
11  1.229246 -0.988689  0.632404  0.939126 -0.186367
12  0.376735  0.329723  1.480293 -0.209164  0.080897
13  0.461558  0.303541 -0.669196 -1.032077  1.634512
14 -0.972455  0.657357 -0.566609  0.154165 -0.561543
15 -2.502244  1.022540 -1.019376 -0.934582  1.751852
16 -1.875567  0.504288 -0.524922  0.048277 -1.587904
17  0.636652  0.441224 -1.391552  0.650876  0.374673
18 -1.503102  0.822411  1.776667 -0.879583  1.035291
19 -0.620467  0.319855 -0.779280 -0.168827  0.502470

In [20]: df[3:5]
Out[20]: 
           0         1         2         3         4
13  0.461558  0.303541 -0.669196 -1.032077  1.634512
14 -0.972455  0.657357 -0.566609  0.154165 -0.561543

In [21]: df.index = [float(x) for x in df.index]

In [22]: df
Out[22]: 
           0         1         2         3         4
10 -1.878123 -0.581537  0.189536  0.173014  0.132059
11  1.229246 -0.988689  0.632404  0.939126 -0.186367
12  0.376735  0.329723  1.480293 -0.209164  0.080897
13  0.461558  0.303541 -0.669196 -1.032077  1.634512
14 -0.972455  0.657357 -0.566609  0.154165 -0.561543
15 -2.502244  1.022540 -1.019376 -0.934582  1.751852
16 -1.875567  0.504288 -0.524922  0.048277 -1.587904
17  0.636652  0.441224 -1.391552  0.650876  0.374673
18 -1.503102  0.822411  1.776667 -0.879583  1.035291
19 -0.620467  0.319855 -0.779280 -0.168827  0.502470

In [23]: df[3:5]
Out[23]: 
Empty DataFrame
Columns: [0, 1, 2, 3, 4]
Index: []

In [24]: df.index
Out[24]: Float64Index([10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0], dtype='float64')

In [25]: pandas.__version__
Out[25]: '0.14.0'

Was this an intentional change, or a bug?

@hayd
Copy link
Contributor

hayd commented Jun 18, 2014

Is this supposed to act like loc or ix, or something else?

Behaviour of In [20]: df[3:5] seems so wrong/unstable!

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

[] is supposed to act like ix IIRC

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

In which case, this is a bug.

@mrjbq7
Copy link
Author

mrjbq7 commented Jun 18, 2014

If you're suggesting slicing by rows is undesirable, what is the new way to do that?

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

You should use loc for label-based indexing and use iloc for position based indexing

@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

@mrjbq7 You are slicing the columns

df.loc[10:12] works just fine, or df.iloc[3:5] as well

or df.loc[10.0:12.0]

@mrjbq7
Copy link
Author

mrjbq7 commented Jun 18, 2014

@jreback the In[20] shows the results of slicing rows, however.

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

@jreback he's slicing the rows

@mrjbq7
Copy link
Author

mrjbq7 commented Jun 18, 2014

Okay, I can use iloc, however the behavior still seems buggy and inconsistent! :)

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

it's definitely a bug, probably introduced by the small (ish) refactoring of Float64Index to use the float64 dtype instead of object.

@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

In[20] has always been that way, its a fallback indexing

@cpcloud cpcloud added this to the 0.14.1 milestone Jun 18, 2014
@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

though their is a bug their somewhere, @cpcloud ?

@cpcloud cpcloud self-assigned this Jun 18, 2014
@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

on it 😄

@mrjbq7
Copy link
Author

mrjbq7 commented Jun 18, 2014

P.S. you guys are awesomely responsive. Thanks.

@jorisvandenbossche
Copy link
Member

So slicing in [] as df[x:y] is always slicing by location, also if the integer labels would overlap? (in other words, df[x:y] is actually equivalent with df.iloc[x:y]?)
Is this somewhere mentioned in the docs? As I don't see it in this section: http://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges, and it is indeed a bit counterintuitive, as with df.ix[x:y] it tries first label-based (so in above example it gives an empty frame).

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

no if the labels are there, then it will use loc, otherwise it will try using iloc

@jorisvandenbossche
Copy link
Member

That does not seem to be the case:

In [7]: df = pandas.DataFrame(np.random.randn(10, 5), index=np.arange(5, 15))

In [8]: df
Out[8]:
           0         1         2         3         4
5   1.027523 -0.481625  0.525546 -0.604405 -1.644525
6  -0.568643  0.385232 -0.661878  0.373214  2.326299
7  -1.163296 -0.118817 -1.528926 -1.937901  0.659142
8   1.138747  0.480652 -1.105340 -0.151181 -0.100053
9  -0.065683 -0.755676  0.578010 -0.350439  0.446478
10  0.035460  1.164672  0.489051  0.289033  0.309896
11  1.250149  0.032059  1.687558 -1.313212  0.645179
12 -1.393927 -0.903836 -2.174578 -0.206523 -1.483739
13  1.313273  1.569998 -0.326552  0.955845  0.138290
14 -0.629166  0.861509 -0.057021  1.336045  0.207536

In [9]: df[5:7]
Out[9]:
           0         1         2         3         4
10  0.035460  1.164672  0.489051  0.289033  0.309896
11  1.250149  0.032059  1.687558 -1.313212  0.645179

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

huh. guess im wrong then, imho this style of indexing should be banned for life

@mrjbq7
Copy link
Author

mrjbq7 commented Jun 18, 2014

It's old syntax, which has worked in my application for several pandas versions and I only noticed just now that it was strangely not working.

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

this was a change in 0.14 ... i refactored some of the "guessing" code and didn't catch this case

@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

[x:y] tries to convience slice on the rows (its basically .ix[x:y], but pretty odd if you ask me
but since it has been their a long time we left it

@jorisvandenbossche
Copy link
Member

So the bug is that the FloatIndex is not doing this location based slicing but tries label-based? (which is more logical, but inconsistent with the other indexers)

@jreback it's not like .ix[x:y] as ix will first try label based and fall back to integer location, while df[x:y] only tries integer location:

In [14]: df[3:5]
Out[14]:
           0         1         2         3         4
13  0.098103 -0.290480  0.716710 -0.533959 -0.890271
14 -0.738622  0.325792  1.106741  0.442422 -1.087715

In [15]: df.ix[3:5]
Out[15]:
Empty DataFrame
Columns: [0, 1, 2, 3, 4]
Index: []

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

@jorisvandenbossche i'm glad you found this, except that now we have 4th style of indexing to support .... :)

@jreback
Copy link
Contributor

jreback commented Jun 18, 2014

hmm, by definition on Float64Index this cannot do integer based, and MUST be label based. This whole 4th type is very odd if you ask me.

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2014

so we have

  • ix default to labels
  • mystery_meat i.e., [] default to pos
  • loc labels only
  • iloc pos only

there's a strangely beautiful symmetry to this mess

@jreback jreback removed this from the 0.16.0 milestone Mar 6, 2015
@jreback
Copy link
Contributor

jreback commented Apr 19, 2016

I believe most of the oddness was fixed in 0.18.0 for the float slicing fixup. so going to revisit this.

@jreback jreback modified the milestones: 0.18.2, Next Major Release Apr 19, 2016
@jorisvandenbossche
Copy link
Member

@jreback What do you mean with 'revisit'? I don't think anything has changed to the original issue here ([] being label based for FloatIndex, while location based for IntIndex)

@jreback
Copy link
Contributor

jreback commented Apr 19, 2016

no, some of the 'other' issues are fixed (IOW, the slicing is all now consistent), e.g.

In [23]: s = Series(np.arange(5), index=np.arange(5) * 2.5, dtype=np.int64)

In [24]: s
Out[24]: 
0.0     0
2.5     1
5.0     2
7.5     3
10.0    4
dtype: int64

In [25]: s[2:5]
Out[25]: 
2.5    1
5.0    2
dtype: int64

In [26]: s[2.0:5.0]
Out[26]: 
2.5    1
5.0    2
dtype: int64

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

we should prob close this issue and open a new clarifying issue about what the problems are with FloatIndex slicing.

@jreback jreback modified the milestones: 0.20.0, 0.21.0 Mar 28, 2017
@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

can someone parse this issue and see if we should open an issue w.r.t. float slicing?

@jorisvandenbossche
Copy link
Member

The original reported 'issue' is still present (or better: 'debatable behaviour')

To summarize in a specific way (disregarding the .ix part of the above discussion, as that is deprecated anyway): for all index types, using integers in [] (__getitem__) is positional (like iloc), except for Float64Index, making this a special case.
So also for an integer index, [] is positional and not label based.

I think it would be nice to make int index and float index consistent for []. Making Int64Index label based would be a change with a lot of impact, changing Float64Index to do positional (when using integers, when using floats it stays label-based) would be easier I think.

@jreback jreback modified the milestones: 0.21.0, 1.0 Oct 2, 2017
@TomAugspurger
Copy link
Contributor

Pushing this off 1.0.

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@MarcoGorelli
Copy link
Member

can someone parse this issue and see if we should open an issue w.r.t. float slicing?

Looks like it's been opened here #31344

@jbrockmendel
Copy link
Member

Reading over this thread I was briefly optimistic "with ix gone this might be easier!" but nope

I think it would be nice to make int index and float index consistent for []. Making Int64Index label based would be a change with a lot of impact, changing Float64Index to do positional (when using integers, when using floats it stays label-based) would be easier I think.

I agree with @jorisvandenbossche here.

I'd also be on board with a "nuke it from space" option to deprecate the ambiguous behaviors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
8 participants