BUG: RangeIndex.get_indexer is incorrect for some decreasing RangeIndex #28678

jschendel · 2019-09-30T03:49:22Z

For some decreasing RangeIndex, the get_indexer method will indicate that all of it's own values are missing, and find matches for values not included in the index:

In [2]: ri = pd.RangeIndex(10, 0, -3)

In [3]: ri.get_indexer(ri)
Out[3]: array([-1, -1, -1, -1])

In [4]: ri.get_indexer(ri - 1)
Out[4]: array([ 1,  2,  3, -1])

This will in turn result in methods like Series.reindex not working properly:

In [5]: s = pd.Series(list('abcd'), index=ri) 

In [6]: s
Out[6]: 
10    a
7     b
4     c
1     d
dtype: object

In [7]: s.reindex([10, 9, 7])
Out[7]: 
10    NaN
9       b
7     NaN
dtype: object

The issue appears to occur specifically for decreasing RangeIndex that are not in their canonical form. By canonical form, I mean when stop is the next valid value in the range that's not included, e.g. when you think of a more standard range like range(1, 7, 1), 7 is the next valid value that's not present, but when the step is larger than 1 you lose uniqueness of representation with stop (i.e. range(1, 6, 2) == range(1, 7, 2)).

Note that the code above works properly for the equivalent RangeIndex in it's canonical form:

In [8]: ri2 = pd.RangeIndex(start=10, stop=-2, step=-3)

In [9]: ri2.equals(ri)
Out[9]: True

In [10]: ri2.get_indexer(ri2)
Out[10]: array([0, 1, 2, 3])

In [11]: ri2.get_indexer(ri2 - 1)
Out[11]: array([-1, -1, -1, -1])

In [12]: s2 = pd.Series(list('abcd'), index=ri2)

In [13]: s2
Out[13]: 
10    a
7     b
4     c
1     d
dtype: object

In [14]: s2.reindex([10, 9, 7])
Out[14]: 
10      a
9     NaN
7       b
dtype: object

The cause of the issue appears to be that the code to determine start, stop, step when dealing with decreasing RangeIndex in get_indexer assumes self.stop is the canonical form:

pandas/pandas/core/indexes/range.py

Lines 386 to 390 in c4489cb

    
           if self.step > 0: 
        
               start, stop, step = self.start, self.stop, self.step 
        
           else: 
        
               # Work on reversed range for simplicity: 
        
               start, stop, step = (self.stop - self.step, self.start + 1, -self.step)

Instead of directly computing the reversed values ourselves, I think we should simply take the values from the reversed underlying range object:

diff --git a/pandas/core/indexes/range.py b/pandas/core/indexes/range.py
index 8783351cc..4c5904e5a 100644
--- a/pandas/core/indexes/range.py
+++ b/pandas/core/indexes/range.py
@@ -387,7 +387,8 @@ class RangeIndex(Int64Index):
             start, stop, step = self.start, self.stop, self.step
         else:
             # Work on reversed range for simplicity:
-            start, stop, step = (self.stop - self.step, self.start + 1, -self.step)
+            reverse = self._range[::-1]
+            start, stop, step = reverse.start, reverse.stop, reverse.step
 
         target_array = np.asarray(target)
         if not (is_integer_dtype(target_array) and target_array.ndim == 1):

The text was updated successfully, but these errors were encountered:

jschendel · 2019-09-30T06:26:28Z

Note that this is a regression as this worked in 0.24.2:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.2'

In [2]: ri = pd.RangeIndex(10, 0, -3)

In [3]: ri.get_indexer(ri)
Out[3]: array([0, 1, 2, 3])

In [4]: ri.get_indexer(ri - 1)
Out[4]: array([-1, -1, -1, -1])

In [5]: s = pd.Series(list('abcd'), index=ri)

In [6]: s
Out[6]: 
10    a
7     b
4     c
1     d
dtype: object

In [7]: s.reindex([10, 9, 7])
Out[7]: 
10      a
9     NaN
7       b
dtype: object

jschendel added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 30, 2019

jschendel added this to the Contributions Welcome milestone Sep 30, 2019

jschendel mentioned this issue Sep 30, 2019

BUG: Fix RangeIndex.get_indexer for decreasing RangeIndex #28680

Merged

5 tasks

jschendel modified the milestones: Contributions Welcome, 0.25.2 Sep 30, 2019

jschendel added the Regression Functionality that used to work in a prior pandas version label Sep 30, 2019

jorisvandenbossche closed this as completed in #28680 Oct 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: RangeIndex.get_indexer is incorrect for some decreasing RangeIndex #28678

BUG: RangeIndex.get_indexer is incorrect for some decreasing RangeIndex #28678

jschendel commented Sep 30, 2019 •

edited

Loading

jschendel commented Sep 30, 2019

BUG: RangeIndex.get_indexer is incorrect for some decreasing RangeIndex #28678

BUG: RangeIndex.get_indexer is incorrect for some decreasing RangeIndex #28678

Comments

jschendel commented Sep 30, 2019 • edited Loading

jschendel commented Sep 30, 2019

jschendel commented Sep 30, 2019 •

edited

Loading