Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix no raise dup index when using drop with axis=0 #19230

Merged
merged 14 commits into from
Jan 18, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,7 @@ Reshaping
- Bug in :func:`Dataframe.pivot_table` which fails when the ``aggfunc`` arg is of type string. The behavior is now consistent with other methods like ``agg`` and ``apply`` (:issue:`18713`)
- Bug in :func:`DataFrame.merge` in which merging using ``Index`` objects as vectors raised an Exception (:issue:`19038`)
- Bug in :func:`DataFrame.stack`, :func:`DataFrame.unstack`, :func:`Series.unstack` which were not returning subclasses (:issue:`15563`)
- Bug in :func:`DataFrame.drop`, `ValueError` now raises when dropping an `Index` that has duplicates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • `ValueError` --> ``ValueError`` (double backticks)
  • `Index` --> ``Index`` (double backticks)
  • Add the issue number: (:issue:`19186`)

-

Numeric
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2909,6 +2909,9 @@ def _drop_axis(self, labels, axis, level=None, errors='raise'):
else:
indexer = ~axis.isin(labels)

if all(indexer) and errors == 'raise':
raise ValueError('{} not found in axis'.format(labels))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a KeyError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should line 3770 in pandas.core.indexes.base also be KeyError then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u show a user facing example

Copy link
Contributor Author

@aschade aschade Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In[3]: pd.DataFrame(index=['a', 'b']).drop('c', axis=1)
ValueError: labels ['c'] not contained in axis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u change that,
what breaks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks 15 tests that are expecting that scenario to be a ValueError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u show some more on this eg which tests

Copy link
Contributor Author

@aschade aschade Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

============================================================= FAILURES =============================================================
_______________________________________________________ TestPanel.test_drop ________________________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/test_panel.py:2305: in test_drop
    pytest.raises(ValueError, panel.drop, 'Three')
pandas/core/generic.py:2863: in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
pandas/core/generic.py:2895: in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels ['Three'] not contained in axis"
____________________________________________ TestDataFrameSelectReindex.test_drop_names ____________________________________________
[gw1] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/frame/test_axis_select_reindex.py:44: in test_drop_names
    pytest.raises(ValueError, df.drop, ['g'])
pandas/core/generic.py:2863: in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
pandas/core/generic.py:2895: in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels ['g'] not contained in axis"
_______________________________________________ TestDataFrameSelectReindex.test_drop _______________________________________________
[gw0] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/frame/test_axis_select_reindex.py:90: in test_drop
    pytest.raises(ValueError, simple.drop, 5)
pandas/core/generic.py:2863: in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
pandas/core/generic.py:2895: in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: 'labels [5] not contained in axis'
_______________________________________________________ TestIndex.test_drop ________________________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1399: in test_drop
    pytest.raises(ValueError, self.strIndex.drop, ['foo', 'bar'])
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels ['foo' 'bar'] not contained in axis"
___________________________________________ TestIndex.test_drop_tuple[to_drop0-values0] ____________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1451: in test_drop_tuple
    pytest.raises(ValueError, removed.drop, drop_me)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels ['a'] not contained in axis"
___________________________________________ TestIndex.test_drop_tuple[to_drop0-values1] ____________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1451: in test_drop_tuple
    pytest.raises(ValueError, removed.drop, drop_me)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels ['a'] not contained in axis"
___________________________________________ TestIndex.test_drop_tuple[to_drop0-values2] ____________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1451: in test_drop_tuple
    pytest.raises(ValueError, removed.drop, drop_me)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels ['a'] not contained in axis"
___________________________________________ TestIndex.test_drop_tuple[to_drop1-values0] ____________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1451: in test_drop_tuple
    pytest.raises(ValueError, removed.drop, drop_me)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels [('c', 'd')] not contained in axis"
___________________________________________ TestIndex.test_drop_tuple[to_drop1-values1] ____________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1451: in test_drop_tuple
    pytest.raises(ValueError, removed.drop, drop_me)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels [('c', 'd')] not contained in axis"
___________________________________________ TestIndex.test_drop_tuple[to_drop1-values2] ____________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/indexes/test_base.py:1451: in test_drop_tuple
    pytest.raises(ValueError, removed.drop, drop_me)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: "labels [('c', 'd')] not contained in axis"
_______________________________________________ TestPivotTable.test_pivot_no_values ________________________________________________
[gw3] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/reshape/test_pivot.py:214: in test_pivot_no_values
    res = df.pivot_table(index=df.index.month, columns=df.index.day)
pandas/core/frame.py:4513: in pivot_table
    margins_name=margins_name)
pandas/core/reshape/pivot.py:77: in pivot_table
    values = values.drop(key)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: 'labels [1 2 1 1 1] not contained in axis'
____________________________________________________ TestPivotTable.test_daily _____________________________________________________
[gw3] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/reshape/test_pivot.py:916: in test_daily
    columns=ts.index.dayofyear)
pandas/core/reshape/pivot.py:77: in pivot_table
    values = values.drop(key)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: 'labels [2000 2000 2000 ..., 2004 2004 2004] not contained in axis'
___________________________________________________ TestPivotTable.test_monthly ____________________________________________________
[gw3] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/reshape/test_pivot.py:934: in test_monthly
    columns=ts.index.month)
pandas/core/reshape/pivot.py:77: in pivot_table
    values = values.drop(key)
pandas/core/indexes/base.py:3771: in drop
    labels[mask])
E   KeyError: 'labels [2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 2001 2001\n 2001 2001 2001 2001 2001 2001 2001 2001 2001 2002 2002 2002 2002 2002 2002\n 2002 2002 2002 2002 2002 2002 2003 2003 2003 2003 2003 2003 2003 2003 2003\n 2003 2003 2003 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004] not contained in axis'
___________________________________________________ TestSeriesIndexing.test_drop ___________________________________________________
[gw2] darwin -- Python 3.5.4 /Users/Alex/miniconda3/envs/pandas/bin/python
pandas/tests/series/test_indexing.py:1841: in test_drop
    pytest.raises(ValueError, s.drop, 'bc')
pandas/core/generic.py:2863: in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
pandas/core/generic.py:2895: in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
pandas/core/indexes/base.py:3771: in drop
    'labels %s not contained in axis' % labels[mask])
E   KeyError: "labels ['bc'] not contained in axis"

Wouldn't be a ton of effort to just update the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let me have a closer look
we could prob just change this to be a KeyError which is more consistent with other indexing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!


slicer = [slice(None)] * self.ndim
slicer[self._get_axis_number(axis_name)] = indexer

Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/frame/test_mutate_columns.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,3 +257,21 @@ def test_insert_column_bug_4032(self):
expected = DataFrame([[1.3, 1, 1.1], [2.3, 2, 2.2]],
columns=['c', 'a', 'b'])
assert_frame_equal(result, expected)

data = [[1, 2, 3], [1, 2, 3]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to test_axis_select_reindex where the other drop tests are


@pytest.mark.parametrize('actual', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to test for axis=1 case as well (see the original issue)

DataFrame(data=data, index=['a', 'a']),
DataFrame(data=data, index=['a', 'b']),
DataFrame(data=data, index=['a', 'b']).set_index([0, 1]),
DataFrame(data=data, index=['a', 'a']).set_index([0, 1])
])
def test_raise_on_drop_duplicate_index(self, actual):

# issue 19186
level = 0 if isinstance(actual.index, MultiIndex) else None
with pytest.raises(ValueError):
actual.drop('c', level=level, axis=0)
expected_no_err = actual.drop('c', axis=0, level=level,
errors='ignore')
assert_frame_equal(expected_no_err, actual)