Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/DEPR: Remove +/- as setops for Index (GH8227) #14127

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -919,6 +919,43 @@ of ``int64`` (:issue:`13988`)
pi = pd.PeriodIndex(['2011-01', '2011-02'], freq='M')
pi.values


.. _whatsnew_0190.api.setops:

Index ``+`` / ``-`` no longer used for set operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Addition and subtraction of the base Index type (not the numeric subclasses)
previously performed set operations (set union and difference). This
behaviour was already deprecated since 0.15.0 (in favor using the specific
``.union()`` and ``.difference()`` methods), and is now disabled. When
possible, ``+`` and ``-`` are now used for element-wise operations, for
example for concatenating strings (:issue:`8227`, :issue:`14127`).

Previous Behavior:

.. code-block:: ipython

In [1]: pd.Index(['a', 'b']) + pd.Index(['a', 'c'])
FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union()
Out[1]: Index(['a', 'b', 'c'], dtype='object')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this clear that this ONLY applies to Index itself and not-subclasses (maybe show that numeric/datetimelike ops are unchanged)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will try to clarify

The same operation will now perform element-wise addition:

.. ipython:: python

pd.Index(['a', 'b']) + pd.Index(['a', 'c'])

Note that numeric Index objects already performed element-wise operations.
For example, the behaviour of adding two integer Indexes:

.. ipython:: python

pd.Index([1, 2, 3]) + pd.Index([2, 3, 4])

is unchanged. The base ``Index`` is now made consistent with this behaviour.


.. _whatsnew_0190.api.difference:

``Index.difference`` and ``.symmetric_difference`` changes
Expand Down
25 changes: 7 additions & 18 deletions pandas/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1739,28 +1739,16 @@ def argsort(self, *args, **kwargs):
return result.argsort(*args, **kwargs)

def __add__(self, other):
if is_list_like(other):
warnings.warn("using '+' to provide set union with Indexes is "
"deprecated, use '|' or .union()", FutureWarning,
stacklevel=2)
if isinstance(other, Index):
return self.union(other)
return Index(np.array(self) + other)

def __radd__(self, other):
if is_list_like(other):
warnings.warn("using '+' to provide set union with Indexes is "
"deprecated, use '|' or .union()", FutureWarning,
stacklevel=2)
return Index(other + np.array(self))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to remain I think? (the isinstance part)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the isinstance part? (we don't only allow to add an Index, but also a scalar or list/array)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really should use _ensure_index I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But scalars should also work?

In [13]: pd.Index(['a', 'b']) + '1'
Out[13]: Index(['a1', 'b1'], dtype='object')

BTW, on numeric indexes, using lists is not allowed:

In [12]: pd.Index([1, 2]) + [10, 20]
TypeError: can only perform ops with scalar values

In [13]: pd.Index([1, 2]) + np.array([10, 20])
Out[13]: Int64Index([11, 22], dtype='int64')

while on base Index it is currently allowed:

In [15]: pd.Index(['a', 'b']) + ['c', 'd']
Out[15]: Index(['ac', 'bd'], dtype='object')

Unifying that would also be nice. Using _ensure_index in this case would still allow lists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are expanding the api w/o any testing of __add__. Its ok, but should create an issue to add tests and/or limit it. (so ideally we don't expand it at all for Index and add back if needed later), esp NOT for MultiIndex and Categorical (I think you just renamed, but just to be sure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for MultiIndex/Categorical nothing should be changed. I just renamed the method to disable add/sub for those, as there are no longer set ops.

For Index, I agree we should be more careful on what to allow, but I think that this is already the case in current master.
The 'old' __add__ did basically:

def __add__(self, other):
    if isinstance(other, Index):
        self.union(other)
    return Index(self.values + other)

So by removing the if instance part I am not broading the scope, previously everything that was not an Index was already handled to do addition.

But, as I said, I agree we should be more careful / do more careful testing (eg Int64Index does not allow to add with lists, base Index does).
I will see if I do that here, or leave it for another issue.

Copy link
Contributor

@jreback jreback Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be a break in here (e.g. where it currently calls .union) and see if we actually have test code that hits this. It obviously works, so prob ok to remove, just checking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this comment. We had indeed tests that tested the union behaviour of + for Index, and I changed this test to check for concatenation. If there are other tests that would hit this path, they should now fail given that I removed it.

__iadd__ = __add__

def __sub__(self, other):
warnings.warn("using '-' to provide set differences with Indexes is "
"deprecated, use .difference()", FutureWarning,
stacklevel=2)
return self.difference(other)
raise TypeError("cannot perform __sub__ with this index type: "
"{typ}".format(typ=type(self)))

def __and__(self, other):
return self.intersection(other)
Expand Down Expand Up @@ -1990,7 +1978,8 @@ def symmetric_difference(self, other, result_name=None):
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``(idx1 - idx2) + (idx2 - idx1)`` with duplicates dropped.
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.

Examples
--------
Expand Down Expand Up @@ -3333,8 +3322,8 @@ def _evaluate_compare(self, other):
cls.__ge__ = _make_compare(operator.ge)

@classmethod
def _add_numericlike_set_methods_disabled(cls):
""" add in the numeric set-like methods to disable """
def _add_numeric_methods_add_sub_disabled(cls):
""" add in the numeric add/sub methods to disable """

def _make_invalid_op(name):
def invalid_op(self, other=None):
Expand All @@ -3349,7 +3338,7 @@ def invalid_op(self, other=None):

@classmethod
def _add_numeric_methods_disabled(cls):
""" add in numeric methods to disable """
""" add in numeric methods to disable other than add/sub """

def _make_invalid_op(name):
def invalid_op(self, other=None):
Expand Down
2 changes: 1 addition & 1 deletion pandas/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@ def _add_accessors(cls):
typ='method', overwrite=True)


CategoricalIndex._add_numericlike_set_methods_disabled()
CategoricalIndex._add_numeric_methods_add_sub_disabled()
CategoricalIndex._add_numeric_methods_disabled()
CategoricalIndex._add_logical_methods_disabled()
CategoricalIndex._add_comparison_methods()
Expand Down
1 change: 1 addition & 0 deletions pandas/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2219,6 +2219,7 @@ def isin(self, values, level=None):


MultiIndex._add_numeric_methods_disabled()
MultiIndex._add_numeric_methods_add_sub_disabled()
MultiIndex._add_logical_methods_disabled()


Expand Down
24 changes: 14 additions & 10 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,16 +730,6 @@ def test_union(self):
expected = Index(list('ab'), name='A')
tm.assert_index_equal(union, expected)

def test_add(self):

# - API change GH 8226
with tm.assert_produces_warning():
self.strIndex + self.strIndex
with tm.assert_produces_warning():
self.strIndex + self.strIndex.tolist()
with tm.assert_produces_warning():
self.strIndex.tolist() + self.strIndex

with tm.assert_produces_warning(RuntimeWarning):
firstCat = self.strIndex.union(self.dateIndex)
secondCat = self.strIndex.union(self.strIndex)
Expand All @@ -755,13 +745,27 @@ def test_add(self):
tm.assert_contains_all(self.strIndex, secondCat)
tm.assert_contains_all(self.dateIndex, firstCat)

def test_add(self):
idx = self.strIndex
expected = Index(self.strIndex.values * 2)
self.assert_index_equal(idx + idx, expected)
self.assert_index_equal(idx + idx.tolist(), expected)
self.assert_index_equal(idx.tolist() + idx, expected)

# test add and radd
idx = Index(list('abc'))
expected = Index(['a1', 'b1', 'c1'])
self.assert_index_equal(idx + '1', expected)
expected = Index(['1a', '1b', '1c'])
self.assert_index_equal('1' + idx, expected)

def test_sub(self):
idx = self.strIndex
self.assertRaises(TypeError, lambda: idx - 'a')
self.assertRaises(TypeError, lambda: idx - idx)
self.assertRaises(TypeError, lambda: idx - idx.tolist())
self.assertRaises(TypeError, lambda: idx.tolist() - idx)

def test_append_multiple(self):
index = Index(['a', 'b', 'c', 'd', 'e', 'f'])

Expand Down
17 changes: 10 additions & 7 deletions pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1408,21 +1408,24 @@ def test_intersection(self):
# result = self.index & tuples
# self.assertTrue(result.equals(tuples))

def test_difference(self):
def test_sub(self):

first = self.index
result = first.difference(self.index[-3:])

# - API change GH 8226
with tm.assert_produces_warning():
# - now raises (previously was set op difference)
with tm.assertRaises(TypeError):
first - self.index[-3:]
with tm.assert_produces_warning():
with tm.assertRaises(TypeError):
self.index[-3:] - first
with tm.assert_produces_warning():
with tm.assertRaises(TypeError):
self.index[-3:] - first.tolist()
with tm.assertRaises(TypeError):
first.tolist() - self.index[-3:]

self.assertRaises(TypeError, lambda: first.tolist() - self.index[-3:])
def test_difference(self):

first = self.index
result = first.difference(self.index[-3:])
expected = MultiIndex.from_tuples(sorted(self.index[:-3].values),
sortorder=0,
names=self.index.names)
Expand Down