Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix IntervalArray equality comparisions #30640

Merged
merged 2 commits into from
Jan 5, 2020

Conversation

jschendel
Copy link
Member

cc @TomAugspurger

@jschendel jschendel added Bug Interval Interval data type labels Jan 3, 2020
@jschendel jschendel added this to the 1.0 milestone Jan 3, 2020
@@ -547,6 +550,58 @@ def __setitem__(self, key, value):
right.values[key] = value_right
self._right = right

def __eq__(self, other):
# ensure pandas array for list-like and eliminate non-interval scalars
if is_list_like(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this particular check can be pushed to the base class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow. Whose the base class here? ExtensionArray?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

return np.zeros(len(self), dtype=bool)

# determine the dtype of the elements we want to compare
if isinstance(other, Interval):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point can’t u just wrap other in array()?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite nice overall.

Are you able to add tests inheriting from pandas.tests.extensions.base.BaseComparisonOpsTests to pandas/tests/extension/test_interval.py now?

@@ -547,6 +550,58 @@ def __setitem__(self, key, value):
right.values[key] = value_right
self._right = right

def __eq__(self, other):
# ensure pandas array for list-like and eliminate non-interval scalars
if is_list_like(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow. Whose the base class here? ExtensionArray?

"overlaps",
"contains",
"__eq__",
"__ne__",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with our other EAs, these need to be dispatched/wrapped the same way they are in datetimelike or categorical. im planning to move the relevant code to indexes.extension so this can re-use the existing code.

The relevant test will be arr == series --> Series, idx == series --> ndarray[bool]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should now be able to use indexes.extension.make_wrapped_comparison_op

@@ -93,6 +113,221 @@ def test_set_na(self, left_right_dtypes):
tm.assert_extension_array_equal(result, expected)


class TestComparison:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly put in tests.arithmetic.test_interval and parametrize with box_with_array

@jreback
Copy link
Contributor

jreback commented Jan 4, 2020

@jschendel ok, hopefully some of this can be pushed down to base classes; @jbrockmendel merge if your comments have been addressed.

isinstance(obj, Interval)
and self.closed == obj.closed
and self.left[i] == obj.left
and self.right[i] == obj.right
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this check be just self[i] == obj?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but there'd be a perf hit for actually materializing the Interval object.

@jschendel
Copy link
Member Author

jschendel commented Jan 5, 2020

hopefully some of this can be pushed down to base classes

Some of this probably could be pushed down to base classes but need to be careful because it could cause a perf hit. In the IntervalArray implementation I took a bit of care to not actually create any Interval objects and instead use comparisons against the left/right arrays. I think using the base class implementation would end up creating Interval objects. Did some quick ad hoc timings and looks like the perf hit can be ~3x in worst case.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2020

ok fIr @jschendel yep we often balance perf of specialized code paths vs more generic code

@jreback jreback merged commit 5e488a0 into pandas-dev:master Jan 5, 2020
@jreback
Copy link
Contributor

jreback commented Jan 5, 2020

k thanks @jschendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IntervalArray equality operators
4 participants