-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: change IntervalIndex.contains to work elementwise #17753
API: change IntervalIndex.contains to work elementwise #17753
Conversation
Any comments on this one? |
pandas/core/indexes/interval.py
Outdated
return True | ||
except KeyError: | ||
return False | ||
return np.array([key in interval for interval in self], dtype='bool') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just do
return self.get_indexer(key) != -1
non-withstanding my comment on the impl. I actually don't think this is a good idea. The reason for But you are now returning a boolean array, which is contrary to a single boolean result. I agree we need a method to return a boolean for each element, but |
If .contains() is different from __contains__, why does it need to return a
scalar in the same way? What's the use case for returning a scalar? We
could have different method name of course (I think we considered .within()
earlier), but I think we need something like this.
…On Tue, Oct 10, 2017 at 3:44 AM Jeff Reback ***@***.***> wrote:
non-withstanding my comment on the impl. I actually don't think this is a
good idea. The reason for .contains in the first place is to allow us to
control what __contains__ does (and for II have it mean something
differnt), so the default of scalar in index is preserved.
But you are now returning a boolean array, which is contrary to a single
boolean result. I agree we need a method to return a boolean *for each
element*, but .contains shouldn't be it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17753 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1kBcmKw9cJ_zOjp5xpeblqScaD4tks5squgGgaJpZM4PrQMj>
.
|
Also, if you want the return value of the current contains, you can always do We can indeed discuss what would be the best name for this method (the one that returns boolean array), and it might not necessarily be 'contains'. But if we decide for 'contains', the fact that it is different from |
Agreed with @jorisvandenbossche here. |
It seems like this would be inconsistent with every other type of index, as I recall there being plans to add an interval accessor. Perhaps that would be a more consistent place for an elementwise |
I am not aware of a general existing It is true that DatetimeIndex/PeriodIndex have one, but this was added together with the introduction of IntervalIndex, and is thus rather coupled. Indeed, when changing it for IntervalIndex, we need to change it for DatetimeIndex/PeriodIndex as well (or for DatetimeIndex, we could also remove it, as this has no notion of intervals).
We have no accessors for any other index, so that would also be inconsistent. Such an accessor would be to access the special methods when the data are in a Series (but I may misremember the previous discussion, do you know where this took place?)
IMO it would be unfortunate that |
In general: note that the proposed behaviour is consistent with how 'contains' works on an individual |
There is a In [2]: idx = pd.Index(['foo', 'bar', 'baz'])
In [3]: idx.contains('foo')
Out[3]: True
In [4]: idx.contains('a')
Out[4]: False
Note that this already happens with the string accessor: In [5]: idx.str.contains('a')
Out[5]: array([False, True, True], dtype=bool) I do see your point though, and am not trying to imply that this would be fine simply because there is precedence for it. Just pointing out that it's not an unbroken rule.
I was thinking of #16401. Seems like the proposal was just for columns, so I guess I was mistaken in thinking that it would also work on an I probably wasn't clear in my original post, but I'm not against this change (nor do I feel like I've been around long enough to cast +1/-1 votes on stuff like this anyways). I was actually quite surprised when I first used |
The entire point of introducing But I think the intent was still a scalar return value, and not an array. The idea was to allow value in II to work. maybe if we had/have |
Whoops it seemed I really looked bad, I thought I was sure to have checked it :-) @jschendel Thanks for pointing this out. That is a bit unfortunate for the changes I am proposing here.
Ah, yes, string accessor is indeed available on Index. I was thinking about the
I don't fully understand this point, as Index.contains is (more or less) an exact alias for |
closing as stale |
I don't agree with closing this :-), but it is true the discussion died. Let me summarize my reasoning for this PR:
So my proposal is:
And also if we do not agree on the above, I think we need the functionality of this PR, so then we need to come up with another name than |
ok with implementing this. pls rebase / update. |
@jorisvandenbossche : Any chance this gets implemented relatively soon? I'm looking for an |
@jschendel can you review and rebase if you think good? |
@jreback : Sure, but will wait until |
closing as stale. if you'd like to continue pls ping. |
IntervalArray is done in the meantime, so we can rebase this now |
Hello @jorisvandenbossche! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-06-30 16:37:27 UTC |
@jschendel If you want to take over, feel free. Otherwise I will take this up next week. |
closing as stale. if someone would like to revive, pls ping. |
I think it would be nice to include this along the other IntervalIndexing changes in #27100 (still need to add whatsnew docs + tests for deprecated methods) |
@@ -615,6 +615,7 @@ Other deprecations | |||
- :attr:`Series.imag` and :attr:`Series.real` are deprecated. (:issue:`18262`) | |||
- :meth:`Series.put` is deprecated. (:issue:`18262`) | |||
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`) | |||
- :meth:`Index.contains` is deprecated. Use ``key in index`` (``__contains__``) instead (:issue:`17753`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is fine, but I would add a sub-section on the usage for IntervalIndex itself
Will add one once the other interval PR is merged
Op ma 1 jul. 2019 12:28 schreef Jeff Reback <notifications@github.com>:
… ***@***.**** requested changes on this pull request.
------------------------------
In doc/source/whatsnew/v0.25.0.rst
<#17753 (comment)>:
> @@ -615,6 +615,7 @@ Other deprecations
- :attr:`Series.imag` and :attr:`Series.real` are deprecated. (:issue:`18262`)
- :meth:`Series.put` is deprecated. (:issue:`18262`)
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
+- :meth:`Index.contains` is deprecated. Use ``key in index`` (``__contains__``) instead (:issue:`17753`).
so this is fine, but I would add a sub-section on the usage for
IntervalIndex itself
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#17753?email_source=notifications&email_token=AAHZEUGJNGZRKLLGKIU5ALLP5I5DBA5CNFSM4D5NAMR2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB5EWQRI#pullrequestreview-256469061>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHZEUFHONUV22VC7GZVUDDP5I5DBANCNFSM4D5NAMRQ>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
def contains(self, other): | ||
if isinstance(other, Interval): | ||
raise NotImplementedError( | ||
'contains not implemented for two intervals' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should be able to implement this in the next day or two if we want to get it in for 0.25.0; it should be relatively straight-forward and we have good testing infrastructure from overlaps
that could be partially reused.
thanks @jorisvandenbossche |
xref #16316
This is more a proof-of-concept, to see if there is agreement on the behaviour change (I suppose the implementation itself can be easily vectorized based on the left/right attributes).
cc @zfrenchee @shoyer @buyology @jreback