Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: create Interval class #8625

Closed
jreback opened this issue Oct 24, 2014 · 12 comments
Closed

API/ENH: create Interval class #8625

jreback opened this issue Oct 24, 2014 · 12 comments
Labels
API Design Categorical Categorical Data Type Enhancement Interval Interval data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 24, 2014

xref #8595

something like: Interval(left=(-1.5,'open),right=(0,'closed'))

reprs to (-1.5,0]

can be used immediately in the index values for pd.cut and such (as an object index)
evenutally can form the basis for IntervalIndex

cc @shoyer
cc @JanSchulz
cc @rosnfeld

@jreback jreback added this to the 0.16.0 milestone Oct 24, 2014
@shoyer
Copy link
Member

shoyer commented Oct 24, 2014

Sounds good to me. As a reminder: here is the IntervalIndex proposal: #7640

Let's consider whether you should be able to control whether the level and right margins are closed/open separately or if you should only be able to say closed='left' or closed='right'. I am inclined to go for the later since it will be more complex for other cases in IntervalIndex.

@jreback
Copy link
Contributor Author

jreback commented Oct 24, 2014

@shoyer

you could also do

Interval(left,right,closed='left|right|both|neither') I guess for full compat (and could have both constructors I guess)

@jnmclarty
Copy link
Contributor

Idea: def __init__(leftside='[',leftnum=0,rightnum=0,rightside=']')

Then...

Interval('[',left,right,']') == Interval('closed',left,right,'closed') == Interval(leftnum=left,rightnum=right)

could subclass and create left(leftside,leftnum) and right(rightnum, rightside), so that left('(',-1) + right(2,']') == Interval('(',-1,2,']')

...but individually they would repr to (-1,inf] and [-inf,2]

@shoyer
Copy link
Member

shoyer commented Oct 27, 2014

One follow up thought: if we support closed='both' and closed='neither' we don't want the want the repr to look like [-1.5, 0] or (-1.5, 0). Probably better to repr to the more verbose Interval(-1.5, 0, closed='both') for these cases.

@jankatins
Copy link
Contributor

Why this verbose repr? In repr of a dataframe (as index or values), I would expect that I only get the "short" repr...

@shoyer
Copy link
Member

shoyer commented Oct 28, 2014

The problem is ambiguity with list and tuple, at least as long as the array has dtype=object.

On Tue, Oct 28, 2014 at 2:56 AM, Jan Schulz notifications@github.com
wrote:

Why this verbose repr? In repr of a dataframe (as index or values), I would expect that I only get the "short" repr...

Reply to this email directly or view it on GitHub:
#8625 (comment)

@jnmclarty
Copy link
Contributor

Could clear up the ambiguity with dash ( or even double dash ). Nobody should confuse "(0 - 2)" with a tuple. Nor would "[0 - 2]" be confused with a list.

shoyer added a commit to shoyer/pandas that referenced this issue Nov 2, 2014
Fixes pandas-dev#7640, pandas-dev#8625

This is a work in progress, but it's far enough along that I'd love to get
some feedback.

TODOs (more called out in the code):

- [ ] documentation + docstrings
- [ ] finish the index methods:
   - [ ] `get_loc`
   - [ ] `get_indexer`
   - [ ] `slice_locs`
- [ ] comparison operations
- [ ] fix `is_monotonic` (pending pandas-dev#8680)
- [ ] ensure sorting works
- [ ] arithmetic operations (not essential for MVP)
- [ ] cythonize the bottlenecks:
   - [ ] `from_breaks`
   - [ ] `_data`
   - [ ] `Interval`?
- [ ] `MultiIndex`
- [ ] `Categorical`/`cut`
- [ ] serialization
- [ ] lots more tests

CC @jreback @cpcloud @immerrr
@shoyer shoyer mentioned this issue Nov 2, 2014
35 tasks
@shoyer
Copy link
Member

shoyer commented Nov 2, 2014

Rough draft implementation up for review in #8707.

Turns out closed='neither' and closed='both' are probably not harder cases for IntervalIndex than 'left'/'right'. But I still like the constructor Interval(left, right, closed='right').

We could also have a to_interval function like to_datetime that parses strings, e.g., to_interval('(0, 1]') -> Interval(0, 1, closed='right') (lists of strings would turn into IntervalIndex).

@jnmclarty
Copy link
Contributor

+1 for to_interval('(0,1]')

@shoyer
Copy link
Member

shoyer commented Nov 5, 2014

Here's another design question: how should comparison operations work with intervals?

My initial thought was to support all comparisons operations, but when I attempted to write it, I realized that it's not obvious to me what the result of 0.5 < Interval(0, 1) should be (it's even less clear when both values are intervals).

So, my current proposal is that we do not actually want to support support most comparisons with intervals. Instead, we should encourage users to write things like 0.5 < interval.left. The only comparison like operations we want to define are __eq__/__ne__, for comparing two intervals objects (e.g., Interval(0, 1) == Interval(0, 1)), and __contains__, for checking if a scalar value is contained in an interval (e.g., 0.5 in Interval(0, 1)).

@jreback
Copy link
Contributor Author

jreback commented Nov 5, 2014

sure

though you will need to provide an ordering in used for IntervalIndex

eg imagine spitting these though it's sort of trivial:

i = IntervalIndex(...)
i.take(...) equiv to IntervalIndex(_left=i._left.take(), _righti._right.take())

@shoyer
Copy link
Member

shoyer commented Nov 5, 2014

So after testing out this alternative, I have now waffled back to defining comparison operations -- otherwise we can't get sorting to work, even in cases of non-overlapping intervals. I would like set operations like union to be able to sort, at least in unambiguous cases, and I don't see any other obvious way to do this.

@jreback I'm not sure I understand your point about take.

@jreback jreback added the Interval Interval data type label Nov 21, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
jreback pushed a commit to jreback/pandas that referenced this issue Feb 5, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 6, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 6, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 7, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 7, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 8, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 15, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 15, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Feb 16, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 8, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 14, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 14, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 17, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 17, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 20, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 24, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 27, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 28, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Mar 31, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Apr 3, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Apr 4, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Apr 6, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Apr 7, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Apr 7, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Apr 11, 2017
jreback pushed a commit to jreback/pandas that referenced this issue Apr 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Categorical Categorical Data Type Enhancement Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants