Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: IntervalIndex[IntervalArray] #20611

Merged
merged 24 commits into from
Jul 13, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1924,11 +1924,24 @@ untouched. If the data is modified, it is because you did so explicitly.
dtypes
------

The main types stored in pandas objects are ``float``, ``int``, ``bool``,
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
for more detail on ``datetime64[ns, tz]`` dtypes.
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
columns of a DataFrame. The main types allowed in pandas objects are ``float``,
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
timezone-aware datetimes).

In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
NumPy's type-system for a few cases.

* :ref:`Categorical <categorical>`
* :ref:`Datetime with Timezone <timeseries.timezone_series>`
* :ref:`Period <timeseries.periods>`
* :ref:`Interval <advanced.indexing.intervallindex>`

Pandas uses the ``object`` dtype for storing strings.

Finally, arbitrary objects may be stored using the ``object`` dtype, but should
be avoided to the extent possible (for performance and interoperability with
other libraries and methods. See :ref:`basics.object_conversion`).

A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series
with the data type of each column.
Expand Down
70 changes: 70 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,36 @@ Current Behavior:

result


.. _whatsnew_0240.enhancements.interval:

Storing Interval Data in Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Interval data may now be stored in a ``Series`` or ``DataFrame``, in addition to an
:class:`IntervalIndex` like previously (:issue:`19453`).

.. ipython:: python

ser = pd.Series(pd.interval_range(0, 5))
ser
ser.dtype

Previously, these would be cast to a NumPy array of ``Interval`` objects. In general,
this should result in better performance when storing an array of intervals in
a :class:`Series`.

Note that the ``.values`` of a ``Series`` containing intervals is no longer a NumPy
array, but rather an ``ExtensionArray``:

.. ipython:: python

ser.values

This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.


.. _whatsnew_0240.enhancements.other:

Other Enhancements
Expand All @@ -90,6 +120,45 @@ Other Enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. _whatsnew_0240.api_breaking.interval_values:

``IntervalIndex.values`` is now an ``IntervalArray``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :attr:`~Interval.values` attribute of an :class:`IntervalIndex` now returns an
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects (:issue:`19453`).

Previous Behavior:

.. code-block:: ipython

In [1]: idx = pd.interval_range(0, 4)

In [2]: idx.values
Out[2]:
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')],
dtype=object)

New Behavior:

.. ipython:: python

idx = pd.interval_range(0, 4)
idx.values

This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``.

For situations where you need an ``ndarray`` of ``Interval`` objects, use
:meth:`numpy.asarray` or ``idx.astype(object)``.

.. ipython:: python

np.asarray(idx)
idx.values.astype(object)


.. _whatsnew_0240.api.datetimelike.normalize:

Tick DateOffset Normalize Restrictions
Expand Down Expand Up @@ -345,6 +414,7 @@ Interval
^^^^^^^^

- Bug in the :class:`IntervalIndex` constructor where the ``closed`` parameter did not always override the inferred ``closed`` (:issue:`19370`)
- Bug in the ``IntervalIndex`` repr where a trailing comma was missing after the list of intervals (:issue:`20611`)
-
-

Expand Down
20 changes: 20 additions & 0 deletions pandas/_libs/interval.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,26 @@ cdef class IntervalMixin(object):
msg = 'cannot compute length between {left!r} and {right!r}'
raise TypeError(msg.format(left=self.left, right=self.right))

def _check_closed_matches(self, other, name='other'):
"""Check if the closed attribute of `other` matches.

Note that 'left' and 'right' are considered different from 'both'.

Parameters
----------
other : Interval, IntervalIndex, IntervalArray
name : str
Name to use for 'other' in the error message.

Raises
------
ValueError
When `other` is not closed exactly the same as self.
"""
if self.closed != other.closed:
msg = "'{}.closed' is '{}', expected '{}'."
raise ValueError(msg.format(name, other.closed, self.closed))


cdef _interval_like(other):
return (hasattr(other, 'left')
Expand Down
1 change: 1 addition & 0 deletions pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@
ExtensionScalarOpsMixin)
from .categorical import Categorical # noqa
from .datetimes import DatetimeArrayMixin # noqa
from .interval import IntervalArray # noqa
from .period import PeriodArrayMixin # noqa
from .timedelta import TimedeltaArrayMixin # noqa
6 changes: 6 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
_ensure_int64,
_ensure_object,
_ensure_platform_int,
is_extension_array_dtype,
is_dtype_equal,
is_datetimelike,
is_datetime64_dtype,
Expand Down Expand Up @@ -1243,6 +1244,11 @@ def __array__(self, dtype=None):
ret = take_1d(self.categories.values, self._codes)
if dtype and not is_dtype_equal(dtype, self.categories.dtype):
return np.asarray(ret, dtype)
if is_extension_array_dtype(ret):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__array__ has to return an ndarray. Without this, Categorical[ExtensionArray]would fail, astake_1d(...)` would be an ExtensionArray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an update on this section already in intna

# When we're a Categorical[ExtensionArray], like Interval,
# we need to ensure __array__ get's all the way to an
# ndarray.
ret = np.asarray(ret)
return ret

def __setstate__(self, state):
Expand Down
Loading