-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: IntervalIndex[IntervalArray] #20611
Changes from 10 commits
9b8564f
9e5fc50
abb8a45
4e48e88
11d97db
de96a61
24db222
d2bb35a
f22b453
934f238
097b9a4
c2bca65
39249a3
f82df72
e0fe0bc
7c0ffd3
e89d89a
a33940a
382737d
95f8f15
e82eeb6
bccb4f7
99ab41f
385ce59
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1925,11 +1925,23 @@ untouched. If the data is modified, it is because you did so explicitly. | |
dtypes | ||
------ | ||
|
||
The main types stored in pandas objects are ``float``, ``int``, ``bool``, | ||
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``, | ||
``category`` and ``object``. In addition these dtypes have item sizes, e.g. | ||
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>` | ||
for more detail on ``datetime64[ns, tz]`` dtypes. | ||
For the most part, pandas uses NumPy arrays and dtypes for Series or individual | ||
columns of a DataFrame. The main types allowed in pandas objects are ``float``, | ||
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support | ||
timezone-aware datetimes). | ||
|
||
In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>` | ||
NumPy's type-system for a few cases. | ||
|
||
* :ref:`Categorical <categorical>` | ||
* :ref:`Datetime with Timezone <timeseries.timezone_series>` | ||
* Interval | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have a ref for this? (maybe should create one if not)? |
||
|
||
Pandas uses the ``object`` dtype for storing strings. | ||
|
||
Finally, arbitrary objects may be stored using the ``object`` dtype, but should | ||
be avoided to the extent possible (for performance and interoperability with | ||
other libraries and methods. See :ref:`basics.object_conversion`). | ||
|
||
A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series | ||
with the data type of each column. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -299,7 +299,42 @@ Supplying a ``CategoricalDtype`` will make the categories in each column consist | |
df['A'].dtype | ||
df['B'].dtype | ||
|
||
.. _whatsnew_023.enhancements.extension: | ||
.. _whatsnew_0230.enhancements.interval: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. technically i guess this is an api breaking change (for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this will need changing during the rebase |
||
|
||
Storing Interval Data in Series and DataFrame | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Interval data may now be stored in a Series or DataFrame, in addition to an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not very clear as its now new. This works now, but is just inefficient. |
||
:class:`IntervalIndex` like before (:issue:`19453`). | ||
|
||
.. ipython:: python | ||
|
||
ser = pd.Series(pd.interval_range(0, 5)) | ||
ser | ||
ser.dtype | ||
|
||
Previously, these would be cast to a NumPy array of Interval objects. In general, | ||
this should result in better performance when storing an array of intervals in | ||
a Series. | ||
|
||
Note that the ``.values`` of a Series containing intervals is no longer a NumPy | ||
array. Rather, it's an ``ExtensionArray``, composed of two arrays ``left`` and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IntervalArray instead of ExtensionArray ? |
||
``right``. | ||
|
||
.. ipython:: python | ||
|
||
ser.values | ||
|
||
To recover the NumPy array of Interval objects, use :func:`numpy.asarray`: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think you should show this (the recovering) |
||
|
||
.. ipython:: python | ||
|
||
np.asarray(ser.values) | ||
|
||
This is the same behavior as ``Series.values`` for categorical data. See | ||
:ref:`whatsnew_0230.api_breaking.interval_values` for more. | ||
|
||
.. _whatsnew_0230.enhancements.extension: | ||
|
||
Extending Pandas with Custom Types | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
@@ -482,6 +517,43 @@ If you wish to retain the old behavior while using Python >= 3.6, you can use | |
'Taxes': -200, | ||
'Net result': 300}).sort_index() | ||
|
||
.. _whatsnew_0230.api_breaking.interval_values: | ||
|
||
``IntervalIndex.values`` is now an ``IntervalArray`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i just put a single sentence for it. its not that big of a deal that this is changed. |
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The ``.values`` attribute of an :class:`IntervalIndex` now returns an | ||
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects | ||
(:issue:`19453`). | ||
|
||
Previous Behavior: | ||
|
||
.. code-block:: ipython | ||
|
||
In [1]: idx = pd.interval_range(0, 4) | ||
|
||
In [2]: idx.values | ||
Out[2]: | ||
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'), | ||
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')], | ||
dtype=object) | ||
|
||
New Behavior: | ||
|
||
.. ipython:: python | ||
|
||
idx = pd.interval_range(0, 4) | ||
idx.values | ||
|
||
This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``. | ||
|
||
For situations where you need an ``ndarray`` of Interval objects, use | ||
:meth:`numpy.asarray` or ``idx.astype(object)``. | ||
|
||
.. ipython:: python | ||
|
||
idx.values.astype(object) | ||
|
||
.. _whatsnew_0230.api_breaking.deprecate_panel: | ||
|
||
Deprecate Panel | ||
|
@@ -1067,6 +1139,7 @@ Indexing | |
- Bug in ``Index`` subclasses constructors that ignore unexpected keyword arguments (:issue:`19348`) | ||
- Bug in :meth:`Index.difference` when taking difference of an ``Index`` with itself (:issue:`20040`) | ||
- Bug in :meth:`DataFrame.first_valid_index` and :meth:`DataFrame.last_valid_index` in presence of entire rows of NaNs in the middle of values (:issue:`20499`). | ||
- Bug in the ``IntervalIndex`` repr missing a trailing comma at the end of the "data" section (:issue`20611`) | ||
- Bug in :class:`IntervalIndex` where some indexing operations were not supported for overlapping or non-monotonic ``uint64`` data (:issue:`20636`) | ||
|
||
MultiIndex | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -451,3 +451,11 @@ def is_platform_mac(): | |
|
||
def is_platform_32bit(): | ||
return struct.calcsize("P") * 8 < 64 | ||
|
||
|
||
class _WritableDoc(type): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be in pandas.util._decorators There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not a decorator. Still want it there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually we have a pandas.util._doctools, move there |
||
# Remove this when Python2 support is dropped | ||
# __doc__ is not mutable for new-style classes in Python2, which means | ||
# we can't use @Appender to share class docstrings. This can be used | ||
# with `add_metaclass` to make cls.__doc__ mutable. | ||
pass |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,10 @@ | ||
from .base import ExtensionArray # noqa | ||
from .categorical import Categorical # noqa | ||
from .base import ExtensionArray | ||
from .categorical import Categorical | ||
from .interval import IntervalArray | ||
|
||
|
||
__all__ = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you don’t need the all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's so the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don’t do this anywhere else - don’t need introduce new ways of doing things |
||
'Categorical', | ||
'ExtensionArray', | ||
'IntervalArray', | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
_ensure_int64, | ||
_ensure_object, | ||
_ensure_platform_int, | ||
is_extension_array_dtype, | ||
is_dtype_equal, | ||
is_datetimelike, | ||
is_datetime64_dtype, | ||
|
@@ -1218,6 +1219,11 @@ def __array__(self, dtype=None): | |
ret = take_1d(self.categories.values, self._codes) | ||
if dtype and not is_dtype_equal(dtype, self.categories.dtype): | ||
return np.asarray(ret, dtype) | ||
if is_extension_array_dtype(ret): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. comment this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have an update on this section already in intna |
||
# When we're a Categorical[ExtensionArray], like Interval, | ||
# we need to ensure __array__ get's all the way to an | ||
# ndarray. | ||
ret = np.asarray(ret) | ||
return ret | ||
|
||
def __setstate__(self, state): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also Periods ?