Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: use _validate_fill_value in Index.insert #38102

Merged
merged 16 commits into from
Dec 13, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
214820d
BUG: NumericIndex.insert(0, False) casting to int
jbrockmendel Nov 24, 2020
631cf65
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 25, 2020
eb8bc34
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 25, 2020
6d03c75
CLN: remove ABCIndex (#38055)
jbrockmendel Nov 26, 2020
47fea62
ENH: Implement cross method for Merge Operations (#37864)
phofl Nov 26, 2020
d05a12c
BUG: __getitem__ raise blank KeyError for IntervalIndex and missing k…
phofl Nov 26, 2020
fd5cb3f
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 26, 2020
c28fb2b
REF: use validate_fill_value idiom in Index.insert
jbrockmendel Nov 27, 2020
19086ea
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 28, 2020
6e2c0aa
use fixture, docstring
jbrockmendel Nov 28, 2020
c4aeb97
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 29, 2020
7382321
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 29, 2020
de857e5
flesh out docstring
jbrockmendel Nov 29, 2020
f121087
raises section for maybe_promote docstring
jbrockmendel Nov 29, 2020
d222c32
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 30, 2020
97ada24
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

import numpy as np

from pandas._libs import lib, tslib, tslibs
from pandas._libs import lib, missing as libmissing, tslib, tslibs
from pandas._libs.tslibs import (
NaT,
OutOfBoundsDatetime,
Expand Down Expand Up @@ -577,6 +577,9 @@ def maybe_promote(dtype, fill_value=np.nan):
dtype = np.dtype(np.object_)
elif is_integer(fill_value) or (is_float(fill_value) and not isna(fill_value)):
dtype = np.dtype(np.object_)
elif is_valid_nat_for_dtype(fill_value, dtype):
# e.g. pd.NA, which is not accepted by Timestamp constructor
fill_value = np.datetime64("NaT", "ns")
Comment on lines +558 to +560
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But pd.NA should not be an allowed fill_value for datetime dtypes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought (half of) the whole idea of pd.NA was that it would be a valid NA value for every dtype.

(the other half being kleene logic)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we support pd.NA atm in datetimelikes (agree eventually we should but we should do this deliberately). However doesn't is_valid_nat_for_dtype distinguish this explicitly?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought (half of) the whole idea of pd.NA was that it would be a valid NA value for every type.

In the future yes, for sure. But at the moment, we don't use pd.NA for datetimelikes, and also don't properly support it in operations with datetimelikes.

So therefore I am wondering if we should allow it here.

(now, it seems that it already does work on master as well, though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However doesn't is_valid_nat_for_dtype distinguish this explicitly?

is_valid_nat_for_dtype considers pd.NA valid for all dtypes, bc that was my understanding of the intent of pd.NA.

else:
try:
fill_value = Timestamp(fill_value).to_datetime64()
Expand All @@ -590,6 +593,9 @@ def maybe_promote(dtype, fill_value=np.nan):
):
# TODO: What about str that can be a timedelta?
dtype = np.dtype(np.object_)
elif is_valid_nat_for_dtype(fill_value, dtype):
# e.g pd.NA, which is not accepted by the Timedelta constructor
fill_value = np.timedelta64("NaT", "ns")
else:
try:
fv = Timedelta(fill_value)
Expand Down Expand Up @@ -663,7 +669,7 @@ def maybe_promote(dtype, fill_value=np.nan):
# e.g. mst is np.complex128 and dtype is np.complex64
dtype = mst

elif fill_value is None:
elif fill_value is None or fill_value is libmissing.NA:
if is_float_dtype(dtype) or is_complex_dtype(dtype):
fill_value = np.nan
elif is_integer_dtype(dtype):
Expand All @@ -673,7 +679,8 @@ def maybe_promote(dtype, fill_value=np.nan):
fill_value = dtype.type("NaT", "ns")
else:
dtype = np.dtype(np.object_)
fill_value = np.nan
if fill_value is not libmissing.NA:
jreback marked this conversation as resolved.
Show resolved Hide resolved
fill_value = np.nan
else:
dtype = np.dtype(np.object_)

Expand Down
39 changes: 18 additions & 21 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@

from pandas.core.dtypes.cast import (
maybe_cast_to_integer_array,
maybe_promote,
validate_numeric_casting,
)
from pandas.core.dtypes.common import (
Expand Down Expand Up @@ -4165,28 +4166,10 @@ def _string_data_error(cls, data):
"to explicitly cast to a numeric type"
)

@final
def _coerce_scalar_to_index(self, item):
"""
We need to coerce a scalar to a compat for our index type.

Parameters
----------
item : scalar item to coerce
"""
dtype = self.dtype

if self._is_numeric_dtype and isna(item):
# We can't coerce to the numeric dtype of "self" (unless
# it's float) if there are NaN values in our output.
dtype = None

return Index([item], dtype=dtype, **self._get_attributes_dict())

def _validate_fill_value(self, value):
"""
Check if the value can be inserted into our array, and convert
it to an appropriate native type if necessary.
Check if the value can be inserted into our array without casting,
and convert it to an appropriate native type if necessary.
"""
return value

Expand Down Expand Up @@ -5487,8 +5470,22 @@ def insert(self, loc: int, item):
"""
# Note: this method is overridden by all ExtensionIndex subclasses,
# so self is never backed by an EA.

try:
item = self._validate_fill_value(item)
except TypeError:
if is_scalar(item):
dtype, item = maybe_promote(self.dtype, item)
else:
jreback marked this conversation as resolved.
Show resolved Hide resolved
# maybe_promote would raise ValueError
dtype = np.dtype(object)

return self.astype(dtype).insert(loc, item)

arr = np.asarray(self)
item = self._coerce_scalar_to_index(item)._values

# Use Index constructor to ensure we get tuples cast correctly.
item = Index([item], dtype=self.dtype)._values
idx = np.concatenate((arr[:loc], item, arr[loc:]))
return Index(idx, name=self.name)

Expand Down
32 changes: 20 additions & 12 deletions pandas/core/indexes/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
is_float,
is_float_dtype,
is_integer_dtype,
is_number,
is_numeric_dtype,
is_scalar,
is_signed_integer_dtype,
Expand Down Expand Up @@ -114,21 +115,37 @@ def _shallow_copy(self, values=None, name: Label = lib.no_default):

def _validate_fill_value(self, value):
"""
Convert value to be insertable to ndarray.
Check if the value can be inserted into our array without casting,
and convert it to an appropriate native type if necessary.
"""
if is_bool(value) or is_bool_dtype(value):
# force conversion to object
# so we don't lose the bools
raise TypeError
elif isinstance(value, str) or lib.is_complex(value):
raise TypeError
elif is_scalar(value) and isna(value):
if is_valid_nat_for_dtype(value, self.dtype):
value = self._na_value
if self.dtype.kind != "f":
jreback marked this conversation as resolved.
Show resolved Hide resolved
# raise so that caller can cast
raise TypeError
else:
# NaT, np.datetime64("NaT"), np.timedelta64("NaT")
raise TypeError

elif is_scalar(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make sense to override this in Float vs Int index to avoid some of this complication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this as a followon. adding all of this logic here (in the index iteself) makes for duplication in other places more likely. this is almost like maybe_promote.

if not is_number(value):
# e.g. datetime64, timedelta64, datetime, ...
raise TypeError

elif lib.is_complex(value):
# at least until we have a ComplexIndx
raise TypeError

elif is_float(value) and self.dtype.kind != "f":
if not value.is_integer():
raise TypeError
value = int(value)

return value

def _convert_tolerance(self, tolerance, target):
Expand Down Expand Up @@ -164,15 +181,6 @@ def _is_all_dates(self) -> bool:
"""
return False

@doc(Index.insert)
def insert(self, loc: int, item):
try:
item = self._validate_fill_value(item)
except TypeError:
return self.astype(object).insert(loc, item)

return super().insert(loc, item)

def _union(self, other, sort):
# Right now, we treat union(int, float) a bit special.
# See https://github.com/pandas-dev/pandas/issues/26778 for discussion
Expand Down
11 changes: 8 additions & 3 deletions pandas/tests/dtypes/cast/test_promote.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ def _assert_match(result_fill_value, expected_fill_value):
assert res_type == ex_type or res_type.__name__ == ex_type.__name__

match_value = result_fill_value == expected_fill_value
if match_value is pd.NA:
match_value = False

# Note: type check above ensures that we have the _same_ NA value
# for missing values, None == None (which is checked
Expand Down Expand Up @@ -569,8 +571,8 @@ def test_maybe_promote_any_with_object(any_numpy_dtype_reduced, object_dtype):
_check_promote(dtype, fill_value, expected_dtype, exp_val_for_scalar)


@pytest.mark.parametrize("fill_value", [None, np.nan, NaT])
def test_maybe_promote_any_numpy_dtype_with_na(any_numpy_dtype_reduced, fill_value):
def test_maybe_promote_any_numpy_dtype_with_na(any_numpy_dtype_reduced, nulls_fixture):
fill_value = nulls_fixture
dtype = np.dtype(any_numpy_dtype_reduced)

if is_integer_dtype(dtype) and fill_value is not NaT:
Expand All @@ -597,7 +599,10 @@ def test_maybe_promote_any_numpy_dtype_with_na(any_numpy_dtype_reduced, fill_val
else:
# all other cases cast to object, and use np.nan as missing value
expected_dtype = np.dtype(object)
exp_val_for_scalar = np.nan
if fill_value is pd.NA:
exp_val_for_scalar = pd.NA
else:
exp_val_for_scalar = np.nan

_check_promote(dtype, fill_value, expected_dtype, exp_val_for_scalar)

Expand Down