Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: use _validate_fill_value in Index.insert #38102

Merged
merged 16 commits into from
Dec 13, 2020

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jbrockmendel jbrockmendel mentioned this pull request Nov 27, 2020
pandas/core/dtypes/cast.py Show resolved Hide resolved
pandas/core/indexes/base.py Show resolved Hide resolved
pandas/core/indexes/numeric.py Show resolved Hide resolved
pandas/tests/dtypes/cast/test_promote.py Outdated Show resolved Hide resolved
@jreback jreback added Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Refactor Internal refactoring of code labels Nov 28, 2020
@jreback
Copy link
Contributor

jreback commented Nov 28, 2020

I assume that you are going to move some of this logic to the array/_mixins (or that is the goal)

@jbrockmendel
Copy link
Member Author

I assume that you are going to move some of this logic to the array/_mixins (or that is the goal)

For all our existing ExtensionIndex subclasses the logic is already there.

I haven't 100% decided where I expect this logic to end up, am still tracking down all the places where we do something similar and figuring out if we can align all the behavior.

Comment on lines +580 to +582
elif is_valid_nat_for_dtype(fill_value, dtype):
# e.g. pd.NA, which is not accepted by Timestamp constructor
fill_value = np.datetime64("NaT", "ns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But pd.NA should not be an allowed fill_value for datetime dtypes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought (half of) the whole idea of pd.NA was that it would be a valid NA value for every dtype.

(the other half being kleene logic)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we support pd.NA atm in datetimelikes (agree eventually we should but we should do this deliberately). However doesn't is_valid_nat_for_dtype distinguish this explicitly?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought (half of) the whole idea of pd.NA was that it would be a valid NA value for every type.

In the future yes, for sure. But at the moment, we don't use pd.NA for datetimelikes, and also don't properly support it in operations with datetimelikes.

So therefore I am wondering if we should allow it here.

(now, it seems that it already does work on master as well, though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However doesn't is_valid_nat_for_dtype distinguish this explicitly?

is_valid_nat_for_dtype considers pd.NA valid for all dtypes, bc that was my understanding of the intent of pd.NA.

@jreback jreback mentioned this pull request Dec 2, 2020
else:
# NaT, np.datetime64("NaT"), np.timedelta64("NaT")
raise TypeError

elif is_scalar(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make sense to override this in Float vs Int index to avoid some of this complication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this as a followon. adding all of this logic here (in the index iteself) makes for duplication in other places more likely. this is almost like maybe_promote.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

PR itself looks ok to me. @jorisvandenbossche @TomAugspurger on the pd.NA comments above (e.g. checking for NA actually will work for datetimelikes).

@jbrockmendel
Copy link
Member Author

any more thoughts here? trying to unify all our casting code

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am ok merging as-is if the intent is to consoliate this fill logic elsewhere.

else:
# NaT, np.datetime64("NaT"), np.timedelta64("NaT")
raise TypeError

elif is_scalar(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this as a followon. adding all of this logic here (in the index iteself) makes for duplication in other places more likely. this is almost like maybe_promote.

@jreback jreback added this to the 1.3 milestone Dec 13, 2020
@jbrockmendel
Copy link
Member Author

if the intent is to consoliate this fill logic elsewhere

very much so

@jreback jreback merged commit 67305b2 into pandas-dev:master Dec 13, 2020
@jbrockmendel jbrockmendel deleted the ref-insert-2 branch December 13, 2020 18:26
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants