Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Block._astype defer to astype_nansafe in more cases #38562

Merged
merged 9 commits into from
Dec 21, 2020

Conversation

jbrockmendel
Copy link
Member

Makes astype_nansafe for (td64|dt64) -> (object|str|string) match DTA/TDA/Series behavior.

Medium-term (weeks) the goal is to get rid of Block._astype altogether and just use astype_nansafe, which among other things will be helpful for ArrayManager.

This changes Series[dt64].astype("string") behavior in a way that causes a new xfail in test_astype_roundtrip, but as discussed in #36153 that test is already wrong for other reasons.

This also has a side-effect of changing Series(dt64, dtype="Sparse[object]") behavior, discussed in #38508 as possibly not-desirable.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2020

This also has a side-effect of changing Series(dt64, dtype="Sparse[object]") behavior, discussed in #38508 as possibly not-desirable.

where is this test case changed?

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Refactor Internal refactoring of code labels Dec 19, 2020
@jreback jreback added this to the 1.3 milestone Dec 19, 2020
@jbrockmendel
Copy link
Member Author

where is this test case changed?

we dont have a test for this; #38508 introduced new ones

@jreback jreback merged commit 9a46a4b into pandas-dev:master Dec 21, 2020
@jbrockmendel jbrockmendel deleted the ref-blk-astype-3 branch December 21, 2020 17:29
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
@simonjayhawkins
Copy link
Member

This changes Series[dt64].astype("string") behavior in a way that causes a new xfail in test_astype_roundtrip, but as discussed in #36153 that test is already wrong for other reasons.

I'm not sure about the new behaviour, I think this should at least have a release note if not reverted.

old behaviour

>>> pd.__version__
'1.2.4'
>>> 
>>> tdi = pd.timedelta_range("1 Day", periods=3)
>>> ser = pd.Series(tdi)
>>> ser
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]
>>> ser.astype("string")
0    1 days
1    2 days
2    3 days
dtype: string
>>> 
>>> dti = pd.date_range("2021", periods=3)
>>> ser = pd.Series(dti)
>>> ser
0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]
>>> ser.astype("string")
0    2021-01-01
1    2021-01-02
2    2021-01-03
dtype: string

new behaviour

>>> pd.__version__
'1.3.0.dev0+1567.g67c9385787'
>>> 
>>> tdi = pd.timedelta_range("1 Day", periods=3)
>>> ser = pd.Series(tdi)
>>> ser
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]
>>> ser.astype("string")
0     86400000000000 nanoseconds
1    172800000000000 nanoseconds
2    259200000000000 nanoseconds
dtype: string
>>> 
>>> dti = pd.date_range("2021", periods=3)
>>> ser = pd.Series(dti)
>>> ser
0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]
>>> ser.astype("string")
0    2021-01-01T00:00:00.000000000
1    2021-01-02T00:00:00.000000000
2    2021-01-03T00:00:00.000000000
dtype: string
>>> 

@jreback
Copy link
Contributor

jreback commented May 10, 2021

@simonjayhawkins can u open a new issue - this is going to be very hard to revert - but i agree original behavior is correct so should fix

@simonjayhawkins
Copy link
Member

#41409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants