-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clib.conversion._to_numpy: Add tests for pandas.Series with datetime dtypes #3670
Conversation
850ac31
to
a64e9e3
Compare
a64e9e3
to
5867999
Compare
068c5cb
to
56a266d
Compare
This reverts commit 56a266d.
6b7017b
to
fb14509
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok, just one suggestion to add a reminder to remove the pandas<2.1 workaround.
if Version(pd.__version__) < Version("2.1"): | ||
# In pandas 2.0, dtype.numpy_type is dtype("O"). | ||
numpy_dtype = np.dtype(f"M8[{dtype.pyarrow_dtype.unit}]") # type: ignore[assignment, attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO
here to remove this once we drop support for pandas 2.0? Should be after 2025-08-29 according to https://scientific-python.org/specs/spec-0000/#support-window
This PR adds tests for pandas.Series with datetime dtypes. Address #3600.
In pandas, datetime dtypes can be specified in following ways:
"datetime64[s]"
pd.DatetimeTZDtype(s, tz="UTC")
or"datetime64[s, UTC]"
pd.ArrowDtype(pyarrow.Timestamp(s, tz="UTC"))
or"timestamp[s, UTC][pyarrow]"
The following codes help us understand the default conversion behaviors:
Via NumPy dtypes. The conversions are done as expected.
Via pd.DateTimeTZDtype with TZ. The pandas.series object is converted to
object
dtype. So we need to deal with the conversion manually. The expected numpy dtype and TZ information can be accessed viaseries.dtype.base
andseries.dtype.tz
.In pandas 2.0, there was a bug (pandas-dev/pandas#52705) that pd.DateTimeTZDtype with any units are stored with dtype in
ns
resolution. The bug was fixed in pandas 2.1 (pandas-dev/pandas#52706), but there is no workaround on our side so the related tests are marked as xfail for pandas 2.0Via pa.Timestamp. The pandas.Series object is converted to
object
dtype. So, we need to deal with it manually. The expected numpy datetime type and TZ information can be accessed viaseries.dtype.numpy_dtype
andseries.dtype.pyarrow_dtype.tz
.In pandas 2.0,
series.dtype.numpy_dtype
isdtype('O')
, and it doesn't have theseries.dt.tz_convert
method.