-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Feature request, date type #32473
Comments
Period[D] behaves a lot like date. Could that be adapted to solve the parquet issue? |
That seems kind of like a work around, is there a real reason that pandas doesn’t have a date type? It’s a pretty essential and standard type for most data platforms and frameworks |
@zbrookle you are welcome to spend the time to develop one normalized Timestamps are pretty easy to understand and easily act like a date type |
@jreback Okay awesome I’d definitely love to work on it |
our current setup actually would be pretty reasonable to support a dtype and extension array of type datetime64[D] (and maybe other freqs); may also have to be named slightly different could be backed by a nullable integer array that represent ordinala from epoch i don’t think it’s that hard actually |
As reference, Apache Arrow has two different For pandas, having two different types feels a bit overkill though. And compatibility with numpy's |
@jorisvandenbossche I think the compatibility with NumPy is probably the most important since that's what the backend of pandas is. It wouldn't be too hard to have logic that converts to the appropriate pyarrow format when a dataframe writes to parquet. I think there would just have to be an error that would be raised if they tried to write out to a date that is beyond 32 bits, which I don't think will happened very frequently |
Compatibility with numpy is important, but I am not fully sure the numpy being the backend is important in this case. I think many of the datetime-specific functionalities are implemented in pandas, and are not coming from numpy (although I am not very much up to date with the exact details here) |
@jorisvandenbossche I'm actually very confident that the backend for at least the datetime64 object in pandas is NumPy (because a lot of my implementation of the date was based off the datetime type) and so I think the biggest priority in terms of conversion is that these two types at the very least work well and efficiently together |
Yes, we use the numpy datetime64 dtype, but that doesn't mean we use much of numpy to do datetime-related things with it (in the end, it's just an int64 array with an annotation). Eg the code to get "fields" from a date (like the month, of the day of the month, etc): https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/fields.pyx |
This is correct, largely because the relevant C functions in numpy are not exposed. If they become exposed (xref numpy/numpy#16364) i expect we'll start using them (once our minimum np version catches up) instead of having our own mostly-copy/pasted versions. |
Those functions all assume the int64s being passed represent nanosecond unit timestamps, which wouldn't apply here. What would apply is wrapping Period[D] |
I’m aware that the actual date functions and abstractions are not through NumPy, but the backing array containing the data is, and that's what will impact conversion between the different data types within pandas using the .astype method. I think the only real decision here is whether to have the DateDtype backed by an int32 array or an int64 array.
|
Copying my comment from here: googleapis/python-db-dtypes-pandas#30 [In the db-dtypes package], we did try |
we now have pyarrow-backed date dtypes. im curious if that handles this use case |
I think this is handled by the |
There currently isn’t any native date dtype in pandas, which makes it impossible to integrate this type into file formats like parquet, where schema is defined
The text was updated successfully, but these errors were encountered: