-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275
Comments
Take |
@AnnaDaglis Thanks for taking a look at this! If you need any pointers, let me know |
@jorisvandenbossche Yes, please, would appreciate some pointers! I found the 2) point in #39260 (comment) relating to
Some changes in the code along these lines work fine on the toy examples I tried, but break a lot of tests.
Would be great to have your thoughts/ideas/pointers! :) |
Yes, in general the ExtensionBlock (or subclasses like DatetimeTZBlock) is only 1D, and so for those the axis should not be changed, only for the blocks storing their data as 2D. Now, an alternative could also be to apply the ufunc column-wise instead of per block. The we don't need to deal with this axis difference. |
@jorisvandenbossche The alternative approach looks somewhat "cleaner" to me, thank you! Will try to implement it. |
Follow-up on #39260 (comment)
Currently, an "accumulate" ufunc is applied on the full DataFrame at once, with the consequence that it doesn't preserve dtypes if you have mixed numeric columns, eg:
It is certainly possible for the default case (corresponding to
.accumulate(axis=0)
) to apply this ufunc on each column or block, to preserve the column dtypes. Whenaxis=1
is passed to the ufunc this is not possible.See at the linked PR discussion above for some more details at what is involved to implement this.
The text was updated successfully, but these errors were encountered: