ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275

jorisvandenbossche · 2021-01-19T17:59:42Z

Currently, an "accumulate" ufunc is applied on the full DataFrame at once, with the consequence that it doesn't preserve dtypes if you have mixed numeric columns, eg:

In [4]: df = pd.DataFrame({"a": [1, 3, 2, 4], "b": [0.1, 4.0, 3.0, 2.0]})

In [5]: df
Out[5]: 
   a    b
0  1  0.1
1  3  4.0
2  2  3.0
3  4  2.0

In [6]: np.maximum.accumulate(df)
Out[6]: 
     a    b
0  1.0  0.1
1  3.0  4.0
2  3.0  4.0
3  4.0  4.0

It is certainly possible for the default case (corresponding to .accumulate(axis=0)) to apply this ufunc on each column or block, to preserve the column dtypes. When axis=1 is passed to the ufunc this is not possible.

See at the linked PR discussion above for some more details at what is involved to implement this.

The text was updated successfully, but these errors were encountered:

AnnaDaglis · 2021-01-27T19:58:17Z

Take

jorisvandenbossche · 2021-01-29T15:11:16Z

@AnnaDaglis Thanks for taking a look at this! If you need any pointers, let me know

AnnaDaglis · 2021-01-29T15:55:32Z

@jorisvandenbossche Yes, please, would appreciate some pointers! I found the 2) point in #39260 (comment) relating to axis somewhat challenging. E.g. if we have a DatetimeTZBlock, we would not actually need to change to axis=1, as it throws an error. So we would need to go back to axis=0 there. E.g. the following throws an error.

df = pd.DataFrame(date_range("20210129", periods=4, tz="UTC")) 
getattr(np.maximum, "accumulate")(df._mgr.blocks[0].values, axis=1)

Some changes in the code along these lines work fine on the toy examples I tried, but break a lot of tests.

df = pd.DataFrame(date_range("20210129", periods=4, tz="UTC")) 
getattr(np.maximum, "accumulate")(df._mgr.blocks[0].values, axis=0)

Would be great to have your thoughts/ideas/pointers! :)

jorisvandenbossche · 2021-01-29T16:17:30Z

E.g. if we have a DatetimeTZBlock, we would not actually need to change to axis=1, as it throws an error. So we would need to go back to axis=0 there.

Yes, in general the ExtensionBlock (or subclasses like DatetimeTZBlock) is only 1D, and so for those the axis should not be changed, only for the blocks storing their data as 2D.

Now, an alternative could also be to apply the ufunc column-wise instead of per block. The we don't need to deal with this axis difference.
Dummy code would be something like result = [ufunc(arr, ...) for arr in df._iter_column_arrays()]; pd.DataFrame._from_arrays(result, df.columns, df.index, verify_integrity=False)

AnnaDaglis · 2021-01-29T19:13:50Z

@jorisvandenbossche The alternative approach looks somewhat "cleaner" to me, thank you! Will try to implement it.

jorisvandenbossche added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations labels Jan 19, 2021

jorisvandenbossche added this to the Contributions Welcome milestone Jan 19, 2021

jorisvandenbossche mentioned this issue Jan 19, 2021

REGR: fix numpy accumulate ufuncs for DataFrame #39260

Merged

github-actions bot assigned AnnaDaglis Jan 27, 2021

jorisvandenbossche mentioned this issue Apr 12, 2021

REGR: ufunc with DataFrame input not passing all kwargs #40878

Merged

4 tasks

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275

ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275

jorisvandenbossche commented Jan 19, 2021

AnnaDaglis commented Jan 27, 2021

jorisvandenbossche commented Jan 29, 2021

AnnaDaglis commented Jan 29, 2021 •

edited

Loading

jorisvandenbossche commented Jan 29, 2021

AnnaDaglis commented Jan 29, 2021 •

edited

Loading

ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275

ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275

Comments

jorisvandenbossche commented Jan 19, 2021

AnnaDaglis commented Jan 27, 2021

jorisvandenbossche commented Jan 29, 2021

AnnaDaglis commented Jan 29, 2021 • edited Loading

jorisvandenbossche commented Jan 29, 2021

AnnaDaglis commented Jan 29, 2021 • edited Loading

AnnaDaglis commented Jan 29, 2021 •

edited

Loading

AnnaDaglis commented Jan 29, 2021 •

edited

Loading