Skip to content

Commit

Permalink
[ENH] Add reverse parameter to cumsum(), cumprod(), cummin()
Browse files Browse the repository at this point in the history
…and `cummax()` (#3381)

Add `reverse` parameter to control direction of cumulative function's calculations: 
- when `False`, calculation is done from top to bottom (default);
- when `True`, calculation is done from bottom to top.

Сloses #3279
  • Loading branch information
samukweku authored Nov 20, 2022
1 parent 5576dea commit 58507fd
Show file tree
Hide file tree
Showing 13 changed files with 298 additions and 74 deletions.
26 changes: 22 additions & 4 deletions docs/api/dt/cummax.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,22 @@
:src: src/core/expr/fexpr_cumminmax.cc pyfn_cummax
:tests: tests/dt/test-cumminmax.py
:cvar: doc_dt_cummax
:signature: cummax(cols)
:signature: cummax(cols, reverse=False)

.. x-version-added:: 1.1.0

For each column from `cols` calculate cumulative maximum. In the presence of :func:`by()`,
the cumulative maximum is computed within each group.
For each column from `cols` calculate cumulative maximum. In the presence
of :func:`by()`, the cumulative maximum is computed within each group.

Parameters
----------
cols: FExpr
Input data for cumulative maximum calculation.

reverse: bool
If ``False``, computation is done from top to bottom.
If ``True``, it is done from bottom to top.

return: FExpr
f-expression that converts input columns into the columns filled
with the respective cumulative maximums.
Expand Down Expand Up @@ -57,6 +61,20 @@
3 | 5
4 | 5
[5 rows x 1 column]

Calculate the cumulative maximum from bottom to top::

>>> DT[:, dt.cummax(f.A, reverse=True)]
| A
| int32
-- + -----
0 | 5
1 | 5
2 | 5
3 | 0
4 | 0
[5 rows x 1 column]


Calculate the cumulative maximum in multiple columns::
Expand All @@ -73,7 +91,7 @@
[5 rows x 3 columns]


In the presence of :func:`by()` calculate the cumulative maximum within each group::
For a grouped frame calculate the cumulative maximum within each group::

>>> DT[:, dt.cummax(f[:]), by('D')]
| D A B C
Expand Down
22 changes: 20 additions & 2 deletions docs/api/dt/cummin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
:src: src/core/expr/fexpr_cumminmax.cc pyfn_cummin
:tests: tests/dt/test-cumminmax.py
:cvar: doc_dt_cummin
:signature: cummin(cols)
:signature: cummin(cols, reverse=False)

.. x-version-added:: 1.1.0

Expand All @@ -15,6 +15,10 @@
cols: FExpr
Input data for cumulative minimum calculation.

reverse: bool
If ``False``, computation is done from top to bottom.
If ``True``, it is done from bottom to top.

return: FExpr
f-expression that converts input columns into the columns filled
with the respective cumulative minimums.
Expand Down Expand Up @@ -57,6 +61,20 @@
3 | -1
4 | -1
[5 rows x 1 column]

Calculate the cumulative minimum from bottom to top::

>>> DT[:, dt.cummin(f.A, reverse=True)]
| A
| int32
-- + -----
0 | -1
1 | -1
2 | -1
3 | -1
4 | 0
[5 rows x 1 column]


Calculate the cumulative minimum in multiple columns::
Expand All @@ -73,7 +91,7 @@
[5 rows x 3 columns]


In the presence of :func:`by()` calculate the cumulative minimum within each group::
For a grouped frame calculate the cumulative minimum within each group::

>>> DT[:, dt.cummin(f[:]), by('D')]
| D A B C
Expand Down
22 changes: 20 additions & 2 deletions docs/api/dt/cumprod.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
:src: src/core/expr/fexpr_cumsumprod.cc pyfn_cumprod
:tests: tests/dt/test-cumprod.py
:cvar: doc_dt_cumprod
:signature: cumprod(cols)
:signature: cumprod(cols, reverse=False)

.. x-version-added:: 1.1.0

Expand All @@ -16,6 +16,10 @@
cols: FExpr
Input data for cumulative product calculation.

reverse: bool
If ``False``, computation is done from top to bottom.
If ``True``, it is done from bottom to top.

return: FExpr
f-expression that converts input columns into the columns filled
with the respective cumulative products.
Expand Down Expand Up @@ -58,6 +62,20 @@
3 | -10
4 | 0
[5 rows x 1 column]

Calculate the cumulative product from bottom to top::

>>> DT[:, dt.cumprod(f.A, reverse=True)]
| A
| int64
-- + -----
0 | 0
1 | 0
2 | 0
3 | 0
4 | 0
[5 rows x 1 column]


Calculate cumulative products in multiple columns::
Expand All @@ -74,7 +92,7 @@
[5 rows x 3 columns]


In the presence of :func:`by()` calculate cumulative products within each group::
For a grouped frame calculate cumulative products within each group::

>>> DT[:, dt.cumprod(f[:]), by('D')]
| D A B C
Expand Down
22 changes: 20 additions & 2 deletions docs/api/dt/cumsum.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
:src: src/core/expr/fexpr_cumsumprod.cc pyfn_cumsum
:tests: tests/dt/test-cumsum.py
:cvar: doc_dt_cumsum
:signature: cumsum(cols)
:signature: cumsum(cols, reverse=False)

.. x-version-added:: 1.1.0

Expand All @@ -16,6 +16,10 @@
cols: FExpr
Input data for cumulative sum calculation.

reverse: bool
If ``False``, computation is done from top to bottom.
If ``True``, it is done from bottom to top.

return: FExpr
f-expression that converts input columns into the columns filled
with the respective cumulative sums.
Expand Down Expand Up @@ -60,6 +64,20 @@
[5 rows x 1 column]


Calculate the cumulative sum from bottom to top::

>>> DT[:, dt.cumsum(f.A, reverse=True)]
| A
| int64
-- + -----
0 | 6
1 | 4
2 | 4
3 | -1
4 | 0
[5 rows x 1 column]

Calculate cumulative sums in multiple columns::

>>> DT[:, dt.cumsum(f[:-1])]
Expand All @@ -74,7 +92,7 @@
[5 rows x 3 columns]


In the presence of :func:`by()` calculate cumulative sums within each group::
For a grouped frame calculate cumulative sums within each group::

>>> DT[:, dt.cumsum(f[:]), by('D')]
| D A B C
Expand Down
50 changes: 35 additions & 15 deletions src/core/column/cumminmax.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
namespace dt {


template <typename T, bool MIN>
template <typename T, bool MIN, bool REVERSE>
class CumMinMax_ColumnImpl : public Virtual_ColumnImpl {
private:
Column col_;
Expand Down Expand Up @@ -57,25 +57,43 @@ class CumMinMax_ColumnImpl : public Virtual_ColumnImpl {
[&](size_t gi) {
size_t i1 = size_t(offsets[gi]);
size_t i2 = size_t(offsets[gi + 1]);

T val;
bool res_valid = col_.get_element(i1, &val);
data[i1] = res_valid? val : GETNA<T>();

for (size_t i = i1 + 1; i < i2; ++i) {
bool is_valid = col_.get_element(i, &val);
if (is_valid) {
if (MIN) {
data[i] = (res_valid && data[i - 1] < val)? data[i - 1] : val;

if (REVERSE) {
bool res_valid = col_.get_element(i2 - 1, &val);
data[i2 - 1] = res_valid? val : GETNA<T>();

for (size_t i = i2 - 1; i-- > i1;) {
bool is_valid = col_.get_element(i, &val);
if (is_valid) {
if (MIN) {
data[i] = (res_valid && data[i + 1] < val)? data[i + 1] : val;
} else {
data[i] = (res_valid && data[i + 1] > val)? data[i + 1] : val;
}
res_valid = true;
} else {
data[i] = (res_valid && data[i - 1] > val)? data[i - 1] : val;
data[i] = data[i + 1];
}
}
} else {
bool res_valid = col_.get_element(i1, &val);
data[i1] = res_valid? val : GETNA<T>();

for (size_t i = i1 + 1; i < i2; ++i) {
bool is_valid = col_.get_element(i, &val);
if (is_valid) {
if (MIN) {
data[i] = (res_valid && data[i - 1] < val)? data[i - 1] : val;
} else {
data[i] = (res_valid && data[i - 1] > val)? data[i - 1] : val;
}
res_valid = true;
} else {
data[i] = data[i - 1];
}
res_valid = true;
} else {
data[i] = data[i - 1];
}
}


});

Expand All @@ -87,10 +105,12 @@ class CumMinMax_ColumnImpl : public Virtual_ColumnImpl {
return new CumMinMax_ColumnImpl(Column(col_), gby_);
}


size_t n_children() const noexcept override {
return 1;
}


const Column& child(size_t i) const override {
xassert(i == 0); (void)i;
return col_;
Expand Down
36 changes: 29 additions & 7 deletions src/core/column/cumsumprod.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

namespace dt {

template <typename T, bool SUM>
template <typename T, bool SUM, bool REVERSE>
class CumSumProd_ColumnImpl : public Virtual_ColumnImpl {
private:
Column col_;
Expand All @@ -44,19 +44,36 @@ namespace dt {
xassert(col_.can_be_read_as<T>());
}


void materialize(Column &col_out, bool) override {
Latent_ColumnImpl::vivify<T>(col_);
Column col = Column::new_data_column(col_.nrows(), col_.stype());
auto data = static_cast<T*>(col.get_data_editable());

auto offsets = gby_.offsets_r();
dt::parallel_for_dynamic(
gby_.size(),
[&](size_t gi) {
size_t i1 = size_t(offsets[gi]);
size_t i2 = size_t(offsets[gi + 1]);
gby_.size(),
[&](size_t gi) {
size_t i1 = size_t(offsets[gi]);
size_t i2 = size_t(offsets[gi + 1]);
T val;

T val;
if (REVERSE) {
bool is_valid = col_.get_element(i2 - 1, &val);
if (SUM) {
data[i2 - 1] = is_valid? val : 0;
} else {
data[i2 - 1] = is_valid? val : 1;
}
for (size_t i = i2 - 1; i-- > i1;) {
is_valid = col_.get_element(i, &val);
if (SUM) {
data[i] = data[i + 1] + (is_valid? val : 0);
} else {
data[i] = data[i + 1] * (is_valid? val : 1);
}
}
} else {
bool is_valid = col_.get_element(i1, &val);
if (SUM) {
data[i1] = is_valid? val : 0;
Expand All @@ -71,19 +88,24 @@ namespace dt {
data[i] = data[i - 1] * (is_valid? val : 1);
}
}
});
}
}
);

col_out = std::move(col);
}


ColumnImpl *clone() const override {
return new CumSumProd_ColumnImpl(Column(col_), gby_);
}


size_t n_children() const noexcept override {
return 1;
}


const Column &child(size_t i) const override {
xassert(i == 0);
(void)i;
Expand Down
Loading

0 comments on commit 58507fd

Please sign in to comment.