Skip to content

Commit

Permalink
Add cumprod() function for cumulative product calculation (#3304)
Browse files Browse the repository at this point in the history
- add `cumprod()` function for cumulative product calculation;
- refactor `cumsum()` internals to also support `cumprod()`;

WIP for #3279
  • Loading branch information
samukweku authored Jun 28, 2022
1 parent 029646f commit 838cf5e
Show file tree
Hide file tree
Showing 17 changed files with 409 additions and 151 deletions.
2 changes: 1 addition & 1 deletion docs/api/dt/cummax.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

Create a sample datatable frame::

>>> from datatable import dt, f
>>> from datatable import dt, f, by
>>> DT = dt.Frame({"A": [2, None, 5, -1, 0],
... "B": [None, None, None, None, None],
... "C": [5.4, 3, 2.2, 4.323, 3],
Expand Down
2 changes: 1 addition & 1 deletion docs/api/dt/cummin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

Create a sample datatable frame::

>>> from datatable import dt, f
>>> from datatable import dt, f, by
>>> DT = dt.Frame({"A": [2, None, 5, -1, 0],
... "B": [None, None, None, None, None],
... "C": [5.4, 3, 2.2, 4.323, 3],
Expand Down
89 changes: 89 additions & 0 deletions docs/api/dt/cumprod.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@

.. xfunction:: datatable.cumprod
:src: src/core/expr/fexpr_cumsumprod.cc pyfn_cumprod
:tests: tests/dt/test-cumprod.py
:cvar: doc_dt_cumprod
:signature: cumprod(cols)

.. x-version-added:: 1.1.0

For each column from `cols` calculate cumulative product. The product of
the missing values is calculated as one. In the presence of :func:`by()`,
the cumulative product is computed within each group.

Parameters
----------
cols: FExpr
Input data for cumulative product calculation.

return: FExpr
f-expression that converts input columns into the columns filled
with the respective cumulative products.

except: TypeError
The exception is raised when one of the columns from `cols`
has a non-numeric type.


Examples
--------

Create a sample datatable frame::

>>> from datatable import dt, f, by
>>> DT = dt.Frame({"A": [2, None, 5, -1, 0],
... "B": [None, None, None, None, None],
... "C": [5.4, 3, 2.2, 4.323, 3],
... "D": ['a', 'a', 'b', 'b', 'b']})
| A B C D
| int32 void float64 str32
-- + ----- ---- ------- -----
0 | 2 NA 5.4 a
1 | NA NA 3 a
2 | 5 NA 2.2 b
3 | -1 NA 4.323 b
4 | 0 NA 3 b
[5 rows x 4 columns]


Calculate cumulative product in a single column::

>>> DT[:, dt.cumprod(f.A)]
| A
| int64
-- + -----
0 | 2
1 | 2
2 | 10
3 | -10
4 | 0
[5 rows x 1 column]


Calculate cumulative products in multiple columns::

>>> DT[:, dt.cumprod(f[:-1])]
| A B C
| int64 int64 float64
-- + ----- ----- -------
0 | 2 1 5.4
1 | 2 1 16.2
2 | 10 1 35.64
3 | -10 1 154.072
4 | 0 1 462.215
[5 rows x 3 columns]


In the presence of :func:`by()` calculate cumulative products within each group::

>>> DT[:, dt.cumprod(f[:]), by('D')]
| D A B C
| str32 int64 int64 float64
-- + ----- ----- ----- -------
0 | a 2 1 5.4
1 | a 2 1 16.2
2 | b 5 1 2.2
3 | b -5 1 9.5106
4 | b 0 1 28.5318
[5 rows x 4 columns]

4 changes: 2 additions & 2 deletions docs/api/dt/cumsum.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

.. xfunction:: datatable.cumsum
:src: src/core/expr/fexpr_cumsum.cc pyfn_cumsum
:src: src/core/expr/fexpr_cumsumprod.cc pyfn_cumsum
:tests: tests/dt/test-cumsum.py
:cvar: doc_dt_cumsum
:signature: cumsum(cols)
Expand Down Expand Up @@ -30,7 +30,7 @@

Create a sample datatable frame::

>>> from datatable import dt, f
>>> from datatable import dt, f, by
>>> DT = dt.Frame({"A": [2, None, 5, -1, 0],
... "B": [None, None, None, None, None],
... "C": [5.4, 3, 2.2, 4.323, 3],
Expand Down
6 changes: 5 additions & 1 deletion docs/api/fexpr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,10 @@

* - :meth:`.cummax()`
- Same as :func:`dt.cummax()`.


* - :meth:`.cumprod()`
- Same as :func:`dt.cumprod()`.

* - :meth:`.cumsum()`
- Same as :func:`dt.cumsum()`.

Expand Down Expand Up @@ -297,6 +300,7 @@
.countna() <fexpr/countna>
.cummin() <fexpr/cummin>
.cummax() <fexpr/cummax>
.cumprod() <fexpr/cumprod>
.cumsum() <fexpr/cumsum>
.extend() <fexpr/extend>
.first() <fexpr/first>
Expand Down
8 changes: 8 additions & 0 deletions docs/api/fexpr/cumprod.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

.. xmethod:: datatable.FExpr.cumprod
:src: src/core/expr/fexpr.cc PyFExpr::cumprod
:cvar: doc_FExpr_cumprod
:signature: cumprod()

Equivalent to :func:`dt.cumprod(self)`.

3 changes: 3 additions & 0 deletions docs/api/index-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,8 @@ Functions
- Calculate the cumulative maximum of values per column
* - :func:`cummin()`
- Calculate the cumulative minimum of values per column
* - :func:`cumprod()`
- Calculate the cumulative product of values per column
* - :func:`cumsum()`
- Calculate the cumulative sum of values per column
* - :func:`cov()`
Expand Down Expand Up @@ -240,6 +242,7 @@ Other
cov() <dt/cov>
cummax() <dt/cummax>
cummin() <dt/cummin>
cumprod() <dt/cumprod>
cumsum() <dt/cumsum>
cut() <dt/cut>
dt <dt/dt>
Expand Down
3 changes: 3 additions & 0 deletions docs/releases/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@
the corresponding :meth:`.cummin()` and :meth:`.cummax()` methods,
to calculate the cumulative minimum and maximum of values per column. [#3279]
-[new] Added function :func:`dt.cumprod()`, as well as :meth:`.cumprod()` method,
to calculate the cumulative product of values per column. [#3279]
-[enh] Added reducer functions :func:`dt.countna()` and :func:`dt.nunique()`. [#2999]
-[new] Class :class:`dt.FExpr` now has method :meth:`.nunique()`,
Expand Down
67 changes: 35 additions & 32 deletions src/core/column/cumsum.h → src/core/column/cumsumprod.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,66 +25,69 @@
#include "parallel/api.h"
#include "stype.h"

namespace dt {

namespace dt {

template <typename T>
class Cumsum_ColumnImpl : public Virtual_ColumnImpl {
template <typename T, bool SUM>
class CumSumProd_ColumnImpl : public Virtual_ColumnImpl {
private:
Column col_;
Groupby gby_;

public:
Cumsum_ColumnImpl(Column&& col, const Groupby& gby)
: Virtual_ColumnImpl(col.nrows(), col.stype()),
col_(std::move(col)),
gby_(gby)
CumSumProd_ColumnImpl(Column &&col, const Groupby &gby)
:Virtual_ColumnImpl(col.nrows(), col.stype()),
col_(std::move(col)),
gby_(gby)
{
xassert(col_.can_be_read_as<T>());
}


void materialize(Column& col_out, bool) override {
void materialize(Column &col_out, bool) override {
Column col = Column::new_data_column(col_.nrows(), col_.stype());
auto data = static_cast<T*>(col.get_data_editable());

auto offsets = gby_.offsets_r();
dt::parallel_for_dynamic(
gby_.size(),
[&](size_t gi) {
size_t i1 = size_t(offsets[gi]);
size_t i2 = size_t(offsets[gi + 1]);

T val;
bool is_valid = col_.get_element(i1, &val);
data[i1] = is_valid? val : 0;

for (size_t i = i1 + 1; i < i2; ++i) {
is_valid = col_.get_element(i, &val);
data[i] = data[i - 1] + (is_valid? val : 0);
}

});
gby_.size(),
[&](size_t gi) {
size_t i1 = size_t(offsets[gi]);
size_t i2 = size_t(offsets[gi + 1]);

T val;
bool is_valid = col_.get_element(i1, &val);
if (SUM) {
data[i1] = is_valid? val : 0;
} else {
data[i1] = is_valid? val : 1;
}
for (size_t i = i1 + 1; i < i2; ++i) {
is_valid = col_.get_element(i, &val);
if (SUM) {
data[i] = data[i - 1] + (is_valid? val : 0);
} else {
data[i] = data[i - 1] * (is_valid? val : 1);
}
}
});

col_out = std::move(col);
}


ColumnImpl* clone() const override {
return new Cumsum_ColumnImpl(Column(col_), gby_);
ColumnImpl *clone() const override {
return new CumSumProd_ColumnImpl(Column(col_), gby_);
}

size_t n_children() const noexcept override {
return 1;
}

const Column& child(size_t i) const override {
xassert(i == 0); (void)i;
const Column &child(size_t i) const override {
xassert(i == 0);
(void)i;
return col_;
}

};

};

} // namespace dt

Expand Down
2 changes: 2 additions & 0 deletions src/core/documentation.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ extern const char* doc_dt_countna;
extern const char* doc_dt_cov;
extern const char* doc_dt_cummax;
extern const char* doc_dt_cummin;
extern const char* doc_dt_cumprod;
extern const char* doc_dt_cumsum;
extern const char* doc_dt_cut;
extern const char* doc_dt_first;
Expand Down Expand Up @@ -283,6 +284,7 @@ extern const char* doc_FExpr_count;
extern const char* doc_FExpr_countna;
extern const char* doc_FExpr_cummax;
extern const char* doc_FExpr_cummin;
extern const char* doc_FExpr_cumprod;
extern const char* doc_FExpr_cumsum;
extern const char* doc_FExpr_extend;
extern const char* doc_FExpr_first;
Expand Down
13 changes: 10 additions & 3 deletions src/core/expr/fexpr.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//------------------------------------------------------------------------------
// Copyright 2020-2021 H2O.ai
// Copyright 2020-2022 H2O.ai
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
Expand Down Expand Up @@ -289,8 +289,6 @@ DECLARE_METHOD(&PyFExpr::re_match)





//------------------------------------------------------------------------------
// Miscellaneous
//------------------------------------------------------------------------------
Expand Down Expand Up @@ -347,6 +345,15 @@ DECLARE_METHOD(&PyFExpr::cummin)
->name("cummin")
->docs(dt::doc_FExpr_cummin);

oobj PyFExpr::cumprod(const XArgs&) {
auto cumprodFn = oobj::import("datatable", "cumprod");
return cumprodFn.call({this});
}

DECLARE_METHOD(&PyFExpr::cumprod)
->name("cumprod")
->docs(dt::doc_FExpr_cumprod);


oobj PyFExpr::cumsum(const XArgs&) {
auto cumsumFn = oobj::import("datatable", "cumsum");
Expand Down
1 change: 1 addition & 0 deletions src/core/expr/fexpr.h
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ class PyFExpr : public py::XObject<PyFExpr> {
py::oobj countna(const py::XArgs&);
py::oobj cummin(const py::XArgs&);
py::oobj cummax(const py::XArgs&);
py::oobj cumprod(const py::XArgs&);
py::oobj cumsum(const py::XArgs&);
py::oobj extend(const py::XArgs&);
py::oobj first(const py::XArgs&);
Expand Down
Loading

0 comments on commit 838cf5e

Please sign in to comment.