Skip to content

Commit

Permalink
[ENH] Implement cumsum() function (#3257)
Browse files Browse the repository at this point in the history
Implement `cumsum()` function.

WIP for #3279
  • Loading branch information
samukweku authored May 17, 2022
1 parent 6e3cbf8 commit 34263ec
Show file tree
Hide file tree
Showing 13 changed files with 432 additions and 1 deletion.
87 changes: 87 additions & 0 deletions docs/api/dt/cumsum.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@

.. xfunction:: datatable.cumsum
:src: src/core/expr/fexpr_cumsum.cc pyfn_cumsum
:tests: tests/dt/test-cumsum.py
:cvar: doc_dt_cumsum
:signature: cumsum(cols)

.. x-version-added:: 1.1.0

For each column from `cols` calculate cumulative sum. In the presence of :func:`by()`,
the cumulative sum is computed per group.

Parameters
----------
cols: FExpr
Input data for cumulative sum calculation.

return: FExpr
f-expression that converts input columns into the columns filled
with the respective cumulative sums.

except: TypeError
The exception is raised when one of the columns from `cols`
has a non-numeric type.


Examples
--------

Create a sample datatable frame::

>>> from datatable import dt, f
>>> DT = dt.Frame({"A": [2, None, 5, -1, 0],
... "B": [None, None, None, None, None],
... "C": [5.4, 3, 2.2, 4.323, 3],
... "D": ['a', 'a', 'b', 'b', 'b']})
| A B C D
| int32 void float64 str32
-- + ----- ---- ------- -----
0 | 2 NA 5.4 a
1 | NA NA 3 a
2 | 5 NA 2.2 b
3 | -1 NA 4.323 b
4 | 0 NA 3 b
[5 rows x 4 columns]


Calculate cumulative sum in a single column::

>>> DT[:, dt.cumsum(f.A)]
| A
| int64
-- + -----
0 | 2
1 | 2
2 | 7
3 | 6
4 | 6
[5 rows x 1 column]


Calculate cumulative sums in multiple columns::

>>> DT[:, dt.cumsum(f[:-1])]
| A B C
| int64 int64 float64
-- + ----- ----- -------
0 | 2 0 5.4
1 | 2 0 8.4
2 | 7 0 10.6
3 | 6 0 14.923
4 | 6 0 17.923
[5 rows x 3 columns]


Calculate cumulative sums per group in the presence of :func:`by()`::

>>> DT[:, dt.cumsum(f[:]), by('D')]
| D A B C
| str32 int64 int64 float64
-- + ----- ----- ----- -------
0 | a 2 0 5.4
1 | a 2 0 8.4
2 | b 5 0 2.2
3 | b 4 0 6.523
4 | b 4 0 9.523
[5 rows x 4 columns]
4 changes: 4 additions & 0 deletions docs/api/fexpr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,9 @@
* - :meth:`.countna()`
- Same as :func:`dt.countna()`.

* - :meth:`.cumsum()`
- Same as :func:`dt.cumsum()`.

* - :meth:`.first()`
- Same as :func:`dt.first()`.

Expand Down Expand Up @@ -286,6 +289,7 @@
.as_type() <fexpr/as_type>
.count() <fexpr/count>
.countna() <fexpr/countna>
.cumsum() <fexpr/cumsum>
.extend() <fexpr/extend>
.first() <fexpr/first>
.last() <fexpr/last>
Expand Down
7 changes: 7 additions & 0 deletions docs/api/fexpr/cumsum.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

.. xmethod:: datatable.FExpr.cumsum
:src: src/core/expr/fexpr.cc PyFExpr::cumsum
:cvar: doc_FExpr_cumsum
:signature: cumsum()

Equivalent to :func:`dt.cumsum(self)`.
3 changes: 3 additions & 0 deletions docs/api/index-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ Functions
- Count non-missing values per column
* - :func:`countna()`
- Count the number of NA values per column
* - :func:`cumsum()`
- Calculate the cumulative sum of values per column
* - :func:`cov()`
- Calculate covariance between two columns
* - :func:`max()`
Expand Down Expand Up @@ -232,6 +234,7 @@ Other
count() <dt/count>
countna() <dt/countna>
cov() <dt/cov>
cumsum() <dt/cumsum>
cut() <dt/cut>
dt <dt/dt>
f <dt/f>
Expand Down
3 changes: 3 additions & 0 deletions docs/releases/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@
-[new] Added reducer function :func:`dt.prod()` and the corresponding :meth:`.prod()`
method to calculate product of values in columns. [#3140]
-[new] Added function :func:`dt.cumsum()`, as well as :meth:`.cumsum()` method,
to calculate the cumulative sum of values per column. [#3279]
-[enh] Added reducer functions :func:`dt.countna()` and :func:`dt.nunique()`. [#2999]
-[new] Class :class:`dt.FExpr` now has method :meth:`.nunique()`,
Expand Down
92 changes: 92 additions & 0 deletions src/core/column/cumsum.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
//------------------------------------------------------------------------------
// Copyright 2022 H2O.ai
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
// to deal in the Software without restriction, including without limitation
// the rights to use, copy, modify, merge, publish, distribute, sublicense,
// and/or sell copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
// IN THE SOFTWARE.
//------------------------------------------------------------------------------
#ifndef dt_COLUMN_CUMSUM_h
#define dt_COLUMN_CUMSUM_h
#include "column/virtual.h"
#include "parallel/api.h"
#include "stype.h"

namespace dt {


template <typename T>
class Cumsum_ColumnImpl : public Virtual_ColumnImpl {
private:
Column col_;
Groupby gby_;

public:
Cumsum_ColumnImpl(Column&& col, const Groupby& gby)
: Virtual_ColumnImpl(col.nrows(), col.stype()),
col_(std::move(col)),
gby_(gby)
{
xassert(col_.can_be_read_as<T>());
}


void materialize(Column& col_out, bool) override {
Column col = Column::new_data_column(col_.nrows(), col_.stype());
auto data = static_cast<T*>(col.get_data_editable());

auto offsets = gby_.offsets_r();
dt::parallel_for_dynamic(
gby_.size(),
[&](size_t gi) {
size_t i1 = size_t(offsets[gi]);
size_t i2 = size_t(offsets[gi + 1]);

T val;
bool is_valid = col_.get_element(i1, &val);
data[i1] = is_valid? val : 0;

for (size_t i = i1 + 1; i < i2; ++i) {
is_valid = col_.get_element(i, &val);
data[i] = data[i - 1] + (is_valid? val : 0);
}

});

col_out = std::move(col);
}


ColumnImpl* clone() const override {
return new Cumsum_ColumnImpl(Column(col_), gby_);
}

size_t n_children() const noexcept override {
return 1;
}

const Column& child(size_t i) const override {
xassert(i == 0); (void)i;
return col_;
}

};


} // namespace dt


#endif
3 changes: 2 additions & 1 deletion src/core/documentation.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@
#define dt_DOCUMENTATION_h
namespace dt {


extern const char* doc_dt_as_type;
extern const char* doc_dt_by;
extern const char* doc_dt_cbind;
extern const char* doc_dt_corr;
extern const char* doc_dt_count;
extern const char* doc_dt_countna;
extern const char* doc_dt_cov;
extern const char* doc_dt_cumsum;
extern const char* doc_dt_cut;
extern const char* doc_dt_first;
extern const char* doc_dt_fread;
Expand Down Expand Up @@ -279,6 +279,7 @@ extern const char* doc_FExpr;
extern const char* doc_FExpr_as_type;
extern const char* doc_FExpr_count;
extern const char* doc_FExpr_countna;
extern const char* doc_FExpr_cumsum;
extern const char* doc_FExpr_extend;
extern const char* doc_FExpr_first;
extern const char* doc_FExpr_last;
Expand Down
10 changes: 10 additions & 0 deletions src/core/expr/fexpr.cc
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,16 @@ DECLARE_METHOD(&PyFExpr::countna)
->name("countna")
->docs(dt::doc_FExpr_countna);


oobj PyFExpr::cumsum(const XArgs&) {
auto cumsumFn = oobj::import("datatable", "cumsum");
return cumsumFn.call({this});
}

DECLARE_METHOD(&PyFExpr::cumsum)
->name("cumsum")
->docs(dt::doc_FExpr_cumsum);

//------------------------------------------------------------------------------
// Class decoration
//------------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions src/core/expr/fexpr.h
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@ class PyFExpr : public py::XObject<PyFExpr> {
py::oobj as_type(const py::XArgs&);
py::oobj count(const py::XArgs&);
py::oobj countna(const py::XArgs&);
py::oobj cumsum(const py::XArgs&);
py::oobj extend(const py::XArgs&);
py::oobj first(const py::XArgs&);
py::oobj last(const py::XArgs&);
Expand Down
113 changes: 113 additions & 0 deletions src/core/expr/fexpr_cumsum.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
//------------------------------------------------------------------------------
// Copyright 2022 H2O.ai
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
// to deal in the Software without restriction, including without limitation
// the rights to use, copy, modify, merge, publish, distribute, sublicense,
// and/or sell copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
// IN THE SOFTWARE.
//------------------------------------------------------------------------------
#include "column/const.h"
#include "column/cumsum.h"
#include "column/latent.h"
#include "documentation.h"
#include "expr/fexpr_func.h"
#include "expr/eval_context.h"
#include "expr/workframe.h"
#include "python/xargs.h"
#include "stype.h"


namespace dt {
namespace expr {

class FExpr_cumsum : public FExpr_Func {
private:
ptrExpr arg_;

public:
FExpr_cumsum(ptrExpr&& arg)
: arg_(std::move(arg)) {}

std::string repr() const override{
std::string out = "cumsum(";
out += arg_->repr();
out += ')';
return out;
}


Workframe evaluate_n(EvalContext& ctx) const override{
Workframe wf = arg_->evaluate_n(ctx);
Groupby gby = Groupby::single_group(wf.nrows());

if (ctx.has_groupby()) {
wf.increase_grouping_mode(Grouping::GtoALL);
gby = ctx.get_groupby();
}

for (size_t i = 0; i < wf.ncols(); ++i) {
Column coli = evaluate1(wf.retrieve_column(i), gby);
wf.replace_column(i, std::move(coli));
}
return wf;
}


Column evaluate1(Column&& col, const Groupby& gby) const {
SType stype = col.stype();
switch (stype) {
case SType::VOID:
case SType::BOOL:
case SType::INT8:
case SType::INT16:
case SType::INT32:
case SType::INT64: return make<int64_t>(std::move(col), SType::INT64, gby);
case SType::FLOAT32: return make<float>(std::move(col), SType::FLOAT32, gby);
case SType::FLOAT64: return make<double>(std::move(col), SType::FLOAT64, gby);
default: throw TypeError()
<< "Invalid column of type " << stype << " in " << repr();
}
}


template <typename T>
Column make(Column&& col, SType stype, const Groupby& gby) const {
if (col.stype() == SType::VOID) {
return Column(new ConstInt_ColumnImpl(col.nrows(), 0, stype));
} else {
col.cast_inplace(stype);
return Column(new Latent_ColumnImpl(
new Cumsum_ColumnImpl<T>(std::move(col), gby)
));
}
}
};


static py::oobj pyfn_cumsum(const py::XArgs& args) {
auto cumsum = args[0].to_oobj();
return PyFExpr::make(new FExpr_cumsum(as_fexpr(cumsum)));
}


DECLARE_PYFN(&pyfn_cumsum)
->name("cumsum")
->docs(doc_dt_cumsum)
->arg_names({"cumsum"})
->n_positional_args(1)
->n_required_args(1);

}} // dt::expr
Loading

0 comments on commit 34263ec

Please sign in to comment.