feat: add `Expr|Series.rolling_mean` method #1290

FBruzzesi · 2024-10-30T22:48:59Z

What type of PR is this? (check all applicable)

Related issues

Related issue feat: support rolling / ewm #1254

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

Opening as draft because of the following:

Dask is raising an error
Pandas does not support weights, we could follow the same approach as per arrow, only if weights are provided
For arrow I tried to keep a generic approach that can be re-used for ~~any~~ most other aggregate functions, but would like a feedback on that

MarcoGorelli · 2024-11-03T11:10:52Z

thanks @FBruzzesi ! weights might even get removed from the polars ones, shall we keep it out for now?

FBruzzesi · 2024-11-03T21:47:38Z

thanks @FBruzzesi ! weights might even get removed from the polars ones, shall we keep it out for now?

Sure, even better to start simpler from there :)

MarcoGorelli

awesome, thanks

pending on the unstable api warning being introduced (#1367), I just have a comment about pyarrow

if we're not sure, i think it's ok to leave it out for now

maybe one day we can get to the point where just having something be part of the Narwhals API is enough to put light pressure on dataframe authors to support a function 😄

MarcoGorelli · 2024-11-17T09:46:32Z

narwhals/_arrow/utils.py

@@ -452,3 +455,40 @@ def _parse_time_format(arr: pa.Array) -> str:
        if pc.all(matches.is_valid()).as_py():
            return time_fmt
    return ""
+
+
+def _window_agg(


we're still iterating in Python here...dunno, maybe we should just raise NotImplementedError here?

curious to see a timing comparison of:

convert to pandas, do rolling mean, back to pyarrow

do it with this function

I will give it a shot and report back :)

Yep this is definitly orders of magnitude slower. I will try with a specified implementation

FBruzzesi · 2024-11-17T10:00:52Z

Thanks for the feedback Marco

if we're not sure, i think it's ok to leave it out for now

My concern is that for plotly we convert dataframe that support interchange protocol to pyarrow and not to pandas. I am afraid that not supporting it would break some user workflows. If that's how we aim to proceed, I would suggest to convert interchange protocol to pandas as it was done before the narwhals PR

maybe one day we can get to the point where just having something be part of the Narwhals API is enough to put light pressure on dataframe authors to support a function 😄

That would be nice 😁

MarcoGorelli · 2024-11-17T10:51:49Z

are we sure it would break a workflow? as far as I can tell, in plotly, you convert to pandas for non-pandas input in trendline

so, if someone was using plotly before and interchanging to pandas, then it means they already have pandas installed, so this shouldn't break their workflow if i understand correctly

FBruzzesi · 2024-11-17T11:01:42Z

are we sure it would break a workflow? as far as I can tell, in plotly, you convert to pandas for non-pandas input in trendline

Yes correct, this is the current behavior, yet the use case for supporting rolling, expanding, and ewm is to eventually completely remove such conversion in trendline plotly module and use the narwhals functionalities instead.

Therefore, if a user provides a dataframe with interchange protocol that we don't support natively, then it gets converted to arrow, and if rolling is not supported, then it would end up breaking.

Does this make sense?

MarcoGorelli · 2024-11-17T11:03:14Z

and use the narwhals functionalities instead

would it work to use the narwhals functionalities if available, and otherwise just convert to pandas?

FBruzzesi · 2024-11-20T14:55:32Z

narwhals/utils.py

@@ -732,3 +733,39 @@ def validate_strict_and_pass_though(
        msg = "Cannot pass both `strict` and `pass_through`"
        raise ValueError(msg)
    return pass_through
+
+
+def _validate_rolling_arguments(


not sure is this is the right place to keep private functionalities

MarcoGorelli

you're too cool for school

FBruzzesi · 2024-11-20T15:04:02Z

narwhals/_arrow/series.py

+        rolling_sum = self.rolling_sum(
+            window_size=window_size, min_periods=min_periods, center=center
+        )
+        rolling_count = (
+            (~self.is_null())
+            .cast(self._dtypes.Int32())
+            .rolling_sum(window_size=window_size, min_periods=min_periods, center=center)
+        )
+        return rolling_sum / rolling_count


This is a lazy way of doing it. I wanted to give it a try, yet timing on 1M rows is 50% slower than pandas.

Performances can be enhanced with the same routine of rolling_sum, and dividing by the count at the end. I will address it.

FBruzzesi added 4 commits October 30, 2024 21:34

feat: Series.rolling_mean

136889e

feat: Expr.rolling_mean

92cfd45

doc api reference

59edc61

add test, fix arrow

0e9dced

github-actions bot added the enhancement New feature or request label Oct 30, 2024

FBruzzesi added 2 commits October 31, 2024 09:01

old arrow, xfail for modin

6f220fb

merge main

7c5363e

DeaMariaLeon mentioned this pull request Nov 1, 2024

feat: Adding ewm_mean #1298

Merged

10 tasks

FBruzzesi marked this pull request as ready for review November 1, 2024 15:26

FBruzzesi changed the title ~~feat: Adds Expr|Series.rolling_mean~~ feat: add Expr|Series.rolling_mean method Nov 2, 2024

merge main, rm weight kwarg

5342191

MarcoGorelli mentioned this pull request Nov 13, 2024

api: "unstable" features #1367

Closed

merge main

7ebcb38

MarcoGorelli reviewed Nov 17, 2024

View reviewed changes

Merge branch 'main' into feat/rolling-mean

fcb935d

arrow wip

13a1b9f

FBruzzesi mentioned this pull request Nov 17, 2024

feat: add Series|Expr.rolling_sum method #1395

Merged

10 tasks

FBruzzesi added 2 commits November 20, 2024 15:01

refactor rolling_mean

8613cf6

perf arrow

82e295a

FBruzzesi commented Nov 20, 2024

View reviewed changes

MarcoGorelli approved these changes Nov 20, 2024

View reviewed changes

FBruzzesi commented Nov 20, 2024

View reviewed changes

FBruzzesi merged commit db9a048 into main Nov 21, 2024
23 checks passed

FBruzzesi deleted the feat/rolling-mean branch November 21, 2024 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `Expr|Series.rolling_mean` method #1290

feat: add `Expr|Series.rolling_mean` method #1290

FBruzzesi commented Oct 30, 2024

MarcoGorelli commented Nov 3, 2024

FBruzzesi commented Nov 3, 2024

MarcoGorelli left a comment

MarcoGorelli Nov 17, 2024

FBruzzesi Nov 17, 2024

FBruzzesi Nov 17, 2024

FBruzzesi commented Nov 17, 2024

MarcoGorelli commented Nov 17, 2024

FBruzzesi commented Nov 17, 2024

MarcoGorelli commented Nov 17, 2024

FBruzzesi Nov 20, 2024

MarcoGorelli left a comment

FBruzzesi Nov 20, 2024 •

edited

Loading

feat: add Expr|Series.rolling_mean method #1290

feat: add Expr|Series.rolling_mean method #1290

Conversation

FBruzzesi commented Oct 30, 2024

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

MarcoGorelli commented Nov 3, 2024

FBruzzesi commented Nov 3, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli Nov 17, 2024

Choose a reason for hiding this comment

FBruzzesi Nov 17, 2024

Choose a reason for hiding this comment

FBruzzesi Nov 17, 2024

Choose a reason for hiding this comment

FBruzzesi commented Nov 17, 2024

MarcoGorelli commented Nov 17, 2024

FBruzzesi commented Nov 17, 2024

MarcoGorelli commented Nov 17, 2024

FBruzzesi Nov 20, 2024

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

FBruzzesi Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

feat: add `Expr|Series.rolling_mean` method #1290

feat: add `Expr|Series.rolling_mean` method #1290

FBruzzesi Nov 20, 2024 •

edited

Loading