ENH: Support skipna parameter in GroupBy min, max, prod, median, var, std and sem methods #60752

snitish · 2025-01-22T01:04:56Z

Closes ENH: enable skipna on groupby reduction ops #15675(Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v3.0.0.rst file if fixing a bug or adding a new feature.

Second (and final) batch of GroupBy reductions being enhanced to support the skipna parameter.

pandas/_libs/groupby.pyx

WillAyd

Looks pretty good. One question

pandas/core/_numba/kernels/min_max_.py

pandas/_libs/groupby.pyx

WillAyd

A few smaller comments. @rhshadrach mind taking a look as well?

WillAyd · 2025-01-22T23:28:03Z

pandas/_libs/groupby.pyx


                if not isna_entry:
                    nobs[lab, j] += 1
                    oldmean = mean[lab, j]
                    mean[lab, j] += (val - oldmean) / nobs[lab, j]
                    out[lab, j] += (val - mean[lab, j]) * (val - oldmean)
+                elif not skipna:


In the case skipna is True wouldn't we still need to assign to out here?

I don't think so, because if skipna is True and value is NA, we skip the value and thus retain existing behavior.

What's the expected result when the group has all NA values?

@rhshadrach In case of all-NA values, the result would be NA regardless of skipna, i.e. consistent with Series.mean() etc.

>>> pd.Series([np.nan]*10).groupby(by=["A","B"]*5).mean(skipna=True) A NaN B NaN dtype: float64 >>> pd.Series([np.nan]*10).groupby(by=["A","B"]*5).mean(skipna=False) A NaN B NaN dtype: float64 >>> pd.Series([np.nan]*10).mean(skipna=True) nan >>> pd.Series([np.nan]*10).mean(skipna=False) nan

pandas/core/resample.py

rhshadrach

Looks great! Can you add a test (adding to your current parametrizations would be fine) where the entire group is NA.

rhshadrach · 2025-01-25T21:02:19Z

pandas/_libs/groupby.pyx


                if not isna_entry:
                    nobs[lab, j] += 1
                    oldmean = mean[lab, j]
                    mean[lab, j] += (val - oldmean) / nobs[lab, j]
                    out[lab, j] += (val - mean[lab, j]) * (val - oldmean)
+                elif not skipna:


What's the expected result when the group has all NA values?

rhshadrach · 2025-01-25T21:20:02Z

pandas/core/_numba/kernels/var_.py

+
+        if not skipna and np.isnan(val):
+            output[lab] = np.nan
+            nobs_arr[lab] += 1


Might make no difference, but don't we usually think of NA values as not being observations?

Agree that it makes no difference, but my rationale was that if skipna is False, NAs can be considered valid observations. Happy to change it if you think it should not update nobs.

No real disagreement with your rational (or agreement for that matter 😄), but for the ops I spot checked we consistently do not count NA values as observations, regardless of skipna. I think we should be consistent here.

Good point! Removed that line.

pandas/core/groupby/groupby.py

pandas/core/resample.py

snitish · 2025-01-25T23:09:03Z

Looks great! Can you add a test (adding to your current parametrizations would be fine) where the entire group is NA.

Thanks for the review @rhshadrach. Added the all-NA tests and responded to comments.

rhshadrach · 2025-01-28T02:16:08Z

Failure on the future infer string is unrelated (and is fixed by #60796). Rerunning Ubuntu 310 just to be sure.

ENH: Support skipna parameter in GroupBy prod, var, std and sem methods

72276dc

snitish requested review from rhshadrach and WillAyd as code owners January 22, 2025 01:04

Fix docstring error

0414465

snitish marked this pull request as draft January 22, 2025 01:33

WillAyd requested changes Jan 22, 2025

View reviewed changes

pandas/_libs/groupby.pyx Outdated Show resolved Hide resolved

snitish added 3 commits January 21, 2025 17:58

Merge branch 'main' into issue15675

bad439a

Address review comment and add skipna to min and max

e2233f8

Undo temporary change

0c58a7d

snitish changed the title ~~ENH: Support skipna parameter in GroupBy prod, var, std and sem methods~~ ENH: Support skipna parameter in GroupBy min, max, prod, var, std and sem methods Jan 22, 2025

snitish added 2 commits January 21, 2025 19:02

Add skipna to groupby median

e259679

Fix docstring error

f40aa16

snitish marked this pull request as ready for review January 22, 2025 04:14

Add min and max to groupby numba vs cython test

574708e

snitish changed the title ~~ENH: Support skipna parameter in GroupBy min, max, prod, var, std and sem methods~~ ENH: Support skipna parameter in GroupBy min, max, prod, median, var, std and sem methods Jan 22, 2025

WillAyd reviewed Jan 22, 2025

View reviewed changes

pandas/core/_numba/kernels/min_max_.py Outdated Show resolved Hide resolved

WillAyd requested changes Jan 22, 2025

View reviewed changes

pandas/_libs/groupby.pyx Outdated Show resolved Hide resolved

Use _get_na_val to determine nan_val in group_prod

a1444c9

WillAyd reviewed Jan 22, 2025

View reviewed changes

rhshadrach added Enhancement Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jan 25, 2025

rhshadrach added this to the 3.0 milestone Jan 25, 2025

rhshadrach requested changes Jan 25, 2025

View reviewed changes

Add test for all-NA case

d31aa79

snitish added 2 commits January 27, 2025 15:58

Address review comment

7a30d59

Remove more no-op lines

0fc49df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support skipna parameter in GroupBy min, max, prod, median, var, std and sem methods #60752

ENH: Support skipna parameter in GroupBy min, max, prod, median, var, std and sem methods #60752

snitish commented Jan 22, 2025 •

edited

Loading

WillAyd left a comment

WillAyd left a comment

WillAyd Jan 22, 2025

snitish Jan 22, 2025

rhshadrach Jan 25, 2025

snitish Jan 25, 2025

rhshadrach left a comment

rhshadrach Jan 25, 2025

rhshadrach Jan 25, 2025

snitish Jan 25, 2025

rhshadrach Jan 27, 2025 •

edited

Loading

snitish Jan 27, 2025

snitish commented Jan 25, 2025

rhshadrach commented Jan 28, 2025

ENH: Support skipna parameter in GroupBy min, max, prod, median, var, std and sem methods #60752

Are you sure you want to change the base?

ENH: Support skipna parameter in GroupBy min, max, prod, median, var, std and sem methods #60752

Conversation

snitish commented Jan 22, 2025 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snitish commented Jan 25, 2025

rhshadrach commented Jan 28, 2025

snitish commented Jan 22, 2025 •

edited

Loading

rhshadrach Jan 27, 2025 •

edited

Loading