-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure ignore_nulls
is respected in horizontal sum/mean
#20469
Conversation
ignore_nulls
parameter respects columns with null dtypeignore_nulls
parameter respects columns with null dtype
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20469 +/- ##
==========================================
- Coverage 78.98% 78.98% -0.01%
==========================================
Files 1563 1563
Lines 220559 220573 +14
Branches 2488 2488
==========================================
+ Hits 174213 174217 +4
- Misses 45773 45783 +10
Partials 573 573 ☔ View full report in Codecov by Sentry. |
ignore_nulls
parameter respects columns with null dtypeignore_nulls
is respected in horizontal sum/mean
if ignore_nulls and dtype_in != pl.Null: | ||
values = [2, 0, 1] | ||
else: | ||
values = [None, None, None] # type: ignore[list-item] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. If our sum doesn't ignore nulls, it doesn't propagate them, but replaces them with the identity: 0.
The horizontal semantics should be the same as the vertical semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have it reversed.
- Ignore nulls: 1 + 2 + null = 3 (nulls replaced by 0)
- Don't ignore nulls: 1 + 2 + null = null (nulls are propagated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that our sum
is agnostic to nulls. I think we made a mistake exposing this to sum_horizontal
as our vertical sum is agnostic to nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're saying is that the ignore_nulls
parameter should be removed entirely, and the ignore_nulls=True
behavior should simply be the default. Is that correct? If so, that sounds like a breaking change.
Given that the current implementation has the parameter, and that there is an issue with it (pl.Null
columns are not subject to the parameter, but nulls in other columns are), is the path forward to accept this PR (if I didn't mess anything up! which I don't think I did), and 2) remove the parameter in a follow-up PR to be merged in 1.19.0?
This PR contains a small fix for the float32 case where mean_horizontal
returns f64 for f32 columns. I could make that a separate PR as well if you don't want to accept this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right. Consider it an observation. ;) Will take a look a bit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ritchie. I have a follow-up to this adding temporals for mean horizontal but I'll wait for this one first.
if !ignore_nulls && non_null_cols.len() < columns.len() { | ||
// We must first determine the correct return dtype. | ||
let return_dtype = match dtypes_to_supertype(non_null_cols.iter().map(|c| c.dtype()))? { | ||
DataType::Boolean => DataType::UInt32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to be in this PR, but this should actually be IndexType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, looks good @mcrumiller. Thank you.
Closes #20455.