Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure ignore_nulls is respected in horizontal sum/mean #20469

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Dec 26, 2024

Closes #20455.

@mcrumiller mcrumiller changed the title Fix: Ensure ignore_nulls parameter respects columns with null dtype fix: Ensure ignore_nulls parameter respects columns with null dtype Dec 26, 2024
@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Dec 26, 2024
Copy link

codecov bot commented Dec 26, 2024

Codecov Report

Attention: Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.

Project coverage is 78.98%. Comparing base (f0b9018) to head (dc087fe).
Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-ops/src/series/ops/horizontal.rs 97.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20469      +/-   ##
==========================================
- Coverage   78.98%   78.98%   -0.01%     
==========================================
  Files        1563     1563              
  Lines      220559   220573      +14     
  Branches     2488     2488              
==========================================
+ Hits       174213   174217       +4     
- Misses      45773    45783      +10     
  Partials      573      573              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcrumiller mcrumiller marked this pull request as ready for review December 27, 2024 00:24
@mcrumiller mcrumiller changed the title fix: Ensure ignore_nulls parameter respects columns with null dtype fix: Ensure ignore_nulls is respected in horizontal sum/mean Dec 27, 2024
if ignore_nulls and dtype_in != pl.Null:
values = [2, 0, 1]
else:
values = [None, None, None] # type: ignore[list-item]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. If our sum doesn't ignore nulls, it doesn't propagate them, but replaces them with the identity: 0.

The horizontal semantics should be the same as the vertical semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have it reversed.

  • Ignore nulls: 1 + 2 + null = 3 (nulls replaced by 0)
  • Don't ignore nulls: 1 + 2 + null = null (nulls are propagated)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that our sum is agnostic to nulls. I think we made a mistake exposing this to sum_horizontal as our vertical sum is agnostic to nulls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're saying is that the ignore_nulls parameter should be removed entirely, and the ignore_nulls=True behavior should simply be the default. Is that correct? If so, that sounds like a breaking change.

Given that the current implementation has the parameter, and that there is an issue with it (pl.Null columns are not subject to the parameter, but nulls in other columns are), is the path forward to accept this PR (if I didn't mess anything up! which I don't think I did), and 2) remove the parameter in a follow-up PR to be merged in 1.19.0?

This PR contains a small fix for the float32 case where mean_horizontal returns f64 for f32 columns. I could make that a separate PR as well if you don't want to accept this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right. Consider it an observation. ;) Will take a look a bit later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ritchie. I have a follow-up to this adding temporals for mean horizontal but I'll wait for this one first.

if !ignore_nulls && non_null_cols.len() < columns.len() {
// We must first determine the correct return dtype.
let return_dtype = match dtypes_to_supertype(non_null_cols.iter().map(|c| c.dtype()))? {
DataType::Boolean => DataType::UInt32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be in this PR, but this should actually be IndexType.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, looks good @mcrumiller. Thank you.

@ritchie46 ritchie46 merged commit dff1ad7 into pola-rs:main Dec 31, 2024
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Columns with null dtype are ignored in sum_horizontal/mean_horizontal regardless of ignore_nulls parameter
2 participants