-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: binary_mut should work if only one input array has null buffer #6396
Conversation
@@ -313,7 +313,7 @@ where | |||
)))); | |||
} | |||
|
|||
let nulls = NullBuffer::union(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only a
has null buffer, NullBuffer.union
will clone it which shares the underlying buffer that fails into_mutable
call on the null buffer later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a
by value, perhaps we could add a PrimitiveArray::into_parts
that would deconstruct the PrimitiveArray
into its underlying NullBuffer
and values 🤔
Following the model of https://docs.rs/arrow/latest/arrow/array/struct.GenericByteArray.html#method.into_parts
That way we could directly use the NullBuffer
and avoid the copy in create_union_null_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me see if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for binary_mut
, it is okay to defer the computation of union of two null buffers. It is because that we don't need to own the null buffers before invoking op
on the two arrays.
Since the null buffer is not changed on the builder, we can compute the union after finishing the builder. So for binary_mut
, we can avoid copying null buffer.
But for try_binary_mut
, because we need to get the union of two null buffers before invoking op
on the two arrays, we still need to copy it as this PR does.
PrimitiveArray::into_parts
(it already exists) also cannot help on it because the builder needs to own the null buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I only avoid copying null buffer in binary_mut
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels to me that we could remove the copy on try_binary_mut, but this seems like an improvement to me 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -313,7 +313,7 @@ where | |||
)))); | |||
} | |||
|
|||
let nulls = NullBuffer::union(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a
by value, perhaps we could add a PrimitiveArray::into_parts
that would deconstruct the PrimitiveArray
into its underlying NullBuffer
and values 🤔
Following the model of https://docs.rs/arrow/latest/arrow/array/struct.GenericByteArray.html#method.into_parts
That way we could directly use the NullBuffer
and avoid the copy in create_union_null_buffer
) -> Option<NullBuffer> { | ||
match (lhs, rhs) { | ||
(Some(lhs), Some(rhs)) => Some(NullBuffer::new(lhs.inner() & rhs.inner())), | ||
(Some(n), None) | (None, Some(n)) => Some(NullBuffer::new(n.inner() & n.inner())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, it forces a copy of the null buffer's contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Previously it did a clone that shares the underlying buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @viirya -- this looks good to me
@@ -313,7 +313,7 @@ where | |||
)))); | |||
} | |||
|
|||
let nulls = NullBuffer::union(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels to me that we could remove the copy on try_binary_mut, but this seems like an improvement to me 🚀
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Thanks @alamb @MaxenceMaire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A copy of the NullBuffer in some cases seems a lot better than having to copy the entire array as was needed previously
Let's go with this and iterate as a follow on
Thank you @alamb |
Which issue does this PR close?
Closes #6374.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?