-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix[RowContainer]: Fix bug in RowContainer::extractValuesWithNulls #12301
base: main
Are you sure you want to change the base?
Conversation
Summary: Bug happens when: We have a target (result) vector which has some rows and not a single null (null buffer is not allocated). We extract to that target vector a column from a RowContainer and that column has null in at least one row. Bug manifests in all rows that the target vector had before extracting our RowContainer into it becoming nulls. This is because the null buffer was allocated with 'setNotNull' set to false. We fix it by supplying true when 'resultOffset' is positive (meaning that target vector already has some rows). Differential Revision: D69445179
This pull request was exported from Phabricator. Differential Revision: D69445179 |
✅ Deploy Preview for meta-velox canceled.
|
/// pointer to the buffer containing them. | ||
/// Optional parameter 'setNotNull' is passed to ensureNullsCapacity() and is | ||
/// used to ensure all the rows will be 'not nulls'. | ||
BufferPtr& mutableNulls(vector_size_t size, bool setNotNull = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's oddly inconsistent that mutableRawNulls calls ensureNulls() which always sets "setNotNull" to true and mutableNulls (prior to this change) calls ensureNullsCapacity directly which defaults "setNotNull" to false.
This is probably the safest approach to addressing it, given how core this is to Vectors, but it seems like it was an accident waiting to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, in this PR I strove to make as small change as possible to avoid breaking something else or introduce an overhead where we don't want.
Our bug occurs in the situation when we preallocate a vector and then can initialize its rows with multiple batches.
I don't know how common this case is compared to "allocate and fully initialize vector" where we probably don't want to fill the buffer upfront.
Most optimizations introduce a risk, it is a trade off.
If we want to streamline this oddness you mention, we should probably do it in a separate change.
Summary:
Bug happens when:
We have a target (result) vector which has some rows and not a single null
(null buffer is not allocated).
We extract to that target vector a column from a RowContainer and that column
has null in at least one row.
Bug manifests in all rows that the target vector had before extracting our
RowContainer into it becoming nulls.
This is because the null buffer was allocated with 'setNotNull' set to false.
We fix it by supplying true when 'resultOffset' is positive (meaning that
target vector already has some rows).
Differential Revision: D69445179