Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix[RowContainer]: Fix bug in RowContainer::extractValuesWithNulls #12301

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

spershin
Copy link
Contributor

Summary:
Bug happens when:
We have a target (result) vector which has some rows and not a single null
(null buffer is not allocated).
We extract to that target vector a column from a RowContainer and that column
has null in at least one row.

Bug manifests in all rows that the target vector had before extracting our
RowContainer into it becoming nulls.

This is because the null buffer was allocated with 'setNotNull' set to false.

We fix it by supplying true when 'resultOffset' is positive (meaning that
target vector already has some rows).

Differential Revision: D69445179

Summary:
Bug happens when:
We have a target (result) vector which has some rows and not a single null
(null buffer is not allocated).
We extract to that target vector a column from a RowContainer and that column
has null in at least one row.

Bug manifests in all rows that the target vector had before extracting our
RowContainer into it becoming nulls.

This is because the null buffer was allocated with 'setNotNull' set to false.

We fix it by supplying true when 'resultOffset' is positive (meaning that
target vector already has some rows).

Differential Revision: D69445179
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69445179

Copy link

netlify bot commented Feb 11, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 53b25dc
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67ab035486c46a00082065e4

/// pointer to the buffer containing them.
/// Optional parameter 'setNotNull' is passed to ensureNullsCapacity() and is
/// used to ensure all the rows will be 'not nulls'.
BufferPtr& mutableNulls(vector_size_t size, bool setNotNull = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's oddly inconsistent that mutableRawNulls calls ensureNulls() which always sets "setNotNull" to true and mutableNulls (prior to this change) calls ensureNullsCapacity directly which defaults "setNotNull" to false.

This is probably the safest approach to addressing it, given how core this is to Vectors, but it seems like it was an accident waiting to happen.

Copy link
Contributor Author

@spershin spershin Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, in this PR I strove to make as small change as possible to avoid breaking something else or introduce an overhead where we don't want.

Our bug occurs in the situation when we preallocate a vector and then can initialize its rows with multiple batches.
I don't know how common this case is compared to "allocate and fully initialize vector" where we probably don't want to fill the buffer upfront.

Most optimizations introduce a risk, it is a trade off.

If we want to streamline this oddness you mention, we should probably do it in a separate change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants