Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow BaseVector.addNulls to grow the Vector #1411

Closed

Conversation

kevinwilfong
Copy link
Contributor

Summary:
When we call setDictionaryWrap to peel dictionary encoding, and then setWrapped to
reapply the dictionary encoding the size of the DictionaryVector can change. This is
because we do not use the original vector size, but rather rows.end() for the size of the new
unpeeled vector.

This causes problems when nulls are removed, as when we call addNulls, the
DictionaryVector may be smaller than the number of rows when there were nulls at the end
of the batch. In this case, addNulls attempts to resize the DictionaryVector which is not
supported.

To fix this, I've modified addNulls to effectively resize the Vector to append NULLs,
provided that all the new elements are NULL (otherwise we'd end up with non-null
undefined values in the Vector).

This leads to a bit of an inconsistency where mayAddNulls may return true, while
addNulls may throw if not all the new elements are NULL.

Looking at mayAddNulls, we only call it in 4 places in the code base, 2 of which are in
clearNulls, which does not call addNulls. This implies the intended use of this function is
to see if the Vector type supports modifying nulls, and changed the name to reflect this.
This also resolves that inconsistency.

Differential Revision: D35617668

Summary:
When we call setDictionaryWrap to peel dictionary encoding, and then setWrapped to
reapply the dictionary encoding the size of the DictionaryVector can change.  This is
because we do not use the original vector size, but rather rows.end() for the size of the new
unpeeled vector.

This causes problems when nulls are removed, as when we call addNulls, the
DictionaryVector may be smaller than the number of rows when there were nulls at the end
of the batch.  In this case, addNulls attempts to resize the DictionaryVector which is not
supported.

To fix this, I've modified addNulls to effectively resize the Vector to append NULLs,
provided that all the new elements are NULL (otherwise we'd end up with non-null
undefined values in the Vector).

This leads to a bit of an inconsistency where mayAddNulls may return true, while
addNulls may throw if not all the new elements are NULL.

Looking at mayAddNulls, we only call it in 4 places in the code base, 2 of which are in
clearNulls, which does not call addNulls.  This implies the intended use of this function is
to see if the Vector type supports modifying nulls, and changed the name to reflect this.
This also resolves that inconsistency.

Differential Revision: D35617668

fbshipit-source-id: 691a3c019ecd6d83ebe019d8a9a431cb023f8868
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Apr 13, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D35617668

@kevinwilfong
Copy link
Contributor Author

Fuzzer tests are broken

@kevinwilfong kevinwilfong marked this pull request as draft April 13, 2022 20:26
@kevinwilfong
Copy link
Contributor Author

Going back to my original approach

marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request Dec 15, 2023
…bator#1411)

What changes were proposed in this pull request?
This pr converts partition names get from hdfs path to lower case, as spark paritiions are in lower case.

(Fixes: facebookincubator#1410)

How was this patch tested?
This patch was tested manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants