Allow BaseVector.addNulls to grow the Vector #1411
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
When we call setDictionaryWrap to peel dictionary encoding, and then setWrapped to
reapply the dictionary encoding the size of the DictionaryVector can change. This is
because we do not use the original vector size, but rather rows.end() for the size of the new
unpeeled vector.
This causes problems when nulls are removed, as when we call addNulls, the
DictionaryVector may be smaller than the number of rows when there were nulls at the end
of the batch. In this case, addNulls attempts to resize the DictionaryVector which is not
supported.
To fix this, I've modified addNulls to effectively resize the Vector to append NULLs,
provided that all the new elements are NULL (otherwise we'd end up with non-null
undefined values in the Vector).
This leads to a bit of an inconsistency where mayAddNulls may return true, while
addNulls may throw if not all the new elements are NULL.
Looking at mayAddNulls, we only call it in 4 places in the code base, 2 of which are in
clearNulls, which does not call addNulls. This implies the intended use of this function is
to see if the Vector type supports modifying nulls, and changed the name to reflect this.
This also resolves that inconsistency.
Differential Revision: D35617668