Improve internal row to columnar host memory by using a combined spillable buffer #10450
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This improves the host memory oom handling for
InternalRowToColumnarBatchIterator
by using a single spillable host buffer for both the data and offsets, and then slicing it to pull out the individual host buffers whenever we need to operate on them.This simplifies the code and makes the host memory handling work better. Previously we were allocating one host buffer and making it spillable, and then allocating another buffer and making it spillable, all inside a single withRetry block. By doing this, we were sometimes able to allocate the second only because we spilled the first, and we really need both or nothing. When we later tried to get both host buffers from the two separate spillable buffers, we more frequently hit OOM with no retries.
Prior to this change, I needed about 32GB host memory limit run q75 and q10 with ShuffleExchangeExec operator disabled on my desktop at scale 100 with 16 executor cores. With this change, I can run them with a 12GB host memory limit.
I've run unit tests and integration tests for this and tested q10 and q75 locally.
I am in the process of running the full nds benchmark with host memory restricted.