[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

revans2 · 2023-07-31T20:16:50Z

Is your feature request related to a problem? Please describe.
This gets to be a little difficult. We have two different implementations that we need to update to deal with this. GeneratedInternalRowToCudfRowIterator and RowToColumnarIterator.

#9862 will add in limits to the host memory being allocated, but we still need to make sure that the allocations can be retried if needed.

The generated version uses code gen to put the data into a buffer with a given format, and then uses the GPU to transform that data into a Table on the GPU. This is relatively simple to deal with because we know how big of a buffer we are going to allocate, and we already have switching the buffer around/allocating new ones/etc in place. So just switching this over to use the new allocation APIs in a blocking allocation is almost good enough. We just have to make sure that we can have the buffer(s) we allocated be spillable after we are done inserting in a new row. This is where having a mutable buffer would be nice. We also need to make sure that we have ways to deal with a single really large row. But in general not too bad.

It also looks fairly simple to make the batch smaller so that we could chunk them, and then pass multiple of these batches down to the kernel so we are not holding onto too much non-spillable CPU memory at a time.

But the other one that is the slow way. That is much harder because we need a way to limit the host memory and still follow the rules where we would not do a blocking allocation of host memory while holding on to non-spillable buffers. This is especially hard because all of the buffers that we want to work with are in the CUDF java APIs.

The problem here is that there is just a lot of coupling between the different parts of the code, and it might be simpler to just copy a lot of the HostColumnVector.ColumnBuilder code out into the plugin for the short term until we can understand exactly how to do this. Then we can move it back if we have clean and generic APIs. Right now we create a GpuColumnarBatchBuilder inside of the plugin with an estimate of the number of rows we want for that table. The buffers are allocated lazily and can grow at any point in time. There just is no good way for us to say that you can do an initial allocation, and it should look this particular way, but don't grow the data, and if you have to stop part way through a row, then we need to make sure that we roll back anything that was partially written for that row so far. Oh and we have to make the buffers spillable in between each call to next to get another row.

I think what we want to do is to allocate one host memory buffer that is close to a target batch size (possibly smaller if we want to chunk it). We can then slice the buffer up into smaller buffers that would correspond to the estimated size we expect per buffer. The initial guess would come from just the schema, but future guesses could come based off of where we know that we made mistakes in earlier estimates. We would then start to populate these buffers with data, but that update would happen in a two phase commit. The first pass would copy the data into the buffers, if there is room, and keep track of which buffers didn't have enough room. If all of the buffers had enough room, then we would commit the result and go on to the next row. If any of them didn't have enough room, we would not commit the result. We would update any stats we wanted for a better allocation next time, and process what we have, assuming that there is enough memory for at least a single row.

The text was updated successfully, but these errors were encountered:

jbrennan333 · 2023-12-01T22:54:07Z

I put up #9929 as a first step for GeneratedInternalRowToCudfRowIterator.

jbrennan333 · 2024-02-21T14:20:25Z

I put up another pr to improve this: #10450

jbrennan333 · 2024-02-29T15:38:14Z

This should not have been closed. Only the first part (GeneratedInternalRowToCudfRowIterator) is done. I am still working on the second part (RowToColumnarIterator).

jbrennan333 · 2024-03-20T22:31:22Z

I put up a draft PR #10617 that is a large part of the second part of this feature.
It allocates a single buffer up front and splices it up among the column builders.

sameerz · 2024-04-22T16:57:18Z

Assigning to @jlowe for triage after #10617 was reverted

revans2 added ? - Needs Triage Need team to review and classify task Work required that improves the product but is not user facing reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Jul 31, 2023

This was referenced Jul 31, 2023

[FEA] Limit Host Memory Usage #8874

Open

[FEA] Remove support for GpuRowBasedUserDefinedFunction #8891

Open

mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 8, 2023

mattahrens assigned jbrennan333 Nov 16, 2023

revans2 changed the title ~~[FEA] Add Host Memory Limits for Row to Columnar Conversion~~ [FEA] Add Host Memory Retry for Row to Columnar Conversion Nov 27, 2023

jbrennan333 mentioned this issue Dec 1, 2023

Add host memory retries for GeneratedInternalRowToCudfRowIterator #9929

Merged

jbrennan333 mentioned this issue Dec 11, 2023

Add cpu oom retry split handling to InternalRowToColumnarBatchIterator #10011

Merged

This was linked to pull requests Dec 14, 2023

Add cpu oom retry split handling to InternalRowToColumnarBatchIterator #10011

Merged

Add host memory retries for GeneratedInternalRowToCudfRowIterator #9929

Merged

jbrennan333 linked a pull request Feb 20, 2024 that will close this issue

Improve internal row to columnar host memory by using a combined spillable buffer #10450

Merged

jbrennan333 removed a link to a pull request Feb 20, 2024

Improve internal row to columnar host memory by using a combined spillable buffer #10450

Merged

jbrennan333 linked a pull request Feb 20, 2024 that will close this issue

Improve internal row to columnar host memory by using a combined spillable buffer #10450

Merged

jbrennan333 closed this as completed in #10450 Feb 21, 2024

jbrennan333 reopened this Feb 29, 2024

jbrennan333 mentioned this issue Feb 29, 2024

Replicate HostColumnVector.ColumnBuilder in plugin to enable host memory oom work #10519

Merged

jbrennan333 mentioned this issue Mar 20, 2024

Host Memory OOM handling for RowToColumnarIterator #10617

Merged

jbrennan333 closed this as completed in #10617 Apr 1, 2024

jlowe mentioned this issue Apr 2, 2024

Revert "Host Memory OOM handling for RowToColumnarIterator (#10617)" [databricks] #10657

Merged

jlowe reopened this Apr 2, 2024

mattahrens unassigned jbrennan333 Apr 3, 2024

sameerz assigned jlowe Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

revans2 commented Jul 31, 2023 •

edited

Loading

jbrennan333 commented Dec 1, 2023

jbrennan333 commented Feb 21, 2024

jbrennan333 commented Feb 29, 2024

jbrennan333 commented Mar 20, 2024

sameerz commented Apr 22, 2024

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

Comments

revans2 commented Jul 31, 2023 • edited Loading

jbrennan333 commented Dec 1, 2023

jbrennan333 commented Feb 21, 2024

jbrennan333 commented Feb 29, 2024

jbrennan333 commented Mar 20, 2024

sameerz commented Apr 22, 2024

revans2 commented Jul 31, 2023 •

edited

Loading