-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887
Comments
I put up #9929 as a first step for GeneratedInternalRowToCudfRowIterator. |
I put up another pr to improve this: #10450 |
This should not have been closed. Only the first part ( |
I put up a draft PR #10617 that is a large part of the second part of this feature. |
Is your feature request related to a problem? Please describe.
This gets to be a little difficult. We have two different implementations that we need to update to deal with this.
GeneratedInternalRowToCudfRowIterator
andRowToColumnarIterator
.#9862 will add in limits to the host memory being allocated, but we still need to make sure that the allocations can be retried if needed.
The generated version uses code gen to put the data into a buffer with a given format, and then uses the GPU to transform that data into a Table on the GPU. This is relatively simple to deal with because we know how big of a buffer we are going to allocate, and we already have switching the buffer around/allocating new ones/etc in place. So just switching this over to use the new allocation APIs in a blocking allocation is almost good enough. We just have to make sure that we can have the buffer(s) we allocated be spillable after we are done inserting in a new row. This is where having a mutable buffer would be nice. We also need to make sure that we have ways to deal with a single really large row. But in general not too bad.
It also looks fairly simple to make the batch smaller so that we could chunk them, and then pass multiple of these batches down to the kernel so we are not holding onto too much non-spillable CPU memory at a time.
But the other one that is the slow way. That is much harder because we need a way to limit the host memory and still follow the rules where we would not do a blocking allocation of host memory while holding on to non-spillable buffers. This is especially hard because all of the buffers that we want to work with are in the CUDF java APIs.
The problem here is that there is just a lot of coupling between the different parts of the code, and it might be simpler to just copy a lot of the HostColumnVector.ColumnBuilder code out into the plugin for the short term until we can understand exactly how to do this. Then we can move it back if we have clean and generic APIs. Right now we create a
GpuColumnarBatchBuilder
inside of the plugin with an estimate of the number of rows we want for that table. The buffers are allocated lazily and can grow at any point in time. There just is no good way for us to say that you can do an initial allocation, and it should look this particular way, but don't grow the data, and if you have to stop part way through a row, then we need to make sure that we roll back anything that was partially written for that row so far. Oh and we have to make the buffers spillable in between each call to next to get another row.I think what we want to do is to allocate one host memory buffer that is close to a target batch size (possibly smaller if we want to chunk it). We can then slice the buffer up into smaller buffers that would correspond to the estimated size we expect per buffer. The initial guess would come from just the schema, but future guesses could come based off of where we know that we made mistakes in earlier estimates. We would then start to populate these buffers with data, but that update would happen in a two phase commit. The first pass would copy the data into the buffers, if there is room, and keep track of which buffers didn't have enough room. If all of the buffers had enough room, then we would commit the result and go on to the next row. If any of them didn't have enough room, we would not commit the result. We would update any stats we wanted for a better allocation next time, and process what we have, assuming that there is enough memory for at least a single row.
The text was updated successfully, but these errors were encountered: