-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making a batch spillable is expensive #3749
Comments
20% of the time of q72 is spent in |
I would like to try this task if no one has already worked in progress or prepares to work on it. |
Nobody has work in progress on this, but note that I suspect this is a pretty significant task. Besides being relatively complex with the packing and chunking of small and large tables, respectively, through the bounce buffer and construction of the corresponding packed metadata, it will trigger refactoring around SpillableBatch / LazySpillableBatch as there should be no reason for the latter if spilling is "free." |
@sperlingxx have you been able to look into this issue? If not, I should have time to look into it, if you have something I am also happy to review or help test it out. |
Hi @abellina, I haven't started it. I am just learning the backgrounds either. I am happy if you want to take over this task, since you are more familiar with BounceBuffer and Spilling of spark-rapids. |
I am on the hook to provide data on this. I'll update the issue with a SOL set of numbers than we can prioritize it accordingly with the data on hand. |
Sorry this has taken me a while to get back to provide some numbers. Here are a couple of queries that are very sensitive to In the past, calling As of today, if I run with a PoC that doesn't call |
Running all queries with the SOL code I get the sum of time is about the same for both, with the proof of concept being 0.05x faster, so very much in the noise. So in terms of the original request, and the complexity to add the bounce buffer to actually handle this well, I don't believe it's the highest priority task. @jlowe would you want to keep this open to have someone look into it in the future? |
I think for now we can close this. We can reopen or file a new issue if we notice it becoming a significant drag on performance in the future. |
Making a batch spillable involves calling
contiguousSplit
to form a contiguous buffer which is then spilled as a single unit. The contiguous split can be relatively expensive, especially when the schema is just a column of long values for some reason, and is unnecessary if the batch never is spilled.Ideally
contiguousSplit
should be as cheap as possible, but there may be ways to avoid calling it when it is unnecessary. For example, we could keep a "bounce buffer" for spilling, similar to the bounce buffers used for UCX and GDS, where we can copy batch buffers into a contiguous form in device memory before copying the contiguous buffer to host memory, or potentially copying the buffers directly to host memory with a multi-buffer copy kernel if the host memory is pinned. Essentially the idea is to perform an "on-the-fly" contiguous split only when it is needed which means making a batch spillable is very cheap, performing no wasted GPU operations since the transformation of a buffer into a contiguous buffer for spilling is performed lazily and only when needed.The text was updated successfully, but these errors were encountered: