[FEA] Limit host memory usage on python UDF execs. #8903
Labels
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
task
Work required that improves the product but is not user facing
Is your feature request related to a problem? Please describe.
#9862 should add a hard limit to most off heap host memory used.
This might get to be a little complicated. Internally this goes from CUDF GPU to Arrow CPU and from Arrow CPU across the wire to python. We also reverse it coming back. There is a small 10 MiB buffer allocated by java for this, but that is more of a bounce buffer. The CPU data allocated for arrow is where most of the memory would come from. I honestly don't know if we can insert some way to have arrow allocate the data from what we want and to reserve it/spill. Probably not. If we cannot figure this out, then we probably need to do something with the overhead for the tasks to try and account for this, or at least document it in some way.
If we can change how the allocation works, then we should add in retry blocks around those allocations. If we cannot, which is likely the case, then we need to document how this all works.
The text was updated successfully, but these errors were encountered: