[QST] Understanding of "Maximum pool size exceeded" #5373
-
What is your question?
My spark-rapids configuration is:
Is "Maximum pool size exceeded" an indication that the partitions are too big, i.e. that |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
"Maximum pool size exceeded" from RMM means the GPU memory pool has been exhausted, and it was unable to satisfy a GPU memory allocation request. There can be lots of causes. Try to run with too much GPU data generated per task or running too many tasks simultaneously on the GPU are primary causes, so setting Increasing the number of shuffle partitions should also help, assuming your processing does not have high key skew, causing most of the data to show up in only a few task partitions.
Yes, increasing the GPU amount per task will only change how Spark assigns pending tasks to executors but not how much memory a single task will take once it starts running. This is the same for executor CPU memory -- it's not tracked or limited per task.
|
Beta Was this translation helpful? Give feedback.
-
@martinstuder can this be closed per the above explanation? |
Beta Was this translation helpful? Give feedback.
-
@jlowe Sorry for not getting back to you earlier. Yes, going to close. |
Beta Was this translation helpful? Give feedback.
"Maximum pool size exceeded" from RMM means the GPU memory pool has been exhausted, and it was unable to satisfy a GPU memory allocation request. There can be lots of causes. Try to run with too much GPU data generated per task or running too many tasks simultaneously on the GPU are primary causes, so setting
spark.rapids.sql.concurrentGpuTasks=1
from a higher initial value will reduce at least some of that memory pressure.Increasing the number of shuffle partitions should also help, assuming your processing does not have high key skew, causing most of the data to show up in only a few task partitions.