-
When the data size of a partition exceeds the configuration size of spark.task.resource.gpu.amount, will the framework automatically batch processes? |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Each task will process its data in different ways, not necessarily collecting it all at once, so it's not guaranteed that a task will run out of memory when its input partition exceeds its memory budget. For example, if all the task is doing is filtering rows then it could stream the data from the input partition in small batches and process a huge input partition with relatively low memory. However GPUs process data in a columnar fashion and performs better when there is a significant amount of input to process per batch, so the batches used will often be significantly larger than the row-based approach Spark uses on the CPU. The GPU batch size can be somewhat controlled by the Note using an extreme setting like If you have not done so already, I suggest checking out our tuning guide and specifically the section on concurrent tasks per GPU. We typically have seen the best performance when the number of tasks per GPU is between 2 and 4, but it does depend on the query. |
Beta Was this translation helpful? Give feedback.
-
@jlowe |
Beta Was this translation helpful? Give feedback.
-
Ultimately the number of tasks that can be allowed to run concurrently on the GPU is influenced by both configuration settings, and how many tasks will run concurrently at any point in time depends on what the tasks are trying to do at that moment. I'll try to explain in detail, so here's how those two configuration settings work in practice:
Similarly when tasks read and write shuffle, that is done without the GPU so tasks can run up to the configured executor parallelism (i.e.: as determined by This interaction is easily seen when looking at an Nsight Systems trace of an executor running when more tasks are configured via So why have the two different configs? This is to allow the executor to run with a wider parallelism than the GPU memory could allow when tasks are not actively needing the GPU to perform processing (e.g.: during distributed filesystem reads, shuffle read/write, and other CPU-only portions of the query). If we only used the standard Spark executor core or task resource GPU settings, then we'd be stuck either running "too wide" and hitting OOM errors because there isn't sufficient GPU memory to accommodate that many concurrent tasks, or running "too narrow" and not going as fast as a CPU-only cluster could go for the portions of the query that are running only on the CPU. Having both of these configs allow us to run wider during CPU-only sections and narrower during GPU-only sections. |
Beta Was this translation helpful? Give feedback.
-
@jlowe if we need to write out this long explanation instead of pointing to https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/docs/tuning-guide.md#number-of-tasks-per-executor and/or https://github.com/NVIDIA/spark-rapids/blob/branch-22.02/docs/tuning-guide.md#number-of-tasks-per-executor then we probably need to re-write them to make it more clear what is happening. |
Beta Was this translation helpful? Give feedback.
-
@revans2 Totally agree. I already pointed to the tuning guide up above, and since questions remained, I'm using this issue to work out what additional clarifications are needed. Once the questions are all resolved, we can use this issue's discussion as input for what needs to be added to the tuning guide and FAQ. |
Beta Was this translation helpful? Give feedback.
-
@jlowe @revans2 |
Beta Was this translation helpful? Give feedback.
-
Closing this as answered. I posted a FAQ change at #4692 to clarify this a bit further and point to the detailed sections of the tuning guide. |
Beta Was this translation helpful? Give feedback.
Ultimately the number of tasks that can be allowed to run concurrently on the GPU is influenced by both configuration settings, and how many tasks will run concurrently at any point in time depends on what the tasks are trying to do at that moment. I'll try to explain in detail, so here's how those two configuration settings work in practice:
spark.task.resource.gpu.amount
will direct Spark's scheduler to limit how many tasks are allowed to run concurrently on an executor, whether those tasks are actively using the GPU or not. For example, ifspark.task.resource.gpu.amount=0.25
then the executor can run at most 4 tasks at a time, just like whenspark.executor.cores=4
. This is the maximum …