You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
Describe the feature
At least three situations can trigger a large number of small blocks in shuffle map task:
Data skew: When the data processed by the map task mostly belongs to a few reduce tasks, the sum of the data for the few reduce tasks exceeds the spill threshold, causing the remaining long-tail reduce tasks to spill together passively.
The number of reduce tasks is extremely large (>10k): When each reduce partition has very little data, it will also cause the buffer of the ShuffleBufferManager of the shuffle map task to be occupied beyond the spill threshold, thereby producing a large number of small blocks.
When an executor can run multiple tasks at the same time, there is memory competition between tasks. At this time, some tasks may not get enough memory because they start later than other tasks, but the overall memory of the executor is insufficient to cause ShuffleBufferManager to spill as a MemoryConsumer. In fact, ShuffleBufferManager may only store a small amount of data for each reduce partition.
Problems caused by small blocks:
Unnecessary network overhead: The amount of data sent in a single transmission is reduced, increasing the number of network interactions between the executor and the shuffle server. The number of interactions can expand up to 10 times or even 100 times.
Increase the size of the index file when the shuffle server stores shuffle data persistently.
Motivation
No response
Describe the solution
when spilling shuffle data, we just spill part of the reduce partition datas which hold the major space.
so, in each spilling process, the WriteBufferManager.clear() method should implement one more logic: sort the to-be spilled buffers by their size and select the top-N buffers to spill.
Additional context
No response
Are you willing to submit PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
With this feature, the buffer size of a single partition(spark.rss.writer.buffer.size) can be increased to a larger value, for example, from the current default configuration of 3MB to 10MB or more.
…fle map task by spill partial partitions data (#1670)
### What changes were proposed in this pull request?
when spilling shuffle data, we just spill part of the reduce partition datas which hold the major space.
so, in each spilling process, the WriteBufferManager.clear() method should implement one more logic: sort the to-be spilled buffers by their size and select the top-N buffers to spill.
### Why are the changes needed?
related feature #1594
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
new UTs.
---------
Co-authored-by: leslizhang <leslizhang@tencent.com>
Code of Conduct
Search before asking
Describe the feature
At least three situations can trigger a large number of small blocks in shuffle map task:
ShuffleBufferManager
of the shuffle map task to be occupied beyond the spill threshold, thereby producing a large number of small blocks.ShuffleBufferManager
to spill as a MemoryConsumer. In fact,ShuffleBufferManager
may only store a small amount of data for each reduce partition.Problems caused by small blocks:
Unnecessary network overhead: The amount of data sent in a single transmission is reduced, increasing the number of network interactions between the executor and the shuffle server. The number of interactions can expand up to 10 times or even 100 times.
Increase the size of the index file when the shuffle server stores shuffle data persistently.
Motivation
No response
Describe the solution
when spilling shuffle data, we just spill part of the reduce partition datas which hold the major space.
so, in each spilling process, the WriteBufferManager.clear() method should implement one more logic: sort the to-be spilled buffers by their size and select the top-N buffers to spill.
Additional context
No response
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: