Skip to content

Commit

Permalink
[SPARK-26700][CORE] enable fetch-big-block-to-disk by default
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

This is a followup of #16989

The fetch-big-block-to-disk feature is disabled by default, because it's not compatible with external shuffle service prior to Spark 2.2. The client sends stream request to fetch block chunks, and old shuffle service can't support it.

After 2 years, Spark 2.2 has EOL, and now it's safe to turn on this feature by default

## How was this patch tested?

existing tests

Closes #23625 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
cloud-fan committed Jan 28, 2019
1 parent bd027f6 commit ed71a82
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 19 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -699,17 +699,19 @@ package object config {
private[spark] val MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM =
ConfigBuilder("spark.maxRemoteBlockSizeFetchToMem")
.doc("Remote block will be fetched to disk when size of the block is above this threshold " +
"in bytes. This is to avoid a giant request takes too much memory. We can enable this " +
"config by setting a specific value(e.g. 200m). Note this configuration will affect " +
"both shuffle fetch and block manager remote block fetch. For users who enabled " +
"external shuffle service, this feature can only be worked when external shuffle" +
"service is newer than Spark 2.2.")
"in bytes. This is to avoid a giant request takes too much memory. Note this " +
"configuration will affect both shuffle fetch and block manager remote block fetch. " +
"For users who enabled external shuffle service, this feature can only work when " +
"external shuffle service is at least 2.3.0.")
.bytesConf(ByteUnit.BYTE)
// fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might
// as well use fetch-to-disk in that case. The message includes some metadata in addition
// to the block data itself (in particular UploadBlock has a lot of metadata), so we leave
// extra room.
.createWithDefault(Int.MaxValue - 512)
.checkValue(
_ <= Int.MaxValue - 512,
"maxRemoteBlockSizeFetchToMem cannot be larger than (Int.MaxValue - 512) bytes.")
.createWithDefaultString("200m")

private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
Expand Down
24 changes: 11 additions & 13 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -626,19 +626,6 @@ Apart from these, the following properties are also available, and may be useful
You can mitigate this issue by setting it to a lower value.
</td>
</tr>
<tr>
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
<td>Int.MaxValue - 512</td>
<td>
The remote block will be fetched to disk when size of the block is above this threshold in bytes.
This is to avoid a giant request that takes too much memory. By default, this is only enabled
for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are
available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much
memory on smaller blocks as well. Note this configuration will affect both shuffle fetch
and block manager remote block fetch. For users who enabled external shuffle service,
this feature can only be used when external shuffle service is newer than Spark 2.2.
</td>
</tr>
<tr>
<td><code>spark.shuffle.compress</code></td>
<td>true</td>
Expand Down Expand Up @@ -1519,6 +1506,17 @@ Apart from these, the following properties are also available, and may be useful
you can set larger value.
</td>
</tr>
<tr>
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
<td>200m</td>
<td>
Remote block will be fetched to disk when size of the block is above this threshold
in bytes. This is to avoid a giant request takes too much memory. Note this
configuration will affect both shuffle fetch and block manager remote block fetch.
For users who enabled external shuffle service, this feature can only work when
external shuffle service is at least 2.3.0.
</td>
</tr>
</table>

### Scheduling
Expand Down

0 comments on commit ed71a82

Please sign in to comment.