[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

wangyum · 2017-06-29T10:55:38Z

What changes were proposed in this pull request?

DownloadCallback has some issues cause FetchFailedException.

Spark cluster can reproduce, local can't:

Start a spark context with spark.reducer.maxReqSizeShuffleToMem=1K:

$ spark-shell --conf spark.reducer.maxReqSizeShuffleToMem=1K

A shuffle:

scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()

The error messages:

org.apache.spark.shuffle.FetchFailedException: Failed to send request for 1649611690367_2 to yhd-jqhadoop166.int.yihaodian.com/10.17.28.166:7337: java.io.IOException: Connection reset by peer
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
...

Immediately to release 2.2, how about disable DownloadCallback to fetch big blocks?

How was this patch tested?

manual tests because need a spark cluster

wangyum · 2017-06-29T10:59:30Z

@jinxing64 @zsxwing @cloud-fan

cloud-fan · 2017-06-29T13:24:21Z

@jinxing64 how hard is it to fix this? if it's hard let's just disable it for 2.2

SparkQA · 2017-06-29T13:49:37Z

Test build #78901 has finished for PR 18466 at commit 9336c15.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jinxing64 · 2017-06-29T14:22:43Z

@wangyum
Can you reproduce this ?
It would be great if you can give more details.
I cannot reproduce on my local.

jinxing64 · 2017-06-29T14:25:24Z

@cloud-fan
I'm not sure about the reason of this exception.
With the log, it seems the connection is broken and I guess not related with DownloadCallback?
I support to disable it for 2.2 if this feature is risky.

wangyum · 2017-06-29T15:58:15Z

Yes, I reproduce it by Yarn cluster, local mode can't reproduce, It seems DownloadCallback doesn't really work.

dongjoon-hyun · 2017-06-29T17:37:18Z

Hi, @wangyum .
Is this a regression with 2.2? I want to reproduce this in YARN cluster, but 2.1.1 looks okay to me.

$ spark-shell --master yarn --conf spark.reducer.maxReqSizeShuffleToMem=1K
...
Spark context Web UI available at http://172.22.115.166:4041
Spark context available as 'sc' (master = yarn, app id = application_1498511273906_0006).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1.2.6.1.0-129
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()
res0: Long = 3000000

zsxwing · 2017-06-29T19:05:02Z

@wangyum I submitted #18467 to disable this feature via configuration instead. You will be the commit author when it's merged.

dongjoon-hyun · 2017-06-29T19:09:41Z

Sorry to bother you guys, but I'm just wondering if I missed something to see this bug.

[hive@hdp26-3 spark]$ bin/spark-shell --master yarn --conf spark.reducer.maxReqSizeShuffleToMem=1K
Spark context Web UI available at http://172.22.115.166:4041
Spark context available as 'sc' (master = yarn, app id = application_1498511273906_0011).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()
res0: Long = 3000000

zsxwing · 2017-06-29T19:41:05Z

@dongjoon-hyun I have not yet figured out the root cause of this issue. The major reason to disable it is this feature breaks old shuffle service.

dongjoon-hyun · 2017-06-29T19:58:23Z

Thank you, @zsxwing . I agree with both this and #18467 . Please proceed to disable this.
Since this is not a revert, I just want to know how to check this later when we recover this feature back.

zsxwing · 2017-06-30T00:02:12Z

FYI, I'm fixing the root issue in #18472

dongjoon-hyun · 2017-06-30T00:22:14Z

Wow, great!

jinxing64 · 2017-06-30T02:04:20Z

It's great job ! 👍

Disable spark.reducer.maxReqSizeShuffleToMem because it breaks the old shuffle service. Credits to wangyum Closes #18466 Jenkins Author: Shixiong Zhu <shixiong@databricks.com> Author: Yuming Wang <wgyumg@gmail.com> Closes #18467 from zsxwing/SPARK-21253. (cherry picked from commit 80f7ac3) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

wangyum · 2017-06-30T14:32:15Z

@dongjoon-hyun Try the following to reproduce, I missed spark.serializer=org.apache.spark.serializer.KryoSerializer, this is my default config:

spark-shell --conf spark.reducer.maxReqSizeShuffleToMem=1K --conf spark.serializer=org.apache.spark.serializer.KryoSerializer

dongjoon-hyun · 2017-06-30T14:46:35Z

Thank you, @wangyum . I'll try.

Disable use DownloadCallback fetch big blocks

9336c15

zsxwing mentioned this pull request Jun 29, 2017

[SPARK-19659][Core]Disable spark.reducer.maxReqSizeShuffleToMem #18467

Closed

asfgit closed this in 80f7ac3 Jun 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

wangyum commented Jun 29, 2017 •

edited

Loading

wangyum commented Jun 29, 2017 •

edited

Loading

cloud-fan commented Jun 29, 2017

SparkQA commented Jun 29, 2017

jinxing64 commented Jun 29, 2017

jinxing64 commented Jun 29, 2017

wangyum commented Jun 29, 2017

dongjoon-hyun commented Jun 29, 2017 •

edited

Loading

zsxwing commented Jun 29, 2017 •

edited

Loading

dongjoon-hyun commented Jun 29, 2017

zsxwing commented Jun 29, 2017

dongjoon-hyun commented Jun 29, 2017

zsxwing commented Jun 30, 2017

dongjoon-hyun commented Jun 30, 2017

jinxing64 commented Jun 30, 2017

wangyum commented Jun 30, 2017

dongjoon-hyun commented Jun 30, 2017

[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

Conversation

wangyum commented Jun 29, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

wangyum commented Jun 29, 2017 • edited Loading

cloud-fan commented Jun 29, 2017

SparkQA commented Jun 29, 2017

jinxing64 commented Jun 29, 2017

jinxing64 commented Jun 29, 2017

wangyum commented Jun 29, 2017

dongjoon-hyun commented Jun 29, 2017 • edited Loading

zsxwing commented Jun 29, 2017 • edited Loading

dongjoon-hyun commented Jun 29, 2017

zsxwing commented Jun 29, 2017

dongjoon-hyun commented Jun 29, 2017

zsxwing commented Jun 30, 2017

dongjoon-hyun commented Jun 30, 2017

jinxing64 commented Jun 30, 2017

wangyum commented Jun 30, 2017

dongjoon-hyun commented Jun 30, 2017

wangyum commented Jun 29, 2017 •

edited

Loading

wangyum commented Jun 29, 2017 •

edited

Loading

dongjoon-hyun commented Jun 29, 2017 •

edited

Loading

zsxwing commented Jun 29, 2017 •

edited

Loading