Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

Closed
wants to merge 1 commit into from
Closed

[SPARK-21253][CORE] Disable use DownloadCallback fetch big blocks #18466

wants to merge 1 commit into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jun 29, 2017

What changes were proposed in this pull request?

DownloadCallback has some issues cause FetchFailedException.

Spark cluster can reproduce, local can't:

  1. Start a spark context with spark.reducer.maxReqSizeShuffleToMem=1K:
$ spark-shell --conf spark.reducer.maxReqSizeShuffleToMem=1K
  1. A shuffle:
scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()

The error messages:

org.apache.spark.shuffle.FetchFailedException: Failed to send request for 1649611690367_2 to yhd-jqhadoop166.int.yihaodian.com/10.17.28.166:7337: java.io.IOException: Connection reset by peer
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
...

Immediately to release 2.2, how about disable DownloadCallback to fetch big blocks?

How was this patch tested?

manual tests because need a spark cluster

@wangyum
Copy link
Member Author

wangyum commented Jun 29, 2017

@cloud-fan
Copy link
Contributor

@jinxing64 how hard is it to fix this? if it's hard let's just disable it for 2.2

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78901 has finished for PR 18466 at commit 9336c15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64
Copy link

@wangyum
Can you reproduce this ?
It would be great if you can give more details.
I cannot reproduce on my local.

@jinxing64
Copy link

@cloud-fan
I'm not sure about the reason of this exception.
With the log, it seems the connection is broken and I guess not related with DownloadCallback?
I support to disable it for 2.2 if this feature is risky.

@wangyum
Copy link
Member Author

wangyum commented Jun 29, 2017

Yes, I reproduce it by Yarn cluster, local mode can't reproduce, It seems DownloadCallback doesn't really work.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 29, 2017

Hi, @wangyum .
Is this a regression with 2.2? I want to reproduce this in YARN cluster, but 2.1.1 looks okay to me.

$ spark-shell --master yarn --conf spark.reducer.maxReqSizeShuffleToMem=1K
...
Spark context Web UI available at http://172.22.115.166:4041
Spark context available as 'sc' (master = yarn, app id = application_1498511273906_0006).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1.2.6.1.0-129
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()
res0: Long = 3000000

@zsxwing
Copy link
Member

zsxwing commented Jun 29, 2017

@wangyum I submitted #18467 to disable this feature via configuration instead. You will be the commit author when it's merged.

@dongjoon-hyun
Copy link
Member

Sorry to bother you guys, but I'm just wondering if I missed something to see this bug.

[hive@hdp26-3 spark]$ bin/spark-shell --master yarn --conf spark.reducer.maxReqSizeShuffleToMem=1K
Spark context Web UI available at http://172.22.115.166:4041
Spark context available as 'sc' (master = yarn, app id = application_1498511273906_0011).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize(0 until 3000000, 10).repartition(2001).count()
res0: Long = 3000000

@zsxwing
Copy link
Member

zsxwing commented Jun 29, 2017

@dongjoon-hyun I have not yet figured out the root cause of this issue. The major reason to disable it is this feature breaks old shuffle service.

@dongjoon-hyun
Copy link
Member

Thank you, @zsxwing . I agree with both this and #18467 . Please proceed to disable this.
Since this is not a revert, I just want to know how to check this later when we recover this feature back.

@zsxwing
Copy link
Member

zsxwing commented Jun 30, 2017

FYI, I'm fixing the root issue in #18472

@dongjoon-hyun
Copy link
Member

Wow, great!

@jinxing64
Copy link

It's great job ! 👍

@asfgit asfgit closed this in 80f7ac3 Jun 30, 2017
asfgit pushed a commit that referenced this pull request Jun 30, 2017
Disable spark.reducer.maxReqSizeShuffleToMem because it breaks the old shuffle service.

Credits to wangyum

Closes #18466

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>
Author: Yuming Wang <wgyumg@gmail.com>

Closes #18467 from zsxwing/SPARK-21253.

(cherry picked from commit 80f7ac3)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@wangyum
Copy link
Member Author

wangyum commented Jun 30, 2017

@dongjoon-hyun Try the following to reproduce, I missed spark.serializer=org.apache.spark.serializer.KryoSerializer, this is my default config:

spark-shell --conf spark.reducer.maxReqSizeShuffleToMem=1K --conf spark.serializer=org.apache.spark.serializer.KryoSerializer

@dongjoon-hyun
Copy link
Member

Thank you, @wangyum . I'll try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants