[SPARK-4691][shuffle] Restructure a few lines in shuffle code #3553

maji2014 · 2014-12-02T09:59:27Z

In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"

In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ?

AmplabJenkins · 2014-12-02T10:02:11Z

Can one of the admins verify this patch?

jerryshao · 2014-12-03T01:00:27Z

core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleReader.scala

@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
      } else {
        new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
      }
-    } else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
+    } else if (dep.mapSideCombine) {


I think the previous way is much more clear and obvious from my understanding :-).

"if(dep.aggregator.isDefined) else if (dep.aggregator.isEmpty)" seems duplicate. isEmpty == !isDefined
We need to do another one more judgement for "dep.aggregator.isEmpty".
Also, in SortShuffleWriter.scala, i think "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined".

Could also write this as

if (dep.aggregator.isDefined) { ... } else { require(!dep.mapSideCombine, "Map-side combine requested without Aggregator specified!") // Convert the Product2s to pairs since this is what downstream RDDs currently expect iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, pair._2)) }

Yes, this seems simple and elegant

aarondav · 2014-12-03T17:52:29Z

core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala

@@ -50,7 +50,7 @@ private[spark] class SortShuffleWriter[K, V, C](
  /** Write a bunch of records to this task's output */
  override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {
    if (dep.mapSideCombine) {
-      if (!dep.aggregator.isDefined) {
+      if (dep.aggregator.isEmpty) {
        throw new IllegalStateException("Aggregator is empty for map-side combine")


could change this guy to a require as well, and maybe change the message too

andrewor14 · 2014-12-03T20:12:18Z

Ok LGTM once you fix Aaron's comment

maji2014 · 2014-12-04T01:30:34Z

Done for that.

andrewor14 · 2014-12-04T03:28:30Z

add to whitelist. Just realized we never started tests for this

SparkQA · 2014-12-04T03:35:24Z

Test build #24125 has started for PR 3553 at commit bf7b14d.

This patch merges cleanly.

SparkQA · 2014-12-04T04:57:19Z

Test build #24125 has finished for PR 3553 at commit bf7b14d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-04T04:57:22Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24125/
Test PASSed.

pwendell · 2014-12-04T05:43:16Z

Can we get a better title for this issue?

maji2014 · 2014-12-04T05:46:23Z

Done as new title

maji2014 · 2014-12-05T08:45:14Z

@pwendell any idea about this title?/

andrewor14 · 2014-12-05T22:05:04Z

I would use [SPARK-4691][Minor] Rewrite a few lines in shuffle code. Not a big deal if you don't change this. I'll merge this once we cut the new 1.2 release.

aarondav · 2014-12-05T22:10:10Z

Should probably just merge it into master, and thus it's independent of the 1.2 release, right?

maji2014 · 2014-12-06T16:52:04Z

NP, done for title change, priority is defined in jira

andrewor14 · 2014-12-09T21:12:50Z

@aarondav I wanted to back port this into branch-1.2 as well. It would be good to minimize the divergence between master and 1.2 if possible.

I'm merging this into master now and I'll mark it backport-needed on the JIRA. Thanks.

JoshRosen · 2014-12-17T20:15:35Z

I've merged this into branch-1.2.

In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined" In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ? Author: maji2014 <maji3@asiainfo.com> Closes #3553 from maji2014/spark-4691 and squashes the following commits: bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala 10d0cf0 [maji2014] change a elegant way d8f52dc [maji2014] code optimization for judgement (cherry picked from commit b310744) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

code optimization for judgement

d8f52dc

jerryshao reviewed Dec 3, 2014
View reviewed changes

change a elegant way

10d0cf0

aarondav reviewed Dec 3, 2014
View reviewed changes

change a elegant way for SortShuffleWriter.scala

bf7b14d

maji2014 changed the title ~~[spark-4691][shuffle]code optimization for judgement~~ [spark-4691][shuffle]Code improvement for aggregator and mapSideCombine judgement Dec 4, 2014

maji2014 changed the title ~~[spark-4691][shuffle]Code improvement for aggregator and mapSideCombine judgement~~ [SPARK-4691][Minor] Rewrite a few lines in shuffle code Dec 6, 2014

maji2014 changed the title ~~[SPARK-4691][Minor] Rewrite a few lines in shuffle code~~ [SPARK-4691][shuffle] Restructure a few lines in shuffle code Dec 7, 2014

asfgit closed this in b310744 Dec 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4691][shuffle] Restructure a few lines in shuffle code #3553

[SPARK-4691][shuffle] Restructure a few lines in shuffle code #3553

maji2014 commented Dec 2, 2014

AmplabJenkins commented Dec 2, 2014

jerryshao Dec 3, 2014

maji2014 Dec 3, 2014

aarondav Dec 3, 2014

maji2014 Dec 3, 2014

aarondav Dec 3, 2014

andrewor14 commented Dec 3, 2014

maji2014 commented Dec 4, 2014

andrewor14 commented Dec 4, 2014

SparkQA commented Dec 4, 2014

SparkQA commented Dec 4, 2014

AmplabJenkins commented Dec 4, 2014

pwendell commented Dec 4, 2014

maji2014 commented Dec 4, 2014

maji2014 commented Dec 5, 2014

andrewor14 commented Dec 5, 2014

aarondav commented Dec 5, 2014

maji2014 commented Dec 6, 2014

andrewor14 commented Dec 9, 2014

JoshRosen commented Dec 17, 2014

[SPARK-4691][shuffle] Restructure a few lines in shuffle code #3553

[SPARK-4691][shuffle] Restructure a few lines in shuffle code #3553

Conversation

maji2014 commented Dec 2, 2014

AmplabJenkins commented Dec 2, 2014

jerryshao Dec 3, 2014

Choose a reason for hiding this comment

maji2014 Dec 3, 2014

Choose a reason for hiding this comment

aarondav Dec 3, 2014

Choose a reason for hiding this comment

maji2014 Dec 3, 2014

Choose a reason for hiding this comment

aarondav Dec 3, 2014

Choose a reason for hiding this comment

andrewor14 commented Dec 3, 2014

maji2014 commented Dec 4, 2014

andrewor14 commented Dec 4, 2014

SparkQA commented Dec 4, 2014

SparkQA commented Dec 4, 2014

AmplabJenkins commented Dec 4, 2014

pwendell commented Dec 4, 2014

maji2014 commented Dec 4, 2014

maji2014 commented Dec 5, 2014

andrewor14 commented Dec 5, 2014

aarondav commented Dec 5, 2014

maji2014 commented Dec 6, 2014

andrewor14 commented Dec 9, 2014

JoshRosen commented Dec 17, 2014