Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4691][shuffle] Restructure a few lines in shuffle code #3553

Closed
wants to merge 3 commits into from

Conversation

maji2014
Copy link
Contributor

@maji2014 maji2014 commented Dec 2, 2014

In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"

In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -45,7 +45,7 @@ private[spark] class HashShuffleReader[K, C](
} else {
new InterruptibleIterator(context, dep.aggregator.get.combineValuesByKey(iter, context))
}
} else if (dep.aggregator.isEmpty && dep.mapSideCombine) {
} else if (dep.mapSideCombine) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous way is much more clear and obvious from my understanding :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if(dep.aggregator.isDefined) else if (dep.aggregator.isEmpty)" seems duplicate. isEmpty == !isDefined
We need to do another one more judgement for "dep.aggregator.isEmpty".
Also, in SortShuffleWriter.scala, i think "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also write this as

if (dep.aggregator.isDefined) {
  ...
} else {
  require(!dep.mapSideCombine, "Map-side combine requested without Aggregator specified!")

  // Convert the Product2s to pairs since this is what downstream RDDs currently expect
  iter.asInstanceOf[Iterator[Product2[K, C]]].map(pair => (pair._1, pair._2))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems simple and elegant

@@ -50,7 +50,7 @@ private[spark] class SortShuffleWriter[K, V, C](
/** Write a bunch of records to this task's output */
override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {
if (dep.mapSideCombine) {
if (!dep.aggregator.isDefined) {
if (dep.aggregator.isEmpty) {
throw new IllegalStateException("Aggregator is empty for map-side combine")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could change this guy to a require as well, and maybe change the message too

@andrewor14
Copy link
Contributor

Ok LGTM once you fix Aaron's comment

@maji2014
Copy link
Contributor Author

maji2014 commented Dec 4, 2014

Done for that.

@andrewor14
Copy link
Contributor

add to whitelist. Just realized we never started tests for this

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24125 has started for PR 3553 at commit bf7b14d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24125 has finished for PR 3553 at commit bf7b14d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24125/
Test PASSed.

@pwendell
Copy link
Contributor

pwendell commented Dec 4, 2014

Can we get a better title for this issue?

@maji2014
Copy link
Contributor Author

maji2014 commented Dec 4, 2014

Done as new title

@maji2014 maji2014 changed the title [spark-4691][shuffle]code optimization for judgement [spark-4691][shuffle]Code improvement for aggregator and mapSideCombine judgement Dec 4, 2014
@maji2014
Copy link
Contributor Author

maji2014 commented Dec 5, 2014

@pwendell any idea about this title?/

@andrewor14
Copy link
Contributor

I would use [SPARK-4691][Minor] Rewrite a few lines in shuffle code. Not a big deal if you don't change this. I'll merge this once we cut the new 1.2 release.

@aarondav
Copy link
Contributor

aarondav commented Dec 5, 2014

Should probably just merge it into master, and thus it's independent of the 1.2 release, right?

@maji2014 maji2014 changed the title [spark-4691][shuffle]Code improvement for aggregator and mapSideCombine judgement [SPARK-4691][Minor] Rewrite a few lines in shuffle code Dec 6, 2014
@maji2014
Copy link
Contributor Author

maji2014 commented Dec 6, 2014

NP, done for title change, priority is defined in jira

@maji2014 maji2014 changed the title [SPARK-4691][Minor] Rewrite a few lines in shuffle code [SPARK-4691][shuffle] Restructure a few lines in shuffle code Dec 7, 2014
@andrewor14
Copy link
Contributor

@aarondav I wanted to back port this into branch-1.2 as well. It would be good to minimize the divergence between master and 1.2 if possible.

I'm merging this into master now and I'll mark it backport-needed on the JIRA. Thanks.

@asfgit asfgit closed this in b310744 Dec 9, 2014
@JoshRosen
Copy link
Contributor

I've merged this into branch-1.2.

asfgit pushed a commit that referenced this pull request Dec 17, 2014
In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"

In SortShuffleWriter.scala, "dep.aggregator.isEmpty"  is better than "!dep.aggregator.isDefined" ?

Author: maji2014 <maji3@asiainfo.com>

Closes #3553 from maji2014/spark-4691 and squashes the following commits:

bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala
10d0cf0 [maji2014] change a elegant way
d8f52dc [maji2014] code optimization for judgement

(cherry picked from commit b310744)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants