Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4488][PySpark] Add control over map-side aggregation #3366

Closed
wants to merge 3 commits into from

Conversation

uncleGen
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Nov 19, 2014

Test build #23611 has started for PR 3366 at commit 66561d4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 19, 2014

Test build #23611 has finished for PR 3366 at commit 66561d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23611/
Test PASSed.

@uncleGen
Copy link
Contributor Author

@davies Could you help reviewing this patch? Thank you!

@davies
Copy link
Contributor

davies commented Nov 20, 2014

What's the cases that we should disable map side aggregation?

@JoshRosen
Copy link
Contributor

@uncleGen Could you comment here to provide examples of when it's beneficial to disable map-side aggregation? If there is a legitimate case for disabling it, then we should add this option in Scala / Java as well. Otherwise, do you mind closing this pull request?

@davies
Copy link
Contributor

davies commented Dec 1, 2014

@JoshRosen We already have this in Scala/Java.

@JoshRosen
Copy link
Contributor

@JoshRosen We already have this in Scala/Java.

What about reduceByKey? I don't see a variant with a flag for disabling map-side combining: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L262. We definitely have the mapSideCombine option for combineByKey but not for reduceByKey.

I guess I kind of pattern-matched on the reduceByKey in my earlier comment; the combineByKey flag makes sense and we should definitely include that for feature-parity.

@asfgit asfgit closed this in 06dc1b1 Dec 1, 2014
@JoshRosen
Copy link
Contributor

Actually, let's re-open this one since part of it should still go in.

@uncleGen uncleGen deleted the master-pyspark branch March 24, 2015 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants