Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-4230. Doc for spark.default.parallelism is incorrect #3107

Closed
wants to merge 2 commits into from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Nov 5, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22921 has started for PR 3107 at commit 14ca79b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22921 has finished for PR 3107 at commit 14ca79b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22921/
Test PASSed.

@@ -563,8 +566,8 @@ Apart from these, the following properties are also available, and may be useful
</ul>
</td>
<td>
Default number of tasks to use across the cluster for distributed shuffle operations
(<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user.
Default number of output partitions for operations like <code>join</code>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "number of shuffle partitions" - it's slightly weird to me to say "output" when this refers to something that is totally internal to Spark - it's output on the map side but input on he read side. In other cases I think output tends to mean things like saving as HDFS data, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that Spark's APIs have no mention of the concept of a "shuffle partition" (e.g. the term is referenced nowhere on https://spark.apache.org/docs/latest/programming-guide.html), but even novice Spark users are meant to understand that every transformation has input and output RDDs and that every RDD has a number of partitions.

Maybe "Default number of partitions for the RDDs produced by operations like ..."?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - what about "Default number of partitions in RDD's returned by join, reduceByKey..."

@pwendell
Copy link
Contributor

pwendell commented Nov 9, 2014

Had some minor wording questions.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23129 has started for PR 3107 at commit 37a1d19.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23129 has finished for PR 3107 at commit 37a1d19.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23129/
Test FAILed.

@sryza
Copy link
Contributor Author

sryza commented Nov 10, 2014

Test failure looks unrelated

@pwendell
Copy link
Contributor

LG - pulling it in.

@asfgit asfgit closed this in c6f4e70 Nov 10, 2014
asfgit pushed a commit that referenced this pull request Nov 10, 2014
Author: Sandy Ryza <sandy@cloudera.com>

Closes #3107 from sryza/sandy-spark-4230 and squashes the following commits:

37a1d19 [Sandy Ryza] Clear up a couple things
34d53de [Sandy Ryza] SPARK-4230. Doc for spark.default.parallelism is incorrect

(cherry picked from commit c6f4e70)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants