SPARK-4230. Doc for spark.default.parallelism is incorrect #3107

sryza · 2014-11-05T07:35:22Z

No description provided.

SparkQA · 2014-11-05T07:37:42Z

Test build #22921 has started for PR 3107 at commit 14ca79b.

This patch merges cleanly.

SparkQA · 2014-11-05T09:02:45Z

Test build #22921 has finished for PR 3107 at commit 14ca79b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-05T09:02:49Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22921/
Test PASSed.

pwendell · 2014-11-09T18:31:31Z

docs/configuration.md

@@ -563,8 +566,8 @@ Apart from these, the following properties are also available, and may be useful
    </ul>
  </td>
  <td>
-    Default number of tasks to use across the cluster for distributed shuffle operations
-    (<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user.
+    Default number of output partitions for operations like <code>join</code>,


Should this say "number of shuffle partitions" - it's slightly weird to me to say "output" when this refers to something that is totally internal to Spark - it's output on the map side but input on he read side. In other cases I think output tends to mean things like saving as HDFS data, etc.

My thinking was that Spark's APIs have no mention of the concept of a "shuffle partition" (e.g. the term is referenced nowhere on https://spark.apache.org/docs/latest/programming-guide.html), but even novice Spark users are meant to understand that every transformation has input and output RDDs and that every RDD has a number of partitions.

Maybe "Default number of partitions for the RDDs produced by operations like ..."?

Ah I see - what about "Default number of partitions in RDD's returned by join, reduceByKey..."

pwendell · 2014-11-09T18:34:58Z

Had some minor wording questions.

SparkQA · 2014-11-10T02:50:06Z

Test build #23129 has started for PR 3107 at commit 37a1d19.

This patch merges cleanly.

SparkQA · 2014-11-10T03:59:36Z

Test build #23129 has finished for PR 3107 at commit 37a1d19.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-10T03:59:40Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23129/
Test FAILed.

sryza · 2014-11-10T04:22:31Z

Test failure looks unrelated

pwendell · 2014-11-10T20:40:29Z

LG - pulling it in.

Author: Sandy Ryza <sandy@cloudera.com> Closes #3107 from sryza/sandy-spark-4230 and squashes the following commits: 37a1d19 [Sandy Ryza] Clear up a couple things 34d53de [Sandy Ryza] SPARK-4230. Doc for spark.default.parallelism is incorrect (cherry picked from commit c6f4e70) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

pwendell reviewed Nov 9, 2014
View reviewed changes

sryza added 2 commits November 9, 2014 18:39

SPARK-4230. Doc for spark.default.parallelism is incorrect

34d53de

Clear up a couple things

37a1d19

sryza force-pushed the sandy-spark-4230 branch from 14ca79b to 37a1d19 Compare November 10, 2014 02:43

asfgit closed this in c6f4e70 Nov 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-4230. Doc for spark.default.parallelism is incorrect #3107

SPARK-4230. Doc for spark.default.parallelism is incorrect #3107

sryza commented Nov 5, 2014

SparkQA commented Nov 5, 2014

SparkQA commented Nov 5, 2014

AmplabJenkins commented Nov 5, 2014

pwendell Nov 9, 2014

sryza Nov 9, 2014

pwendell Nov 10, 2014

pwendell commented Nov 9, 2014

SparkQA commented Nov 10, 2014

SparkQA commented Nov 10, 2014

AmplabJenkins commented Nov 10, 2014

sryza commented Nov 10, 2014

pwendell commented Nov 10, 2014

SPARK-4230. Doc for spark.default.parallelism is incorrect #3107

SPARK-4230. Doc for spark.default.parallelism is incorrect #3107

Conversation

sryza commented Nov 5, 2014

SparkQA commented Nov 5, 2014

SparkQA commented Nov 5, 2014

AmplabJenkins commented Nov 5, 2014

pwendell Nov 9, 2014

Choose a reason for hiding this comment

sryza Nov 9, 2014

Choose a reason for hiding this comment

pwendell Nov 10, 2014

Choose a reason for hiding this comment

pwendell commented Nov 9, 2014

SparkQA commented Nov 10, 2014

SparkQA commented Nov 10, 2014

AmplabJenkins commented Nov 10, 2014

sryza commented Nov 10, 2014

pwendell commented Nov 10, 2014