Clean up and simplify Spark configuration #299

pwendell · 2014-04-02T06:13:39Z

Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run spark-submit to launch applications. This is a WIP patch but it makes the following improvements:

Improved spark-env.sh.template which was missing a lot of things users now set in that file.
Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
Adds ability to set these same variables for the driver using spark-submit.
Allows you to load system properties from a spark-defaults.conf file when running spark-submit. This will allow setting both SparkConf options and other system properties utilized by spark-submit.
Made SPARK_LOCAL_IP an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

AmplabJenkins · 2014-04-02T06:17:23Z

Merged build triggered.

AmplabJenkins · 2014-04-02T06:17:29Z

Merged build started.

AmplabJenkins · 2014-04-02T07:23:56Z

Merged build finished.

AmplabJenkins · 2014-04-02T07:23:56Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13670/

aarondav · 2014-04-02T18:27:58Z

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala

+
+    // Set extra Java options for the executor
+    val executorOpts = sys.props.find(_._1.contains("spark.executor.extraJavaOptions"))
+    JAVA_OPTS += executorOpts


I think this returns an Option? String concatenation may not be the correct solution here.

mridulm · 2014-04-03T01:08:40Z

We will need to continue support for a few of these (SPARK_JAVA_OPTS for example) - which are currently getting pretty heavily used.
We can issue a deprecated warning to the user, but removing them will break too many cron'ed jobs already running.

aarondav · 2014-04-06T01:18:36Z

bin/run-example

@@ -75,7 +75,6 @@ fi

 # Set JAVA_OPTS to be able to load native libraries and to set heap size
 JAVA_OPTS="$SPARK_JAVA_OPTS"
-JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"


It seems there is currently no way to set the library path for examples (save via SPARK_JAVA_OPTS), do we need one?

andrewor14 · 2014-04-07T17:57:18Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+    val properties = new Properties()
+    properties.load(inputStream)
+    properties.stringPropertyNames().toSeq.map(k => (k, properties(k)))
+  }


Would be good to add a try catch here (or just throw a nice exception)

AmplabJenkins · 2014-04-21T04:09:13Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14276/

AmplabJenkins · 2014-04-21T04:28:12Z

Merged build triggered.

AmplabJenkins · 2014-04-21T04:28:19Z

Merged build started.

AmplabJenkins · 2014-04-21T05:03:13Z

Merged build finished.

AmplabJenkins · 2014-04-21T05:03:14Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14277/

AmplabJenkins · 2014-04-21T06:23:12Z

Merged build triggered.

AmplabJenkins · 2014-04-21T06:23:19Z

Merged build started.

AmplabJenkins · 2014-04-21T07:04:22Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-21T07:04:22Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14281/

pwendell · 2014-04-21T17:23:09Z

Okay - I played around with this a bunch this weekend and added some more tests. Unfortunately this type of thing is hard to unit test well since the changes are pervasive throughout the code base and affect things exogenous to the creation of a SparkContext.

We'll continue testing this in the next few days during QA. For now I think I'm going to merge this.

sryza · 2014-04-21T17:28:32Z

Is this tied to a Spark JIRA?

Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements: 1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file. 2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath. 3. Adds ability to set these same variables for the driver using `spark-submit`. 4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`. 5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node. Author: Patrick Wendell <pwendell@gmail.com> Closes #299 from pwendell/config-cleanup and squashes the following commits: 127f301 [Patrick Wendell] Improvements to testing a006464 [Patrick Wendell] Moving properties file template. b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf 0086939 [Patrick Wendell] Minor style fixes af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide af0adf7 [Patrick Wendell] Automatically add user jar a56b125 [Patrick Wendell] Responses to Tom's review d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup a762901 [Patrick Wendell] Fixing test failures ffa00fe [Patrick Wendell] Review feedback fda0301 [Patrick Wendell] Note 308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN e83cd8f [Patrick Wendell] Changes to allow re-use of test applications be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set c2a2909 [Patrick Wendell] Test compile fixes 4ee6f9d [Patrick Wendell] Making YARN doc changes consistent afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors. b08893b [Patrick Wendell] Additional improvements. ace4ead [Patrick Wendell] Responses to review feedback. b72d183 [Patrick Wendell] Review feedback for spark env file 46555c1 [Patrick Wendell] Review feedback and import clean-ups 437aed1 [Patrick Wendell] Small fix 761ebcd [Patrick Wendell] Library path and classpath for drivers 7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script 5b0ba8e [Patrick Wendell] Don't ship executor envs 84cc5e5 [Patrick Wendell] Small clean-up 1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings 4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH 6eaf7d0 [Patrick Wendell] executorJavaOpts 0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS (cherry picked from commit fb98488) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

tgravescs · 2014-04-22T14:32:01Z

I don't see where you addressed my comment about spark.authenticate not being propogated to the executors on yarn properly. I gave it a quick try and its not working.

pwendell · 2014-04-23T01:11:01Z

@tgravescs I'll fix this - sorry I thought it worked.

witgo · 2014-04-23T11:27:42Z

SPARK DAEMON_OPTS seems to have no effect

witgo · 2014-04-23T11:31:33Z

./bin/spark-submit /opt/spark/classes/toona-assembly-assembly-1.0.0-SNAPSHOT.jar --master spark://spark:7077 --deploy-mode client --class com.zhe800.toona.als.computation.DealCF --arg hdfs://192.168.10.39:8020/user/hadoop/testdata/20140304/*  --arg /opt/spark/toona/uidIndex --arg /opt/spark/toona/rating --arg /opt/spark/toona/model --verbose

=>

Using properties file: /opt/spark/spark-1.0.0-cdh3/conf/spark-defaults.conf
Adding default property: spark.executor.memory=13g
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.eventLog.dir=/opt/spark/logs/
Adding default property: spark.master=spark://spark:7077
Using properties file: /opt/spark/spark-1.0.0-cdh3/conf/spark-defaults.conf
Adding default property: spark.executor.memory=13g
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.eventLog.dir=/opt/spark/logs/
Adding default property: spark.master=spark://spark:7077

why ?

Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements: 1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file. 2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath. 3. Adds ability to set these same variables for the driver using `spark-submit`. 4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`. 5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node. Author: Patrick Wendell <pwendell@gmail.com> Closes apache#299 from pwendell/config-cleanup and squashes the following commits: 127f301 [Patrick Wendell] Improvements to testing a006464 [Patrick Wendell] Moving properties file template. b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf 0086939 [Patrick Wendell] Minor style fixes af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide af0adf7 [Patrick Wendell] Automatically add user jar a56b125 [Patrick Wendell] Responses to Tom's review d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup a762901 [Patrick Wendell] Fixing test failures ffa00fe [Patrick Wendell] Review feedback fda0301 [Patrick Wendell] Note 308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN e83cd8f [Patrick Wendell] Changes to allow re-use of test applications be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set c2a2909 [Patrick Wendell] Test compile fixes 4ee6f9d [Patrick Wendell] Making YARN doc changes consistent afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors. b08893b [Patrick Wendell] Additional improvements. ace4ead [Patrick Wendell] Responses to review feedback. b72d183 [Patrick Wendell] Review feedback for spark env file 46555c1 [Patrick Wendell] Review feedback and import clean-ups 437aed1 [Patrick Wendell] Small fix 761ebcd [Patrick Wendell] Library path and classpath for drivers 7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script 5b0ba8e [Patrick Wendell] Don't ship executor envs 84cc5e5 [Patrick Wendell] Small clean-up 1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings 4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH 6eaf7d0 [Patrick Wendell] executorJavaOpts 0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS

liyezhang556520 · 2014-07-17T08:36:44Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala

-        "\"%s\" org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %d".format(
-          runScript, driverUrl, offer.getSlaveId.getValue, offer.getHostname, numCores))
+        "\"%s\" org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %s %d".format(
+          runScript, extraOpts, driverUrl, offer.getSlaveId.getValue, offer.getHostname, numCores))


@pwendell extraOpts will be read as arguments of org.apache.spark.executor.CoarseGrainedExecutorBackend, which will lead to parse error if extraOpts is not empty. Because the 4th argument of CoarseGrainedExecutorBackend is Int type, and if extraOpts is not empty, other values instead of numCores will be regarded as 4th argument to cast to Int and cause error. Do I have any misunderstanding of this:)

Merge upstream

…is reused ## What changes were proposed in this pull request? With this change, we can easily identify the plan difference when subquery is reused. When the reuse is enabled, the plan looks like ``` == Physical Plan == CollectLimit 1 +- *(1) Project [(Subquery subquery240 + ReusedSubquery Subquery subquery240) AS (scalarsubquery() + scalarsubquery())#253] : :- Subquery subquery240 : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#250]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#256, count#257L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- ReusedSubquery Subquery subquery240 +- *(1) SerializeFromObject +- Scan[obj#12] ``` When the reuse is disabled, the plan looks like ``` == Physical Plan == CollectLimit 1 +- *(1) Project [(Subquery subquery286 + Subquery subquery287) AS (scalarsubquery() + scalarsubquery())#299] : :- Subquery subquery286 : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#296]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#302, count#303L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- Subquery subquery287 : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)#298]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#306, count#307L]) : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : +- Scan[obj#12] +- *(1) SerializeFromObject +- Scan[obj#12] ``` ## How was this patch tested? Modified the existing test. Closes #24258 from gatorsmile/followupSPARK-27279. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

Add access key and secret key for huaweicloud credentials

…is reused With this change, we can easily identify the plan difference when subquery is reused. When the reuse is enabled, the plan looks like ``` == Physical Plan == CollectLimit 1 +- *(1) Project [(Subquery subquery240 + ReusedSubquery Subquery subquery240) AS (scalarsubquery() + scalarsubquery())apache#253] : :- Subquery subquery240 : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)apache#250]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#256, count#257L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- ReusedSubquery Subquery subquery240 +- *(1) SerializeFromObject +- Scan[obj#12] ``` When the reuse is disabled, the plan looks like ``` == Physical Plan == CollectLimit 1 +- *(1) Project [(Subquery subquery286 + Subquery subquery287) AS (scalarsubquery() + scalarsubquery())apache#299] : :- Subquery subquery286 : : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)apache#296]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#302, count#303L]) : : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : : +- Scan[obj#12] : +- Subquery subquery287 : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as bigint))], output=[avg(key)apache#298]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_avg(cast(key#13 as bigint))], output=[sum#306, count#307L]) : +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] : +- Scan[obj#12] +- *(1) SerializeFromObject +- Scan[obj#12] ``` Modified the existing test. Closes apache#24258 from gatorsmile/followupSPARK-27279. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…t from keyStorePassword (apache#299)

aarondav reviewed Apr 2, 2014
View reviewed changes

aarondav reviewed Apr 6, 2014
View reviewed changes

andrewor14 mentioned this pull request Apr 7, 2014

[SPARK-1276] Add a HistoryServer to render persisted UI #204

Closed

andrewor14 reviewed Apr 7, 2014
View reviewed changes

pwendell added 2 commits April 13, 2014 13:27

Change spark.local.dir -> SPARK_LOCAL_DIRS

ac2d65e

Stash of adding config options in submit script and YARN

0faa3b6

Moving properties file template.

a006464

Improvements to testing

127f301

asfgit closed this in fb98488 Apr 21, 2014

liyezhang556520 reviewed Jul 17, 2014
View reviewed changes

gatesn pushed a commit to gatesn/spark that referenced this pull request Mar 14, 2018

Merge pull request apache#299 from palantir/rk/merge-upstream

05bd224

Merge upstream

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#299 from liu-sheng/ak-sk-huaweicloud

f43a1e3

Add access key and secret key for huaweicloud credentials

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-263] Add possibility to use keyPassword which is differen…

fb8825c

…t from keyStorePassword (apache#299)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up and simplify Spark configuration #299

Clean up and simplify Spark configuration #299

pwendell commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

aarondav Apr 2, 2014

mridulm commented Apr 3, 2014

aarondav Apr 6, 2014

andrewor14 Apr 7, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

pwendell commented Apr 21, 2014

sryza commented Apr 21, 2014

tgravescs commented Apr 22, 2014

pwendell commented Apr 23, 2014

witgo commented Apr 23, 2014

witgo commented Apr 23, 2014

liyezhang556520 Jul 17, 2014

Clean up and simplify Spark configuration #299

Clean up and simplify Spark configuration #299

Conversation

pwendell commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

AmplabJenkins commented Apr 2, 2014

aarondav Apr 2, 2014

Choose a reason for hiding this comment

mridulm commented Apr 3, 2014

aarondav Apr 6, 2014

Choose a reason for hiding this comment

andrewor14 Apr 7, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

pwendell commented Apr 21, 2014

sryza commented Apr 21, 2014

tgravescs commented Apr 22, 2014

pwendell commented Apr 23, 2014

witgo commented Apr 23, 2014

witgo commented Apr 23, 2014

liyezhang556520 Jul 17, 2014

Choose a reason for hiding this comment