[SPARK-2260] Fix standalone-cluster mode, which was broken #1538

andrewor14 · 2014-07-23T02:36:22Z

The main thing was that spark configs were not propagated to the driver, and so applications that do not specify master or appName automatically failed. This PR fixes that and a couple of miscellaneous things that are related.

One thing that may or may not be an issue is that the jars must be available on the driver node. In standalone-cluster mode, this effectively means these jars must be available on all the worker machines, since the driver is launched on one of them. The semantics here are not the same as yarn-cluster mode, where all the relevant jars are uploaded to a distributed cache automatically and shipped to the containers. This is probably not a concern, but still worth a mention.

The problem was that spark properties are not propagated to the driver. The solution is simple: pass the properties as part of the driver description, such that the command that launches the driver automatically sets the spark properties as its java system properties, which will then be loaded by SparkConf.

andrewor14 · 2014-07-23T02:57:04Z

test this please!

SparkQA · 2014-07-23T03:03:37Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17009/consoleFull

SparkQA · 2014-07-23T03:04:20Z

QA results for PR 1538:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17009/consoleFull

SparkQA · 2014-07-23T19:28:24Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17054/consoleFull

SparkQA · 2014-07-23T21:01:29Z

QA results for PR 1538:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17054/consoleFull

vanzin · 2014-07-23T21:22:27Z

core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala

      }

    val permGenOpt = Seq("-XX:MaxPermSize=128m")

+    // Convert Spark properties to java system properties
+    val sparkOpts = command.sparkProps.map { case (k, v) => s"-D$k=$v" }


Not super important, but Yarn only uses system properties for configs needed to open the akka connection and then transfer the whole config. See: https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala#L65

Might be worth doing the same here, or coalescing that logic somewhere.

Yeah I agree - it would be better to only set this for the driver, this buildJavaOpts is used for both executors and the driver.

Note that the logic here is more equivalent to https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L388, because the driver needs basic configs like spark.master and spark.app.name to launch the SparkContext in addition to just akka and authentication configs.

But yes, when we launch the executors we might actually want to only use the akka and authentication ones, and pull in the rest from the driver later on, similar to how yarn handles it in your link.

Yeah, I'd prefer not to change the executor code path here.

vanzin · 2014-07-23T21:28:29Z

Other than the test failure, LGTM.

SparkQA · 2014-07-24T05:33:35Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17100/consoleFull

SparkQA · 2014-07-24T06:17:41Z

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17100/consoleFull

Conflicts: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

SparkQA · 2014-07-25T01:53:52Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17156/consoleFull

SparkQA · 2014-07-25T02:36:08Z

QA results for PR 1538:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17156/consoleFull

SparkQA · 2014-07-25T04:58:30Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17163/consoleFull

SparkQA · 2014-07-25T05:43:19Z

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17163/consoleFull

pwendell · 2014-07-25T06:14:21Z

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

@@ -45,7 +45,7 @@ private[spark] class SparkDeploySchedulerBackend(
      conf.get("spark.driver.host"), conf.get("spark.driver.port"),
      CoarseGrainedSchedulerBackend.ACTOR_NAME)
    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
-    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
+    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions").toSeq


Since this no longer goes through Utils.splitCommandString, I don't think it will work with options that are quoted.

Thanks. I need to verify this for both yarn and standalone.

This actually handles quoted strings, spaces, backslashes, and a combination of all the above; I have tested this on a standalone cluster in both deploy modes. This works because we pass around these options as a sequence of strings before using them in commands.

I still need to verify the same for YARN.

There is currently no good way to handle quoted arguments and backslashes in YARN. The new code does not do any escaping, which is fine for standalone mode (which uses Java's ProcessBuilder) but not for YARN mode. I will open a separate JIRA for this.

SparkQA · 2014-07-28T22:18:47Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17308/consoleFull

SparkQA · 2014-07-28T22:58:53Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17312/consoleFull

SparkQA · 2014-07-28T23:45:26Z

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17312/consoleFull

andrewor14 · 2014-07-29T00:41:11Z

I have reverted my changes for YARN for this PR and instead filed a JIRA at SPARK-2718. As of the latest commit, this PR should change the behavior for only standalone mode.

I have tested the latest changes on a standalone cluster with quoted configs, backslashes and spaces, for both normal spark configs and spark.{executor/driver}.extraJavaOptions on both deploy modes. This is ready to go from my side.

SparkQA · 2014-07-29T00:58:45Z

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17324/consoleFull

SparkQA · 2014-07-29T01:46:27Z

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17324/consoleFull

pwendell · 2014-07-30T06:51:56Z

LGTM - thanks andrew!

The main thing was that spark configs were not propagated to the driver, and so applications that do not specify `master` or `appName` automatically failed. This PR fixes that and a couple of miscellaneous things that are related. One thing that may or may not be an issue is that the jars must be available on the driver node. In `standalone-cluster` mode, this effectively means these jars must be available on all the worker machines, since the driver is launched on one of them. The semantics here are not the same as `yarn-cluster` mode, where all the relevant jars are uploaded to a distributed cache automatically and shipped to the containers. This is probably not a concern, but still worth a mention. Author: Andrew Or <andrewor14@gmail.com> Closes apache#1538 from andrewor14/standalone-cluster and squashes the following commits: 8c11a0d [Andrew Or] Clean up imports / comments (minor) 2678d13 [Andrew Or] Handle extraJavaOpts properly 7660547 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-cluster 6f64a9b [Andrew Or] Revert changes in YARN 2f2908b [Andrew Or] Fix tests ed01491 [Andrew Or] Don't go overboard with escaping 8e105e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-cluster b890949 [Andrew Or] Abstract usages of converting spark opts to java opts 79f63a3 [Andrew Or] Move sparkProps into javaOpts 78752f8 [Andrew Or] Fix tests 5a9c6c7 [Andrew Or] Fix line too long c141a00 [Andrew Or] Don't display "unknown app" on driver log pages d7e2728 [Andrew Or] Avoid deprecation warning in standalone Client 6ceb14f [Andrew Or] Allow relevant configs to propagate to standalone Driver 7f854bc [Andrew Or] Fix test 855256e [Andrew Or] Fix standalone-cluster mode fd9da51 [Andrew Or] Formatting changes (minor)

andrewor14 added 6 commits July 22, 2014 14:10

Formatting changes (minor)

fd9da51

Fix test

7f854bc

Allow relevant configs to propagate to standalone Driver

6ceb14f

Avoid deprecation warning in standalone Client

d7e2728

Don't display "unknown app" on driver log pages

c141a00

Fix line too long

5a9c6c7

vanzin reviewed Jul 23, 2014
View reviewed changes

Fix tests

78752f8

andrewor14 added 4 commits July 24, 2014 16:45

Move sparkProps into javaOpts

79f63a3

Abstract usages of converting spark opts to java opts

b890949

Merge branch 'master' of github.com:apache/spark into standalone-cluster

8e105e1

Conflicts: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

Don't go overboard with escaping

ed01491

Fix tests

2f2908b

pwendell reviewed Jul 25, 2014
View reviewed changes

andrewor14 added 2 commits July 28, 2014 15:11

Revert changes in YARN

6f64a9b

There is currently no good way to handle quoted arguments and backslashes in YARN. The new code does not do any escaping, which is fine for standalone mode (which uses Java's ProcessBuilder) but not for YARN mode. I will open a separate JIRA for this.

Merge branch 'master' of github.com:apache/spark into standalone-cluster

7660547

Handle extraJavaOpts properly

2678d13

Clean up imports / comments (minor)

8c11a0d

asfgit closed this in 4ce92cc Jul 30, 2014

andrewor14 mentioned this pull request Jul 31, 2014

SPARK-2664. Deal with --conf options in spark-submit that relate to fl... #1665

Closed

andrewor14 deleted the standalone-cluster branch August 2, 2014 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2260] Fix standalone-cluster mode, which was broken #1538

[SPARK-2260] Fix standalone-cluster mode, which was broken #1538

andrewor14 commented Jul 23, 2014

andrewor14 commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

vanzin Jul 23, 2014

pwendell Jul 23, 2014

andrewor14 Jul 23, 2014

pwendell Jul 24, 2014

vanzin commented Jul 23, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 25, 2014

SparkQA commented Jul 25, 2014

SparkQA commented Jul 25, 2014

SparkQA commented Jul 25, 2014

pwendell Jul 25, 2014

andrewor14 Jul 25, 2014

andrewor14 Jul 26, 2014

SparkQA commented Jul 28, 2014

SparkQA commented Jul 28, 2014

SparkQA commented Jul 28, 2014

andrewor14 commented Jul 29, 2014

SparkQA commented Jul 29, 2014

SparkQA commented Jul 29, 2014

pwendell commented Jul 30, 2014

[SPARK-2260] Fix standalone-cluster mode, which was broken #1538

[SPARK-2260] Fix standalone-cluster mode, which was broken #1538

Conversation

andrewor14 commented Jul 23, 2014

andrewor14 commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

vanzin Jul 23, 2014

Choose a reason for hiding this comment

pwendell Jul 23, 2014

Choose a reason for hiding this comment

andrewor14 Jul 23, 2014

Choose a reason for hiding this comment

pwendell Jul 24, 2014

Choose a reason for hiding this comment

vanzin commented Jul 23, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 25, 2014

SparkQA commented Jul 25, 2014

SparkQA commented Jul 25, 2014

SparkQA commented Jul 25, 2014

pwendell Jul 25, 2014

Choose a reason for hiding this comment

andrewor14 Jul 25, 2014

Choose a reason for hiding this comment

andrewor14 Jul 26, 2014

Choose a reason for hiding this comment

SparkQA commented Jul 28, 2014

SparkQA commented Jul 28, 2014

SparkQA commented Jul 28, 2014

andrewor14 commented Jul 29, 2014

SparkQA commented Jul 29, 2014

SparkQA commented Jul 29, 2014

pwendell commented Jul 30, 2014