Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2260] Fix standalone-cluster mode, which was broken #1538

Closed
wants to merge 17 commits into from

Conversation

andrewor14
Copy link
Contributor

The main thing was that spark configs were not propagated to the driver, and so applications that do not specify master or appName automatically failed. This PR fixes that and a couple of miscellaneous things that are related.

One thing that may or may not be an issue is that the jars must be available on the driver node. In standalone-cluster mode, this effectively means these jars must be available on all the worker machines, since the driver is launched on one of them. The semantics here are not the same as yarn-cluster mode, where all the relevant jars are uploaded to a distributed cache automatically and shipped to the containers. This is probably not a concern, but still worth a mention.

The problem was that spark properties are not propagated to the driver.
The solution is simple: pass the properties as part of the driver
description, such that the command that launches the driver
automatically sets the spark properties as its java system properties,
which will then be loaded by SparkConf.
@andrewor14
Copy link
Contributor Author

test this please!

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17009/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1538:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17009/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17054/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1538:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17054/consoleFull

}

val permGenOpt = Seq("-XX:MaxPermSize=128m")

// Convert Spark properties to java system properties
val sparkOpts = command.sparkProps.map { case (k, v) => s"-D$k=$v" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important, but Yarn only uses system properties for configs needed to open the akka connection and then transfer the whole config. See: https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala#L65

Might be worth doing the same here, or coalescing that logic somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree - it would be better to only set this for the driver, this buildJavaOpts is used for both executors and the driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the logic here is more equivalent to https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L388, because the driver needs basic configs like spark.master and spark.app.name to launch the SparkContext in addition to just akka and authentication configs.

But yes, when we launch the executors we might actually want to only use the akka and authentication ones, and pull in the rest from the driver later on, similar to how yarn handles it in your link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd prefer not to change the executor code path here.

@vanzin
Copy link
Contributor

vanzin commented Jul 23, 2014

Other than the test failure, LGTM.

@SparkQA
Copy link

SparkQA commented Jul 24, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17100/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 24, 2014

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17100/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17156/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA results for PR 1538:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17156/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17163/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17163/consoleFull

@@ -45,7 +45,7 @@ private[spark] class SparkDeploySchedulerBackend(
conf.get("spark.driver.host"), conf.get("spark.driver.port"),
CoarseGrainedSchedulerBackend.ACTOR_NAME)
val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions").toSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this no longer goes through Utils.splitCommandString, I don't think it will work with options that are quoted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I need to verify this for both yarn and standalone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually handles quoted strings, spaces, backslashes, and a combination of all the above; I have tested this on a standalone cluster in both deploy modes. This works because we pass around these options as a sequence of strings before using them in commands.

I still need to verify the same for YARN.

There is currently no good way to handle quoted arguments and
backslashes in YARN. The new code does not do any escaping, which
is fine for standalone mode (which uses Java's ProcessBuilder) but
not for YARN mode. I will open a separate JIRA for this.
@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17308/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17312/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17312/consoleFull

@andrewor14
Copy link
Contributor Author

I have reverted my changes for YARN for this PR and instead filed a JIRA at SPARK-2718. As of the latest commit, this PR should change the behavior for only standalone mode.

I have tested the latest changes on a standalone cluster with quoted configs, backslashes and spaces, for both normal spark configs and spark.{executor/driver}.extraJavaOptions on both deploy modes. This is ready to go from my side.

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA tests have started for PR 1538. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17324/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA results for PR 1538:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17324/consoleFull

@pwendell
Copy link
Contributor

LGTM - thanks andrew!

@asfgit asfgit closed this in 4ce92cc Jul 30, 2014
@andrewor14 andrewor14 deleted the standalone-cluster branch August 2, 2014 02:28
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
The main thing was that spark configs were not propagated to the driver, and so applications that do not specify `master` or `appName` automatically failed. This PR fixes that and a couple of miscellaneous things that are related.

One thing that may or may not be an issue is that the jars must be available on the driver node. In `standalone-cluster` mode, this effectively means these jars must be available on all the worker machines, since the driver is launched on one of them. The semantics here are not the same as `yarn-cluster` mode,  where all the relevant jars are uploaded to a distributed cache automatically and shipped to the containers. This is probably not a concern, but still worth a mention.

Author: Andrew Or <andrewor14@gmail.com>

Closes apache#1538 from andrewor14/standalone-cluster and squashes the following commits:

8c11a0d [Andrew Or] Clean up imports / comments (minor)
2678d13 [Andrew Or] Handle extraJavaOpts properly
7660547 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-cluster
6f64a9b [Andrew Or] Revert changes in YARN
2f2908b [Andrew Or] Fix tests
ed01491 [Andrew Or] Don't go overboard with escaping
8e105e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-cluster
b890949 [Andrew Or] Abstract usages of converting spark opts to java opts
79f63a3 [Andrew Or] Move sparkProps into javaOpts
78752f8 [Andrew Or] Fix tests
5a9c6c7 [Andrew Or] Fix line too long
c141a00 [Andrew Or] Don't display "unknown app" on driver log pages
d7e2728 [Andrew Or] Avoid deprecation warning in standalone Client
6ceb14f [Andrew Or] Allow relevant configs to propagate to standalone Driver
7f854bc [Andrew Or] Fix test
855256e [Andrew Or] Fix standalone-cluster mode
fd9da51 [Andrew Or] Formatting changes (minor)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants