-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2260] Fix standalone-cluster mode, which was broken #1538
Conversation
The problem was that spark properties are not propagated to the driver. The solution is simple: pass the properties as part of the driver description, such that the command that launches the driver automatically sets the spark properties as its java system properties, which will then be loaded by SparkConf.
test this please! |
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
} | ||
|
||
val permGenOpt = Seq("-XX:MaxPermSize=128m") | ||
|
||
// Convert Spark properties to java system properties | ||
val sparkOpts = command.sparkProps.map { case (k, v) => s"-D$k=$v" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super important, but Yarn only uses system properties for configs needed to open the akka connection and then transfer the whole config. See: https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala#L65
Might be worth doing the same here, or coalescing that logic somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree - it would be better to only set this for the driver, this buildJavaOpts
is used for both executors and the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the logic here is more equivalent to https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L388, because the driver needs basic configs like spark.master
and spark.app.name
to launch the SparkContext in addition to just akka and authentication configs.
But yes, when we launch the executors we might actually want to only use the akka and authentication ones, and pull in the rest from the driver later on, similar to how yarn handles it in your link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd prefer not to change the executor code path here.
Other than the test failure, LGTM. |
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
Conflicts: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
@@ -45,7 +45,7 @@ private[spark] class SparkDeploySchedulerBackend( | |||
conf.get("spark.driver.host"), conf.get("spark.driver.port"), | |||
CoarseGrainedSchedulerBackend.ACTOR_NAME) | |||
val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}") | |||
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions") | |||
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions").toSeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this no longer goes through Utils.splitCommandString
, I don't think it will work with options that are quoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I need to verify this for both yarn and standalone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually handles quoted strings, spaces, backslashes, and a combination of all the above; I have tested this on a standalone cluster in both deploy modes. This works because we pass around these options as a sequence of strings before using them in commands.
I still need to verify the same for YARN.
There is currently no good way to handle quoted arguments and backslashes in YARN. The new code does not do any escaping, which is fine for standalone mode (which uses Java's ProcessBuilder) but not for YARN mode. I will open a separate JIRA for this.
QA tests have started for PR 1538. This patch merges cleanly. |
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
I have reverted my changes for YARN for this PR and instead filed a JIRA at SPARK-2718. As of the latest commit, this PR should change the behavior for only standalone mode. I have tested the latest changes on a standalone cluster with quoted configs, backslashes and spaces, for both normal spark configs and |
QA tests have started for PR 1538. This patch merges cleanly. |
QA results for PR 1538: |
LGTM - thanks andrew! |
The main thing was that spark configs were not propagated to the driver, and so applications that do not specify `master` or `appName` automatically failed. This PR fixes that and a couple of miscellaneous things that are related. One thing that may or may not be an issue is that the jars must be available on the driver node. In `standalone-cluster` mode, this effectively means these jars must be available on all the worker machines, since the driver is launched on one of them. The semantics here are not the same as `yarn-cluster` mode, where all the relevant jars are uploaded to a distributed cache automatically and shipped to the containers. This is probably not a concern, but still worth a mention. Author: Andrew Or <andrewor14@gmail.com> Closes apache#1538 from andrewor14/standalone-cluster and squashes the following commits: 8c11a0d [Andrew Or] Clean up imports / comments (minor) 2678d13 [Andrew Or] Handle extraJavaOpts properly 7660547 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-cluster 6f64a9b [Andrew Or] Revert changes in YARN 2f2908b [Andrew Or] Fix tests ed01491 [Andrew Or] Don't go overboard with escaping 8e105e1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-cluster b890949 [Andrew Or] Abstract usages of converting spark opts to java opts 79f63a3 [Andrew Or] Move sparkProps into javaOpts 78752f8 [Andrew Or] Fix tests 5a9c6c7 [Andrew Or] Fix line too long c141a00 [Andrew Or] Don't display "unknown app" on driver log pages d7e2728 [Andrew Or] Avoid deprecation warning in standalone Client 6ceb14f [Andrew Or] Allow relevant configs to propagate to standalone Driver 7f854bc [Andrew Or] Fix test 855256e [Andrew Or] Fix standalone-cluster mode fd9da51 [Andrew Or] Formatting changes (minor)
The main thing was that spark configs were not propagated to the driver, and so applications that do not specify
master
orappName
automatically failed. This PR fixes that and a couple of miscellaneous things that are related.One thing that may or may not be an issue is that the jars must be available on the driver node. In
standalone-cluster
mode, this effectively means these jars must be available on all the worker machines, since the driver is launched on one of them. The semantics here are not the same asyarn-cluster
mode, where all the relevant jars are uploaded to a distributed cache automatically and shipped to the containers. This is probably not a concern, but still worth a mention.