Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21403][Mesos] fix --packages for mesos #18587

Closed
wants to merge 1 commit into from

Conversation

skonto
Copy link
Contributor

@skonto skonto commented Jul 10, 2017

What changes were proposed in this pull request?

Fixes --packages flag for mesos in cluster mode. Probably I will handle standalone and Yarn in another commit, I need to investigate those cases as they are different.

How was this patch tested?

Tested with a community 1.9 dc/os cluster. packages were successfully resolved in cluster mode within a container.

@andrewor14 @susanxhuynh @ArtRand @srowen pls review.

@skonto skonto changed the title [SPARK-12559][mesos] fix --packages for mesos [SPARK-12559][Mesos] fix --packages for mesos Jul 10, 2017
printErrorAndExit("Cluster deploy mode is not applicable to Spark Thrift server.")
case _ =>
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for failing faster no need to resolve dependencies if we are going to fail.

@SparkQA
Copy link

SparkQA commented Jul 10, 2017

Test build #79462 has finished for PR 18587 at commit 6139631.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jul 11, 2017

I'm 99% sure there's nothing to do for YARN. This line takes care of it:

args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)

YARN cluster mode will distribute all jars in args.jars to the app.

As for the change, it seems to work because the Mesos backend starts the driver using spark-submit, right? (It would probably have to change the deploy mode from "cluster" to "client" when doing that but I didn't dig that much into the code...)

If that's the case it seems fine, although it kinda loses the ability to use the ivy cache on the machine launching the job...

Also, I'd be more comfortable if someone more familiar with the Mesos backend could take a look. Not sure who that person is.

@skonto
Copy link
Contributor Author

skonto commented Jul 12, 2017

@vanzin

args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)

Correct... and this is my understanding:
Files intended for the appmaster are handled by copying to hdfs:


The distributed cache manager does the work by utilizing the distributed Hadoop cache for jars used at the executor side.
https://github.com/apache/spark/blob/ab9872db1f9c0f289541ec5756d1a142d85545ce/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala

As for the change, it seems to work because the Mesos backend starts the driver using spark-submit, right?

That is the idea. The second time the submit is called in order to launch the driver, the submit is done using client mode which is the default. The call for launching the driver in client mode is here:

val (executable, sandboxPath) = if (dockerDefined) {
// Application jar is automatically downloaded in the mounted sandbox by Mesos,
// and the path to the mounted volume is stored in $MESOS_SANDBOX env variable.
("./bin/spark-submit", "$MESOS_SANDBOX")
} else if (executorUri.isDefined) {
val folderBasename = executorUri.get.split('/').last.split('.').head
val entries = conf.getOption("spark.executor.extraLibraryPath")
.map(path => Seq(path) ++ desc.command.libraryPathEntries)
.getOrElse(desc.command.libraryPathEntries)
val prefixEnv = if (!entries.isEmpty) Utils.libraryPathEnvPrefix(entries) else ""
val cmdExecutable = s"cd $folderBasename*; $prefixEnv bin/spark-submit"
// Sandbox path points to the parent folder as we chdir into the folderBasename.
(cmdExecutable, "..")
} else {
val executorSparkHome = desc.conf.getOption("spark.mesos.executor.home")
.orElse(conf.getOption("spark.home"))
.orElse(Option(System.getenv("SPARK_HOME")))
.getOrElse {
throw new SparkException("Executor Spark home `spark.mesos.executor.home` is not set!")
}
val cmdExecutable = new File(executorSparkHome, "./bin/spark-submit").getPath
// Sandbox points to the current directory by default with Mesos.
(cmdExecutable, ".")
}
val cmdOptions = generateCmdOption(desc, sandboxPath).mkString(" ")
val primaryResource = new File(sandboxPath, desc.jarUrl.split("/").last).toString()
val appArguments = desc.command.arguments.mkString(" ")
s"$executable $cmdOptions $primaryResource $appArguments"

If that's the case it seems fine, although it kinda loses the ability to use the ivy cache on the machine launching the job...

That's is correct. If you launch jobs from a machine within the same network cache makes sense. Otherwise it is just copying over the internet. Now in restarts it is also important when you re-launch something to access it from the cache. In spark there is no functionality for the mesos code to exploit any type of cache from what I see. I would prefer a unified cluster layer for certain things like in the case of other frameworks: https://issues.apache.org/jira/browse/FLINK-6177
I guess some refactoring would make yarn stuff accessible from the mesos part in Spark. For doing the distribution of jars to executors the mesos fetcher does the work like in the Hadoop Distributed Cache case.

For now, I see this PR as a first option to fix things because customers want it. As an iteration it would make more sense to have distributed cache support for the mesos branch as well. As for the standalone case I am no sure right now, probably the same approach with the cache would apply.

@skonto skonto changed the title [SPARK-12559][Mesos] fix --packages for mesos [SPARK-21403][Mesos] fix --packages for mesos Jul 13, 2017
@vanzin
Copy link
Contributor

vanzin commented Jul 13, 2017

Sounds good. Merging to master.

@asfgit asfgit closed this in d8257b9 Jul 13, 2017
@skonto
Copy link
Contributor Author

skonto commented Jul 13, 2017

thnx @vanzin I am working on some code for the standalone case.

ArtRand pushed a commit to d2iq-archive/spark that referenced this pull request Nov 28, 2017
Fixes --packages flag for mesos in cluster mode. Probably I will handle standalone and Yarn in another commit, I need to investigate those cases as they are different.

Tested with a community 1.9 dc/os cluster. packages were successfully resolved in cluster mode within a container.

andrewor14  susanxhuynh ArtRand srowen  pls review.

Author: Stavros Kontopoulos <st.kontopoulos@gmail.com>

Closes apache#18587 from skonto/fix_packages_mesos_cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants