[SPARK-21403][Mesos] fix --packages for mesos #18587

skonto · 2017-07-10T10:11:42Z

What changes were proposed in this pull request?

Fixes --packages flag for mesos in cluster mode. Probably I will handle standalone and Yarn in another commit, I need to investigate those cases as they are different.

How was this patch tested?

Tested with a community 1.9 dc/os cluster. packages were successfully resolved in cluster mode within a container.

@andrewor14 @susanxhuynh @ArtRand @srowen pls review.

skonto · 2017-07-10T10:12:45Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+        printErrorAndExit("Cluster deploy mode is not applicable to Spark Thrift server.")
+      case _ =>
+    }
+


This is for failing faster no need to resolve dependencies if we are going to fail.

SparkQA · 2017-07-10T12:56:27Z

Test build #79462 has finished for PR 18587 at commit 6139631.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-07-11T21:12:44Z

I'm 99% sure there's nothing to do for YARN. This line takes care of it:

args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)

YARN cluster mode will distribute all jars in args.jars to the app.

As for the change, it seems to work because the Mesos backend starts the driver using spark-submit, right? (It would probably have to change the deploy mode from "cluster" to "client" when doing that but I didn't dig that much into the code...)

If that's the case it seems fine, although it kinda loses the ability to use the ivy cache on the machine launching the job...

Also, I'd be more comfortable if someone more familiar with the Mesos backend could take a look. Not sure who that person is.

skonto · 2017-07-12T10:14:05Z

@vanzin

args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)

Correct... and this is my understanding:
Files intended for the appmaster are handled by copying to hdfs:

spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

Line 365 in 1cad31f

def prepareLocalResources(

The distributed cache manager does the work by utilizing the distributed Hadoop cache for jars used at the executor side.
https://github.com/apache/spark/blob/ab9872db1f9c0f289541ec5756d1a142d85545ce/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala

As for the change, it seems to work because the Mesos backend starts the driver using spark-submit, right?

That is the idea. The second time the submit is called in order to launch the driver, the submit is done using client mode which is the default. The call for launching the driver in client mode is here:

spark/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala

Lines 411 to 442 in 8da3f70

    
           val (executable, sandboxPath) = if (dockerDefined) { 
        
             // Application jar is automatically downloaded in the mounted sandbox by Mesos, 
        
             // and the path to the mounted volume is stored in $MESOS_SANDBOX env variable. 
        
             ("./bin/spark-submit", "$MESOS_SANDBOX") 
        
           } else if (executorUri.isDefined) { 
        
             val folderBasename = executorUri.get.split('/').last.split('.').head 
        
             val entries = conf.getOption("spark.executor.extraLibraryPath") 
        
               .map(path => Seq(path) ++ desc.command.libraryPathEntries) 
        
               .getOrElse(desc.command.libraryPathEntries) 
        
             val prefixEnv = if (!entries.isEmpty) Utils.libraryPathEnvPrefix(entries) else "" 
        
             val cmdExecutable = s"cd $folderBasename*; $prefixEnv bin/spark-submit" 
        
             // Sandbox path points to the parent folder as we chdir into the folderBasename. 
        
             (cmdExecutable, "..") 
        
           } else { 
        
             val executorSparkHome = desc.conf.getOption("spark.mesos.executor.home") 
        
               .orElse(conf.getOption("spark.home")) 
        
               .orElse(Option(System.getenv("SPARK_HOME"))) 
        
               .getOrElse { 
        
                 throw new SparkException("Executor Spark home `spark.mesos.executor.home` is not set!") 
        
               } 
        
             val cmdExecutable = new File(executorSparkHome, "./bin/spark-submit").getPath 
        
             // Sandbox points to the current directory by default with Mesos. 
        
             (cmdExecutable, ".") 
        
           } 
        
           val cmdOptions = generateCmdOption(desc, sandboxPath).mkString(" ") 
        
           val primaryResource = new File(sandboxPath, desc.jarUrl.split("/").last).toString() 
        
           val appArguments = desc.command.arguments.mkString(" ") 
        
           s"$executable $cmdOptions $primaryResource $appArguments"

If that's the case it seems fine, although it kinda loses the ability to use the ivy cache on the machine launching the job...

That's is correct. If you launch jobs from a machine within the same network cache makes sense. Otherwise it is just copying over the internet. Now in restarts it is also important when you re-launch something to access it from the cache. In spark there is no functionality for the mesos code to exploit any type of cache from what I see. I would prefer a unified cluster layer for certain things like in the case of other frameworks: https://issues.apache.org/jira/browse/FLINK-6177
I guess some refactoring would make yarn stuff accessible from the mesos part in Spark. For doing the distribution of jars to executors the mesos fetcher does the work like in the Hadoop Distributed Cache case.

For now, I see this PR as a first option to fix things because customers want it. As an iteration it would make more sense to have distributed cache support for the mesos branch as well. As for the standalone case I am no sure right now, probably the same approach with the cache would apply.

vanzin · 2017-07-13T17:36:42Z

Sounds good. Merging to master.

skonto · 2017-07-13T20:55:34Z

thnx @vanzin I am working on some code for the standalone case.

Fixes --packages flag for mesos in cluster mode. Probably I will handle standalone and Yarn in another commit, I need to investigate those cases as they are different. Tested with a community 1.9 dc/os cluster. packages were successfully resolved in cluster mode within a container. andrewor14 susanxhuynh ArtRand srowen pls review. Author: Stavros Kontopoulos <st.kontopoulos@gmail.com> Closes apache#18587 from skonto/fix_packages_mesos_cluster.

fix --packages for mesos

6139631

skonto changed the title ~~[SPARK-12559][mesos] fix --packages for mesos~~ [SPARK-12559][Mesos] fix --packages for mesos Jul 10, 2017

skonto commented Jul 10, 2017

View reviewed changes

skonto changed the title ~~[SPARK-12559][Mesos] fix --packages for mesos~~ [SPARK-21403][Mesos] fix --packages for mesos Jul 13, 2017

asfgit closed this in d8257b9 Jul 13, 2017

skonto mentioned this pull request Jul 17, 2017

[SPARK-12559][SPARK SUBMIT] fix --packages for stand-alone cluster mode #18630

Closed

skonto mentioned this pull request Nov 21, 2017

Possible issue with --packages argument for submit-args d2iq-archive/spark-build#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21403][Mesos] fix --packages for mesos #18587

[SPARK-21403][Mesos] fix --packages for mesos #18587

skonto commented Jul 10, 2017 •

edited

Loading

skonto Jul 10, 2017

SparkQA commented Jul 10, 2017

vanzin commented Jul 11, 2017

skonto commented Jul 12, 2017 •

edited

Loading

vanzin commented Jul 13, 2017

skonto commented Jul 13, 2017 •

edited

Loading

[SPARK-21403][Mesos] fix --packages for mesos #18587

[SPARK-21403][Mesos] fix --packages for mesos #18587

Conversation

skonto commented Jul 10, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

skonto Jul 10, 2017

Choose a reason for hiding this comment

SparkQA commented Jul 10, 2017

vanzin commented Jul 11, 2017

skonto commented Jul 12, 2017 • edited Loading

vanzin commented Jul 13, 2017

skonto commented Jul 13, 2017 • edited Loading

skonto commented Jul 10, 2017 •

edited

Loading

skonto commented Jul 12, 2017 •

edited

Loading

skonto commented Jul 13, 2017 •

edited

Loading