Block spark-submit call until job is complete #46

ash211 · 2017-01-25T09:35:03Z

When running spark-submit in YARN cluster mode, the spark-submit script stays running until the Spark job completes, printing out the application status every second until it eventually finishes:

2017-01-25 01:28:28,346 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:29,348 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:30,350 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:31,352 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:32,355 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:33,357 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:34,362 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:35,364 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:36,366 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:37,368 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:38,370 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:39,372 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)

We should have spark-submit with k8s cluster mode (only supported mode now) do the same -- block the call and poll for pod status until the pod terminates.

The blocking call seems required to match the YARN feature set, though as a possible extension we could provide driver logs instead of secondly-status polling using the below example. I for one would find that a great usability improvement over YARN-cluster mode's behavior.

https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/PodLogExample.java

P.S. As a side note, I'm interested in making this call blocking so I can more accurately perform perf benchmarks of the same job running in YARN vs kubernetes by running time spark-submit ... aimed at both clusters.

The text was updated successfully, but these errors were encountered:

lins05 · 2017-01-25T12:05:16Z

It's controlled by a flag spark.yarn.submit.waitAppCompletion, which defaults to true.

https://github.com/apache/spark/blob/v2.1.0/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala#L125

We can use a similar one to match it.

iyanuobidele · 2017-01-25T19:11:34Z

+1
This is arguably a better approach than the fire and forget semantics used in the Spark Standalone mode, for example. Even better, if we can pick a behavior based on a flag.

ash211 · 2017-01-25T19:12:57Z

Yes looking at the description for spark.yarn.submit.waitAppCompletion that's exactly the behavior I'd like default for k8s. In addition to the report interval spark.yarn.report.interval in k8s we could probably do log tailing as well.

tnachen · 2017-01-25T19:13:58Z

We should probably try to unify these settings in Spark, mesos and standalone cluster mode can potentially both use this flag.

foxish · 2017-01-25T19:19:50Z

+1 Printing the status after every N seconds seems like a good idea.
I'm not sure if we should tail the driver logs also. Kubernetes combines the stderr and stdout logs and we'd be spewing a lot of the stderr that users may not be used to looking at.

tnachen · 2017-01-25T19:26:26Z

I don't think we should tail the logs unless the user asks for it, it's quite verbose

…rtition limits (apache-spark-on-k8s#46)

ash211 self-assigned this Jan 26, 2017

ash211 mentioned this issue Jan 26, 2017

Introduce blocking submit to kubernetes by default #53

Merged

mccheah closed this as completed in #53 Feb 3, 2017

ifilonenko pushed a commit to ifilonenko/spark that referenced this issue Feb 25, 2019

SPARK-18079: CollectLimitExec.executeToIterator should perform per-pa…

37dd746

…rtition limits (apache-spark-on-k8s#46)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block spark-submit call until job is complete #46

Block spark-submit call until job is complete #46

ash211 commented Jan 25, 2017

lins05 commented Jan 25, 2017 •

edited

Loading

iyanuobidele commented Jan 25, 2017

ash211 commented Jan 25, 2017

tnachen commented Jan 25, 2017

foxish commented Jan 25, 2017 •

edited

Loading

tnachen commented Jan 25, 2017

Block spark-submit call until job is complete #46

Block spark-submit call until job is complete #46

Comments

ash211 commented Jan 25, 2017

lins05 commented Jan 25, 2017 • edited Loading

iyanuobidele commented Jan 25, 2017

ash211 commented Jan 25, 2017

tnachen commented Jan 25, 2017

foxish commented Jan 25, 2017 • edited Loading

tnachen commented Jan 25, 2017

lins05 commented Jan 25, 2017 •

edited

Loading

foxish commented Jan 25, 2017 •

edited

Loading