Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Block spark-submit call until job is complete #46

Closed
ash211 opened this issue Jan 25, 2017 · 6 comments
Closed

Block spark-submit call until job is complete #46

ash211 opened this issue Jan 25, 2017 · 6 comments
Assignees

Comments

@ash211
Copy link

ash211 commented Jan 25, 2017

When running spark-submit in YARN cluster mode, the spark-submit script stays running until the Spark job completes, printing out the application status every second until it eventually finishes:

2017-01-25 01:28:28,346 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:29,348 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:30,350 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:31,352 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:32,355 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:33,357 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:34,362 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:35,364 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:36,366 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:37,368 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:38,370 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)
2017-01-25 01:28:39,372 INFO  [main] yarn.Client (Logging.scala:logInfo(54)) - Application report for application_1485242805299_0010 (state: RUNNING)

We should have spark-submit with k8s cluster mode (only supported mode now) do the same -- block the call and poll for pod status until the pod terminates.

The blocking call seems required to match the YARN feature set, though as a possible extension we could provide driver logs instead of secondly-status polling using the below example. I for one would find that a great usability improvement over YARN-cluster mode's behavior.

https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/PodLogExample.java

P.S. As a side note, I'm interested in making this call blocking so I can more accurately perform perf benchmarks of the same job running in YARN vs kubernetes by running time spark-submit ... aimed at both clusters.

@lins05
Copy link

lins05 commented Jan 25, 2017

It's controlled by a flag spark.yarn.submit.waitAppCompletion, which defaults to true.

https://github.com/apache/spark/blob/v2.1.0/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala#L125

We can use a similar one to match it.

@iyanuobidele
Copy link

+1
This is arguably a better approach than the fire and forget semantics used in the Spark Standalone mode, for example. Even better, if we can pick a behavior based on a flag.

@ash211
Copy link
Author

ash211 commented Jan 25, 2017

Yes looking at the description for spark.yarn.submit.waitAppCompletion that's exactly the behavior I'd like default for k8s. In addition to the report interval spark.yarn.report.interval in k8s we could probably do log tailing as well.

@tnachen
Copy link

tnachen commented Jan 25, 2017

We should probably try to unify these settings in Spark, mesos and standalone cluster mode can potentially both use this flag.

@foxish
Copy link
Member

foxish commented Jan 25, 2017

+1 Printing the status after every N seconds seems like a good idea.
I'm not sure if we should tail the driver logs also. Kubernetes combines the stderr and stdout logs and we'd be spewing a lot of the stderr that users may not be used to looking at.

@tnachen
Copy link

tnachen commented Jan 25, 2017

I don't think we should tail the logs unless the user asks for it, it's quite verbose

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants