-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4899][MESOS] Support for Checkpointing on Coarse Grained Mode #17750
Conversation
would be great to have this soon in 2.2.x (maybe even backported to 2.1.x) |
IMO we should not enable checkpointing in fine-grained mode. Because with checkpointing enabled, mesos agents would persist all status updates to disk which means great I/O cost because fine-grained mode makes use of mesos status updates to send the task results back to the driver. Also I'm not sure whether it makes sense to set the |
Hey @lins05, thanks for taking the time to look into this. Yes, It is true that there is an associated overhead in both modes, that's why the defaults have not been changed. i.e. Default behavior is not to checkpoint. Setting And considering that this is being used in the latest version I guess the Spark Driver does support it. |
The overhead in fine-grained mode would be much heavier than coarse grained mode. For example, with checkpoint enabled, each time you run In contrast, in the coarse grained mode the executor would send the 100MB data to the driver directly without going through mesos agents. The only thing that agents write to disk are small task status messages like TASK_RUNNING/TASK_KILLED which are typically several KBbytes.
The code in your link is the mesos cluster scheduler, which is a mesos framework that launches spark drivers for you, not the mesos scheduler inside the spark driver that launches executors. It has |
Do you then think it would be a viable option to enable it by default on Coarse grained and have it not used in Fine-grained.
This makes sense now, I definitely did not consider this , but this explains it.
Could you expand on this a bit more, I assume we could maintain the state of the tasks similar to how driver state is maintained in I'll start implementing that, if you think we could enable it to reconcileTasks with state. |
SGTM, especially considering fine-grained mode is already deprecated.
I don't think it's an easy task at all, because the spark driver is not designed to recover from crash. The state in the MesosClusterScheduler is pretty simple. It's just a REST server that accepts requests from clients and launches spark drivers on their behalf. And it just need to persist its mesos framework id, because it need to re-register with mesos master with the same framework id if it's restarted. In the current implementation MesosClusterScheduler uses zookeeper as the persist storage. Aside from that, the MesosClusterScheduler has no other stateful information. The spark driver is totally different, because it contains lots of stateful information: the job/stage/task info, executors info, catalog that holds temporary views, to name a few. And all those are kept in the driver's memory and would be lost whenever the driver crashes. So it doesn't make sense to set |
I am looking at solving a problem where an intermittent network partition can result in the driver being killed unnecessarily, and it's possible that adding a failover_timeout will solve that, but I'm still looking into that. |
Updated the PR to only include checkpointing on Coarse grained mode. |
docs/running-on-mesos.md
Outdated
<td><code>spark.mesos.checkpoint</code></td> | ||
<td>false</td> | ||
<td> | ||
If set, agents running tasks started by this framework will write the framework pid, executor pids and status updates to disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's customize this copy a bit for Spark instead of just copying the protobuf docs. e.g. "tasks" should be "executors" and you should remove the part about "this framework", in place of something about Spark in particular.
@@ -158,7 +158,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( | |||
sc.appName, | |||
sc.conf, | |||
sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.webUrl)), | |||
None, | |||
sc.conf.getOption("spark.mesos.checkpoint").map(_.toBoolean), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to move all config over to https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala
Please add this there.
@@ -78,8 +78,8 @@ private[spark] class MesosFineGrainedSchedulerBackend( | |||
sc.appName, | |||
sc.conf, | |||
sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.webUrl)), | |||
Option.empty, | |||
Option.empty, | |||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not to touch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
docs/running-on-mesos.md
Outdated
@@ -520,7 +520,7 @@ See the [configuration page](configuration.html) for information on Spark config | |||
<td><code>spark.mesos.checkpoint</code></td> | |||
<td>false</td> | |||
<td> | |||
If set, agents running tasks started by this framework will write the framework pid, executor pids and status updates to disk. | |||
If set to true, the agents that are running the spark-executors will write framework pids (Spark), executor pids and status updates to disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
s/spark/Spark
remove the '-'
remove "(Spark)". All of this data applies to Spark, not just the framework pid.
s/pids/pid (there's only one framework)
@@ -56,4 +56,9 @@ package object config { | |||
.stringConf | |||
.createOptional | |||
|
|||
private[spark] val CHECKPOINT = | |||
ConfigBuilder("spark.mesos.checkpoint") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add .doc
like the others.
LGTM @srowen Can we get a merge? Thanks. |
Actually, first, @gkc2104 can you please remove "fine-grained mode" from the PR title? |
docs/running-on-mesos.md
Outdated
<td><code>spark.mesos.checkpoint</code></td> | ||
<td>false</td> | ||
<td> | ||
If set to true, the agents that are running the Spark executors will write the framework pid, executor pids and status updates to disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the mesos agents
@gkc2104 @mgummelt Will there be a separate issue & pr for adding the failover_timeout? |
ping @srowen, i think this PR is ready to merge |
Can one of the admins verify this patch? |
@srowen @mgummelt @gkc2104 Just curious, why did we not merge this? Or has this feature been addressed already elsewhere? I couldn't find it anywhere in the latest codebase and documentation. Accept my apology in advance if this feature is merged already as being a beginner in Spark, I am still unaware of all the features. Thanks |
Support for Mesos checkpointing
https://issues.apache.org/jira/browse/SPARK-4899
#60
What changes were proposed in this pull request?
Enabled checkpointing on Coarse grained mode
How was this patch tested?
Unit Tests ensure that the correct SchedulerDriver is created
Please review http://spark.apache.org/contributing.html before opening a pull request.