Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Submission client should provide the driver with a pod template to use for executors #434

Closed
mccheah opened this issue Aug 14, 2017 · 4 comments

Comments

@mccheah
Copy link

mccheah commented Aug 14, 2017

Currently we have a construct of a SparkPodInitContainerBootstrap, which is a module that can bootstrap an arbitrary pod with the init container that fetches dependencies. This code is used in both the submission client and the scheduler backend to bootstrap the init container into the driver and executor pods, respectively. However, in order to make sure the bootstrap is identical between the driver and the executors, the submission client passes along internal configuration to the driver to inform the driver of the name of a specific config map to load onto the executors. This also requires the scheduler backend to know to interpret the configurations sent from the submission client and to construct a SparkPodInitContainerBootstrap accordingly. This is not ideal for two reasons:

  1. The submission client and the driver code need to agree on these internal configurations and the interpretation of them. If the submission client is upgraded, it necessarily requires an upgrade of the driver image also.
  2. One could argue that the scheduler backend is doing unnecessary work, in the sense that it's merely repeating the work of setting up the executor's pod spec YAML despite the fact that the submission client had already done this for the driver. Here, the scheduler backend is merely translating Spark configuration from SparkConf key-value pairs into the pod spec that will be used in the executors.

The issue is amplified by #393 because we're intending to programmatically create another configmap to load small files from the user's local disk. We can use the submission steps architecture to fix the driver accordingly, but the files still need to be mounted on the executors. If we follow the existing precedent, that means another block of specific bootstrap code that needs to be abstracted out and shared between the driver and the executors. What's more, even though the interpretation of spark.files seems to be a responsibility specific to the submission client and not the driver, the user will need to upgrade the driver image alongside the submission client again.

I'm suggesting here that the submission client should be responsible for providing some of the executor pod's specification directly. The submission client can build a loose template for the scheduler backend to apply to all executors that it builds. Note that this is similar to, but decidedly distinct from, #38, as in said linked issue, we want the user to provide the executor pod's YAML specification, but in this case, the YAML specification to be applied to the executors is being determined by the submission client, based on parameters passed to spark-submit.

This would also require a change in the submission steps API because now steps need to be able to specify customizations for the executor pod template.

The significant concern to this kind of design is that it becomes unclear what the scheduler backend is responsible for building versus what the submission client is responsible for building. For example, should the submission client build the executor's template that specifies the amount of resources the executor needs? The division of responsibilities of what should be applied by the driver versus what should be applied by the submission client may end up being uncomfortably arbitrary.

@mccheah
Copy link
Author

mccheah commented Aug 14, 2017

@aash @erikerlandson @ifilonenko @foxish for thoughts. I'm undecided on if this is a philosophically sound approach.

@erikerlandson
Copy link
Member

Is the idea that a template rendered as yaml is an approach to reduce the duplicated effort, in a way that can be passed into the cluster from the submission client?

@mccheah
Copy link
Author

mccheah commented Aug 15, 2017

Something like that... the more I think about this though the more I don't think this is the right approach, at least not yet. @ifilonenko might have some thoughts on whether this would be helpful for the Kerberos work we've done too, because that has a similar problem.

@mccheah mccheah closed this as completed Aug 29, 2017
@mccheah
Copy link
Author

mccheah commented Aug 29, 2017

Closing because this wasn't necessary yet to accomplish what we needed to do

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants