Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Support custom YAML for the driver pod spec #38

Open
mccheah opened this issue Jan 23, 2017 · 10 comments
Open

Support custom YAML for the driver pod spec #38

mccheah opened this issue Jan 23, 2017 · 10 comments
Labels

Comments

@mccheah
Copy link

mccheah commented Jan 23, 2017

We started with our base implementation, which hard-codes the pod spec to have specific fields. We then thought of the idea of supporting custom labels. However, there's plenty of things on an arbitrary user's application that could be useful to customize, such as ports and mounted volumes.

This issue therefore proposes that we support users providing arbitrary YAML files that describe the pod, or at least any modifications and augmentations they would like to make. We need to be careful when considering the API and expectations here. One mode of operation could be, the user specifies a custom file for the pod via —driver-pod-spec-file or an equivalent SparkConf. We can then take the user's pod spec and augment it with whatever is missing - for example, adding the Spark UI and REST submission server ports which are required for Spark.

The tricky part is the fact that the user has to specify that the container is running the Spark driver submission server, and that this container is the one that needs the custom ports open. Thus we should probably support adding —driver-container that must be set if the custom pod spec is set, so that we know which container we need to adjust to add the missing ports, etc. I don't know if there is any way we can make this easy to use, but in a sense usability seems to be a secondary issue here - I anticipate this will primarily be used for specific off-roading "power-user" scenarios.

@mccheah mccheah changed the title Support custom YAML for the pod spec Support custom YAML for the driver pod spec Jan 23, 2017
@mccheah
Copy link
Author

mccheah commented Jan 23, 2017

@foxish @erikerlandson curious as to your thoughts on this.

Could also be worthwhile for executors to support side-car containers - think custom metrics and reporting, etc.

@foxish
Copy link
Member

foxish commented Jan 24, 2017

This is something Eric Tune and I had discussed earlier. It is a use case we want to support, but I think we should defer implementing this till we have the "default" specifications of driver and executor pods nailed down.

@mccheah
Copy link
Author

mccheah commented Jan 24, 2017

+1 - probably not for phase 1 then but for down the line.

@erikerlandson
Copy link
Member

We've done prototype work with side-car containers in a master pod for supporting carbon & graphite sinks for spark's metrics. I'm not sure if something similar is needed for executors but I can see how it might be.

I agree it's not a high priority.

On that topic, does the ability to customize yaml on the driver imply the potential to add containers in the driver pod?

@mccheah
Copy link
Author

mccheah commented Jan 24, 2017

Yeah side-cars in the driver would be supported as well. We'd need to de-duplicate which is the actual driver container, hence the suggestion for a second config option to denote that.

@liyinan926
Copy link
Member

I want bring this issue up again as I feel that given now we have more requirements for customizing the driver/executor pods, with issues like #393, #397, #299, etc., it's the right time to re-think about this. I have some thoughts below on the use of YAML pod templates specifically. Whether to use YAML templates, PodPresets, or even something else remains a question.

  • We still support the individual configuration properties that, when set, override the same aspect set in the template. So individual configuration properties are considered more specific and should always take precedence.
  • We can limit what can be set in the template to those aspects that have corresponding configuration properties, i.e., aspects that are currently already configurable by the users, e.g., name, labels, annotations, memory, cpu cores, etc.
  • We can add validation logic in the submission client to make sure that templates don't contain aspects that are not allowed to be set in the template. We may relax this in the future if we a good notion of a default specification.

With those, we really just use YAML templates for things that can be overridden by individual configuration properties. However, having the option to use templates save users a lot of efforts of setting individual properties repeatedly, while still offering the flexibility through overriding by the individual properties. Thoughts on this?

@mccheah
Copy link
Author

mccheah commented Jul 27, 2017

I think @foxish suggested using Pod Presets for this instead, in which case there would actually be no work to do on our part.

@liyinan926
Copy link
Member

liyinan926 commented Jul 27, 2017

Yes, we are also considering PodPresets and having discussions on that. But it's not quite ready yet (currently alpha so is not guaranteed to be available on a cluster) and needs certain things to be enabled to be used. That's why I brought this up again just to start discussions on the feasibility of using YAML templates as a potential solution.

@mccheah
Copy link
Author

mccheah commented Jul 27, 2017

What's the timeline for Pod Presets to move from alpha to beta status?

@erikerlandson
Copy link
Member

Doing it with YAML (as opposed to PodPresets, which of course are also yaml) feels like it will be an unguarded chainsaw kind of tool. Trying to make it both safe and easy to explain would be a hard needle to thread. From the POV that it's for power users anyway, that isn't necessarily a deal breaker, but at the very least it would have to be documented as a dangerous-power-tool category of feature.

My inclination is to wait for pod presets, although others might be feeling varying levels of urgency.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants