Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Move executor and driver commands from dockerfile to scheduler #60

Conversation

tnachen
Copy link

@tnachen tnachen commented Jan 27, 2017

What changes were proposed in this pull request?

Move the executor and driver commands in the dockerfile into the scheduler code, so that it's easier to know and modify commands rather than making changes to the Dockerfile to reflect more changes in the scheduler.

How was this patch tested?

integration-tests

@mccheah
Copy link

mccheah commented Jan 27, 2017

@tnachen we had a discussion about this awhile back actually, when work was done on an older fork: foxish#7

I'm happy to discuss this some more, but I lean towards keeping the entire command in the Docker file, and having the client strictly parameterize the call via environment variables. This is for two primary reasons:

  1. Customization: If the user wishes to specify their own launch bootstrapping process then they would have the flexibility to override the CMD in the Dockerfile according to their specific use case. Here the user is tied to the specific command that is decided by spark-submit. I suppose this is partly remedied if we introduce Support custom YAML for the driver pod spec #38 but ideally users wouldn't have to specify their entire pod YAML if they just want to change the image and the logic of the command that is run.
  2. Visibility: The Dockerfile is an easily-understood specification of what the driver and executors are expected to do, and what would need to be done if the Docker image were to be extended. The Dockerfile is also easy to get ahold of since would also foreseeably provide the Dockerfile in the Spark distribution tarball - see Copy the Dockerfiles from docker-minimal-bundle into the distribution. #12. Even if we implemented Support custom YAML for the driver pod spec #38 it becomes unclear for the user what the expected contract is for custom driver and executor images unless the user were to look at the Spark source code.

It is true that with more and more features being added this Dockerfile is getting more complex; see, for example, https://github.com/apache-spark-on-k8s/spark/pull/30/files#diff-1876654a7b8b8a0ad1f8252da08e5e36R22. Unfortunately the Dockerfile's requirement to include all of the logic in a single command makes bundling all the functionality there difficult.

An alternative proposal was to package a script which encapsulates all the logic. This idea seems more favorable now than before, because the approach to distribution has shifted to shipping the Docker files with the full build context, as opposed to shipping the Docker images to a public registry. This difference in distribution matters because if we were providing the Docker images, the Dockerfile's command would invoke the script but it would be difficult for users to find what the script actually did behind the scenes without visiting the Spark code repository to find the script. Fortunately now since we're providing the entire build context in the Spark distribution, they can use the scripts included in the Spark tarball as a baseline reference when they want to extend the images.

@tnachen
Copy link
Author

tnachen commented Jan 28, 2017

Thanks for the explanation and the background, the reason I proposed this change in the first place is that I find it going to be more difficult for us to maintain the integration if we bake the command in the docker image, especially since we don't really own the image that the user is potentially going to run. If we ever want to change or add parameters around the rest server, or if we want to change the class to run when running the driver pod, it becomes a public announcement and burden for us to tell everyone to rewrite their dockerfiles and troubleshoot. Arguably I am not sure having the user see the exact command we run in the dockerfile really gives them more flexibility, since the parameters are already controlled by us?

@foxish
Copy link
Member

foxish commented Jan 28, 2017

Dockerfile CMD/ENTRYPOINT are used typically to allow for images that run on systems other than k8s, and it is common for us to override those using the cmd/args in container spec. We do not control the image, so, it would make sense to have as small a dependency as possible on it.

I suppose this is partly remedied if we introduce #38 but ideally users wouldn't have to specify their entire pod YAML if they just want to change the image and the logic of the command that is run.

@mccheah, I am not sure I understand this point. What kinds of customization do we expect in the image? The way I see it, the user may want to change what packages are installed, or want to get some other libraries on the image, but would still have to execute the same command. As for the logic of the command being run, I also don't see how that could be changed under either model. Currently, we depend on environment variables which are opaquely included within the pod spec by our scheduler, such as SPARK_SUBMISSION_SECRET_LOCATION and SPARK_DRIVER_LAUNCHER_SERVER_PORT.

The visibility aspect is a good point. If we want to leave some trace of what the contract is within the Dockerfile itself, we could set the ENTRYPOINT to be the class we want to execute and supply the various arguments from within our scheduler code. That could be a good middle ground here perhaps? I can see many parallels, like https://github.com/wangqiang8511/docker-spark-mesos and https://github.com/berngp/mesos-spark-docker

@erikerlandson
Copy link
Member

One example of customization that is very relevant to us is adding nss-wrapper logic which allows spark to run in openshift. We also do other kinds of setup for configurations. Moving the commands into the code would take away a lot of options for customizing behavior prior to actually invoking the spark commands.

@tnachen
Copy link
Author

tnachen commented Jan 30, 2017

If you are just adding a wrapper then you simply just need to add a different entry point then you can still add what you want.

@mccheah
Copy link

mccheah commented Jan 30, 2017

How would one add a different entry point when spark-submit is setting the entry point directly? Is the entry point different from the command?

@foxish
Copy link
Member

foxish commented Jan 30, 2017

@mccheah, the ENTRYPOINT is different from the CMD, and typically used to specify the command (devoid of args). The arguments to it can be specified from the pod spec. This way, if the user wants to wrap our executable in their own script, they can go ahead and do that (by changing the entrypoint), and ensure that they forward the necessary arguments to the driver/executor class from their own receiver class.

The ENTRYPOINT in a dockerfile is overridden by spec.container[N].command, and CMD by spec.container[N].args in the pod spec respectively.

@mccheah
Copy link

mccheah commented Jan 30, 2017

Right, but this PR makes spark-submit hardcode the command in the pod spec which would override anything the user sets in the ENTRYPOINT.

@foxish
Copy link
Member

foxish commented Jan 31, 2017

Ah, I see what you mean. I think we have agreement on keeping the executable in ENTRYPOINT and arguments in spec.container[N].args. The PR needs an update to reflect our current thinking @tnachen

@tnachen
Copy link
Author

tnachen commented Jan 31, 2017

Only overriding args is fine with me too, although I thought if you only override command it still uses the same entrypoint from the docker image and just ends up running ENTRYPOINT (docker image) + COMMAND (spark code).

@mccheah
Copy link

mccheah commented Jan 31, 2017

I think we can move down that path. We might run into issues down the line if the user expects to pass on the arguments specifically (e.g. their custom script expects --use-ssl and we change it to ssl-enabled). We can revisit this as the customization use cases in practice become better defined at a later time.

@tnachen watch out for #65, we might run into conflicts since I'm changing some of the config keys and env vars there.

@tnachen tnachen force-pushed the k8s-command branch 2 times, most recently from 6bc24e3 to 5ef78ae Compare February 3, 2017 13:50
@tnachen
Copy link
Author

tnachen commented Feb 4, 2017

@mccheah @foxish Just updated this PR and rebased. PTAL

Copy link

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed above that we only want to override the arguments. Can the PR be updated to reflect that direction?

val containerPorts = buildContainerPorts()
val probePingHttpGet = new HTTPGetActionBuilder()
.withScheme(if (driverSubmitSslOptions.enabled) "HTTPS" else "HTTP")
.withPath("/v1/submissions/ping")
.withNewPort(SUBMISSION_SERVER_PORT_NAME)
.build()
val args = mutable.Buffer[String]()
args ++= Seq(
"bin/spark-class", "org.apache.spark.deploy.rest.kubernetes.KubernetesSparkRestServer",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command bin/spark-class should be in the Dockerfile, yes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's similarly how Mesos works as well (although there is a bit more path wrangling).

@@ -189,6 +178,17 @@ private[spark] class KubernetesClusterSchedulerBackend(
.withContainerPort(port._2)
.build()
})
val args = scala.collection.mutable.Buffer[String]()
args ++= Seq(
"${JAVA_HOME}/bin/java", s"-Dspark.executor.port=${executorPort.toString}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here - at least JAVA_HOME/bin/java but would like the Dockerfile to at least handle the specification of the main class as well. That being said it's tricky to separate out the various parts of the java command in this context, seeing as the VM options have to come before the main class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exactly what I wanted to avoid is to let the Dockerfile specify the main class, then it becomes coupled with the Dockerfile with a particular version.

@tnachen
Copy link
Author

tnachen commented Feb 11, 2017

I see, I thought when we mentioned "arguments" it's passing the command as arguments to K8s, but sounds like we want to have a entrypoint and then tack on arguments in the scheduler. This sounds trickier since then users cannot override the entrypoint (what @erikerlandson wants) anymore if we only override arguments.

@tnachen
Copy link
Author

tnachen commented Feb 11, 2017

Btw I think if we don't like to merge this since this is IMO more for future maintenance, I'll just close this for now as I see we're changing quite often the command part in the scheduler.

@mccheah
Copy link

mccheah commented Feb 13, 2017

This sounds trickier since then users cannot override the entrypoint (what @erikerlandson wants) anymore if we only override arguments.

@tnachen @erikerlandson I think the approach as outlined here is harder for users to customize. My understanding is that if we put the arguments in the scheduler but the Dockerfile handles the base CMD / ENTRYPOINT that receives said arguments, that gives the users the flexibility to provide their own Dockerfile which overrides the CMD but still receives the same scheduler-provided arguments. The custom CMD can handle the scheduler-provided arguments its own way, most likely forwarding to what the default Dockerfile CMD specifies.

@ash211
Copy link

ash211 commented Mar 16, 2017

Please rebase onto branch-2.1-kubernetes and send this PR into that branch instead of k8s-support-alternate-incremental which is now deprecated.

@erikerlandson
Copy link
Member

Since this has been resolved in favor of assuming commands on the images, it seems best to close it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants