-
Notifications
You must be signed in to change notification settings - Fork 118
Move executor and driver commands from dockerfile to scheduler #60
Move executor and driver commands from dockerfile to scheduler #60
Conversation
@tnachen we had a discussion about this awhile back actually, when work was done on an older fork: foxish#7 I'm happy to discuss this some more, but I lean towards keeping the entire command in the Docker file, and having the client strictly parameterize the call via environment variables. This is for two primary reasons:
It is true that with more and more features being added this Dockerfile is getting more complex; see, for example, https://github.com/apache-spark-on-k8s/spark/pull/30/files#diff-1876654a7b8b8a0ad1f8252da08e5e36R22. Unfortunately the Dockerfile's requirement to include all of the logic in a single command makes bundling all the functionality there difficult. An alternative proposal was to package a script which encapsulates all the logic. This idea seems more favorable now than before, because the approach to distribution has shifted to shipping the Docker files with the full build context, as opposed to shipping the Docker images to a public registry. This difference in distribution matters because if we were providing the Docker images, the Dockerfile's command would invoke the script but it would be difficult for users to find what the script actually did behind the scenes without visiting the Spark code repository to find the script. Fortunately now since we're providing the entire build context in the Spark distribution, they can use the scripts included in the Spark tarball as a baseline reference when they want to extend the images. |
Thanks for the explanation and the background, the reason I proposed this change in the first place is that I find it going to be more difficult for us to maintain the integration if we bake the command in the docker image, especially since we don't really own the image that the user is potentially going to run. If we ever want to change or add parameters around the rest server, or if we want to change the class to run when running the driver pod, it becomes a public announcement and burden for us to tell everyone to rewrite their dockerfiles and troubleshoot. Arguably I am not sure having the user see the exact command we run in the dockerfile really gives them more flexibility, since the parameters are already controlled by us? |
Dockerfile CMD/ENTRYPOINT are used typically to allow for images that run on systems other than k8s, and it is common for us to override those using the cmd/args in container spec. We do not control the image, so, it would make sense to have as small a dependency as possible on it.
@mccheah, I am not sure I understand this point. What kinds of customization do we expect in the image? The way I see it, the user may want to change what packages are installed, or want to get some other libraries on the image, but would still have to execute the same command. As for the logic of the command being run, I also don't see how that could be changed under either model. Currently, we depend on environment variables which are opaquely included within the pod spec by our scheduler, such as SPARK_SUBMISSION_SECRET_LOCATION and SPARK_DRIVER_LAUNCHER_SERVER_PORT. The visibility aspect is a good point. If we want to leave some trace of what the contract is within the Dockerfile itself, we could set the |
One example of customization that is very relevant to us is adding nss-wrapper logic which allows spark to run in openshift. We also do other kinds of setup for configurations. Moving the commands into the code would take away a lot of options for customizing behavior prior to actually invoking the spark commands. |
If you are just adding a wrapper then you simply just need to add a different entry point then you can still add what you want. |
How would one add a different entry point when |
@mccheah, the The ENTRYPOINT in a dockerfile is overridden by |
Right, but this PR makes |
Ah, I see what you mean. I think we have agreement on keeping the executable in ENTRYPOINT and arguments in |
Only overriding args is fine with me too, although I thought if you only override command it still uses the same entrypoint from the docker image and just ends up running ENTRYPOINT (docker image) + COMMAND (spark code). |
I think we can move down that path. We might run into issues down the line if the user expects to pass on the arguments specifically (e.g. their custom script expects @tnachen watch out for #65, we might run into conflicts since I'm changing some of the config keys and env vars there. |
6bc24e3
to
5ef78ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed above that we only want to override the arguments. Can the PR be updated to reflect that direction?
val containerPorts = buildContainerPorts() | ||
val probePingHttpGet = new HTTPGetActionBuilder() | ||
.withScheme(if (driverSubmitSslOptions.enabled) "HTTPS" else "HTTP") | ||
.withPath("/v1/submissions/ping") | ||
.withNewPort(SUBMISSION_SERVER_PORT_NAME) | ||
.build() | ||
val args = mutable.Buffer[String]() | ||
args ++= Seq( | ||
"bin/spark-class", "org.apache.spark.deploy.rest.kubernetes.KubernetesSparkRestServer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command bin/spark-class
should be in the Dockerfile
, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's similarly how Mesos works as well (although there is a bit more path wrangling).
@@ -189,6 +178,17 @@ private[spark] class KubernetesClusterSchedulerBackend( | |||
.withContainerPort(port._2) | |||
.build() | |||
}) | |||
val args = scala.collection.mutable.Buffer[String]() | |||
args ++= Seq( | |||
"${JAVA_HOME}/bin/java", s"-Dspark.executor.port=${executorPort.toString}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here - at least JAVA_HOME/bin/java
but would like the Dockerfile to at least handle the specification of the main class as well. That being said it's tricky to separate out the various parts of the java
command in this context, seeing as the VM options have to come before the main class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's exactly what I wanted to avoid is to let the Dockerfile specify the main class, then it becomes coupled with the Dockerfile with a particular version.
I see, I thought when we mentioned "arguments" it's passing the command as arguments to K8s, but sounds like we want to have a entrypoint and then tack on arguments in the scheduler. This sounds trickier since then users cannot override the entrypoint (what @erikerlandson wants) anymore if we only override arguments. |
Btw I think if we don't like to merge this since this is IMO more for future maintenance, I'll just close this for now as I see we're changing quite often the command part in the scheduler. |
@tnachen @erikerlandson I think the approach as outlined here is harder for users to customize. My understanding is that if we put the arguments in the scheduler but the Dockerfile handles the base CMD / ENTRYPOINT that receives said arguments, that gives the users the flexibility to provide their own Dockerfile which overrides the CMD but still receives the same scheduler-provided arguments. The custom CMD can handle the scheduler-provided arguments its own way, most likely forwarding to what the default Dockerfile CMD specifies. |
Please rebase onto |
Since this has been resolved in favor of assuming commands on the images, it seems best to close it. |
What changes were proposed in this pull request?
Move the executor and driver commands in the dockerfile into the scheduler code, so that it's easier to know and modify commands rather than making changes to the Dockerfile to reflect more changes in the scheduler.
How was this patch tested?
integration-tests