-
Notifications
You must be signed in to change notification settings - Fork 118
Replace submission v1 with submission v2. #286
Conversation
df72c0e
to
ab07326
Compare
docs/running-on-kubernetes.md
Outdated
- `spark.ssl.kubernetes.resourceStagingServer.keyPasswordFile`, to provide the keyStore's key password using a file | ||
instead of a static value. This is useful if the key password is to be mounted into the container with a secret. | ||
|
||
Note that while the properties can be set in the ConfigMap, you will still need to consider the means of mounting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@foxish I wonder if we can do better here. Perhaps it would be worthwhile to have a script that allows users to provide their TLS secrets from their local disk and the script then converts them to a Kubernetes secret that's mounted into the pod. Or if that's too heavyweight and we don't want to maintain that, perhaps we can have an alternate YML template that provides an outline for a Secret object and how it would be mounted into the resource staging service container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I think we can wait till there is a push for using the security features extensively before making it more convenient. For now, I think this is fine.
@@ -3,23 +3,34 @@ layout: global | |||
title: Running Spark on Kubernetes | |||
--- | |||
|
|||
Support for running on [Kubernetes](https://kubernetes.io/docs/whatisk8s/) is available in experimental status. The feature set is | |||
currently limited and not well-tested. This should not be used in production environments. | |||
Support for running on [Kubernetes](https://kubernetes.io/docs/whatisk8s/) is available in experimental status. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ash211 @erikerlandson @kimoonkim @lins05 @foxish
The documentation changes are by far the most important part of this change, as well as the workflow in general. This is our chance to catch any changes in the user's workflow as described in the documentation and weigh if it is still reasonable to move forward with this overhaul.
@@ -0,0 +1,60 @@ | |||
--- | |||
apiVersion: extensions/v1beta1 | |||
kind: ReplicaSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a deployment here instead? It's a superset of the ReplicaSet in terms of features, providing rolling-update and the like. Also, I think it has recently moved to apps/v1beta1.
name: spark-resource-staging-server-config | ||
containers: | ||
- name: spark-resource-staging-server | ||
image: kubespark/spark-resource-staging-server:v2.1.0-kubernetes-0.1.0-alpha.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this image exist already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: spark-resource-staging-server | ||
image: kubespark/spark-resource-staging-server:v2.1.0-kubernetes-0.1.0-alpha.3 | ||
resources: | ||
requests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a limit
section, we will have the pods belong to the burstable QoS class. Is this the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added limits.
- protocol: TCP | ||
port: 10000 | ||
targetPort: 10000 | ||
nodePort: 31000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not typical to specify a nodeport, and should probably be called out explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make the documentation and deployment instructions simpler, basically removing the steps to find the NodePort.
docs/running-on-kubernetes.md
Outdated
- `spark.ssl.kubernetes.resourceStagingServer.keyPasswordFile`, to provide the keyStore's key password using a file | ||
instead of a static value. This is useful if the key password is to be mounted into the container with a secret. | ||
|
||
Note that while the properties can be set in the ConfigMap, you will still need to consider the means of mounting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I think we can wait till there is a push for using the security features extensively before making it more convenient. For now, I think this is fine.
Thanks for the changes. Did one cursory review. The change looks good overall; will do a more detailed pass in a day or so. |
apiVersion: extensions/v1beta1 | ||
kind: ReplicaSet | ||
apiVersion: apps/v1beta1 | ||
kind: Deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is Deployment compatible with k8s 1.5 (which as of now is "prior version" of k8s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. maybe we should use extensions.v1beta1/Deployment
instead of the apps group here. The apps group move happened in 1.6 (https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#deployment-2). I'll verify with the team about the plan for this moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Deployment from the group extensions/v1beta1
will work on 1.5, 1.6 and 1.7, and we'll probably deprecate it in 1.8. So, I think we can use that for now, and not apps/v1beta1.
ced7990
to
7cb0420
Compare
docs/running-on-kubernetes.md
Outdated
using [kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes | ||
cluster, you may setup a test cluster on your local machine using | ||
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). | ||
* We recommend that minikube be updated to the most recent version (0.18.0 at the time of this documentation), as some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.19.0 now -- https://github.com/kubernetes/minikube/releases
docs/running-on-kubernetes.md
Outdated
earlier versions may not start up the kubernetes cluster with all the necessary components. | ||
* You must have appropriate permissions to create and list [pods](https://kubernetes.io/docs/user-guide/pods/), | ||
[ConfigMaps](https://kubernetes.io/docs/tasks/configure-pod-container/configmap/) and | ||
[secrets](https://kubernetes.io/docs/concepts/configuration/secret/) in your cluster. You can verify that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our sample resource staging server yaml is a Deployment now -- should add that kind to this list of required perms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking it's not necessary to deploy the resource staging server as specified by the sample YAML file, however. We can add that separately to the documentation on running the resource staging server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A section on "Optionality of the Resource Staging Server" would be good. I think it's basically when you have any local resources (jars or files) including the application jar, but we should spell that out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discuss optionality here but should it be under a specific heading? https://github.com/apache-spark-on-k8s/spark/pull/286/files#diff-b5527f236b253e0d9f5db5164bdb43e9R136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think separate heading -- it's an important distinction and determines whether the resource staging server is needed at all
`dockerfiles/driver/Dockerfile` and `dockerfiles/executor/Dockerfile`, respectively. Use these Docker files to build the | ||
Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images to the | ||
registry. | ||
You may also build these docker images from sources, or customize them as required. Spark distributions include the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should link customize them as required
to a section about how to build a custom Docker image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can follow up in a separate commit for documentation on creating custom Docker images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, should do that later. That's captured as one of the items on #262 so nothing more on that for this PR
docs/running-on-kubernetes.md
Outdated
This YAML file configures a Deployment with one pod running the resource staging server configured with a ConfigMap, | ||
and exposes the server through a Service with a given NodePort. | ||
|
||
To run the resource staging server with default configurations, the Kubernetes resources can simply be created: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer avoiding the word simple
in documentation -- the situation is inevitably more complicated than it seems at first and simple
comes off a bit as marketing-speak
bin/spark-submit \ | ||
--deploy-mode cluster \ | ||
--class org.apache.spark.examples.SparkPi \ | ||
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the https://
here -- it looks weird and is equivalent to the k8s://
-only version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/running-on-kubernetes.md
Outdated
<td><code>spark.kubernetes.driverSubmissionTimeout</code></td> | ||
<td>60s</td> | ||
<td><code>spark.kubernetes.resourceStagingServer.internal.uri</code></td> | ||
<td>(none)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaults to the value of spark.kubernetes.resourceStagingServer.uri
right?
docs/running-on-kubernetes.md
Outdated
<td><code>spark.kubernetes.report.interval</code></td> | ||
<td><code>1s</code></td> | ||
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.clientCertPem</code></td> | ||
<td>(none)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default of spark.ssl.kubernetes.resourceStagingServer.clientCertPem
docs/running-on-kubernetes.md
Outdated
<td><code>spark.kubernetes.submission.waitAppCompletion</code></td> | ||
<td><code>true</code></td> | ||
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.trustStoreType</code></td> | ||
<td>(none)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default of spark.ssl.kubernetes.resourceStagingServer.trustStoreType
docs/running-on-kubernetes.md
Outdated
<td><code>spark.kubernetes.driver.pod.name</code></td> | ||
<td><code>(none)</code></td> | ||
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.trustStorePassword</code></td> | ||
<td>(none)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default of spark.ssl.kubernetes.resourceStagingServer.trustStorePassword
docs/running-on-kubernetes.md
Outdated
<td><code>spark.kubernetes.driver.service.exposeUiPort</code></td> | ||
<td><code>false</code></td> | ||
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.trustStore</code></td> | ||
<td>(none)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default of spark.ssl.kubernetes.resourceStagingServer.trustStore
rerun unit tests please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@foxish can you take a look at this today/tomorrow?
LGTM from my end. I think a couple of @ash211's comments still need addressing however. |
I think that covers it -- will merge when the tests succeed. Thanks for the review @foxish ! |
* Replace submission v1 with submission v2. * Address documentation changes. * Fix documentation
* Replace submission v1 with submission v2. * Address documentation changes. * Fix documentation
Removes all of the submission V1 code and makes submission v2 the only submission mode that is used by SparkSubmit.
The package structure has been re-organized once more to remove the v1 and v2 qualifiers, changing most of the imports in the code. I also attempted to remove as much dead code as possible in files that we still use in the new submission code. Documentation has been re-written to reflect the new submission workflow that uses a resource staging server for local dependencies.
This PR also includes a sample YAML file that deploys the minimum set of resources required for a resource staging server. It includes a ConfigMap with example properties that configure the resource staging server.