Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Replace submission v1 with submission v2. #286

Merged
merged 3 commits into from
May 23, 2017

Conversation

mccheah
Copy link

@mccheah mccheah commented May 19, 2017

Removes all of the submission V1 code and makes submission v2 the only submission mode that is used by SparkSubmit.

The package structure has been re-organized once more to remove the v1 and v2 qualifiers, changing most of the imports in the code. I also attempted to remove as much dead code as possible in files that we still use in the new submission code. Documentation has been re-written to reflect the new submission workflow that uses a resource staging server for local dependencies.

This PR also includes a sample YAML file that deploys the minimum set of resources required for a resource staging server. It includes a ConfigMap with example properties that configure the resource staging server.

@mccheah mccheah force-pushed the remove-submission-v1 branch from df72c0e to ab07326 Compare May 19, 2017 23:25
- `spark.ssl.kubernetes.resourceStagingServer.keyPasswordFile`, to provide the keyStore's key password using a file
instead of a static value. This is useful if the key password is to be mounted into the container with a secret.

Note that while the properties can be set in the ConfigMap, you will still need to consider the means of mounting the
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@foxish I wonder if we can do better here. Perhaps it would be worthwhile to have a script that allows users to provide their TLS secrets from their local disk and the script then converts them to a Kubernetes secret that's mounted into the pod. Or if that's too heavyweight and we don't want to maintain that, perhaps we can have an alternate YML template that provides an outline for a Secret object and how it would be mounted into the resource staging service container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think we can wait till there is a push for using the security features extensively before making it more convenient. For now, I think this is fine.

@@ -3,23 +3,34 @@ layout: global
title: Running Spark on Kubernetes
---

Support for running on [Kubernetes](https://kubernetes.io/docs/whatisk8s/) is available in experimental status. The feature set is
currently limited and not well-tested. This should not be used in production environments.
Support for running on [Kubernetes](https://kubernetes.io/docs/whatisk8s/) is available in experimental status. The
Copy link
Author

@mccheah mccheah May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ash211 @erikerlandson @kimoonkim @lins05 @foxish

The documentation changes are by far the most important part of this change, as well as the workflow in general. This is our chance to catch any changes in the user's workflow as described in the documentation and weigh if it is still reasonable to move forward with this overhaul.

@@ -0,0 +1,60 @@
---
apiVersion: extensions/v1beta1
kind: ReplicaSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a deployment here instead? It's a superset of the ReplicaSet in terms of features, providing rolling-update and the like. Also, I think it has recently moved to apps/v1beta1.

name: spark-resource-staging-server-config
containers:
- name: spark-resource-staging-server
image: kubespark/spark-resource-staging-server:v2.1.0-kubernetes-0.1.0-alpha.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this image exist already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- name: spark-resource-staging-server
image: kubespark/spark-resource-staging-server:v2.1.0-kubernetes-0.1.0-alpha.3
resources:
requests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a limit section, we will have the pods belong to the burstable QoS class. Is this the intention?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added limits.

- protocol: TCP
port: 10000
targetPort: 10000
nodePort: 31000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not typical to specify a nodeport, and should probably be called out explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make the documentation and deployment instructions simpler, basically removing the steps to find the NodePort.

- `spark.ssl.kubernetes.resourceStagingServer.keyPasswordFile`, to provide the keyStore's key password using a file
instead of a static value. This is useful if the key password is to be mounted into the container with a secret.

Note that while the properties can be set in the ConfigMap, you will still need to consider the means of mounting the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think we can wait till there is a push for using the security features extensively before making it more convenient. For now, I think this is fine.

@foxish
Copy link
Member

foxish commented May 22, 2017

Thanks for the changes. Did one cursory review. The change looks good overall; will do a more detailed pass in a day or so.

apiVersion: extensions/v1beta1
kind: ReplicaSet
apiVersion: apps/v1beta1
kind: Deployment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is Deployment compatible with k8s 1.5 (which as of now is "prior version" of k8s)?

Copy link
Member

@foxish foxish May 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. maybe we should use extensions.v1beta1/Deployment instead of the apps group here. The apps group move happened in 1.6 (https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#deployment-2). I'll verify with the team about the plan for this moving forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Deployment from the group extensions/v1beta1 will work on 1.5, 1.6 and 1.7, and we'll probably deprecate it in 1.8. So, I think we can use that for now, and not apps/v1beta1.

@mccheah mccheah force-pushed the remove-submission-v1 branch from ced7990 to 7cb0420 Compare May 22, 2017 21:36
@mccheah mccheah changed the base branch from logging-pod-status-v2 to branch-2.1-kubernetes May 22, 2017 21:36
using [kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes
cluster, you may setup a test cluster on your local machine using
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
* We recommend that minikube be updated to the most recent version (0.18.0 at the time of this documentation), as some
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earlier versions may not start up the kubernetes cluster with all the necessary components.
* You must have appropriate permissions to create and list [pods](https://kubernetes.io/docs/user-guide/pods/),
[ConfigMaps](https://kubernetes.io/docs/tasks/configure-pod-container/configmap/) and
[secrets](https://kubernetes.io/docs/concepts/configuration/secret/) in your cluster. You can verify that
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our sample resource staging server yaml is a Deployment now -- should add that kind to this list of required perms

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking it's not necessary to deploy the resource staging server as specified by the sample YAML file, however. We can add that separately to the documentation on running the resource staging server.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A section on "Optionality of the Resource Staging Server" would be good. I think it's basically when you have any local resources (jars or files) including the application jar, but we should spell that out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discuss optionality here but should it be under a specific heading? https://github.com/apache-spark-on-k8s/spark/pull/286/files#diff-b5527f236b253e0d9f5db5164bdb43e9R136

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think separate heading -- it's an important distinction and determines whether the resource staging server is needed at all

`dockerfiles/driver/Dockerfile` and `dockerfiles/executor/Dockerfile`, respectively. Use these Docker files to build the
Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images to the
registry.
You may also build these docker images from sources, or customize them as required. Spark distributions include the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should link customize them as required to a section about how to build a custom Docker image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow up in a separate commit for documentation on creating custom Docker images.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, should do that later. That's captured as one of the items on #262 so nothing more on that for this PR

This YAML file configures a Deployment with one pod running the resource staging server configured with a ConfigMap,
and exposes the server through a Service with a given NodePort.

To run the resource staging server with default configurations, the Kubernetes resources can simply be created:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer avoiding the word simple in documentation -- the situation is inevitably more complicated than it seems at first and simple comes off a bit as marketing-speak

bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the https:// here -- it looks weird and is equivalent to the k8s://-only version

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<td><code>spark.kubernetes.driverSubmissionTimeout</code></td>
<td>60s</td>
<td><code>spark.kubernetes.resourceStagingServer.internal.uri</code></td>
<td>(none)</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaults to the value of spark.kubernetes.resourceStagingServer.uri right?

<td><code>spark.kubernetes.report.interval</code></td>
<td><code>1s</code></td>
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.clientCertPem</code></td>
<td>(none)</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default of spark.ssl.kubernetes.resourceStagingServer.clientCertPem

<td><code>spark.kubernetes.submission.waitAppCompletion</code></td>
<td><code>true</code></td>
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.trustStoreType</code></td>
<td>(none)</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default of spark.ssl.kubernetes.resourceStagingServer.trustStoreType

<td><code>spark.kubernetes.driver.pod.name</code></td>
<td><code>(none)</code></td>
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.trustStorePassword</code></td>
<td>(none)</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default of spark.ssl.kubernetes.resourceStagingServer.trustStorePassword

<td><code>spark.kubernetes.driver.service.exposeUiPort</code></td>
<td><code>false</code></td>
<td><code>spark.ssl.kubernetes.resourceStagingServer.internal.trustStore</code></td>
<td>(none)</td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default of spark.ssl.kubernetes.resourceStagingServer.trustStore

@mccheah
Copy link
Author

mccheah commented May 22, 2017

rerun unit tests please

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@foxish can you take a look at this today/tomorrow?

@foxish
Copy link
Member

foxish commented May 23, 2017

LGTM from my end. I think a couple of @ash211's comments still need addressing however.

@ash211
Copy link

ash211 commented May 23, 2017

I think that covers it -- will merge when the tests succeed. Thanks for the review @foxish !

@ash211 ash211 merged commit 8f3d965 into branch-2.1-kubernetes May 23, 2017
@ash211 ash211 deleted the remove-submission-v1 branch May 23, 2017 19:57
foxish pushed a commit that referenced this pull request Jul 24, 2017
* Replace submission v1 with submission v2.

* Address documentation changes.

* Fix documentation
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
* Replace submission v1 with submission v2.

* Address documentation changes.

* Fix documentation
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants