Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Allow customizing external URI provision + External URI can be set via annotations #147

Merged
merged 7 commits into from
Mar 3, 2017

Conversation

mccheah
Copy link

@mccheah mccheah commented Feb 24, 2017

Closes #140.

@kimoonkim
Copy link
Member

@mccheah FYI, the Travis build failed not because of unit tests but because of style issues:

error file=/home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/constants.scala message=File line length exceeds 100 characters line=68
Saving to outputFile=/home/travis/build/apache-spark-on-k8s/spark/resource-managers/kubernetes/core/target/scalastyle-output.xml
Processed 19 file(s)
Found 1 errors
Found 0 warnings
Found 0 infos
Finished in 1617 ms
[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check (default) on project spark-kubernetes_2.11: Failed during scalastyle execution: You have 1 Scalastyle violation(s). -> [Help 1]

@@ -106,6 +106,33 @@ The above mechanism using `kubectl proxy` can be used when we have authenticatio
kubernetes-client library does not support. Authentication using X509 Client Certs and oauth tokens
is currently supported.

### Determining the Driver Base URI

Kubernetes pods run with their own IP address space. If Spark is run in cluster mode, the driver pod may not be
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could perhaps be more concise or clear. Would appreciate anyone's thoughts on how this could be better explained.

ShutdownHookManager.DEFAULT_SHUTDOWN_PRIORITY + 1)(() =>
driverServiceManager.handleSubmissionError(
new SparkException("Submission shutting down early...")))
try {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ash211 I swapped the order here of the tryWithResource and try-finally blocks mainly because we want to clean up the secrets and the shutdown hooks after the pod has been launched but also before the application itself completes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do switch to shutdown hooks? those accumulate this global state and you start ending up having ordering issues (like you had with priorities) that I think is best avoided. Can we keep as try-finally blocks instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try-finally doesn't catch the case when the JVM is stopped via a kill signal. I suppose the big question mark is if we care about such scenarios. The rest of the Spark ecosystem relies on shutdown hooks greatly to clear up all kinds of temporary state.

uris = nodeUrls,
maxRetriesPerServer = 3,
uris = serviceUris,
maxRetriesPerServer = 10,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be the case that the external URI providers take a little longer to be ready. In testing #70 I noticed that the Ingress can take awhile to set up in the background.

.asScala(0)
assert(driverService.getMetadata.getAnnotations.containsKey(ANNOTATION_PROVIDE_EXTERNAL_URI),
"External URI request annotation was not set on the driver service.")
// Unfortunately we can't check the correctness of the actual value of the URI, as it depends
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think this highlights that we have an API discrepancy here. We document that the driverServiceManager will build the service that will end up being used, but the code still manipulates the created service after the fact to avoid exposing the NodePort. There's also the extra configuration parameter spark.kubernetes.driver.service.exposeUiPort. There should be a way to reconcile all of these configurations.


There may be cases where the Kubelet nodes cannot be reached by the submission client. For example, the cluster may
only be reachable through an external load balancer. The user may provide their own external IP for Spark driver
services. To use a your own external IP instead of a Kubelet's IP, first set
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use a manually-specified external IP instead of a Kubelet's IP, ...

only be reachable through an external load balancer. The user may provide their own external IP for Spark driver
services. To use a your own external IP instead of a Kubelet's IP, first set
`spark.kubernetes.driver.serviceManagerType` to `ExternalAnnotation`. This will cause a service to be created that
routes to the driver pod with the annotation `spark-job.alpha.apache.org/provideExternalUri`. You will need to run a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a service with the annotation XYZ that routes to the driver pod

process that watches the API server for services that are created with this annotation in the application's namespace
(set by `spark.kubernetes.namespace`). The process should determine a URI that routes to this service, and patch the
service to include an annotation `spark-job.alpha.apache.org/resolvedExternalUri`, which has its value as the external
URI that your process has provided.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide an example URI, e.g. https://example.com:8080/my-job

kubelet nodes at the port that the Kubelets expose for the service. If the Kubelets cannot be contacted from the
submitter's machine, consider setting this to <code>ExternalAnnotation</code> as described in "Determining the
Driver Base URI" above. One may also include a custom implementation of
<code>org.apache.spark.deploy.rest.kubernetes.DriverServiceManager</code> on the submitter's classpath -
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the service-loaded class should be the value of that setting? e.g. spark.kubernetes.driver.serviceManagerType =com.mycompany.spark.MyCustomDriverServiceManager ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not - the value here matches what is returned by DriverServiceManager#getType.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of this is that the configuration value becomes easier to set for the things we bundle into it - NodePort vs. ExternalAnnotation and eventually ApiServerProxy is more elegant to remember than full class names.

</td>
</tr>
<tr>
<td><code>spark.kubernetes.driver.serviceManagerType</code></td>
<td><code>NodePort</code></td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also provide an implementation of this that uploads through the apiserver proxy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this can be left for future work.


private[spark] class NodePortUrisDriverServiceManager extends DriverServiceManager with Logging {

override def getDriverServiceSubmissionServerUris(driverService: Service): Set[String] = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this method at the end so the flow is top to bottom. getServiceManagerType is called first, then customizeDriverService, then getDriverServiceSubmissionServerUris

"External URI request annotation was not set on the driver service.")
// Unfortunately we can't check the correctness of the actual value of the URI, as it depends
// on the driver submission port set on the driver service but we remove that port from the
// service once the submission is complete.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this service cleanup after the job is submitted should be part of the plugin point, rather than the core always opinionatedly deleting the service after the job has submitted.

Alternatively the contract could be that any created k8s resources need to have owner references to the service so that when Spark deletes the service the dependent resources are cleaned up too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different here because the service isn't deleted, but rather it is manipulated such that its type becomes ClusterIP and it no longer exposes the submission server over the Node Port. Regardless there is still the question of whether or not the plugin should be the component that makes this decision or if it's universal.

<td><code>spark.kubernetes.driver.serviceManagerType</code></td>
<td><code>NodePort</code></td>
<td>
A tag indicating which class to use for creating the Kubernetes service and determining its URI for the submission
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide a list of bundled implementations in a bullet list here -- right now this is just NodePort and ExternalAnnotation but I also think we should do a ServerApi also

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't want to try to embed a bullet-list in the table - but added the valid values in prose form.

ShutdownHookManager.DEFAULT_SHUTDOWN_PRIORITY + 1)(() =>
driverServiceManager.handleSubmissionError(
new SparkException("Submission shutting down early...")))
try {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to do switch to shutdown hooks? those accumulate this global state and you start ending up having ordering issues (like you had with priorities) that I think is best avoided. Can we keep as try-finally blocks instead?

kubernetesResourceCleaner.registerOrUpdateResource(submitServerSecret)
val sslConfiguration = sslConfigurationProvider.getSslConfiguration()
// start outer watch for status logging of driver pod
// only enable interval logging if in waitForAppCompletion mode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments got disconnected from the LoggingPodStatusWatcher code

@foxish
Copy link
Member

foxish commented Feb 25, 2017

Haven't done a full review yet of the code.

On the docs front, one broad comment:
The kubelet is the process that runs on each node. The kubelet has to do with bookkeeping and master communication. That name is not used to refer to the node. The node is referred to as "Kubernetes node", or simply "node". For example, NodePort opens a port on the node IP address and listens to it.

@foxish
Copy link
Member

foxish commented Feb 25, 2017

Also, this PR should be targeted towards k8s-support-alternate-incremental? It is pointing at the ssl branch as of now.

@mccheah mccheah force-pushed the external-uri-provider branch from 3e34edb to ba50538 Compare February 25, 2017 02:04
@mccheah mccheah changed the base branch from refactor-ssl to k8s-support-alternate-incremental February 25, 2017 02:04
@mccheah
Copy link
Author

mccheah commented Feb 25, 2017

@ash211 @foxish thanks for reviewing. I changed the base branch to k8s-support-alternate-incremental and also addressed all the comments so far.

@mccheah mccheah changed the title Listen for annotations that provide external URIs. Allow customizing external URI provision + External URI can be set via annotations Feb 28, 2017
@mccheah
Copy link
Author

mccheah commented Feb 28, 2017

@foxish any more thoughts on this PR?

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor docs suggestions


There may be cases where the nodes cannot be reached by the submission client. For example, the cluster may
only be reachable through an external load balancer. The user may provide their own external IP for Spark driver
services. To use a your own external IP instead of a node's IP, first set
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external IP -> external URI

services. To use a your own external IP instead of a node's IP, first set
`spark.kubernetes.driver.serviceManagerType` to `ExternalAnnotation`. A service will be created with the annotation
`spark-job.alpha.apache.org/provideExternalUri`, and this service routes to the driver pod. You will need to run a
process that watches the API server for services that are created with this annotation in the application's namespace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate process

`spark.kubernetes.driver.serviceManagerType` to `ExternalAnnotation`. A service will be created with the annotation
`spark-job.alpha.apache.org/provideExternalUri`, and this service routes to the driver pod. You will need to run a
process that watches the API server for services that are created with this annotation in the application's namespace
(set by `spark.kubernetes.namespace`). The process should determine a URI that routes to this service, and patch the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... that routes to this service (potentially configuring infrastructure to handle the URI behind the scenes), and patch ...

URI that your process has provided (e.g. `https://example.com:8080/my-job`).

Note that if the URI provided by the annotation also provides a base path, the base path should be removed when the
request is forwarded to the back end pod.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this sentence. If the Spark code ignores the base path, then let's throw when the external URI provider gives a URI with a base path? Alternatively probably we should support submitting through a base path since some external URI providers may route based on base-path and require the base path for traffic to reach the underlying spark submit service

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external URI can include a base path, just that when the request reaches the back end pod, the back end pod should be contacted at an empty base path. In the case that you describe where the external URI provider requires a base path, the URI provider should also rewrite the URI when forwarding to the back end to remove the base path. I'm not sure of the best way to phrase this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe something like:

Note that the URI provided in the annotation needs to route traffic to the appropriate destination on the pod, which has a fixed path portion of the URI. This means the external URI provider will likely need to rewrite the path from the external URI to the destination on the pod, e.g. https://example.com:8080/spark-app-1/submit will need to route traffic to https://<pod_ip>:<service_port>/path/to/submit. Note that the paths of these two URLs are different.

I'm not sure what the destination /path/to/submit actually is in Spark -- could use some help there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - the path is empty on the back end.


If the above is confusing, keep in mind that this functionality is only necessary if the submitter cannot reach any of
the nodes at the driver's node port. It is recommended to use the default configuration with the node port service
whenever possible.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that there's a way to serviceload implementations of this here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should include the service loading note here - it's a pretty advanced use case and is already covered in the configuration table below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because this section is confusing as it is - wouldn't want to add to the confusion here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, missed that it's mentioned in the config table.

<td>
A tag indicating which class to use for creating the Kubernetes service and determining its URI for the submission
client. Valid values are currently <code>NodePort</code> and <code>ExternalAnnotation</code>. By default, a service
is created with the NodePort type, and the driver will be contacted at one of the kubelet nodes at the port that the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since NodePort is a config value in this paragraph, make sure it's always wrapped in code tags

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly a config value when stating "a service is created with the NodePort type" - NodePort is the type of the service in this case. Granted, NodePort is a term that is overloaded in this case. Marked it with code tags anyways.

@foxish
Copy link
Member

foxish commented Mar 1, 2017

LGTM

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, though there's a merge conflict now

@ash211
Copy link

ash211 commented Mar 2, 2017

One of the travis checks failed and it looked like a flake (couldn't find mvn?) so I retriggered it.

@ash211
Copy link

ash211 commented Mar 3, 2017

Passed this time -- merging!

@ash211 ash211 merged commit f0a40b8 into k8s-support-alternate-incremental Mar 3, 2017
@ash211 ash211 deleted the external-uri-provider branch March 3, 2017 00:22
ash211 pushed a commit that referenced this pull request Mar 8, 2017
…a annotations (#147)

* Listen for annotations that provide external URIs.

* FIx scalstyle

* Address comments

* Fix doc style

* Docs updates

* Clearly explain path rewrites
foxish pushed a commit that referenced this pull request Jul 24, 2017
…a annotations (#147)

* Listen for annotations that provide external URIs.

* FIx scalstyle

* Address comments

* Fix doc style

* Docs updates

* Clearly explain path rewrites
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019
…a annotations (apache-spark-on-k8s#147)

* Listen for annotations that provide external URIs.

* FIx scalstyle

* Address comments

* Fix doc style

* Docs updates

* Clearly explain path rewrites

(cherry picked from commit f0a40b8)
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
…a annotations (apache-spark-on-k8s#147)

* Listen for annotations that provide external URIs.

* FIx scalstyle

* Address comments

* Fix doc style

* Docs updates

* Clearly explain path rewrites
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants