Use readiness probe instead of client-side ping. #75

mccheah · 2017-02-02T02:29:49Z

Keep one ping just as a sanity check, but otherwise set up the readiness probe to report the container as ready only when the ping endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.

Closes #72

Keep one ping() just as a sanity check, but otherwise set up the readiness probe to report the container as ready only when the ping endpoint can be reached. Also add a liveliness probe for convenience and symmetry.

ash211

Definitely prefer this over the client side ping. Main question is around changing the watch to the service vs the driver pod

ash211 · 2017-02-02T17:21:06Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

@@ -127,6 +121,11 @@ private[spark] class Client(
            .pods()
            .withLabels(driverKubernetesSelectors)
            .watch(podWatcher)) { _ =>
+          val probePingHttpGet = new HTTPGetActionBuilder()
+            .withScheme(if (driverSubmitSslOptions.enabled) "HTTPS" else "HTTP")


these need to be all-caps? looks kinda weird to me

Uppercasing is done within the library anyway.

It does need to be capitalized, the Kubernetes API will throw an error otherwise.

ash211 · 2017-02-02T17:31:00Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

@@ -162,6 +161,8 @@ private[spark] class Client(
                  .endEnv()
                .addToEnv(sslEnvs: _*)
                .withPorts(containerPorts.asJava)
+                .withNewReadinessProbe().withHttpGet(probePingHttpGet).endReadinessProbe()
+                .withNewLivenessProbe().withHttpGet(probePingHttpGet).endLivenessProbe()


do we need initialDelaySeconds / timeoutSeconds / periodSeconds set here?

https://kubernetes.io/docs/api-reference/v1/definitions/#_v1_probe

I'm also not sure that having both is necessary -- we only rely on the readiness probe in this PR?

The liveness could be useful to restart our server if it fails for some reason, but we would need an initial delay seconds, since otherwise, we'll have it fail early on, before the server starts listening.

ash211 · 2017-02-02T17:35:53Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

@@ -127,6 +121,11 @@ private[spark] class Client(
            .pods()
            .withLabels(driverKubernetesSelectors)


Currently our PodWatcher watches the driver pod to determine its liveness for uploading files to it. What we more directly depend on those is the service -- when the readiness probe goes to success and the pod gets added to the service, that seems like the more specific trigger.

Proposal: what if we change the DriverPodWatcher to watch for the service gaining a backing pod, instead of the driver pod being ready? That more directly monitors for the condition we need to submit the job.

We would need an extra layer of nesting to do that, since we want to delay the service creation for as long as possible. So we want to only make the service after we make the pod, so we'll need a Watch for the pod being created to trigger creating the service, and another watch for the Service having the backing pod. It's doable, just not sure how to write it in a way that makes the nesting easy to reason about.

It's worth discussing what the ramifications could be for creating the service up front and perhaps having the node ports open for the duration of the launch - in the worst case, if the launch times out. If we created the service outside of the watch it would be easier to design this where we don't have two layers of nesting and futures to listen for, but there is the aforementioned tradeoff to consider.

In the meantime - @ash211 @foxish how about we merge this PR as is and follow up on that point moving forward?

I'm good with making this PR's change be to watch on the readiness probe of the driver pod, and potentially in a subsequent PR we change that to be a watch on the driver service directly if that makes more sense.

Filed #76 as a followup for discussion

I also agree to let this go in for now, perhaps, we could table this fine-grained approach at the next meeting.

ash211 · 2017-02-02T17:38:28Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

                    driverSubmitSslOptions)
-                val ping = Retry.retry(5, 5.seconds) {
                  driverSubmitter.ping()


comment that this ping is a final check of the service liveness before submitting the job (in addition to the k8s checks).

Might also be worth try/catching it and logging on failure that even though the k8s service is active we are unable to connect to the driver's rest service from the submitter

ash211

(meant to click request changes)

ash211 · 2017-02-02T20:43:23Z

@mccheah you've got merge conflicts

…te-incremental' into readiness-probe

mccheah · 2017-02-03T01:56:17Z

I found that this patch fails integration tests as the Client reports that it can't reach the Minikube host at the NodePort. I had thought that using the Readiness Probe would make it such that we would not have to retry the submission, as when the Pod is marked as "ready" any Services that proxy to the Pod should be available immediately over their Node Ports. This isn't necessarily the case though, and watching the Service doesn't give any indication as to whether or not a Service is "Ready" or not.

Thus I can only conclude that we still need retry logic client-side here. @foxish do you have any thoughts?

…te-incremental' into readiness-probe

foxish · 2017-02-03T19:08:24Z

I was thinking more about this. The pod enters Ready and then the endpoints need to be setup in the service, and both of these can happen asynchronously and in any order. We can do some complex logic for watching endpoints and pod readiness, but it doesn't seem to be worth the complexity anymore. The retry loop is cleaner and a more obvious solution to this. I'd say we should stick with that.

mccheah · 2017-02-03T19:28:04Z

Hm, I still prefer trying to get a reactive-based approach to work. The main reason is that pinging has an inherent latency in that if the first attempt doesn't work, there's a delay between the first attempt and the subsequent attempt, where if the components become ready in between the two attempts, we waste a bit of time. Of course we could reduce the time between each ping, but there's only a certain frequency where we would want to do that; that is, we wouldn't want to flood the proxy.

We could make a Watch-based approach pretty sane by making them only set signals and not actually do any complex logic. The previous iteration of this was intractable primarily because we were doing the heavy lifting in the watch.

What about this:

Create two watches, one for the pod and one for the endpoints
Create two CountDownLatches, one each for pod readiness and endpoint readiness
Each watch will count down each appropriate latch and has no other logic
Create the service and the pod immediately
Block on each latch in sequence
Submit the data

An alternate version of this that delays the endpoint creation only until after the pod is successfully launched, has the pod watch build the service. That part of the logic is fairly simple anyways.

@foxish thoughts?

ash211 · 2017-02-04T00:26:48Z

@mccheah I think the double watch with two CountDownLatches would work well, and could even be added as a single waitForServiceReady function rather than right inside the main flow of the run method.

Given how much the retry logic is costing in my testing, I'd rather do the work to make this fully event-driven with no polling.

foxish · 2017-02-04T00:35:42Z

Let's pick this up on Monday? I will perform a couple of experiments and see how we can shorten the time taken in the retry loop and fail quicker. If the event based approach is the only way we can achieve decent latency, then we can go with the two latches as Matt suggested.

mccheah · 2017-02-04T00:41:50Z

I'm attempting the two-latch approach and unfortunately even if the Endpoint-creation event comes in to the EndpointsWatcher, indicating the endpoint is ready, the client still can't reach the driver pod through the service immediately after. If I set a breakpoint and wait a while, the endpoint becomes reachable. I'm not sure what event I should be looking for, or perhaps the Watch's listener is being triggered prematurely?

ash211 · 2017-02-04T01:00:49Z

@mccheah when watching for the service to be ready, what method in the fabric8 api are you checking?

mccheah · 2017-02-04T01:03:46Z

I'll post a patch with what I have so far and we can work from there

…te-incremental' into readiness-probe

mccheah · 2017-02-04T01:49:03Z

This works on my machine. This has the caveat of creating the service and the NodePort endpoint a little earlier than we actually need it, but in practice I found that we needed to create the service earlier on so that it would be available by the time we try to query it. Thus the pod launch time services as a "buffer" of sorts to allow the service to become ready... but this seems severely brittle and I would like to investigate why the endpoint is being reported as ADDED when it's not open yet.

lins05 · 2017-02-04T16:23:32Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+        Utils.tryWithResource(kubernetesClient
+            .endpoints()
+            .withName(kubernetesAppId)
+            .watch(endpointsReadyWatcher)) { _ =>


nit: Can we simplify the three calls to Utils.tryWithResource with a helper function?

@lins05 the latest patch rearranges the logic a little bit - how does it look now? I don't think extracting the three tryWithResource is the real problem, so much as having this method do both the Kubernetes component creation and the adjustment for owner references seems like a bit of overload.

I think the idea of having a tryWithResource that takes multiple Closeable factories is intriguing - the tricky part would be having the closure accept a variable-length argument list, if that makes sense.

lins05 · 2017-02-04T16:29:38Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+    override def eventReceived(action: Action, endpoints: Endpoints): Unit = {
+      if ((action == Action.ADDED) || (action == Action.MODIFIED)
+          && endpoints.getSubsets.asScala.nonEmpty
+          && endpoints.getSubsets.asScala.exists(_.getAddresses.asScala.nonEmpty)


If the service endpoints have an address, it means the pod must have been ready. This makes the pod status watcher unnecessary. Should we remove it?

On a second thought, we could still keep the pod watcher, such that we can tell the user the exact stage at which the submission fails (failed when creating the driver pod v.s. failed after the driver pod has been created)

I like more granular logging in failures (makes debugging much easier) so agreed on keeping the pod watcher. It's not just for better logging though, I think Matt has also seen instances where a Minikube cluster would send events to Watches that showed the service as having ready endpoints even when the endpoints themselves weren't actually ready. (please correct if this telling is off)

ash211

My only remaining concern is about liveness vs readiness probes, and the liveness probe timeout being too short.

ash211 · 2017-02-07T23:57:15Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+                }
+                Utils.tryLogNonFatalError {
+                  kubernetesClient.pods().delete(driverPod)
+                }


the 6 lines ending here are almost the same as lines 140-146 -- worth extracting out to a method with service and pod parameters?

ash211 · 2017-02-08T00:40:56Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+            .endReadinessProbe()
+          .withNewLivenessProbe()
+            .withHttpGet(probePingHttpGet)
+            .withInitialDelaySeconds(DRIVER_INITIAL_LIVELINESS_CHECK_DELAY_SECONDS)


what happens when readinessProbe and livenessProbe are both failing? does k8s wait for the pod to become ready, or immediately restart?

If this check delay is too short (e.g. a kubelet can't pull the docker image in the 10s window) then I'm worried the driver pod would never get spun up anywhere.

We might need to bump this value to the longest we can imagine a docker image pull taking, or otherwise making it configurable.

I'd probably prefer putting in just the readiness probe and no liveness probe until we get this bit figured out

ash211 · 2017-02-08T00:50:45Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+    override def eventReceived(action: Action, endpoints: Endpoints): Unit = {
+      if ((action == Action.ADDED) || (action == Action.MODIFIED)
+          && endpoints.getSubsets.asScala.nonEmpty
+          && endpoints.getSubsets.asScala.exists(_.getAddresses.asScala.nonEmpty)


I like more granular logging in failures (makes debugging much easier) so agreed on keeping the pod watcher. It's not just for better logging though, I think Matt has also seen instances where a Minikube cluster would send events to Watches that showed the service as having ready endpoints even when the endpoints themselves weren't actually ready. (please correct if this telling is off)

mccheah · 2017-02-08T19:40:39Z

@aash @foxish @lins05 I adjusted the implementation. The overall approach is now:

Watch on all 3 components - endpoint, service, and pod
MultiServerFeignTarget retries each server with a pause between each attempt

ash211 · 2017-02-08T20:51:36Z

...e-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/constants.scala

@@ -67,4 +67,5 @@ package object constants {
  // Miscellaneous
  private[spark] val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver"
  private[spark] val KUBERNETES_SUBMIT_SSL_NAMESPACE = "kubernetes.submit"
+  private[spark] val DRIVER_INITIAL_LIVELINESS_CHECK_DELAY_SECONDS = 10


don't use this anymore?

ash211 · 2017-02-08T20:52:53Z

.../kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/HttpClientUtil.scala

@@ -33,6 +33,7 @@ private[spark] object HttpClientUtil {

  def createClient[T: ClassTag](
      uris: Array[String],
+      maxRetriesPerServer: Int = 1,


does this param need to get passed through to the MultiServerFeignTarget constructor?

ash211 · 2017-02-08T20:54:59Z

.../kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/HttpClientUtil.scala

@@ -33,6 +33,7 @@ private[spark] object HttpClientUtil {

  def createClient[T: ClassTag](
      uris: Array[String],
+      maxRetriesPerServer: Int = 1,
      sslSocketFactory: SSLSocketFactory = SSLContext.getDefault.getSocketFactory,
      trustContext: X509TrustManager = null,
      readTimeoutMillis: Int = 20000,


I saw a cute trick recently for status checking we could use below. Instead of:

if (response.status() >= 200 && response.status() < 300) {

use

if (response.status() / 100 == 2) {

ash211 · 2017-02-08T20:59:34Z

...tes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/MultiServerFeignTarget.scala

+    if (threadLocalCurrentAttempt.get < maxRetriesPerServer) {
+      logWarning(s"Attempt $currentAttempt of $maxRetriesPerServer failed for" +
+        s" server ${url()}. Retrying request...", e)
+      Thread.sleep(delayBetweenRetriesMillis)


does the feign API expect Retryers to sleep in this method?

Or rather - everything in Feign is synchronous so there would be no other realistic way to add the delay except for here (unless of course we wanted the retry loop in the caller - but trying to abstract that away with this in the first place)

ash211

LGTM -- will merge when build is green

ash211 · 2017-02-08T22:30:26Z

Hold off a second -- debugging something..

ash211 · 2017-02-08T22:39:46Z

takes 20sec to fail over from one attempt to the next -- I think we need to lower the initial connect timeout if we intend this to be a fail-fast:

2017-02-08 17:04:08 WARN  MultiServerFeignTarget:87 - Attempt 7 of 10 failed for server http://10.0.20.108:32113. Retrying request...
feign.RetryableException: connect timed out executing POST http://10.0.20.108:32113/v1/submissions/create
	at feign.FeignException.errorExecuting(FeignException.java:67)
	at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:102)
	at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76)
	at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:103)
	at com.sun.proxy.$Proxy31.submitApplication(Unknown Source)
	at org.apache.spark.deploy.kubernetes.Client.org$apache$spark$deploy$kubernetes$Client$$submitApplicationToDriverServer(Client.scala:213)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$6.apply(Client.scala:145)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$6.apply(Client.scala:121)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:121)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:99)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client.run(Client.scala:99)
	at org.apache.spark.deploy.kubernetes.Client$.main(Client.scala:823)
	at org.apache.spark.deploy.kubernetes.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:117)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:187)
	at okhttp3.internal.connection.RealConnection.buildConnection(RealConnection.java:173)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:114)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:193)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:129)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:98)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:109)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:124)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:170)
	at okhttp3.RealCall.execute(RealCall.java:60)
	at feign.okhttp.OkHttpClient.execute(OkHttpClient.java:153)
	at org.apache.spark.deploy.rest.kubernetes.HttpClientUtil$$anon$1.execute(HttpClientUtil.scala:53)
	at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97)
	... 22 more
2017-02-08 17:04:08 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:09 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:10 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:11 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:12 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:13 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:14 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:15 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:16 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:17 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:18 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:19 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:20 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:21 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:22 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:23 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:24 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:25 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:26 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:27 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:28 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:29 WARN  MultiServerFeignTarget:87 - Attempt 8 of 10 failed for server http://10.0.20.108:32113. Retrying request...
feign.RetryableException: connect timed out executing POST http://10.0.20.108:32113/v1/submissions/create

after 10x of failing to hit that (20sec each) it fails over to ... the same URL!

2017-02-08 17:05:11 WARN  MultiServerFeignTarget:87 - Failed request to http://10.0.20.108:32113 10 times. Trying to access http://10.0.20.108:32113 instead.

Need to dedupe the list that we shuffle through

after 10x of 20sec each of that failing it fails over to the hostname, also of the master (these are all to my master node which is not marked Unscheduleable which is a kubeadm default)
after 10x of 20sec each of failing on the master's hostname, it fails over to a different hostname (one of the workers) which then finally works

ash211

need to dedupe the list
need to shuffle the list better as my master got picked first (it's first alphabetically) which would be uncommon
need to reduce initial connect timeout to more like 2sec from 20sec

mccheah · 2017-02-08T22:43:33Z

We probably need #90 to select only the external IPs - can follow up separately there.

ash211 · 2017-02-08T23:05:52Z

Much better! I think we should drop the retries down from 10 to 3 though. At the point where we're retrying, we've already gotten a Watch event back from k8s that things are all good to go (including service readiness) so I think 10 attempts is overkill and takes quite a while.

ash211 · 2017-02-09T01:59:32Z

This is performing well in tests now -- failing over is working properly, and it happens reasonably quickly.

Still need to do more proactive filtering of the candidate host list for failover:

unscheduleable nodes (aka those not able to function as the node in a NodePort service) (Filter out unschedulable nodes for the driver endpoint more rigorously #73)
internal IPs (Narrow down the list of Node IPs to external interfaces only #90)

But I think we can do that in a followup PR

* Use readiness probe instead of client-side ping. Keep one ping() just as a sanity check, but otherwise set up the readiness probe to report the container as ready only when the ping endpoint can be reached. Also add a liveliness probe for convenience and symmetry. * Extract common HTTP get action * Remove some code * Add delay to liveliness check * Fix merge conflicts. * Fix more merge conflicts * Fix more merge conflicts * Revamp readiness check logic * Add addresses ready condition to endpoints watch * Rearrange the logic some more. * Remove liveness probe, retry against servers * Fix compiler error * Fix another compiler error * Delay between retries. Remove unintended test modification * FIx another compiler error * Extract method * Address comments * Deduplicate node addresses, use lower initial connect timeout * Drop maxRetriesPerServer from 10 to 3

) * Use readiness probe instead of client-side ping. Keep one ping() just as a sanity check, but otherwise set up the readiness probe to report the container as ready only when the ping endpoint can be reached. Also add a liveliness probe for convenience and symmetry. * Extract common HTTP get action * Remove some code * Add delay to liveliness check * Fix merge conflicts. * Fix more merge conflicts * Fix more merge conflicts * Revamp readiness check logic * Add addresses ready condition to endpoints watch * Rearrange the logic some more. * Remove liveness probe, retry against servers * Fix compiler error * Fix another compiler error * Delay between retries. Remove unintended test modification * FIx another compiler error * Extract method * Address comments * Deduplicate node addresses, use lower initial connect timeout * Drop maxRetriesPerServer from 10 to 3

mccheah added 3 commits February 1, 2017 18:29

Use readiness probe instead of client-side ping.

c25c86c

Keep one ping() just as a sanity check, but otherwise set up the readiness probe to report the container as ready only when the ping endpoint can be reached. Also add a liveliness probe for convenience and symmetry.

Extract common HTTP get action

e231105

Remove some code

0ad223f

ash211 approved these changes Feb 2, 2017

View reviewed changes

ash211 suggested changes Feb 2, 2017

View reviewed changes

ash211 mentioned this pull request Feb 2, 2017

Retry the submit-application request to multiple nodes #69

Merged

Add delay to liveliness check

6699072

ash211 mentioned this pull request Feb 2, 2017

Watch for service readiness instead of pod readiness #76

Closed

mccheah added 2 commits February 2, 2017 17:11

Merge remote-tracking branch 'apache-spark-on-k8s/k8s-support-alterna…

06bc81e

…te-incremental' into readiness-probe

Fix merge conflicts.

ef8179a

mccheah added 3 commits February 2, 2017 17:58

Merge remote-tracking branch 'apache-spark-on-k8s/k8s-support-alterna…

7f0a46e

…te-incremental' into readiness-probe

Fix more merge conflicts

deb6386

Fix more merge conflicts

afb2a08

mccheah added 2 commits February 3, 2017 17:43

Revamp readiness check logic

dcc9c53

Merge remote-tracking branch 'apache-spark-on-k8s/k8s-support-alterna…

fcc3919

…te-incremental' into readiness-probe

Add addresses ready condition to endpoints watch

606accb

lins05 reviewed Feb 4, 2017

View reviewed changes

Rearrange the logic some more.

ec990b2

ash211 reviewed Feb 8, 2017

View reviewed changes

Remove liveness probe, retry against servers

977ca42

mccheah added 5 commits February 8, 2017 11:29

Fix compiler error

7be274a

Fix another compiler error

d243dbf

Delay between retries. Remove unintended test modification

bfb71c7

FIx another compiler error

a95a2d6

Extract method

6b02756

ash211 reviewed Feb 8, 2017

View reviewed changes

Address comments

9f11ad1

ash211 approved these changes Feb 8, 2017

View reviewed changes

ash211 suggested changes Feb 8, 2017

View reviewed changes

Deduplicate node addresses, use lower initial connect timeout

ce6bb03

Drop maxRetriesPerServer from 10 to 3

e929b50

ash211 merged commit b1466e6 into k8s-support-alternate-incremental Feb 9, 2017

ash211 deleted the readiness-probe branch February 9, 2017 01:59

		@@ -127,6 +121,11 @@ private[spark] class Client(
		.pods()
		.withLabels(driverKubernetesSelectors)

Use readiness probe instead of client-side ping. #75

Use readiness probe instead of client-side ping. #75

Conversation

mccheah commented Feb 2, 2017 • edited Loading

ash211 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah Feb 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ash211 left a comment

Choose a reason for hiding this comment

ash211 commented Feb 2, 2017

mccheah commented Feb 3, 2017

foxish commented Feb 3, 2017 • edited Loading

mccheah commented Feb 3, 2017

ash211 commented Feb 4, 2017

foxish commented Feb 4, 2017

mccheah commented Feb 4, 2017

ash211 commented Feb 4, 2017

mccheah commented Feb 4, 2017

mccheah commented Feb 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ash211 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah commented Feb 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ash211 left a comment

Choose a reason for hiding this comment

ash211 commented Feb 8, 2017

ash211 commented Feb 8, 2017

ash211 left a comment

Choose a reason for hiding this comment

mccheah commented Feb 8, 2017

ash211 commented Feb 8, 2017

ash211 commented Feb 9, 2017

mccheah commented Feb 2, 2017 •

edited

Loading

mccheah Feb 2, 2017 •

edited

Loading

foxish commented Feb 3, 2017 •

edited

Loading