Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

Use readiness probe instead of client-side ping. #75

Merged
merged 22 commits into from
Feb 9, 2017

Conversation

mccheah
Copy link

@mccheah mccheah commented Feb 2, 2017

Keep one ping just as a sanity check, but otherwise set up the readiness probe to report the container as ready only when the ping endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.

Closes #72

Keep one ping() just as a sanity check, but otherwise set up the
readiness probe to report the container as ready only when the ping
endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.
Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely prefer this over the client side ping. Main question is around changing the watch to the service vs the driver pod

@@ -127,6 +121,11 @@ private[spark] class Client(
.pods()
.withLabels(driverKubernetesSelectors)
.watch(podWatcher)) { _ =>
val probePingHttpGet = new HTTPGetActionBuilder()
.withScheme(if (driverSubmitSslOptions.enabled) "HTTPS" else "HTTP")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be all-caps? looks kinda weird to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercasing is done within the library anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does need to be capitalized, the Kubernetes API will throw an error otherwise.

@@ -162,6 +161,8 @@ private[spark] class Client(
.endEnv()
.addToEnv(sslEnvs: _*)
.withPorts(containerPorts.asJava)
.withNewReadinessProbe().withHttpGet(probePingHttpGet).endReadinessProbe()
.withNewLivenessProbe().withHttpGet(probePingHttpGet).endLivenessProbe()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need initialDelaySeconds / timeoutSeconds / periodSeconds set here?

https://kubernetes.io/docs/api-reference/v1/definitions/#_v1_probe

I'm also not sure that having both is necessary -- we only rely on the readiness probe in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The liveness could be useful to restart our server if it fails for some reason, but we would need an initial delay seconds, since otherwise, we'll have it fail early on, before the server starts listening.

@@ -127,6 +121,11 @@ private[spark] class Client(
.pods()
.withLabels(driverKubernetesSelectors)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our PodWatcher watches the driver pod to determine its liveness for uploading files to it. What we more directly depend on those is the service -- when the readiness probe goes to success and the pod gets added to the service, that seems like the more specific trigger.

Proposal: what if we change the DriverPodWatcher to watch for the service gaining a backing pod, instead of the driver pod being ready? That more directly monitors for the condition we need to submit the job.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need an extra layer of nesting to do that, since we want to delay the service creation for as long as possible. So we want to only make the service after we make the pod, so we'll need a Watch for the pod being created to trigger creating the service, and another watch for the Service having the backing pod. It's doable, just not sure how to write it in a way that makes the nesting easy to reason about.

Copy link
Author

@mccheah mccheah Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth discussing what the ramifications could be for creating the service up front and perhaps having the node ports open for the duration of the launch - in the worst case, if the launch times out. If we created the service outside of the watch it would be easier to design this where we don't have two layers of nesting and futures to listen for, but there is the aforementioned tradeoff to consider.

In the meantime - @ash211 @foxish how about we merge this PR as is and follow up on that point moving forward?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with making this PR's change be to watch on the readiness probe of the driver pod, and potentially in a subsequent PR we change that to be a watch on the driver service directly if that makes more sense.

Filed #76 as a followup for discussion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree to let this go in for now, perhaps, we could table this fine-grained approach at the next meeting.

driverSubmitSslOptions)
val ping = Retry.retry(5, 5.seconds) {
driverSubmitter.ping()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment that this ping is a final check of the service liveness before submitting the job (in addition to the k8s checks).

Might also be worth try/catching it and logging on failure that even though the k8s service is active we are unable to connect to the driver's rest service from the submitter

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(meant to click request changes)

@ash211
Copy link

ash211 commented Feb 2, 2017

@mccheah you've got merge conflicts

@mccheah
Copy link
Author

mccheah commented Feb 3, 2017

I found that this patch fails integration tests as the Client reports that it can't reach the Minikube host at the NodePort. I had thought that using the Readiness Probe would make it such that we would not have to retry the submission, as when the Pod is marked as "ready" any Services that proxy to the Pod should be available immediately over their Node Ports. This isn't necessarily the case though, and watching the Service doesn't give any indication as to whether or not a Service is "Ready" or not.

Thus I can only conclude that we still need retry logic client-side here. @foxish do you have any thoughts?

@foxish
Copy link
Member

foxish commented Feb 3, 2017

I was thinking more about this. The pod enters Ready and then the endpoints need to be setup in the service, and both of these can happen asynchronously and in any order. We can do some complex logic for watching endpoints and pod readiness, but it doesn't seem to be worth the complexity anymore. The retry loop is cleaner and a more obvious solution to this. I'd say we should stick with that.

@mccheah
Copy link
Author

mccheah commented Feb 3, 2017

Hm, I still prefer trying to get a reactive-based approach to work. The main reason is that pinging has an inherent latency in that if the first attempt doesn't work, there's a delay between the first attempt and the subsequent attempt, where if the components become ready in between the two attempts, we waste a bit of time. Of course we could reduce the time between each ping, but there's only a certain frequency where we would want to do that; that is, we wouldn't want to flood the proxy.

We could make a Watch-based approach pretty sane by making them only set signals and not actually do any complex logic. The previous iteration of this was intractable primarily because we were doing the heavy lifting in the watch.

What about this:

  • Create two watches, one for the pod and one for the endpoints
  • Create two CountDownLatches, one each for pod readiness and endpoint readiness
  • Each watch will count down each appropriate latch and has no other logic
  • Create the service and the pod immediately
  • Block on each latch in sequence
  • Submit the data

An alternate version of this that delays the endpoint creation only until after the pod is successfully launched, has the pod watch build the service. That part of the logic is fairly simple anyways.

@foxish thoughts?

@ash211
Copy link

ash211 commented Feb 4, 2017

@mccheah I think the double watch with two CountDownLatches would work well, and could even be added as a single waitForServiceReady function rather than right inside the main flow of the run method.

Given how much the retry logic is costing in my testing, I'd rather do the work to make this fully event-driven with no polling.

@foxish
Copy link
Member

foxish commented Feb 4, 2017

Let's pick this up on Monday? I will perform a couple of experiments and see how we can shorten the time taken in the retry loop and fail quicker. If the event based approach is the only way we can achieve decent latency, then we can go with the two latches as Matt suggested.

@mccheah
Copy link
Author

mccheah commented Feb 4, 2017

I'm attempting the two-latch approach and unfortunately even if the Endpoint-creation event comes in to the EndpointsWatcher, indicating the endpoint is ready, the client still can't reach the driver pod through the service immediately after. If I set a breakpoint and wait a while, the endpoint becomes reachable. I'm not sure what event I should be looking for, or perhaps the Watch's listener is being triggered prematurely?

@ash211
Copy link

ash211 commented Feb 4, 2017

@mccheah when watching for the service to be ready, what method in the fabric8 api are you checking?

@mccheah
Copy link
Author

mccheah commented Feb 4, 2017

I'll post a patch with what I have so far and we can work from there

@mccheah
Copy link
Author

mccheah commented Feb 4, 2017

This works on my machine. This has the caveat of creating the service and the NodePort endpoint a little earlier than we actually need it, but in practice I found that we needed to create the service earlier on so that it would be available by the time we try to query it. Thus the pod launch time services as a "buffer" of sorts to allow the service to become ready... but this seems severely brittle and I would like to investigate why the endpoint is being reported as ADDED when it's not open yet.

Utils.tryWithResource(kubernetesClient
.endpoints()
.withName(kubernetesAppId)
.watch(endpointsReadyWatcher)) { _ =>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we simplify the three calls to Utils.tryWithResource with a helper function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lins05 the latest patch rearranges the logic a little bit - how does it look now? I don't think extracting the three tryWithResource is the real problem, so much as having this method do both the Kubernetes component creation and the adjustment for owner references seems like a bit of overload.

I think the idea of having a tryWithResource that takes multiple Closeable factories is intriguing - the tricky part would be having the closure accept a variable-length argument list, if that makes sense.

override def eventReceived(action: Action, endpoints: Endpoints): Unit = {
if ((action == Action.ADDED) || (action == Action.MODIFIED)
&& endpoints.getSubsets.asScala.nonEmpty
&& endpoints.getSubsets.asScala.exists(_.getAddresses.asScala.nonEmpty)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service endpoints have an address, it means the pod must have been ready. This makes the pod status watcher unnecessary. Should we remove it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, we could still keep the pod watcher, such that we can tell the user the exact stage at which the submission fails (failed when creating the driver pod v.s. failed after the driver pod has been created)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like more granular logging in failures (makes debugging much easier) so agreed on keeping the pod watcher. It's not just for better logging though, I think Matt has also seen instances where a Minikube cluster would send events to Watches that showed the service as having ready endpoints even when the endpoints themselves weren't actually ready. (please correct if this telling is off)

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only remaining concern is about liveness vs readiness probes, and the liveness probe timeout being too short.

}
Utils.tryLogNonFatalError {
kubernetesClient.pods().delete(driverPod)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 6 lines ending here are almost the same as lines 140-146 -- worth extracting out to a method with service and pod parameters?

.endReadinessProbe()
.withNewLivenessProbe()
.withHttpGet(probePingHttpGet)
.withInitialDelaySeconds(DRIVER_INITIAL_LIVELINESS_CHECK_DELAY_SECONDS)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when readinessProbe and livenessProbe are both failing? does k8s wait for the pod to become ready, or immediately restart?

If this check delay is too short (e.g. a kubelet can't pull the docker image in the 10s window) then I'm worried the driver pod would never get spun up anywhere.

We might need to bump this value to the longest we can imagine a docker image pull taking, or otherwise making it configurable.

I'd probably prefer putting in just the readiness probe and no liveness probe until we get this bit figured out

override def eventReceived(action: Action, endpoints: Endpoints): Unit = {
if ((action == Action.ADDED) || (action == Action.MODIFIED)
&& endpoints.getSubsets.asScala.nonEmpty
&& endpoints.getSubsets.asScala.exists(_.getAddresses.asScala.nonEmpty)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like more granular logging in failures (makes debugging much easier) so agreed on keeping the pod watcher. It's not just for better logging though, I think Matt has also seen instances where a Minikube cluster would send events to Watches that showed the service as having ready endpoints even when the endpoints themselves weren't actually ready. (please correct if this telling is off)

@mccheah
Copy link
Author

mccheah commented Feb 8, 2017

@aash @foxish @lins05 I adjusted the implementation. The overall approach is now:

  • Watch on all 3 components - endpoint, service, and pod
  • MultiServerFeignTarget retries each server with a pause between each attempt

@@ -67,4 +67,5 @@ package object constants {
// Miscellaneous
private[spark] val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver"
private[spark] val KUBERNETES_SUBMIT_SSL_NAMESPACE = "kubernetes.submit"
private[spark] val DRIVER_INITIAL_LIVELINESS_CHECK_DELAY_SECONDS = 10
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use this anymore?

@@ -33,6 +33,7 @@ private[spark] object HttpClientUtil {

def createClient[T: ClassTag](
uris: Array[String],
maxRetriesPerServer: Int = 1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this param need to get passed through to the MultiServerFeignTarget constructor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -33,6 +33,7 @@ private[spark] object HttpClientUtil {

def createClient[T: ClassTag](
uris: Array[String],
maxRetriesPerServer: Int = 1,
sslSocketFactory: SSLSocketFactory = SSLContext.getDefault.getSocketFactory,
trustContext: X509TrustManager = null,
readTimeoutMillis: Int = 20000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a cute trick recently for status checking we could use below. Instead of:

if (response.status() >= 200 && response.status() < 300) {

use

if (response.status() / 100 == 2) {

if (threadLocalCurrentAttempt.get < maxRetriesPerServer) {
logWarning(s"Attempt $currentAttempt of $maxRetriesPerServer failed for" +
s" server ${url()}. Retrying request...", e)
Thread.sleep(delayBetweenRetriesMillis)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the feign API expect Retryers to sleep in this method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather - everything in Feign is synchronous so there would be no other realistic way to add the delay except for here (unless of course we wanted the retry loop in the caller - but trying to abstract that away with this in the first place)

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- will merge when build is green

@ash211
Copy link

ash211 commented Feb 8, 2017

Hold off a second -- debugging something..

@ash211
Copy link

ash211 commented Feb 8, 2017

  • takes 20sec to fail over from one attempt to the next -- I think we need to lower the initial connect timeout if we intend this to be a fail-fast:
2017-02-08 17:04:08 WARN  MultiServerFeignTarget:87 - Attempt 7 of 10 failed for server http://10.0.20.108:32113. Retrying request...
feign.RetryableException: connect timed out executing POST http://10.0.20.108:32113/v1/submissions/create
	at feign.FeignException.errorExecuting(FeignException.java:67)
	at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:102)
	at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76)
	at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:103)
	at com.sun.proxy.$Proxy31.submitApplication(Unknown Source)
	at org.apache.spark.deploy.kubernetes.Client.org$apache$spark$deploy$kubernetes$Client$$submitApplicationToDriverServer(Client.scala:213)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$6.apply(Client.scala:145)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6$$anonfun$apply$6.apply(Client.scala:121)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:121)
	at org.apache.spark.deploy.kubernetes.Client$$anonfun$run$6.apply(Client.scala:99)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2530)
	at org.apache.spark.deploy.kubernetes.Client.run(Client.scala:99)
	at org.apache.spark.deploy.kubernetes.Client$.main(Client.scala:823)
	at org.apache.spark.deploy.kubernetes.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:117)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:124)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:187)
	at okhttp3.internal.connection.RealConnection.buildConnection(RealConnection.java:173)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:114)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:193)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:129)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:98)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:109)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:124)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:170)
	at okhttp3.RealCall.execute(RealCall.java:60)
	at feign.okhttp.OkHttpClient.execute(OkHttpClient.java:153)
	at org.apache.spark.deploy.rest.kubernetes.HttpClientUtil$$anon$1.execute(HttpClientUtil.scala:53)
	at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97)
	... 22 more
2017-02-08 17:04:08 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:09 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:10 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:11 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:12 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:13 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:14 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:15 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:16 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:17 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:18 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:19 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:20 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:21 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:22 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:23 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:24 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:25 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:26 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:27 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:28 INFO  LoggingPodStatusWatcher:54 - Application status for org-apache-spark-examples-sparkpi-1486591289178 (phase: Running)
2017-02-08 17:04:29 WARN  MultiServerFeignTarget:87 - Attempt 8 of 10 failed for server http://10.0.20.108:32113. Retrying request...
feign.RetryableException: connect timed out executing POST http://10.0.20.108:32113/v1/submissions/create
  • after 10x of failing to hit that (20sec each) it fails over to ... the same URL!
2017-02-08 17:05:11 WARN  MultiServerFeignTarget:87 - Failed request to http://10.0.20.108:32113 10 times. Trying to access http://10.0.20.108:32113 instead.

Need to dedupe the list that we shuffle through

  • after 10x of 20sec each of that failing it fails over to the hostname, also of the master (these are all to my master node which is not marked Unscheduleable which is a kubeadm default)

  • after 10x of 20sec each of failing on the master's hostname, it fails over to a different hostname (one of the workers) which then finally works

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • need to dedupe the list
  • need to shuffle the list better as my master got picked first (it's first alphabetically) which would be uncommon
  • need to reduce initial connect timeout to more like 2sec from 20sec

@mccheah
Copy link
Author

mccheah commented Feb 8, 2017

We probably need #90 to select only the external IPs - can follow up separately there.

@ash211
Copy link

ash211 commented Feb 8, 2017

Much better! I think we should drop the retries down from 10 to 3 though. At the point where we're retrying, we've already gotten a Watch event back from k8s that things are all good to go (including service readiness) so I think 10 attempts is overkill and takes quite a while.

@ash211
Copy link

ash211 commented Feb 9, 2017

This is performing well in tests now -- failing over is working properly, and it happens reasonably quickly.

Still need to do more proactive filtering of the candidate host list for failover:

But I think we can do that in a followup PR

@ash211 ash211 merged commit b1466e6 into k8s-support-alternate-incremental Feb 9, 2017
@ash211 ash211 deleted the readiness-probe branch February 9, 2017 01:59
ash211 pushed a commit that referenced this pull request Mar 8, 2017
* Use readiness probe instead of client-side ping.

Keep one ping() just as a sanity check, but otherwise set up the
readiness probe to report the container as ready only when the ping
endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.

* Extract common HTTP get action

* Remove some code

* Add delay to liveliness check

* Fix merge conflicts.

* Fix more merge conflicts

* Fix more merge conflicts

* Revamp readiness check logic

* Add addresses ready condition to endpoints watch

* Rearrange the logic some more.

* Remove liveness probe, retry against servers

* Fix compiler error

* Fix another compiler error

* Delay between retries. Remove unintended test modification

* FIx another compiler error

* Extract method

* Address comments

* Deduplicate node addresses, use lower initial connect timeout

* Drop maxRetriesPerServer from 10 to 3
foxish pushed a commit that referenced this pull request Jul 24, 2017
* Use readiness probe instead of client-side ping.

Keep one ping() just as a sanity check, but otherwise set up the
readiness probe to report the container as ready only when the ping
endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.

* Extract common HTTP get action

* Remove some code

* Add delay to liveliness check

* Fix merge conflicts.

* Fix more merge conflicts

* Fix more merge conflicts

* Revamp readiness check logic

* Add addresses ready condition to endpoints watch

* Rearrange the logic some more.

* Remove liveness probe, retry against servers

* Fix compiler error

* Fix another compiler error

* Delay between retries. Remove unintended test modification

* FIx another compiler error

* Extract method

* Address comments

* Deduplicate node addresses, use lower initial connect timeout

* Drop maxRetriesPerServer from 10 to 3
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019
)

* Use readiness probe instead of client-side ping.

Keep one ping() just as a sanity check, but otherwise set up the
readiness probe to report the container as ready only when the ping
endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.

* Extract common HTTP get action

* Remove some code

* Add delay to liveliness check

* Fix merge conflicts.

* Fix more merge conflicts

* Fix more merge conflicts

* Revamp readiness check logic

* Add addresses ready condition to endpoints watch

* Rearrange the logic some more.

* Remove liveness probe, retry against servers

* Fix compiler error

* Fix another compiler error

* Delay between retries. Remove unintended test modification

* FIx another compiler error

* Extract method

* Address comments

* Deduplicate node addresses, use lower initial connect timeout

* Drop maxRetriesPerServer from 10 to 3
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
)

* Use readiness probe instead of client-side ping.

Keep one ping() just as a sanity check, but otherwise set up the
readiness probe to report the container as ready only when the ping
endpoint can be reached.

Also add a liveliness probe for convenience and symmetry.

* Extract common HTTP get action

* Remove some code

* Add delay to liveliness check

* Fix merge conflicts.

* Fix more merge conflicts

* Fix more merge conflicts

* Revamp readiness check logic

* Add addresses ready condition to endpoints watch

* Rearrange the logic some more.

* Remove liveness probe, retry against servers

* Fix compiler error

* Fix another compiler error

* Delay between retries. Remove unintended test modification

* FIx another compiler error

* Extract method

* Address comments

* Deduplicate node addresses, use lower initial connect timeout

* Drop maxRetriesPerServer from 10 to 3
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants