-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Serve pod usage information in /health
handler
#912
Conversation
@choldgraf do you have thoughts on naming and what information to return? I am trying to think of a word to replace |
Since the docs say "Serve statistics about", how about I suggest |
Does it make sense to include this info in I'd also be happy with /stats |
I started a new endpoint because I wasn't sure how I'd integrate it with I don't think that "over quota" should mean "sick", because at least right now it is a "soft quota". I then pondered if we should have an intermediate state "so-so" for In the end I decided to punt on all that confusion and made a new endpoint so that we could prototype the actual code. Especially if someone can help resolve my confusion over how to reflect "kinda sick" in |
/health
handler
I added the pod quota information to the health handler. Right now it ignores the quota information when determining the overall health of the hub. WDYT? |
If the usage/load/stats info returned relates to health it makes sense to me to included it in the health endpoint. Or hmmm wait a second... So there is health and readiness in k8s. The health saying "not ok" means the pod needs to restart. The readiness saying no means "dont restart but also dont pass more traffic to me" i think. So, including it in a readiness endpoint may be suitable, but perhaps not in a health one? Hmmmm, im not confident about what makes sense, but it is relevant to understand the potential confusion that could arise by mismatching with how health/readiness endpoints have been used in k8s on pods |
Maybe this should be in To be super explicit the The goal of the existing |
I think this endpoint could be a good readiness endpoint for the main entrypoint to a binderhub in both kubernetes sense and a cluster federation sense. a kubernetes service would be able to target multiple binderhub entrypoint for example, and delegate traffic but exclude those that are not in ready state. Is it both about the health of the who binderhub and the readiness of the whole binderhub, or only one of these? Health example to me: critical pods in system not healthy or similar. Readiness example to me: hub is ready for more incomming traffic. Hmmm... new vs old traffic may be of relevance to consider... Yikes gtg for now |
merged this so we can get it deployed and start using it. |
This adds a new endpoint
/_usage
that returns information about the number of build and user pods currently running and their sum.It introduces a new config variable called
max_pod_capacity
that will be served to indicate the maximum capacity the cluster can handle. Currently this value isn't used to enforce an upper limit during launches, so it is more indicative.The idea is to use information from this endpoint as part of the decision to send users to a cluster from the federation redirect. We could implement a strategy that sends users to the least full cluster (
total_pods/max_pod_capacity
) or considers clusters weretotal_pods > max_pod_capacity
to be unavailable for more launches.I am not super happy with the naming. Currently it is
/_usage
and the handler isUsageHandler
but that somehow suggests that you get historical usage information as well. So maybe "load" is a better name, but then is that the load on the system or an endpoint where you can load something?I used the
_
prefix in_usage
to signal that this is somehow an "internal" endpoint.closes #874
WDYT?