-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: get information about global resource availability #222
Comments
@lorenzo-cavazzi can we clarify what the |
The initial idea was to get information about the availability of the resources to prevent stalling when starting a new environment requiring currently unavailable resources (e.g. when asking for 1 gpu and no gpus are available). A few things changed in the meanwhile, and I guess it's not easy to have something like this in our environment -- with different nodes, it may not be obvious where the pod for a new interactive environment ends up. A better solution could be to fail the lunch with a reasonable error message when no resources are available. That seems easier to implement and the UX won't be much worse than the original proposal. |
I believe it should be possible to query the kubernetes API about resources and provide constraints on the types of nodes you want to consider. imho preventing a launch is better than recovering from a failed launch |
Ok, then for the UI the idea would be to get extra information about the current availability of the resources. If that is feasible, I can quickly formalize the endpoint proposal by sketching it in SwaggerHub. |
Apparently this is not easily solvable (see this discussion) but some tools exist: https://github.com/davidB/kubectl-view-allocations |
I am sorry for being late to this. My github notifications are not setup right so I get way too many emails about random things and I miss things where I am needed. @rokroskar that is a very good link you posted. In there I found a command like this, which will give you the cpu requested by every pod in a namespace (or all namespaces).
So if we use this command on the backed and combine it with something like Some other questions (that I think can be resolved) if we pursue this are:
|
Sometimes users end up with notebooks in a pending state without knowing that there aren't enough resources available in the cluster. This is especially true for GPUs that are very limited in some deployments.
We could get this information from kubernetes and use it for one (or both) of the following:
/stats
APIThe text was updated successfully, but these errors were encountered: