-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating pod scheduling #56
Conversation
From the output you pasted I think k8s is happy because there are zero instances desired and zero instances running. This means "all is good". So the question is: why does it think that zero instances are desired. Do the nodes you want the daemonset to schedule on have a |
Hmmm if the daemonset does not desire being scheduled, it seems like they dont tolerate to schedule on any node it considers viable at all. Viable nodes are those that it has nodeSelector and affinity.nodeAffinity.requiredDuring... for |
Unfortunately, I'm not going to have much time to work on this today but here's a summary of the
AFAICT, the labels and taints look correct. Next, is an example build pod spec:
Here's the summary of the DinD Daemonset:
Finally, I turned DinD off in our production cluster and things are mostly working for now but this isn't a long term solution. |
In looking at the daemonset config, I realized the tolerations are not list, this has me thinking jupyterhub/binderhub#856 may not have been successful. I have an idea how to fix this and will open up a PR shortly. |
jupyterhub/binderhub#857 was the key. All seems well here. Thanks @betatim and @consideRatio for chiming in! |
I've been working today on getting the final pieces of the k8s scheduling implemented in our binder deployment. This has resulted in a few merged PRs upstream (jupyterhub/binderhub#853, jupyterhub/binderhub#856). I've deployed these changes manually and I'm opening this PR to ask for some help debugging a few things.
Problem 1: DinD DaemonSet is not scheduling
I'm trying to get the DinD DaemonSet to tolerate the
hub.jupyter.org_dedicated=user:NO_SCHEDULE
taint that is set on theuser-pool
(jupyterhub/binderhub#856). However, new DinD pods are not coming up when a new user-pool node comes live. Then build tasks fail because they can't mount the docker socket.Asking for help from @yuvipanda @consideRatio @betatim