-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Resource Naming #2255
Comments
Since an admin has a curated list of instance types one wants to expose and peer-pods are heavily tied to the instance type we could expose
If we do not care about the GPU type and just need a any instance type we need a common name and then peer-pods could allocate any GPU instance CSP GPU == cgpu ?
This way we would have distinct names for Traditional Container: |
A bit off-topic but how these resources are advertised on a node and mapped to a podvm? a device plugin with CDI devices that are podvm specific? |
Since we're in peer-pod land, I will answer this question in this context. I talked to @bpradipt, who told me that one (admin, operator) will usually have a curated list of VM instance types that can be used in a specific cluster. We can create NFD rules or a device plugin (which is unnecessary since it can only add ENV or mounts to the container; we cannot add annotations depending on the request) to expose this list as an extended resource. Since we added CDI support in the Kata agent, what we can now do is the following: Pod requests
The kata-agent will read this annotation, and the corresponding CDI device will be injected. If we need multiple GPUs
For the instance type, we have another annotation which is not related to CDI but would need to be GPU instance obviously. If we use a CPU instance type and have added the CDI annotations, kata-agent will fail since we cannot create the CDI specs for GPUs and timeout. |
@zvonkok In my understanding, Cloud API Adaptor, which sits between the container runtime for remote hypervisor and kata agent (and resides outside of a pod VM), currently handles the GPU resource request annotations to determine an appropriate instance type. Do you suggest kata agent can handle this annotation by using a CDI spec, inside of a pod VM? |
Currently we have the following mechanism for using GPU with peer-pods User provides the following pod manifest (same like regular Kata or runc, except the runtimeClass changes)
The webhook mutates the pod manifest to something like this (note the removal of resources and addition of annotations)
Then CAA finds out the suitable gpu instance type from the pre-configured instance type list and creates the VM and runs the pod. Another alternate mechanism is to simply use a pod manifest specific for peer-pods, like the following (note the machine_type annotation to select the specific GPU instance type)
Now with CDI, we can start with the most basic implementation, like the manifest below:
There are two places we can add the CDI annotation. Either in the webhook or in CAA.
I think we would want the actual manifest to be with the proper cdi annotation added indicating number of pgpus. That's not possible with webhook today. CAA already has this info so should be able to modify the oci spec to add it.
Does this make sense ? |
IIUC eventually we'll need to have some sort of translation between the instance size and a matching CDI annotation (type), no? which cannot be done ATM in the webhook AFAIU. Having said that, starting with attaching a default CDI annotation in the webhook/caa according to the gpu request looks like a good option to me (assuming i understand the workflow right). |
Yes. That's my understanding
Is there anything needed on the pod VM side or the CDI annotation in the spec is enough ? |
AFAIU the agent CDI related bits are all in place, podvm needs to have the CDI specification in place and that's it (i've been experimenting with the injunction in the caa and it worked) |
Actually adding the CDI annotation it in the webhook (or manually) will fail ATM as the (go) shim cannot add the specified CDI device (should it simply pass the annotation and do nothing else when it's remote hypervisor? IDK) |
I believe the idea is that kata-agent knows about the CDI devices and writes the Would peer-pods simply work if the |
For the Kata BM use-case, we have VFIOs advertised as
nvidia.com/pgpu: 1
, we cannot usenvidia.com/gpu: 1
for the peer-pods use-case since this is reserved for GPUs that are using traditional container runtimes and will clash in a cluster where nodes are running GPUs without Kata/PeerPods and nodes running with Kata/PeerPodsWe need to come up with a new naming scheme that we use for peer-pods.
In the bare-metal use-case we have e.g. also the SKU name exposed in the cluster
nvidia.com/GH100_H800: 8
.The text was updated successfully, but these errors were encountered: