-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod cannot detect Arc A380 GPU #1798
Comments
Hi @SeanOMik, weird. I've seen issues with access rights where the render device cannot be accessed. This can be fixed with a securityContext addition:
But render accessibility shouldn't cause issues with the demo Pod. You could add |
@tkatila I tried that fix, but it didn't work. My host is Ubuntu, so I thought the name of the render group is
Here's the result of running that in the container:
|
Something I forgot to mention is that the k3s host, which is Ubuntu, is running as a VM in proxmox. The GPU is passed through fine, the docker container was able to use it. I, of course, blacklisted the drivers on the VM host to pass it through. Just to make sure, I don't need to black list the drivers on the kubernetes node as well, right? |
The warning is fine afaik. As long as the gid matches the one on the host, it should be ok.
Thanks. There's nothing wrong in that trace. You mentioned in the original post that you can access&use the GPU in docker. What container is used in that case? What you could also do is to run the |
I tested a similar scenario but with a different GPU (Integrated Tigerlake). VM host with 24.04 + 6.8.0-39-generic and k3s. With that opencl demo Pod works ok: clinfo provides device details. |
So I actually noticed that
However, even after a system reboot, the pod does not see the card:
It was the official plex container:
I compared the output of strace from the pod and host and I don't see any errors in the pod one, but I also don't really know what I'm looking for. The host trace is a lot longer and has some errors on missing
Hmm... I've talked with some people on discord who have gotten this working using integrated GPU's as well, but my setup differs slightly from having a dedicated GPU, an Arc A380. |
Thanks. Sadly the container is built from a prebuild binary so it's hard to see what ingredients it includes.
The communication with the GPU stops after some ioctl's where as in the Host the communication continues. My hunch is that something in the user space libraries is not liking the 380 hardware and won't use it. I don't have access to a A380 at the moment. |
Hm, okay... Well I'll just go back to using the docker container for plex. I don't know enough about this GPU's and drivers and stuff to help much, sorry about that. Thanks for your time though! |
I tried running
Not sure if that gives any more information. This is when I run the pod as the |
I'll try to reproduce the scenario with a 750 card I have access to. But it might take some time so don't hold your breath. |
I think kernel requires root user and PERFMON capability from processes accessing PMU ( Those are not required to for normal GPU (write) access though, just some user or group matching the GPU device file. |
You're right about that. I added Here's a strace of |
You don't need (If you're using ancient Docker version, you may need to use
You do not need privileged mode, or any capabilities for normal GPU usage. Elevated privileges are needed only for some of the metrics (power & perf) used by i-g-t.
Try the same driver version in your pod as you have on the host. I think the driver version in the pod is not compatible with your kernel version, possibly due to: intel/compute-runtime#710 |
It seems to be the kernel version I was on! After reading that issue you sent, I noticed I was also on the same kernel version, 6.8.0. I updated my system ( I upgraded the demo pod to match the same ubuntu version I'm on, 24.04. At first the demo pod was still not recognizing any devices, but the output was a tiny bit longer, so I tried going through the steps listed on the docs for installing the drivers in the pod. After I followed those steps I attached the GPU to the jellyfin pod, enabled hardware transcoding, and it worked! Thanks for the help!! |
Describe the support request
I'm using k3s on a ubuntu host. I installed intel device plugins through helm on the cluster. The install looks good, all pods are up:
The issue I'm running into is that when I try to give a GPU to a pod and use it, the pod can't detect it. I tried following the verification steps on the docs, which didn't work. This is the output of the intelgpu-demo pod (the one that runs clinfo):
I also tried to give the GPU to jellyfin, but jellyfin fails to start ffmpeg for encoding. I exec'd into the jellyfin pod and was able to see the card:

I was also able to see that same result in the demo pod by changing the command to infinitely sleep and executing into it.
On the node I'm able to run
clinfo
and it outputs a bunch of stuff. I can useintel_gpu_top
to see the card's usage without any issue. I was able to give the card to a docker container on this same node in the past and it worked great, not sure why its not working in kubernetes. Any help appreciated!System (please complete the following information if applicable):
The text was updated successfully, but these errors were encountered: