-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In WSL2's docker container : GPU access blocked by the operating system #9962
Comments
same error |
This issue occurs after updating wsl from 1.1.6 to 1.2.0. |
It is not resolved with 1.2.1 pre released version. |
wow, Thanks, this helps a lot. |
Yeah, very cool! Do you mind telling me how to downgrade to a previous Version of WSL (1.1.6) please? I have not been able to find that out yet. |
Same error here, did you fixed your problem after downgrading it? |
Yep downgrading fixed it. Thank you very much @jucysoft <3 <3 <3 Detectron2 finally is working again. |
Thanks for reporting. Sounds like something regressed between 1.1.6 and 1.2? I'm looking through the change history and nothing is really jumping out to me... Does it work with 1.1.7? |
I just tried it, but it doesn't work. |
@nakashimn - thanks. How about 1.2? I'm a bit confused why things would be working inside the WSL environment but not with Docker... |
Maybe an strace of nvidia-smi would indicate why things are failing? |
I checked with 1.2.0 to 1.2.2. All of them don't work. |
Ok thanks, I'm setting up an environment to see what might have happened. Might have been the change to cgroups. |
Unfortunately I am not able to repro on 1.2.1, what could I be missing? I followed the repro steps exactly.
|
At version 1.1.6(GPU works well in container), when start docker service, terminal will log some error msg, but docker service start OK, and GPU works well.
By the way, at this version, GPU works well:
error version is:
WSLg version also changed. |
@lpdink - interesting. What version of docker are you running? That might explain a difference. |
@benhillis Docker version 23.0.3, build 3e7cbfd |
@lpdink - sorry missed that in the issue. Sidenote - Thanks for filing such a complete issue! I'm thinking the best thing to do is undo our change around cgroup while I sort out what's going on here. |
whew, what a bug to run into on my first install of WSL. Running a wsl 2 service.
The cgroup outputs aren't present on the newer versions I tried, when attempting to start docker, but are on 1.16
On versions newer than 1.16 I got:
Using a ryzen 3900x w/ ecc ram on a pcie4x16 link for main hardware points in case this ends up being hardware related. |
The same problem you had .Do I need to downgrade to wsl1? @lpdink |
@l1377687647 I noticed that the latest version 1.2.3 has fixed this issue, but it has not been pushed to the Microsoft Store yet, you can download it in release and install it in PowerShell (admin mode) with the Add-AppxPackage command. You may need to call wsl --shutdown at first and terminate the wsl process via Task Manager. |
我是使用的nvidia-docker方式部署深度学习服务,请问下linux子系统里需要安装驱动和配置cuda、cudnn、tensorrt吗? |
不需要,确保你的镜像内有所需环境即可。这个问题只是由wsl导致的,升级或降级到恰当版本,之后像你之前一样启动服务即可。@l1377687647 |
@l1377687647 |
Thank you very much. I successfully resolved the problem using wsl1.2.4. I can run deep learning services @lpdink |
Oh, I have been obsessed by this very bug for couple of days! Thanks you guys for letting me know what the problem is! I have tried wsl 1.2.3 just now and It works very well! :) |
I am using wsl 1.2.5 and facing this issue. Is wsl 1.2.5 broke GPU driver again? |
Hello same problem with 1.2.5.0 |
@maurange well, just download 1.2.3 or 1.2.4 .msixbundle in release. Open powershell with admain mode:
if nothing wrong, this will work:) |
thank you @lpdink seems working |
after running add-appxpackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle, add-appxpackage : 找不到路径“C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle”,因为该路径不存在。
what's wrong?thanks |
emm...Looks like just file location error. Make sure you have downloaded the file Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle and placed it in the path C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle. This file is not pre-positioned. @2019211753 |
thank you for your prompt reply! |
Same error on 1.2.5,0 |
Also same in 1.2.5.0 |
Just tested @lpdink steps on v1.3.10 and it worked: Setup:
I also tested Docker Desktop based on their doc: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/ Hope this helps and hopefully more people will be able to confirm it works now too (so we can get back to cgroup2 😇) |
@nunix Hey, I tested version 1.3.11 and everything works well. Thank you team for your nice work. Maybe I could close this issue recently? |
@lpdink Hi, I am using 2.0.9, and I have the same issue. do you have any idea? do I need to downgrade to 1.2.23? |
I had the same issue, I simply followed wsl2 kernel update steps which are: "It has recently become simpler to update the WSL 2 Kernel: turn on "Receive updates for other Microsoft products when you update Windows OR Receive updates for other Microsoft products (Windows 10/Windows 11) in the Advanced options for Windows update. The WSL2 kernel will get updated automatically." And I can see it now. |
@cccc11231 I have no idea about version 2.x.x, I'm still using version 1.3.11.0 because of this case...And I forbid auto-update out of same question. Maybe you could downgrade to 1.3.11.0 ? |
I am having the same issues, anyone else got this working on WSL 2.x ?
Even though running the
Hope someone is able to help me out, thanks a lot! Edit: even after downgrading WSL to
|
Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false
|
This Helped me solving it for WSL2 2.1.5 thanks |
thank you so much! This helped solve my issue. |
This worked for me, thanks! I actually was even able to remove
|
This just happened to me out of the blue (screen) after previously working without it. After a blue screen and reboot I started getting this error. I can only guess there was a software update or change in the background. But this solved it for reasons I still don't fully understand. Thanks. |
Windows Version
Microsoft Windows [Version 10.0.22621.1555]
WSL Version
1.2.0.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.90.1
Distro Version
Ubuntu-20.04
Other Software
Docker version 23.0.3, build 3e7cbfd
NVIDIA Container Runtime version 1.13.0
commit: b7079454b5b8fed1390ce78ca5a3343748f62657
spec: 1.0.2-dev
runc version 1.1.5
commit: v1.1.5-0-gf19387a
spec: 1.0.2-dev
go: go1.19.7
libseccomp: 2.5.1
NVIDIA GeForce Game Ready driver version : 531.41 (installed in windows11)
Repro Steps
nvidia-smi works well in wsl2, but it doesn't work properly in the docker container started in wsl2, with error "Failed to initialize NVML: GPU access blocked by the operating system".
I use the official image provided by Pytorch and am confident that Docker-ce and nvidia_container_toolkit has been installed correctly. In fact, when I use the same installation script in the Ubuntu system, the GPU in the container works well.
Here is the command i install docker-ce and nvidia-container-toolkit.
Expected Behavior
Actual Behavior
Diagnostic Logs
No response
The text was updated successfully, but these errors were encountered: