Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In WSL2's docker container : GPU access blocked by the operating system #9962

Closed
1 of 2 tasks
lpdink opened this issue Apr 13, 2023 · 48 comments
Closed
1 of 2 tasks

In WSL2's docker container : GPU access blocked by the operating system #9962

lpdink opened this issue Apr 13, 2023 · 48 comments
Labels

Comments

@lpdink
Copy link

lpdink commented Apr 13, 2023

Windows Version

Microsoft Windows [Version 10.0.22621.1555]

WSL Version

1.2.0.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.90.1

Distro Version

Ubuntu-20.04

Other Software

  • Docker version 23.0.3, build 3e7cbfd

  • NVIDIA Container Runtime version 1.13.0
    commit: b7079454b5b8fed1390ce78ca5a3343748f62657
    spec: 1.0.2-dev
    runc version 1.1.5
    commit: v1.1.5-0-gf19387a
    spec: 1.0.2-dev
    go: go1.19.7
    libseccomp: 2.5.1

  • NVIDIA GeForce Game Ready driver version : 531.41 (installed in windows11)

Repro Steps

nvidia-smi works well in wsl2, but it doesn't work properly in the docker container started in wsl2, with error "Failed to initialize NVML: GPU access blocked by the operating system".
I use the official image provided by Pytorch and am confident that Docker-ce and nvidia_container_toolkit has been installed correctly. In fact, when I use the same installation script in the Ubuntu system, the GPU in the container works well.
Here is the command i install docker-ce and nvidia-container-toolkit.

  • Install docker-ce in wsl2:
apt-get update

apt-get install -y apt-transport-https ca-certificates curl software-properties-common 

# here use aliyun mirror.
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-get install -y docker-ce

service docker start
  • install nvidia-container-toolkit in wsl2:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt-get update
apt-get install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker
service docker restart
  • pull nvidia-gpu-supported image and try to use GPU in container
docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi

Expected Behavior

image

Actual Behavior

image

Diagnostic Logs

No response

@jucysoft
Copy link

same error

@Vkhark
Copy link

Vkhark commented Apr 13, 2023

same error. Containers that previously worked (last week) are not working right now. Might it have been caused by some Update?

Quite new in using docker, so I wouldnt even really know where to look.

grafik

@jucysoft
Copy link

This issue occurs after updating wsl from 1.1.6 to 1.2.0.

@jucysoft
Copy link

It is not resolved with 1.2.1 pre released version.

@lpdink
Copy link
Author

lpdink commented Apr 13, 2023

This issue occurs after updating wsl from 1.1.6 to 1.2.0.

wow, Thanks, this helps a lot.

@Vkhark
Copy link

Vkhark commented Apr 13, 2023

Yeah, very cool! Do you mind telling me how to downgrade to a previous Version of WSL (1.1.6) please? I have not been able to find that out yet.

@Vkhark
Copy link

Vkhark commented Apr 13, 2023

Its fine. Just figured it out. Just go to the GitRepo and download the msixbundle from the previous version and just double click it.
grafik

@amarese
Copy link

amarese commented Apr 13, 2023

Same error here, did you fixed your problem after downgrading it?

@Vkhark
Copy link

Vkhark commented Apr 13, 2023

Yep downgrading fixed it. Thank you very much @jucysoft <3 <3 <3

Detectron2 finally is working again.

I can also view nvidia-smi inside the docker container:
grafik

@benhillis benhillis added the GPU label Apr 13, 2023
@benhillis
Copy link
Member

Thanks for reporting. Sounds like something regressed between 1.1.6 and 1.2? I'm looking through the change history and nothing is really jumping out to me...

Does it work with 1.1.7?

@nakashimn
Copy link

Does it work with 1.1.7?

I just tried it, but it doesn't work.
It works with 1.1.6.

@benhillis
Copy link
Member

@nakashimn - thanks. How about 1.2? I'm a bit confused why things would be working inside the WSL environment but not with Docker...

@benhillis
Copy link
Member

Maybe an strace of nvidia-smi would indicate why things are failing?

@nakashimn
Copy link

@benhillis

How about 1.2?

I checked with 1.2.0 to 1.2.2. All of them don't work.

@benhillis
Copy link
Member

Ok thanks, I'm setting up an environment to see what might have happened. Might have been the change to cgroups.

@benhillis
Copy link
Member

benhillis commented Apr 13, 2023

Unfortunately I am not able to repro on 1.2.1, what could I be missing? I followed the repro steps exactly.

ben@BENHILL-DEV:~$ wsl.exe --version
WSL version: 1.2.1.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.25342.1001
ben@BENHILL-DEV:~$ docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Thu Apr 13 16:14:02 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.07    Driver Version: 527.27       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     On   | 00000000:73:00.0 Off |                  N/A |
| 30%   35C    P8     1W / 125W |   2650MiB /  8192MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        33      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

@lpdink
Copy link
Author

lpdink commented Apr 13, 2023

At version 1.1.6(GPU works well in container), when start docker service, terminal will log some error msg, but docker service start OK, and GPU works well.
While at version 1.2.0 or 1.2.1, no error log, but GPU can't use. The error log looks like related to cgroup as you mentioned, maybe this can help?@benhillis

root@darkMaster:/mnt/c/Users/32649# service docker start
mount: /sys/fs/cgroup/cpuset: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpu: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpuacct: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/blkio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/memory: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/devices: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/freezer: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_cls: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/perf_event: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_prio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/hugetlb: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/pids: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/rdma: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/misc: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
 * Starting Docker: docker                                                                                       [ OK ]
 root@darkMaster:/mnt/c/Users/32649#  docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Thu Apr 13 17:17:10 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 531.41       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060         On | 00000000:01:00.0  On |                  N/A |
| 32%   30C    P8                8W / 170W|   1829MiB / 12288MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

By the way, at this version, GPU works well:

wsl --version
WSL version: 1.1.6.0
Kernel version: 5.15.90.1
WSLg version: 1.0.50
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.1555

error version is:

WSL version: 1.2.0.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.1555

WSLg version also changed.

@benhillis
Copy link
Member

@lpdink - interesting. What version of docker are you running? That might explain a difference.

@lpdink
Copy link
Author

lpdink commented Apr 13, 2023

@lpdink - interesting. What version of docker are you running? That might explain a difference.

@benhillis Docker version 23.0.3, build 3e7cbfd
you could see other software (e.g. nvidia_comtainer_toolkit) version in the issue content. :)

@benhillis
Copy link
Member

@lpdink - sorry missed that in the issue. Sidenote - Thanks for filing such a complete issue!

I'm thinking the best thing to do is undo our change around cgroup while I sort out what's going on here.

@gosselind1
Copy link

gosselind1 commented Apr 14, 2023

whew, what a bug to run into on my first install of WSL.
I re-installed my wsl guest, and sanity checked my package installations pretty hard, but only downgrading to 1.16 was able to fix this issue as well.

Running a wsl 2 service.
Guest is a debian system with kernel: 5.15.90.1-microsoft-standard-WSL2
nvidia driver was freshly updated to: 531.61 (previous driver was date to Febuary, though I don't remember the version off top of my head)
docker version is: Docker version 23.0.3, build 3e7cbfd
nvidia container package: nvidia-container-toolkit/buster,now 1.13.0-1 amd64 [installed]

WSL version: 1.1.6.0
Kernel version: 5.15.90.1
WSLg version: 1.0.50
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2846

The cgroup outputs aren't present on the newer versions I tried, when attempting to start docker, but are on 1.16

$ sudo service docker start
mount: /sys/fs/cgroup/cpuset: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpu: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpuacct: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/blkio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/memory: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/devices: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/freezer: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_cls: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/perf_event: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_prio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/hugetlb: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/pids: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/rdma: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/misc: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
Starting Docker: docker.

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Fri Apr 14 07:14:12 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.46                 Driver Version: 531.61       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti      On | 00000000:2A:00.0  On |                  N/A |
| 49%   49C    P0              119W / 350W|    880MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

On versions newer than 1.16 I got:

$ sudo service docker start
Starting Docker: docker.

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Using a ryzen 3900x w/ ecc ram on a pcie4x16 link for main hardware points in case this ends up being hardware related.

@teymur-git
Copy link

The same problem you had .Do I need to downgrade to wsl1? @lpdink

@lpdink
Copy link
Author

lpdink commented Apr 18, 2023

@l1377687647 I noticed that the latest version 1.2.3 has fixed this issue, but it has not been pushed to the Microsoft Store yet, you can download it in release and install it in PowerShell (admin mode) with the Add-AppxPackage command. You may need to call wsl --shutdown at first and terminate the wsl process via Task Manager.
Or downgrade to 1.1.6 as above, please note that 1.1.6 is still the version of wsl2 instead of wsl1.

@teymur-git
Copy link

我是使用的nvidia-docker方式部署深度学习服务,请问下linux子系统里需要安装驱动和配置cuda、cudnn、tensorrt吗?
我成功启动过服务器,成功运行了一段时间,最近启动服务突然报错:GPU access blocked by the operating system @lpdink

@lpdink
Copy link
Author

lpdink commented Apr 18, 2023

不需要,确保你的镜像内有所需环境即可。这个问题只是由wsl导致的,升级或降级到恰当版本,之后像你之前一样启动服务即可。@l1377687647

@lpdink
Copy link
Author

lpdink commented Apr 18, 2023

@l1377687647 I noticed that the latest version 1.2.3 has fixed this issue, but it has not been pushed to the Microsoft Store yet, you can download it in release and install it in PowerShell (admin mode) with the Add-AppxPackage command. You may need to call wsl --shutdown at first and terminate the wsl process via Task Manager.
Or downgrade to 1.1.6 as above, please note that 1.1.6 is still the version of wsl2 instead of wsl1.

@l1377687647

@teymur-git
Copy link

Thank you very much. I successfully resolved the problem using wsl1.2.4. I can run deep learning services @lpdink

@Zoe-Wan
Copy link

Zoe-Wan commented Apr 19, 2023

Oh, I have been obsessed by this very bug for couple of days! Thanks you guys for letting me know what the problem is! I have tried wsl 1.2.3 just now and It works very well! :)

@ArchiMickey
Copy link

I am using wsl 1.2.5 and facing this issue. Is wsl 1.2.5 broke GPU driver again?

@maurange
Copy link

maurange commented May 5, 2023

Hello same problem with 1.2.5.0
can you give some tips to downgrade to 1.2.3 or 1.2.4 ?

@lpdink
Copy link
Author

lpdink commented May 5, 2023

Hello same problem with 1.2.5.0
can you give some tips to downgrade to 1.2.3 or 1.2.4 ?

@maurange well, just download 1.2.3 or 1.2.4 .msixbundle in release. Open powershell with admain mode:

# find wsl package full name
get-appxpackage *linux*
# uninstall wsl, may need to stop wsl service in task manager.
remove-appxpackage your_full_name
# install 1.2.3/.4 package
add-appxpackage  Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle
# run wsl again
wsl

if nothing wrong, this will work:)

@maurange
Copy link

maurange commented May 9, 2023

thank you @lpdink seems working

@2019211753
Copy link

after running add-appxpackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle,
it reports:

add-appxpackage : 找不到路径“C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle”,因为该路径不存在。
所在位置 行:1 字符: 1

  • add-appxpackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle
  •   + CategoryInfo          : ObjectNotFound: (C:\Users\22705\...RM64.msixbundle:String) [Add-AppxPackage], ItemNotFoundExcept
     ion
      + FullyQualifiedErrorId : PathNotFound,Microsoft.Windows.Appx.PackageManager.Commands.AddAppxPackageCommand
      
    

what's wrong?thanks

@lpdink
Copy link
Author

lpdink commented May 18, 2023

emm...Looks like just file location error. Make sure you have downloaded the file Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle and placed it in the path C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle. This file is not pre-positioned. @2019211753

@2019211753
Copy link

emm...Looks like just file location error. Make sure you have downloaded the file Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle and placed it in the path C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle. This file is not pre-positioned. @2019211753

thank you for your prompt reply!

@Dounx
Copy link

Dounx commented May 30, 2023

Same error on 1.2.5,0

@githubbabrova
Copy link

Also same in 1.2.5.0

@nunix
Copy link

nunix commented Jun 14, 2023

Just tested @lpdink steps on v1.3.10 and it worked:
image

Setup:

  • enable cgroupv2 on .wslconfig with kernelCommandLine
  • on Ubuntu: add a [boot] command to umount the remaining cgroup mounts and mount /sys/fs/cgroup as cgroup2
  • install Docker and Nvidia runtime as described by @lpdink
  • run a container with nvidia gpu

I also tested Docker Desktop based on their doc: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/
Same results, all good too
image

Hope this helps and hopefully more people will be able to confirm it works now too (so we can get back to cgroup2 😇)

@lpdink
Copy link
Author

lpdink commented Jul 2, 2023

Just tested @lpdink steps on v1.3.10 and it worked: image

Setup:

  • enable cgroupv2 on .wslconfig with kernelCommandLine
  • on Ubuntu: add a [boot] command to umount the remaining cgroup mounts and mount /sys/fs/cgroup as cgroup2
  • install Docker and Nvidia runtime as described by @lpdink
  • run a container with nvidia gpu

I also tested Docker Desktop based on their doc: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/ Same results, all good too image

Hope this helps and hopefully more people will be able to confirm it works now too (so we can get back to cgroup2 😇)

@nunix Hey, I tested version 1.3.11 and everything works well. Thank you team for your nice work. Maybe I could close this issue recently?

@lpdink lpdink closed this as completed Aug 4, 2023
@cccc11231
Copy link

@lpdink Hi, I am using 2.0.9, and I have the same issue.

image

image

do you have any idea? do I need to downgrade to 1.2.23?

@abulus
Copy link

abulus commented Nov 22, 2023

@lpdink Hi, I am using 2.0.9, and I have the same issue.

image

image

do you have any idea? do I need to downgrade to 1.2.23?

I had the same issue, I simply followed wsl2 kernel update steps which are: "It has recently become simpler to update the WSL 2 Kernel: turn on "Receive updates for other Microsoft products when you update Windows OR Receive updates for other Microsoft products (Windows 10/Windows 11) in the Advanced options for Windows update. The WSL2 kernel will get updated automatically."

And I can see it now.
docker run --rm --gpus all ubuntu nvidia-smi
image

@lpdink
Copy link
Author

lpdink commented Nov 22, 2023

@lpdink Hi, I am using 2.0.9, and I have the same issue.

image

image

do you have any idea? do I need to downgrade to 1.2.23?

@cccc11231 I have no idea about version 2.x.x, I'm still using version 1.3.11.0 because of this case...And I forbid auto-update out of same question. Maybe you could downgrade to 1.3.11.0 ?

@flamed0g
Copy link

flamed0g commented Mar 5, 2024

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!

Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1
> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

@ivtavares
Copy link

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!

Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1
> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

@Mougrouff
Copy link

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1
> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

This Helped me solving it for WSL2 2.1.5 thanks

@alifim
Copy link

alifim commented Jun 7, 2024

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1
> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

thank you so much! This helped solve my issue.

@ben-cha
Copy link

ben-cha commented Aug 8, 2024

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1
> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

This worked for me, thanks! I actually was even able to remove kernelCommandLine = cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1 from .wslconfig

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2


Thu Aug  8 01:16:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.27                 Driver Version: 560.70         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+

@keyboarderror
Copy link

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1
> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

This just happened to me out of the blue (screen) after previously working without it. After a blue screen and reboot I started getting this error. I can only guess there was a software update or change in the background. But this solved it for reasons I still don't fully understand. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests