In WSL2's docker container : GPU access blocked by the operating system #9962

lpdink · 2023-04-13T06:25:04Z

Windows Version

Microsoft Windows [Version 10.0.22621.1555]

WSL Version

1.2.0.0

Are you using WSL 1 or WSL 2?

WSL 2
WSL 1

Kernel Version

5.15.90.1

Distro Version

Ubuntu-20.04

Other Software

Docker version 23.0.3, build 3e7cbfd
NVIDIA Container Runtime version 1.13.0
commit: b7079454b5b8fed1390ce78ca5a3343748f62657
spec: 1.0.2-dev
runc version 1.1.5
commit: v1.1.5-0-gf19387a
spec: 1.0.2-dev
go: go1.19.7
libseccomp: 2.5.1
NVIDIA GeForce Game Ready driver version : 531.41 (installed in windows11)

Repro Steps

nvidia-smi works well in wsl2, but it doesn't work properly in the docker container started in wsl2, with error "Failed to initialize NVML: GPU access blocked by the operating system".
I use the official image provided by Pytorch and am confident that Docker-ce and nvidia_container_toolkit has been installed correctly. In fact, when I use the same installation script in the Ubuntu system, the GPU in the container works well.
Here is the command i install docker-ce and nvidia-container-toolkit.

Install docker-ce in wsl2:

apt-get update

apt-get install -y apt-transport-https ca-certificates curl software-properties-common 

# here use aliyun mirror.
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
apt-get install -y docker-ce

service docker start

install nvidia-container-toolkit in wsl2:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt-get update
apt-get install -y nvidia-container-toolkit
nvidia-ctk runtime configure --runtime=docker
service docker restart

pull nvidia-gpu-supported image and try to use GPU in container

docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi

Expected Behavior

Actual Behavior

Diagnostic Logs

No response

The text was updated successfully, but these errors were encountered:

jucysoft · 2023-04-13T09:24:40Z

same error

Vkhark · 2023-04-13T11:42:30Z

same error. Containers that previously worked (last week) are not working right now. Might it have been caused by some Update?

Quite new in using docker, so I wouldnt even really know where to look.

jucysoft · 2023-04-13T11:52:09Z

This issue occurs after updating wsl from 1.1.6 to 1.2.0.

jucysoft · 2023-04-13T11:54:43Z

It is not resolved with 1.2.1 pre released version.

lpdink · 2023-04-13T12:10:03Z

This issue occurs after updating wsl from 1.1.6 to 1.2.0.

wow, Thanks, this helps a lot.

Vkhark · 2023-04-13T12:47:55Z

Yeah, very cool! Do you mind telling me how to downgrade to a previous Version of WSL (1.1.6) please? I have not been able to find that out yet.

Vkhark · 2023-04-13T12:55:28Z

Its fine. Just figured it out. Just go to the GitRepo and download the msixbundle from the previous version and just double click it.

amarese · 2023-04-13T13:01:02Z

Same error here, did you fixed your problem after downgrading it?

Vkhark · 2023-04-13T13:03:55Z

Yep downgrading fixed it. Thank you very much @jucysoft <3 <3 <3

Detectron2 finally is working again.

I can also view nvidia-smi inside the docker container:

benhillis · 2023-04-13T15:39:12Z

Thanks for reporting. Sounds like something regressed between 1.1.6 and 1.2? I'm looking through the change history and nothing is really jumping out to me...

Does it work with 1.1.7?

nakashimn · 2023-04-13T15:41:46Z

Does it work with 1.1.7?

I just tried it, but it doesn't work.
It works with 1.1.6.

benhillis · 2023-04-13T15:42:28Z

@nakashimn - thanks. How about 1.2? I'm a bit confused why things would be working inside the WSL environment but not with Docker...

benhillis · 2023-04-13T15:52:38Z

Maybe an strace of nvidia-smi would indicate why things are failing?

nakashimn · 2023-04-13T15:54:13Z

@benhillis

How about 1.2?

I checked with 1.2.0 to 1.2.2. All of them don't work.

benhillis · 2023-04-13T15:56:10Z

Ok thanks, I'm setting up an environment to see what might have happened. Might have been the change to cgroups.

benhillis · 2023-04-13T16:11:28Z

Unfortunately I am not able to repro on 1.2.1, what could I be missing? I followed the repro steps exactly.

ben@BENHILL-DEV:~$ wsl.exe --version
WSL version: 1.2.1.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.25342.1001
ben@BENHILL-DEV:~$ docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Thu Apr 13 16:14:02 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.07    Driver Version: 527.27       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     On   | 00000000:73:00.0 Off |                  N/A |
| 30%   35C    P8     1W / 125W |   2650MiB /  8192MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        33      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

lpdink · 2023-04-13T17:23:34Z

At version 1.1.6(GPU works well in container), when start docker service, terminal will log some error msg, but docker service start OK, and GPU works well.
While at version 1.2.0 or 1.2.1, no error log, but GPU can't use. The error log looks like related to cgroup as you mentioned, maybe this can help?@benhillis

root@darkMaster:/mnt/c/Users/32649# service docker start
mount: /sys/fs/cgroup/cpuset: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpu: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpuacct: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/blkio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/memory: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/devices: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/freezer: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_cls: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/perf_event: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_prio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/hugetlb: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/pids: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/rdma: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/misc: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
 * Starting Docker: docker                                                                                       [ OK ]
 root@darkMaster:/mnt/c/Users/32649#  docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Thu Apr 13 17:17:10 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 531.41       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060         On | 00000000:01:00.0  On |                  N/A |
| 32%   30C    P8                8W / 170W|   1829MiB / 12288MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

By the way, at this version, GPU works well:

wsl --version
WSL version: 1.1.6.0
Kernel version: 5.15.90.1
WSLg version: 1.0.50
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.1555

error version is:

WSL version: 1.2.0.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.1555

WSLg version also changed.

benhillis · 2023-04-13T19:41:49Z

@lpdink - interesting. What version of docker are you running? That might explain a difference.

lpdink · 2023-04-13T19:53:33Z

@lpdink - interesting. What version of docker are you running? That might explain a difference.

@benhillis Docker version 23.0.3, build 3e7cbfd
you could see other software (e.g. nvidia_comtainer_toolkit) version in the issue content. :）

benhillis · 2023-04-13T20:42:17Z

@lpdink - sorry missed that in the issue. Sidenote - Thanks for filing such a complete issue!

I'm thinking the best thing to do is undo our change around cgroup while I sort out what's going on here.

gosselind1 · 2023-04-14T07:36:38Z

whew, what a bug to run into on my first install of WSL.
I re-installed my wsl guest, and sanity checked my package installations pretty hard, but only downgrading to 1.16 was able to fix this issue as well.

Running a wsl 2 service.
Guest is a debian system with kernel: 5.15.90.1-microsoft-standard-WSL2
nvidia driver was freshly updated to: 531.61 (previous driver was date to Febuary, though I don't remember the version off top of my head)
docker version is: Docker version 23.0.3, build 3e7cbfd
nvidia container package: nvidia-container-toolkit/buster,now 1.13.0-1 amd64 [installed]

WSL version: 1.1.6.0
Kernel version: 5.15.90.1
WSLg version: 1.0.50
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.2846

The cgroup outputs aren't present on the newer versions I tried, when attempting to start docker, but are on 1.16

$ sudo service docker start
mount: /sys/fs/cgroup/cpuset: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpu: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpuacct: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/blkio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/memory: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/devices: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/freezer: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_cls: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/perf_event: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_prio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/hugetlb: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/pids: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/rdma: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/misc: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
Starting Docker: docker.

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Fri Apr 14 07:14:12 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.46                 Driver Version: 531.61       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti      On | 00000000:2A:00.0  On |                  N/A |
| 49%   49C    P0              119W / 350W|    880MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

On versions newer than 1.16 I got:

$ sudo service docker start
Starting Docker: docker.

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Using a ryzen 3900x w/ ecc ram on a pcie4x16 link for main hardware points in case this ends up being hardware related.

teymur-git · 2023-04-18T02:02:32Z

The same problem you had .Do I need to downgrade to wsl1? @lpdink

lpdink · 2023-04-18T02:28:28Z

@l1377687647 I noticed that the latest version 1.2.3 has fixed this issue, but it has not been pushed to the Microsoft Store yet, you can download it in release and install it in PowerShell (admin mode) with the Add-AppxPackage command. You may need to call wsl --shutdown at first and terminate the wsl process via Task Manager.
Or downgrade to 1.1.6 as above, please note that 1.1.6 is still the version of wsl2 instead of wsl1.

teymur-git · 2023-04-18T02:57:35Z

我是使用的nvidia-docker方式部署深度学习服务，请问下linux子系统里需要安装驱动和配置cuda、cudnn、tensorrt吗？
我成功启动过服务器，成功运行了一段时间，最近启动服务突然报错：GPU access blocked by the operating system @lpdink

lpdink · 2023-04-18T03:02:50Z

不需要，确保你的镜像内有所需环境即可。这个问题只是由wsl导致的，升级或降级到恰当版本，之后像你之前一样启动服务即可。@l1377687647

lpdink · 2023-04-18T06:13:07Z

@l1377687647 I noticed that the latest version 1.2.3 has fixed this issue, but it has not been pushed to the Microsoft Store yet, you can download it in release and install it in PowerShell (admin mode) with the Add-AppxPackage command. You may need to call wsl --shutdown at first and terminate the wsl process via Task Manager.
Or downgrade to 1.1.6 as above, please note that 1.1.6 is still the version of wsl2 instead of wsl1.

@l1377687647

teymur-git · 2023-04-18T07:25:34Z

Thank you very much. I successfully resolved the problem using wsl1.2.4. I can run deep learning services @lpdink

Zoe-Wan · 2023-04-19T03:40:59Z

Oh, I have been obsessed by this very bug for couple of days! Thanks you guys for letting me know what the problem is! I have tried wsl 1.2.3 just now and It works very well! :)

ArchiMickey · 2023-05-02T20:35:51Z

I am using wsl 1.2.5 and facing this issue. Is wsl 1.2.5 broke GPU driver again?

maurange · 2023-05-05T14:42:56Z

Hello same problem with 1.2.5.0
can you give some tips to downgrade to 1.2.3 or 1.2.4 ?

lpdink · 2023-05-05T14:56:47Z

Hello same problem with 1.2.5.0
can you give some tips to downgrade to 1.2.3 or 1.2.4 ?

@maurange well, just download 1.2.3 or 1.2.4 .msixbundle in release. Open powershell with admain mode:

# find wsl package full name
get-appxpackage *linux*
# uninstall wsl, may need to stop wsl service in task manager.
remove-appxpackage your_full_name
# install 1.2.3/.4 package
add-appxpackage  Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle
# run wsl again
wsl

if nothing wrong, this will work:)

maurange · 2023-05-09T17:14:16Z

thank you @lpdink seems working

2019211753 · 2023-05-18T01:49:54Z

after running add-appxpackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle，
it reports：

add-appxpackage : 找不到路径“C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle”，因为该路径不存在。
所在位置行:1 字符: 1

add-appxpackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle

  + CategoryInfo          : ObjectNotFound: (C:\Users\22705\...RM64.msixbundle:String) [Add-AppxPackage], ItemNotFoundExcept
 ion
  + FullyQualifiedErrorId : PathNotFound,Microsoft.Windows.Appx.PackageManager.Commands.AddAppxPackageCommand

what's wrong?thanks

lpdink · 2023-05-18T02:13:27Z

emm...Looks like just file location error. Make sure you have downloaded the file Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle and placed it in the path C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle. This file is not pre-positioned. @2019211753

2019211753 · 2023-05-18T02:16:45Z

emm...Looks like just file location error. Make sure you have downloaded the file Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle and placed it in the path C:\Users\22705\Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle. This file is not pre-positioned. @2019211753

thank you for your prompt reply！

Dounx · 2023-05-30T06:39:57Z

Same error on 1.2.5,0

githubbabrova · 2023-06-09T12:38:18Z

Also same in 1.2.5.0

nunix · 2023-06-14T12:28:37Z

Just tested @lpdink steps on v1.3.10 and it worked:

Setup:

enable cgroupv2 on .wslconfig with kernelCommandLine
on Ubuntu: add a [boot] command to umount the remaining cgroup mounts and mount /sys/fs/cgroup as cgroup2
install Docker and Nvidia runtime as described by @lpdink
run a container with nvidia gpu

I also tested Docker Desktop based on their doc: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/
Same results, all good too

Hope this helps and hopefully more people will be able to confirm it works now too (so we can get back to cgroup2 😇)

lpdink · 2023-07-02T04:04:13Z

Just tested @lpdink steps on v1.3.10 and it worked:

Setup:

enable cgroupv2 on .wslconfig with kernelCommandLine

on Ubuntu: add a [boot] command to umount the remaining cgroup mounts and mount /sys/fs/cgroup as cgroup2

install Docker and Nvidia runtime as described by @lpdink

run a container with nvidia gpu

I also tested Docker Desktop based on their doc: https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/ Same results, all good too

Hope this helps and hopefully more people will be able to confirm it works now too (so we can get back to cgroup2 😇)

@nunix Hey, I tested version 1.3.11 and everything works well. Thank you team for your nice work. Maybe I could close this issue recently?

cccc11231 · 2023-11-15T20:10:20Z

@lpdink Hi, I am using 2.0.9, and I have the same issue.

do you have any idea? do I need to downgrade to 1.2.23?

abulus · 2023-11-22T15:11:07Z

@lpdink Hi, I am using 2.0.9, and I have the same issue.

do you have any idea? do I need to downgrade to 1.2.23?

I had the same issue, I simply followed wsl2 kernel update steps which are: "It has recently become simpler to update the WSL 2 Kernel: turn on "Receive updates for other Microsoft products when you update Windows OR Receive updates for other Microsoft products (Windows 10/Windows 11) in the Advanced options for Windows update. The WSL2 kernel will get updated automatically."

And I can see it now.
docker run --rm --gpus all ubuntu nvidia-smi

lpdink · 2023-11-22T16:15:09Z

@lpdink Hi, I am using 2.0.9, and I have the same issue.

do you have any idea? do I need to downgrade to 1.2.23?

@cccc11231 I have no idea about version 2.x.x, I'm still using version 1.3.11.0 because of this case...And I forbid auto-update out of same question. Maybe you could downgrade to 1.3.11.0 ?

flamed0g · 2024-03-05T12:37:00Z

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!

Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1

> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

ivtavares · 2024-04-19T12:23:41Z

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!

Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1

> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

Mougrouff · 2024-05-13T13:33:23Z

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1

> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

This Helped me solving it for WSL2 2.1.5 thanks

alifim · 2024-06-07T16:34:26Z

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1

> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

thank you so much! This helped solve my issue.

ben-cha · 2024-08-08T01:17:26Z

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1

> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

This worked for me, thanks! I actually was even able to remove kernelCommandLine = cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1 from .wslconfig

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2


Thu Aug  8 01:16:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.27                 Driver Version: 560.70         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+

keyboarderror · 2025-02-13T18:50:24Z

I am having the same issues, anyone else got this working on WSL 2.x ?

> wsl --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1

> docker run --rm --runtime nvidia --gpus all pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Even though running the nvidia-smi command works fine, outputs all the correct GPU information:

 nvidia-smi
Tue Mar  5 13:33:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
...

Hope someone is able to help me out, thanks a lot!
Edit: even after downgrading WSL to 1.2.3.0 (even tried to downgrade to 1.1.6.0), the issue persists for me:

> wsl --version
WSL version: 1.2.3.0
Kernel version: 5.15.90.1

> docker run --rm --gpus all ubuntu nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

wsl --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2

This just happened to me out of the blue (screen) after previously working without it. After a blue screen and reboot I started getting this error. I can only guess there was a software update or change in the background. But this solved it for reasons I still don't fully understand. Thanks.

benhillis added the GPU label Apr 13, 2023

lpdink mentioned this issue Apr 18, 2023

Failed to initialize NVML: GPU access blocked by the operating system #9938

Open

2 tasks

MorphBonehunter mentioned this issue May 2, 2023

v1.2.5 regression - podman stats broken by lack of cgroups v2 #10050

Closed

2 tasks

lpdink closed this as completed Aug 4, 2023

This was referenced Jul 26, 2024

WSL2 with /dev/dri passthrough to Docker container not working #11846

Closed

GPU access blocked in WSL hosted in windows server 2022 with GPU L40s #11865

Closed

In WSL2's docker container : GPU access blocked by the operating system #9962

In WSL2's docker container : GPU access blocked by the operating system #9962

Comments

lpdink commented Apr 13, 2023

Windows Version

WSL Version

Are you using WSL 1 or WSL 2?

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

jucysoft commented Apr 13, 2023

Vkhark commented Apr 13, 2023

jucysoft commented Apr 13, 2023

jucysoft commented Apr 13, 2023

lpdink commented Apr 13, 2023

Vkhark commented Apr 13, 2023

Vkhark commented Apr 13, 2023

amarese commented Apr 13, 2023

Vkhark commented Apr 13, 2023

benhillis commented Apr 13, 2023

nakashimn commented Apr 13, 2023

benhillis commented Apr 13, 2023

benhillis commented Apr 13, 2023

nakashimn commented Apr 13, 2023

benhillis commented Apr 13, 2023

benhillis commented Apr 13, 2023 • edited Loading

lpdink commented Apr 13, 2023 • edited Loading

benhillis commented Apr 13, 2023

lpdink commented Apr 13, 2023

benhillis commented Apr 13, 2023

gosselind1 commented Apr 14, 2023 • edited Loading

teymur-git commented Apr 18, 2023

lpdink commented Apr 18, 2023

teymur-git commented Apr 18, 2023

lpdink commented Apr 18, 2023 • edited Loading

lpdink commented Apr 18, 2023

teymur-git commented Apr 18, 2023

Zoe-Wan commented Apr 19, 2023

ArchiMickey commented May 2, 2023

maurange commented May 5, 2023

lpdink commented May 5, 2023

maurange commented May 9, 2023

2019211753 commented May 18, 2023

lpdink commented May 18, 2023

2019211753 commented May 18, 2023

Dounx commented May 30, 2023 • edited Loading

githubbabrova commented Jun 9, 2023

nunix commented Jun 14, 2023

lpdink commented Jul 2, 2023

cccc11231 commented Nov 15, 2023

abulus commented Nov 22, 2023

lpdink commented Nov 22, 2023

flamed0g commented Mar 5, 2024 • edited Loading

ivtavares commented Apr 19, 2024

Mougrouff commented May 13, 2024

alifim commented Jun 7, 2024

ben-cha commented Aug 8, 2024

keyboarderror commented Feb 13, 2025

benhillis commented Apr 13, 2023 •

edited

Loading

lpdink commented Apr 13, 2023 •

edited

Loading

gosselind1 commented Apr 14, 2023 •

edited

Loading

lpdink commented Apr 18, 2023 •

edited

Loading

Dounx commented May 30, 2023 •

edited

Loading

flamed0g commented Mar 5, 2024 •

edited

Loading