How does sysbox k8s in docker schedule tensorflow/tensorflow:2.9.1-gpu？ #643

zhongcloudtian · 2023-02-23T09:10:04Z

nvidia driver verison : NVIDIA-Linux-x86_64-525.85.12.run, os: ubuntu 20.04
docker run --detach --interactive --runtime=sysbox-runc --name k8s-worker01 --hostname=k8s-worker01
--mount type=tmpfs,destination=/proc/driver/nvidia
--mount type=bind,source=/usr/bin/nvidia-smi,target=/usr/bin/nvidia-smi
--mount type=bind,source=/usr/bin/nvidia-debugdump,target=/usr/bin/nvidia-debugdump
--mount type=bind,source=/usr/bin/nvidia-persistenced,target=/usr/bin/nvidia-persistenced
--mount type=bind,source=/usr/bin/nvidia-cuda-mps-control,target=/usr/bin/nvidia-cuda-mps-control
--mount type=bind,source=/usr/bin/nvidia-cuda-mps-server,target=/usr/bin/nvidia-cuda-mps-server
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu
--mount type=bind,source=/run/nvidia-persistenced/socket,target=/run/nvidia-persistenced/socket
--device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm
--device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
--device /dev/nvidia0:/dev/nvidia0
nestybox/k8s-node:v1.20.2
sysbox run tensorflow/tensorflow:2.9.1-gpu as follows：
docker run --gpus all --mount type=bind,source=/usr/bin/nvidia-smi,target=/usr/bin/nvidia-smi -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu --device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --name test10 tensorflow/tensorflow:2.9.1-gpu python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
error message:
E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: aaf4ecde1157
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: aaf4ecde1157
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 525.85.12

ctalledo · 2023-03-29T02:58:18Z

Hi @zhongcloudtian, apologies for the belated reply.

Sysbox does not formally support exposing Nvidia GPUs into the container yet, though other folks have had some success with it; see:

#50
#452

Hope that helps.

ctalledo · 2023-03-29T02:58:40Z

Duplicate of Issue #50.

ctalledo closed this as completed Mar 29, 2023

ctalledo added the duplicate This issue or pull request already exists label Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does sysbox k8s in docker schedule tensorflow/tensorflow:2.9.1-gpu？ #643

How does sysbox k8s in docker schedule tensorflow/tensorflow:2.9.1-gpu？ #643

zhongcloudtian commented Feb 23, 2023 •

edited

Loading

ctalledo commented Mar 29, 2023

ctalledo commented Mar 29, 2023

How does sysbox k8s in docker schedule tensorflow/tensorflow:2.9.1-gpu？ #643

How does sysbox k8s in docker schedule tensorflow/tensorflow:2.9.1-gpu？ #643

Comments

zhongcloudtian commented Feb 23, 2023 • edited Loading

ctalledo commented Mar 29, 2023

ctalledo commented Mar 29, 2023

zhongcloudtian commented Feb 23, 2023 •

edited

Loading