Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does sysbox k8s in docker schedule tensorflow/tensorflow:2.9.1-gpu? #643

Closed
zhongcloudtian opened this issue Feb 23, 2023 · 2 comments
Labels
duplicate This issue or pull request already exists

Comments

@zhongcloudtian
Copy link

zhongcloudtian commented Feb 23, 2023

  1. nvidia driver verison : NVIDIA-Linux-x86_64-525.85.12.run, os: ubuntu 20.04
  2. docker run --detach --interactive --runtime=sysbox-runc --name k8s-worker01 --hostname=k8s-worker01
    --mount type=tmpfs,destination=/proc/driver/nvidia
    --mount type=bind,source=/usr/bin/nvidia-smi,target=/usr/bin/nvidia-smi
    --mount type=bind,source=/usr/bin/nvidia-debugdump,target=/usr/bin/nvidia-debugdump
    --mount type=bind,source=/usr/bin/nvidia-persistenced,target=/usr/bin/nvidia-persistenced
    --mount type=bind,source=/usr/bin/nvidia-cuda-mps-control,target=/usr/bin/nvidia-cuda-mps-control
    --mount type=bind,source=/usr/bin/nvidia-cuda-mps-server,target=/usr/bin/nvidia-cuda-mps-server
    -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu
    --mount type=bind,source=/run/nvidia-persistenced/socket,target=/run/nvidia-persistenced/socket
    --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm
    --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
    --device /dev/nvidia0:/dev/nvidia0
    nestybox/k8s-node:v1.20.2
  3. sysbox run tensorflow/tensorflow:2.9.1-gpu as follows:
    docker run --gpus all --mount type=bind,source=/usr/bin/nvidia-smi,target=/usr/bin/nvidia-smi -v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu --device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --name test10 tensorflow/tensorflow:2.9.1-gpu python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
    error message:
    E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
    I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: aaf4ecde1157
    I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: aaf4ecde1157
    I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
    I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 525.85.12
@ctalledo
Copy link
Member

Hi @zhongcloudtian, apologies for the belated reply.

Sysbox does not formally support exposing Nvidia GPUs into the container yet, though other folks have had some success with it; see:

#50
#452

Hope that helps.

@ctalledo
Copy link
Member

Duplicate of Issue #50.

@ctalledo ctalledo added the duplicate This issue or pull request already exists label Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants