tensorflow-gpu & Nvidia GPU & Cuda & Cudnn 环境配置
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get purge nvidia*
$ sudo vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
禁 Ubuntu 自带开源驱动nouveau,写入后保存重启
$ sudo reboot
$ lsmod | grep nouveau
GTX 1080 / RTX 2080 - Download CUDA: CUDA Toolkit 10.0 (Sept 2018)
[CUDA Link]
Ctrl + Alt + F1-( Enter virtual consoles )进入tty1命令行界面安装CUDA
Ctrl + Alt + F7-( Return back to GUI )回到桌面系统界面
$ sudo service lightdm stop
注意:确定下载的 cuda 版本后执行指令
$ sudo chmod 777 cuda_10.0.130_410.48_linux.run
$ sudo sh cuda_10.0.130_410.48_linux.run --no-opengl-libs
...
进入 CUDA 安装中同意安装 Nvidia 驱动
[accept] #同意安装 [y] #安装Driver,将自动安装CUDA版本相匹配的Nvidia驱动 [y] #安装CUDA Toolkit install <Enter> #安装到默认目录 [y] #创建安装目录的软链接 [n] #不复制Samples,因为在安装目录下有/samples
安装完成后会显示 CUDA 和 Nvidia 驱动成功安装
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
$ source .bashrc
$ nvidia-smi
$ nvcc -V
$ cd /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery
$ sudo make
$ ./deviceQuery
$ cd ../bandwidthTest
$ sudo make
$ ./bandwidthTest
GTX 1080 - cuDNN v7.1.4 (May 16, 2018), for CUDA 9.0
[cuDNN Link]
RTX 2080 - cuDNN v7.5.0 (May 16, 2018), for CUDA 10.0
[cuDNN Link]
RTX 2080 Ti 请按指定版本安装 cuDNN 以及更改以下相对应的安装文件名默认执行 gtx 1080 Ti
$ tar -zxvf cudnn-9.0-linux-x64-v7.1.tgz
$ cd cuda $ sudo cp lib64/lib* /usr/local/cuda/lib64/
$ sudo cp include/cudnn.h /usr/local/cuda/include/
$ cd /usr/local/cuda/lib64/ $ sudo chmod +r libcudnn.so.7.1.4
$ sudo ln -sf libcudnn.so.7.1.4 libcudnn.so.7
$ sudo ln -sf libcudnn.so.7 libcudnn.so $ sudo ldconfig
GTX 1080 / RTX 2080
$ pip install --user tensorflow-gpu==1.13.0
$ sudo apt-get remove nvidia-*
$ sudo apt-get autoremove
$ sudo nvidia-uninstall
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running issue
GTX 1080 Ti
$ uname -a
#目前使用版本为 4.15
Linux CAI 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
若版本高于 4.10 必须降级, 下载 4.10 内核方法如下
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/linux-headers-4.10.0-041000_4.10.0-041000.201702191831_all.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/linux-headers-4.10.0-041000-generic_4.10.0-041000.201702191831_amd64.deb
wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/linux-image-4.10.0-041000-generic_4.10.0-041000.201702191831_amd64.deb
sudo dpkg -i *.deb
升级完成后 nvidia-smi 出现 GPU 使用狀況栏可不用重新安装 Driver, 若未出现可按步骤重新安装 Driver
$ lspci | grep 'VGA'
#找到卡后,显示显卡讯息
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
a. 开机后, 进入Bios 设定画面(若是Acer的电脑, 按Del 或是F2 即可进入Bios)
b. 改成disable 后, 重新开机
$ sudo gedit /etc/X11/xorg.conf
Section "Monitor"
Identifier "Configured Monitor"
Modeline "1920x1080_60.00" 173.00 1920 2048 2248 2576 1080
1083 1088 1120 -hsync +vsync
Option "PreferredMode" "1920x1080_60.00"
EndSection
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
Device "Configured Video Device"
EndSection
Section "Device"
Identifier "Configured Video Device"
EndSection
$ cvt 1920 1080
# 1920x1080 59.96 Hz (CVT 2.07M9) hsync: 67.16 kHz; pclk: 173.00 MHz
Modeline "1920x1080_60.00" 173.00 1920 2048 2248 2576 1080 1083 1088 1120 -hsync +vsync
$ sudo xrandr --newmode "1920x1080_60.00" 173.00 1920 2048 2248 2576 1080 1083 1088 1120 -hsync +vsync
$ sudo xrand --addmode [THE NAME OF YOUR DISPLAY] "1920x1080_60.00"
$ sudo xrand --output [THE NAME OF YOUR DISPLAY] --mode "1920x1080_60.00"
$ sudo apt-get purge nvidia*
$ cd /usr/local/cuda/bin
$ sudo ./uninstall_cuda_7.5.pl
$ sudo rm -rf /usr/local/cuda/include/cudnn.h
$ sudo rm -rf /usr/local/cuda/lib64/libcudnn