我的GPU是NVIDIA - GeForce RTX 3090 Ti,操作系统是乌班图18.04。由于我的代码无法运行,我检查了 python、pytorch、cuda 和 cudnn 的版本。
- 蟒蛇:3.6
- 火炬。版本:1.4.0
- torch.version.cuda :10.1(nvidia-smi 显示 CUDA 版本 11.3)
- 库德恩:7.6.3
这些与3090 Ti不兼容,我成功升级了Python 到 3.9, 和Pytorch 至 1.12.1+cu102。但是,“pip3 install cuda-python”和“pip install nvidia-cudnn”对我不起作用。所以我按照网站上的步骤操作。
- 对于cuda(尝试过11.8版本):https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=18.04&target_type=deb_local
- 对于 cudnn(尝试版本 8.6.0,tar 文件安装):安装指南 :: NVIDIA 深度学习 cuDNN 文档
安装步骤完成后,nvidia-smi 显示“无法初始化 NVML:驱动程序/库版本不匹配”。我发现重启就可以了,但是系统卡在重启步骤了。
dpkg -l |grep nvidia
iU libnvidia-cfg1-520:amd64 520.61.05-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-465 465.19.01-0ubuntu1 all Shared files used by the NVIDIA libraries
iU libnvidia-common-520 520.61.05-0ubuntu1 all Shared files used by the NVIDIA libraries
rc libnvidia-compute-465:amd64 465.19.01-0ubuntu1 amd64 NVIDIA libcompute package
iU libnvidia-compute-520:amd64 520.61.05-0ubuntu1 amd64 NVIDIA libcompute package
iU libnvidia-compute-520:i386 520.61.05-0ubuntu1 i386 NVIDIA libcompute package
ii libnvidia-container-tools 1.11.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.11.0-1 amd64 NVIDIA container runtime library
iU libnvidia-decode-520:amd64 520.61.05-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries
iU libnvidia-decode-520:i386 520.61.05-0ubuntu1 i386 NVIDIA Video Decoding runtime libraries
iU libnvidia-encode-520:amd64 520.61.05-0ubuntu1 amd64 NVENC Video Encoding runtime library
iU libnvidia-encode-520:i386 520.61.05-0ubuntu1 i386 NVENC Video Encoding runtime library
iU libnvidia-extra-520:amd64 520.61.05-0ubuntu1 amd64 Extra libraries for the NVIDIA driver
iU libnvidia-fbc1-520:amd64 520.61.05-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
iU libnvidia-fbc1-520:i386 520.61.05-0ubuntu1 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
iU libnvidia-gl-520:amd64 520.61.05-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
iU libnvidia-gl-520:i386 520.61.05-0ubuntu1 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
rc nvidia-compute-utils-465 465.19.01-0ubuntu1 amd64 NVIDIA compute utilities
iU nvidia-compute-utils-520 520.61.05-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-container-toolkit 1.11.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.11.0-1 amd64 NVIDIA Container Toolkit Base
rc nvidia-dkms-465 465.19.01-0ubuntu1 amd64 NVIDIA DKMS package
iU nvidia-dkms-520 520.61.05-0ubuntu1 amd64 NVIDIA DKMS package
iU nvidia-driver-520 520.61.05-0ubuntu1 amd64 NVIDIA driver metapackage
rc nvidia-kernel-common-465 465.19.01-0ubuntu1 amd64 Shared files used with the kernel module
iU nvidia-kernel-common-520 520.61.05-0ubuntu1 amd64 Shared files used with the kernel module
iU nvidia-kernel-source-520 520.61.05-0ubuntu1 amd64 NVIDIA kernel source package
iU nvidia-modprobe 520.61.05-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-opencl-dev:amd64 9.1.85-3ubuntu1 amd64 NVIDIA OpenCL development files
ii nvidia-prime 0.8.16~0.18.04.1 all Tools to enable NVIDIA’s Prime
iU nvidia-settings 520.61.05-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
iU nvidia-utils-520 520.61.05-0ubuntu1 amd64 NVIDIA driver support binaries
iU xserver-xorg-video-nvidia-520 520.61.05-0ubuntu1 amd64 NVIDIA binary Xorg driver
ls -l /usr/lib/x86_64-linux-gnu/libcuda*
lrwxrwxrwx 1 root root 28 Sep 29 05:22 /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1 → libcudadebugger.so.520.61.05
-rw-r–r-- 1 root root 10934360 Sep 29 01:20 /usr/lib/x86_64-linux-gnu/libcudadebugger.so.520.61.05
lrwxrwxrwx 1 root root 12 Sep 29 05:22 /usr/lib/x86_64-linux-gnu/libcuda.so → libcuda.so.1
lrwxrwxrwx 1 root root 20 Sep 29 05:22 /usr/lib/x86_64-linux-gnu/libcuda.so.1 → libcuda.so.520.61.05
-rw-r–r-- 1 root root 26284256 Sep 29 01:56 /usr/lib/x86_64-linux-gnu/libcuda.so.520.61.05
dkms status
virtualbox, 5.2.42, 5.4.0-126-generic, x86_64: installed
virtualbox, 5.2.42, 5.4.0-72-generic, x86_64: installed
答案1
当前的驱动程序似乎会导致黑屏并在启动时冻结机器。
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
22.04
升级驱动程序/cuda 软件包后,我在裸机 Ubuntu 上遇到了这个问题。但是,具有类似 rtx3090 直通 GPU 的虚拟机可以在相同的驱动程序和操作系统版本下正常工作。也许是因为他们仅使用 GPU 进行计算而不是显示。
有些人说从 HDMI 输入切换到 DP 可能会有所帮助。我没有测试过。根据 Nvidia 代表的说法,该修复将在下一个版本中发布,因此您可以降级到以前的版本或等待修复。