我清除了与 cuda/nvidia 相关的所有内容,并从 NVIDIA 网站从头安装了适用于 1650 Ti(笔记本)的 nvidia 驱动程序,它再次为我安装了 cuda 11.2,而我确实需要适用于 PyTorch 1.8 GPU 版本的 CUDA 11.1。如何在 Ubuntu 20.04 中为 NVIDIA GeForce 1650 Ti(笔记本)GPU 安装 CUDA 11.1?
$ sudo apt-get --purge 删除”库布拉斯“cuda*” “nsight*”
$ sudo apt-get --purge remove "*nvidia*"
$ sudo rm -rf /usr/local/cuda*
$ sudo apt-get autoclean
$ sudo apt-get autoremove
$ sudo apt-get purge nvidia*
从 NVIDIA 网站下载 CUDA 11.1:
$ wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
从 NVIDIA 网站下载适用于 Linux 64 位的 1650 Ti(笔记本)的 NVIDIA 驱动程序,然后:
$ sudo init 3
然后安装 nvidia 驱动程序
$ ./NVIDIA-Linux-x86_64-460.56.run
$ reboot
然后在 ubuntu gui 终端
mona@goku:~$ sudo sh cuda_11.1.0_455.23.05_linux.run
[sudo] password for mona:
Installation failed. See log at /var/log/cuda-installer.log for details.
The log in vi /var/log/nvidia-installer.log reads as:
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
安装 nvidia-driver 后,即使没有安装我下载的 CUDA 11.1,我也看到安装了 cuda 版本 11.2
$ nvidia-smi
Fri Mar 5 20:16:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 165... Off | 00000000:01:00.0 Off | N/A |
| N/A 39C P8 3W / N/A | 10MiB / 3911MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1173 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1703 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
日志为:
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Fri Mar 5 20:30:23 2021
installer version: 455.23.05
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
nvidia-installer command line:
./nvidia-installer
--ui=none
--no-questions
--accept-license
--disable-nouveau
--no-cc-version-check
--install-libglvnd
Using built-in stream user interface
-> Detected 12 CPUs online; setting concurrency level to 12.
ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.