我正在关注Nvidia 官方说明在 Ubuntu 16.04.7 LTS x64 上安装 CUDA 11 不起作用:
apt-get update && apt-get upgrade
wget
http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
sudo sh cuda_11.0.2_450.51.05_linux.run
安装过程中我保留了默认设置:
我收到以下错误消息:
Installation failed. See log at /var/log/cuda-installer.log for details.
/var/log/cuda-installer.log
包含:
nano /var/log/cuda-installer.log [INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 450.51.05
[INFO]: Executing NVIDIA-Linux-x86_64-450.51.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 450.51.05 failed, quitting
我也尝试了其他Nvidia 官方说明:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1604-11-0-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
然而,这也给了我一个错误:
Processing triggers for ca-certificates (20210119~16.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
done.
Errors were encountered while processing:
nvidia-450
nvidia-450-dev
cuda-drivers-450
cuda-drivers
cuda-runtime-11-0
cuda-demo-suite-11-0
cuda-11-0
cuda
E: Sub-process /usr/bin/dpkg returned an error code (1)
我使用 Ubuntu 16.04.7 LTS x64 nvidia-smi
。:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P0 23W / 300W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
可能是什么问题?
答案1
使用驱动程序文件在终端中运行,NVIDIA-Linux-x86_64-450.51.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check
查看驱动程序是否单独安装。如果是,则其他驱动程序--install-libglvnd
已损坏。无论哪种情况,都需要联系 Nvidia 以修复他们为此提供的软件包中损坏的说明。他们的驱动程序已损坏,或者您的显卡不兼容,导致驱动程序安装失败,或者安装的最后部分已损坏libglvnd
。实际上,只有在这是他们的问题而不是您的显卡兼容性问题时才需要联系 Nvidia。
答案2
诀窍是从 Ubuntu 恢复模式安装。
启动到 GRUB 菜单并选择恢复 -> 在恢复菜单中选择“root” -> 从终端运行 cuda_11.3.1_465.19.01_linux.run(您的版本可能有所不同)
确保您的卡受驱动程序支持。
安装后,我必须重新链接才能运行 cuda 程序:
$ ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1.7.0 /usr/lib/x86_64-linux-gnu/mesa/libGL.so
驾驶员状况:
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA T400 Off | 00000000:01:00.0 On | N/A |
| 38% 40C P8 N/A / 31W | 181MiB / 1867MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
我的系统是 Ubuntu 16,带有定制的内核 4.15.18:
$ uname -a
Linux np1 4.15.18-cma #1 SMP Tue May 17 13:07:31 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.7 LTS
Release: 16.04
Codename: xenial