为什么 Nvidia 官方指示在 Ubuntu 16.04 上安装 CUDA 11 不起作用?

为什么 Nvidia 官方指示在 Ubuntu 16.04 上安装 CUDA 11 不起作用?

我正在关注Nvidia 官方说明在 Ubuntu 16.04.7 LTS x64 上安装 CUDA 11 不起作用:

apt-get update && apt-get upgrade 
wget 
http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
sudo sh cuda_11.0.2_450.51.05_linux.run

安装过程中我保留了默认设置:

在此处输入图片描述

我收到以下错误消息:

Installation failed. See log at /var/log/cuda-installer.log for details.

/var/log/cuda-installer.log包含:

nano /var/log/cuda-installer.log [INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 450.51.05
[INFO]: Executing NVIDIA-Linux-x86_64-450.51.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 450.51.05 failed, quitting

我也尝试了其他Nvidia 官方说明

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-11-0-local_11.0.2-450.51.05-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1604-11-0-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

然而,这也给了我一个错误:

Processing triggers for ca-certificates (20210119~16.04.1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...

done.
done.
Errors were encountered while processing:
 nvidia-450
 nvidia-450-dev
 cuda-drivers-450
 cuda-drivers
 cuda-runtime-11-0
 cuda-demo-suite-11-0
 cuda-11-0
 cuda
E: Sub-process /usr/bin/dpkg returned an error code (1)

我使用 Ubuntu 16.04.7 LTS x64 nvidia-smi。:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P0    23W / 300W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

可能是什么问题?

答案1

使用驱动程序文件在终端中运行,NVIDIA-Linux-x86_64-450.51.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check查看驱动程序是否单独安装。如果是,则其他驱动程序--install-libglvnd已损坏。无论哪种情况,都需要联系 Nvidia 以修复他们为此提供的软件包中损坏的说明。他们的驱动程序已损坏,或者您的显卡不兼容,导致驱动程序安装失败,或者安装的最后部分已损坏libglvnd。实际上,只有在这是他们的问题而不是您的显卡兼容性问题时才需要联系 Nvidia。

答案2

诀窍是从 Ubuntu 恢复模式安装。

启动到 GRUB 菜单并选择恢复 -> 在恢复菜单中选择“root” -> 从终端运行 cuda_11.3.1_465.19.01_linux.run(您的版本可能有所不同)

确保您的卡受驱动程序支持。

安装后,我必须重新链接才能运行 cuda 程序:

$ ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1.7.0 /usr/lib/x86_64-linux-gnu/mesa/libGL.so

驾驶员状况:

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T400         Off  | 00000000:01:00.0  On |                  N/A |
| 38%   40C    P8    N/A /  31W |    181MiB /  1867MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

我的系统是 Ubuntu 16,带有定制的内核 4.15.18:

$ uname -a
Linux np1 4.15.18-cma #1 SMP Tue May 17 13:07:31 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.7 LTS
Release:    16.04
Codename:   xenial

相关内容