我正在尝试在运行 Ubuntu 18.04 的远程服务器上安装 nvidia 驱动程序而不是 nouveau,我通过远程桌面(运行 Plasma KDE)访问该服务器。我按照其他帖子中的说明尝试安装专有驱动程序,然后多次清除并重新安装。我不确定驱动程序是否安装正确。
我做了什么(在这篇文章之后在 Ubuntu Server 18.04 上安装 Nvidia 驱动程序):
sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get install nvidia-driver-430
sudo reboot
这一切看上去都很顺利。
然后我检查nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| 33% 37C P8 7W / 200W | 54MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1273 G /usr/lib/xorg/Xorg 9MiB |
| 0 1339 G /usr/bin/sddm-greeter 42MiB |
+-----------------------------------------------------------------------------+
这让我认为驱动程序已正确安装,但是却sudo nvidia-settings
无法运行:
ERROR: Unable to load info from any available system
(nvidia-settings:2278): GLib-GObject-CRITICAL **: 13:28:34.205: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
** Message: 13:28:34.206: PRIME: No offloading required. Abort
** Message: 13:28:34.206: PRIME: is it supported? no
并且glxinfo | grep nvidia
不返回任何内容。
最后两个事实让我认为驱动程序未正确安装。
最后,我不确定这到底是什么意思,但是这是输出dmesg | grep nvidia
:
[ 1.257269] nvidia: loading out-of-tree module taints kernel.
[ 1.257467] nvidia: module license 'NVIDIA' taints kernel.
[ 1.261066] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 1.267769] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[ 1.268292] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 1.389997] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 430.40 Sun Jul 21 04:57:42 CDT 2019
[ 1.390918] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 2.091441] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[ 4.258679] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
我认为问题可能出在:
在一些帖子中,例如:
如何在 Ubuntu 18.04 上使用安全启动安装 Nvidia 驱动程序?
无法在 Ubuntu 18.04.1 上安装 Nvidia 驱动程序
我发现在安装过程中会要求您输入密码,然后您需要在重新启动时再次输入该密码。我没有这个步骤。事实上,我不确定如何处理在计算机启动之前采取的任何行动建议(例如进入 BIOS 或禁用安全启动,因为正如我所说,它是一个远程服务器,我可以通过 ssh 或远程桌面访问它。
额外的信息:
跑步glxinfo | grep OpenGL
:
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 8.0, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.0.2
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 19.0.2
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 19.0.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:
跑步sudo nvidia-settings
:
上面列出了错误代码,但出现了一个小的空白窗口。
/etc/X11/Xorg.conf
文件内容:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 430.40
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
EndSubSection
EndSection
运行时终端输出sudo apt-get install nvidia-driver-430
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libxnvctrl0 nvidia-compute-utils-430 nvidia-dkms-430 nvidia-kernel-common-430 nvidia-kernel-source-430
nvidia-prime nvidia-settings nvidia-utils-430 screen-resolution-extra
The following NEW packages will be installed:
libxnvctrl0 nvidia-compute-utils-430 nvidia-dkms-430 nvidia-driver-430 nvidia-kernel-common-430
nvidia-kernel-source-430 nvidia-prime nvidia-settings nvidia-utils-430 screen-resolution-extra
0 upgraded, 10 newly installed, 0 to remove and 162 not upgraded.
Need to get 0 B/13.8 MB of archives.
After this operation, 39.0 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Selecting previously unselected package libxnvctrl0:amd64.
(Reading database ... 250513 files and directories currently installed.)
Preparing to unpack .../0-libxnvctrl0_418.56-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking libxnvctrl0:amd64 (418.56-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-compute-utils-430.
Preparing to unpack .../1-nvidia-compute-utils-430_430.40-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-compute-utils-430 (430.40-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-kernel-source-430.
Preparing to unpack .../2-nvidia-kernel-source-430_430.40-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-kernel-source-430 (430.40-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-kernel-common-430.
Preparing to unpack .../3-nvidia-kernel-common-430_430.40-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-kernel-common-430 (430.40-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-dkms-430.
Preparing to unpack .../4-nvidia-dkms-430_430.40-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-dkms-430 (430.40-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-utils-430.
Preparing to unpack .../5-nvidia-utils-430_430.40-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-utils-430 (430.40-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-driver-430.
Preparing to unpack .../6-nvidia-driver-430_430.40-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-driver-430 (430.40-0ubuntu0~gpu18.04.1) ...
Selecting previously unselected package nvidia-prime.
Preparing to unpack .../7-nvidia-prime_0.8.8.2_all.deb ...
Unpacking nvidia-prime (0.8.8.2) ...
Selecting previously unselected package screen-resolution-extra.
Preparing to unpack .../8-screen-resolution-extra_0.17.3_all.deb ...
Unpacking screen-resolution-extra (0.17.3) ...
Selecting previously unselected package nvidia-settings.
Preparing to unpack .../9-nvidia-settings_418.56-0ubuntu0~gpu18.04.1_amd64.deb ...
Unpacking nvidia-settings (418.56-0ubuntu0~gpu18.04.1) ...
Setting up nvidia-prime (0.8.8.2) ...
Processing triggers for mime-support (3.60ubuntu1) ...
Setting up nvidia-utils-430 (430.40-0ubuntu0~gpu18.04.1) ...
Setting up nvidia-kernel-common-430 (430.40-0ubuntu0~gpu18.04.1) ...
update-initramfs: deferring update (trigger activated)
Setting up nvidia-compute-utils-430 (430.40-0ubuntu0~gpu18.04.1) ...
Warning: The home dir /nonexistent you specified can't be accessed: No such file or directory
Adding system user `nvidia-persistenced' (UID 127) ...
Adding new group `nvidia-persistenced' (GID 135) ...
Adding new user `nvidia-persistenced' (UID 127) with group `nvidia-persistenced' ...
Not creating home directory `/nonexistent'.
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up nvidia-kernel-source-430 (430.40-0ubuntu0~gpu18.04.1) ...
Setting up screen-resolution-extra (0.17.3) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for dbus (1.12.2-1ubuntu1.1) ...
Setting up libxnvctrl0:amd64 (418.56-0ubuntu0~gpu18.04.1) ...
Setting up nvidia-dkms-430 (430.40-0ubuntu0~gpu18.04.1) ...
update-initramfs: deferring update (trigger activated)
INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
Loading new nvidia-430.40 DKMS files...
Building for 4.15.0-58-generic
Building for architecture x86_64
Building initial module for 4.15.0-58-generic
Done.
nvidia:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-58-generic/updates/dkms/
nvidia-modeset.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-58-generic/updates/dkms/
nvidia-drm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-58-generic/updates/dkms/
nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-58-generic/updates/dkms/
depmod...
DKMS: install completed.
Setting up nvidia-driver-430 (430.40-0ubuntu0~gpu18.04.1) ...
Setting up nvidia-settings (418.56-0ubuntu0~gpu18.04.1) ...
Processing triggers for initramfs-tools (0.130ubuntu3.8) ...
update-initramfs: Generating /boot/initrd.img-4.15.0-58-generic
I: The initramfs will attempt to resume from /dev/dm-1
I: (/dev/mapper/vg0-swap)
I: Set the RESUME variable to override this.
Processing triggers for libc-bin (2.27-3ubuntu1) ...
答案1
最近,我的服务器出现了问题,该服务器有 4 个 A100 GPU 卡。驱动程序工作正常,但必须安装较新的驱动程序才能访问较新的 Cuda(从 450 更新到 510),以下是在 ubuntu 20.04 上顺利运行的,但在 ubuntu 18.04 上也可以运行:
最好避免使用 apt-get 来安装 Nvidia 驱动程序!对我来说,避免使用任何 x11 Xorg 是至关重要的,所以我避免使用它。但是,对你来说,它可能没问题。
清除所有(相信我,这看起来很有风险但是有效):
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
sudo nvidia-uninstaller
去这里,上面写着 CUDA 工具包,但它同时包含 Nvidia 驱动程序和 Cuda。基于表3选择您想要安装的版本 笔记:Cuda 实际上是指最大可用 Cuda。换句话说,如果您安装 Cuda 工具包 11.6(其中包含 Nvidia 驱动程序 v510;您可以使用较旧的 Cuda(如 11.5...及更早版本),而无需更改驱动程序版本。
从这里点击
CUDA Toolkit XX.X.X
(20XX月),并打开其版本化在线文档。
4.1 安装所需的内核和工具:版本化在线文档: 去安装指南 Linux并查看表 1。基本上,它说的是 Nvidia 驱动程序所需的内核版本:在我的情况下,我是11.6.1
在 Ubuntu 20.04.3 上安装的,所以我安装了相同版本的内核映像及其标头,然后使用该内核启动(使用检查你的内核版本,uname -r
如果相同,则转到下一步,否则:($(name -r)
用中提到的确切版本替换表格1)
sudo apt install linux-image-$(uname -r)
4.2 安装内核头文件:#(将 $(uname -r) 替换为表格1)
sudo apt install linux-headers-$(uname -r)
4.3安装所需的驱动程序+cuda(它被称为 CUDA 工具包):按照第一个链接 CUDA Toolkit XX.X.X
并单击Linux
>>>> (都可以,我更喜欢这个X86_64
)Ubuntu
your ubuntu version
run file (local)
然后在终端中使用提供的两个命令,剩下的就很容易了。
训练愉快!