尝试安装 GPU 驱动程序时遇到问题。
我安装了 495 NVIDIA 驱动程序。这是 Ubuntu 推荐的驱动程序。
不知何故,nvidia-smi 找不到我安装的驱动程序。然而,运行 DKMS 状态时,该驱动程序存在:
user@server:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
user@server:~$ dkms status
nvidia, 495.29.05, 5.11.0-41-generic, x86_64: installed
user@server:~$ nvidia-debugdump -l
Error: nvmlInit(): Driver Not Loaded
user@server:~$ lspci -v | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1) (prog-if 00 [VGA controller])
我也得到了这个:
user@server:~$ systemctl status nvidia-persistenced.service
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2021-12-06 14:50:17 EST; 38min ago
Process: 1109 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=1/FAILURE)
Process: 1116 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced/* (code=exited, status=0/SUCCESS)
Dec 06 14:50:17 server systemd[1]: nvidia-persistenced.service: Scheduled restart job, restart counter is at 5.
Dec 06 14:50:17 server systemd[1]: Stopped NVIDIA Persistence Daemon.
Dec 06 14:50:17 server systemd[1]: nvidia-persistenced.service: Start request repeated too quickly.
Dec 06 14:50:17 server systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Dec 06 14:50:17 server systemd[1]: Failed to start NVIDIA Persistence Daemon.
我的 GPU 被检测到了。我是否缺少将 nvidia-smi 链接到 Ubuntu NVIDIA 驱动程序所需的步骤?
我有一个xorg.conf
文件,可以设置分辨率大小,但不能将其连接到 nvidia-smi。
如果您需要有关此问题的更多信息,请告诉我。
先感谢您。
(编辑):
这是sudo lshw -c video
user@server:~$ sudo lshw -c video
*-display UNCLAIMED
description: VGA compatible controller
product: TU106 [GeForce RTX 2060 Rev. A]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list
configuration: latency=0
resources: memory:f6000000-f6ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:e000(size=128) memory:c0000-dffff
答案1
正如 heynnema 所说,
我在我的 UEFI bios 中禁用了安全启动。nvidia-smi
现在运行良好!
user@server:~$ nvidia-smi
Mon Dec 6 18:14:06 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 49C P8 17W / 190W | 179MiB / 5931MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1152 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 1889 G /usr/lib/xorg/Xorg 48MiB |
| 0 N/A N/A 2016 G /usr/bin/gnome-shell 84MiB |
+-----------------------------------------------------------------------------+
谢谢你的帮助!