Ubuntu 18.04 中的 Nvidia 驱动程序导致冻结

Ubuntu 18.04 中的 Nvidia 驱动程序导致冻结

我正在运行搭载 Ubuntu 18.04 和 NVidia Quadro T2000 的 Dell Precision 5540。

lspci

01:00.0 3D controller: NVIDIA Corporation TU117GLM [Quadro T2000 Mobile / Max-Q] (rev a1)

ubuntu-drivers devices

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FB8sv00001028sd00000906bc03sc02i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-430 - distro non-free
driver   : nvidia-driver-435 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

我的内核版本来自uname -r

4.15.0-1079-oem

我的 Ubuntu 版本信息lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:    18.04
Codename:   bionic

当我登录并被nvidia选中nvidia-prime时(即,当加载 nvidia 驱动程序时),我可以登录,但过了一段时间后,X 服务器似乎崩溃了,冻结了 GUI。我说“冻结 GUI”,是因为如果我不管它,日志就会dmesg在后台继续增长。dmesg以下错误与冻结同时发生:

Apr 18 15:08:17 artificer kernel: NVRM: GPU at PCI:0000:01:00: GPU-54fd0cf3-82c7-1f02-af0a-5555977a0327
Apr 18 15:08:17 artificer kernel: NVRM: GPU Board Serial Number: 
Apr 18 15:08:17 artificer kernel: NVRM: Xid (PCI:0000:01:00): 62, pid=617, 203c(3090) 00000000 00000000
Apr 18 15:08:17 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=617, Ch 00000000
Apr 18 15:08:17 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=617, Ch 00000001
Apr 18 15:08:17 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
Apr 18 15:08:17 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) NVIDIA(0):     recover...
Apr 18 15:08:17 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=617, Ch 00000008
Apr 18 15:08:19 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3247, Ch 00000009
Apr 18 15:08:19 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3247, Ch 0000000a
Apr 18 15:08:19 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3247, Ch 00000010
Apr 18 15:08:20 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3445, Ch 00000011
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3445, Ch 00000012
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3445, Ch 00000013
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3445, Ch 00000014
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3445, Ch 00000015
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=3445, Ch 00000018
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=4331, Ch 00000019
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=4331, Ch 0000001a
Apr 18 15:08:21 artificer kernel: NVRM: Xid (PCI:0000:01:00): 45, pid=4331, Ch 0000001b
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) NVIDIA(GPU-0): Failed to allocate notification memory.
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) NVIDIA(0): Failed to allocate push buffer
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) NVIDIA(0): Error recovery failed.
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) NVIDIA(0):  *** Aborting ***
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE)
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: Fatal server error:
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) Failed to recover from error!
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE)
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: Please consult the The X.Org Foundation support
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]:          at http://wiki.x.org
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]:  for help.
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE) Please also check the log file at "/home/brad/.local/share/xorg/Xorg.0.log" for additional information.
Apr 18 15:08:21 artificer /usr/lib/gdm3/gdm-x-session[3245]: (EE)

日志Xorg.0.log并没有提供更多信息。

[   139.284] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
[   139.284] (EE) NVIDIA(0):     recover...
[   143.875] (EE) NVIDIA(GPU-0): Failed to allocate notification memory.
[   143.875] (EE) NVIDIA(0): Failed to allocate push buffer
[   143.875] (EE) NVIDIA(0): Error recovery failed.
[   143.875] (EE) NVIDIA(0):  *** Aborting ***
[   143.875] (EE)
Fatal server error:
[   143.875] (EE) Failed to recover from error!
[   143.875] (EE)
[   143.875] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[   143.875] (EE) Please also check the log file at "/home/brad/.local/share/xorg/Xorg.0.log" for additional information.
[   143.875] (EE)

我尝试了很多不同的解决方案,包括:

  • 安装 440 驱动程序
  • 安装 430 驱动程序
  • 安装 435 驱动程序
  • 通过安装驱动程序sudo ubuntu-drivers autoinstall
  • apt --purge remove 'nvidia-*每次安装前执行。

每次我做这些事情时,登录后大约两分钟内都会出现一致的“冻结”。我通常会利用这段时间在不可避免的崩溃之前运行prime-select intel,因为选择 GPU 后系统不会崩溃intel

有人有其他想法吗?

答案1

在排除了一系列潜在的软件原因后,发现此行为是由 GPU 的物理故障引起的。完全恢复出厂设置后,冻结问题仍然存在。更换了硬件。

相关内容