PCIe 设备导致 Ubuntu 16.04 崩溃

PCIe 设备导致 Ubuntu 16.04 崩溃

我使用的是运行 Ubuntu 16.04 LTS 的 Dell Precision 5820 Tower,它有 4 个 PCIe DAQ 卡以及一个 AMD RV710 显卡(因此 Tower 中总共有 5 个 PCIe 设备)。我设计了一些软件,允许我通过按一个或多个按钮来启动和停止 4 个 PCIe DAQ 卡中的任何一个或全部。我遇到的问题是,只要运行 PCIe 设备,我的系统就会冻结。有关问题如何出现的更多详细信息:

  • 在我完全清除并重新安装 Ubuntu 16.04 后,如果其他 3 个 PCIe DAQ 板正在运行,一旦我决定运行第 4 个 PCIe DAQ 设备,系统就会冻结。如果我同时运行 4 个 PCIe DAQ 板中的 3 个,系统将完全运行,但只要我按下第 4 个 PCIe 设备的运行按钮,系统就会冻结,并停止响应鼠标和键盘,迫使我进行硬关机。系统冻结后,PCIe 设备不会停止运行。4 个 PCIe DAQ 板继续运行并正常运行,但我无法再与它们交互,因为系统不响应我的键盘或鼠标。

  • 一旦系统因上一条中描述的一系列事件而冻结,系统将启动并正常运行,但只要我运行任何一个 PCIe DAQ 设备,系统就会冻结。以前,系统允许我同时运行最多 3 个设备而不会冻结,但一旦系统因上一条中概述的事件而冻结,我就无法运行任何一个 PCIe DAQ 设备,否则系统就会冻结。

  • 如果我完全擦除并重新安装 Ubuntu 16.04 LTS,我可以再次同时使用最多 3 个 PCIe DAQ 板。

系统冻结时,鼠标或键盘不会响应。当我在后台播放 YouTube 视频时,系统确实冻结了。这并没有导致 YouTube 视频播放出现任何问题:音频或视频没有中断,但系统对我的键盘或鼠标没有响应。

以下是有关我的设备的更多信息:

操作系统:Ubuntu 16.04 LTS

内存:15.4 GiB

处理器:Intel® Xeon(R) W-2123 CPU @ 3.60GHz × 8

显卡:AMD RV710(DRM 2.48.0 / 4.9.0-040900-generic,LLVM 5.0.0)

操作系统类型:64位

磁盘:967.8 GB

lspci 的输出:

username@TOWER:~$ lspci
00:00.0 Host bridge: Intel Corporation Device 2020 (rev 04)
00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers (rev 04)
00:05.2 System peripheral: Intel Corporation Device 2025 (rev 04)
00:05.4 PIC: Intel Corporation Device 2026 (rev 04)
00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:14.0 USB controller: Intel Corporation Device a2af
00:14.2 Signal processing controller: Intel Corporation Device a2b1
00:16.0 Communication controller: Intel Corporation Device a2ba
00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller
00:1c.0 PCI bridge: Intel Corporation Device a290 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Device a295 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Device a296 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Device a298 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a2d3
00:1f.2 Memory controller: Intel Corporation Device a2a1
00:1f.3 Audio device: Intel Corporation Device a2f0
00:1f.4 SMBus: Intel Corporation Device a2a3
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (5) I219-LM
02:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
03:00.0 Unassigned class [ffff]: Measurement Computing PCIe-DAS1602/16
04:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
06:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
07:00.0 Unassigned class [ffff]: Measurement Computing PCIe-DAS1602/16
16:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port 1A (rev 04)
16:05.0 System peripheral: Intel Corporation Device 2034 (rev 04)
16:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
16:05.4 PIC: Intel Corporation Device 2036 (rev 04)
16:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
17:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
18:00.0 Unassigned class [ffff]: Measurement Computing PCIe-DAS1602/16
64:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port 1A (rev 04)
64:05.0 System peripheral: Intel Corporation Device 2034 (rev 04)
64:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
64:05.4 PIC: Intel Corporation Device 2036 (rev 04)
64:08.0 System peripheral: Intel Corporation Device 2066 (rev 04)
64:09.0 System peripheral: Intel Corporation Device 2066 (rev 04)
64:0a.0 System peripheral: Intel Corporation Device 2040 (rev 04)
64:0a.1 System peripheral: Intel Corporation Device 2041 (rev 04)
64:0a.2 System peripheral: Intel Corporation Device 2042 (rev 04)
64:0a.3 System peripheral: Intel Corporation Device 2043 (rev 04)
64:0a.4 System peripheral: Intel Corporation Device 2044 (rev 04)
64:0a.5 System peripheral: Intel Corporation Device 2045 (rev 04)
64:0a.6 System peripheral: Intel Corporation Device 2046 (rev 04)
64:0a.7 System peripheral: Intel Corporation Device 2047 (rev 04)    
64:0b.0 System peripheral: Intel Corporation Device 2048 (rev 04)
64:0b.1 System peripheral: Intel Corporation Device 2049 (rev 04)
64:0b.2 System peripheral: Intel Corporation Device 204a (rev 04)
64:0b.3 System peripheral: Intel Corporation Device 204b (rev 04)
64:0c.0 System peripheral: Intel Corporation Device 2040 (rev 04)
64:0c.1 System peripheral: Intel Corporation Device 2041 (rev 04)
64:0c.2 System peripheral: Intel Corporation Device 2042 (rev 04)
64:0c.3 System peripheral: Intel Corporation Device 2043 (rev 04)
64:0c.4 System peripheral: Intel Corporation Device 2044 (rev 04)
64:0c.5 System peripheral: Intel Corporation Device 2045 (rev 04)
64:0c.6 System peripheral: Intel Corporation Device 2046 (rev 04)
64:0c.7 System peripheral: Intel Corporation Device 2047 (rev 04)
64:0d.0 System peripheral: Intel Corporation Device 2048 (rev 04)
64:0d.1 System peripheral: Intel Corporation Device 2049 (rev 04)
64:0d.2 System peripheral: Intel Corporation Device 204a (rev 04)
64:0d.3 System peripheral: Intel Corporation Device 204b (rev 04)
65:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
66:00.0 Unassigned class [ffff]: Measurement Computing PCIe-DAS1602/16
b2:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port 1A (rev 04)
b2:02.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port 1C (rev 04)
b2:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port 1D (rev 04)
b2:05.0 System peripheral: Intel Corporation Device 2034 (rev 04)
b2:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
b2:05.4 PIC: Intel Corporation Device 2036 (rev 04)
b2:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b3:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV710/M92 [Mobility Radeon HD 4330]
b3:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] RV710/730 HDMI Audio [Radeon HD 4000 series]

uname -r 的输出:

username@TOWER:~$ uname -r
4.9.0-040900-generic

我尝试过的事情:

-使用不同的显卡。我尝试过几款 NVIDIA(NVS 310 和 315、Quadro K620)显卡,但无济于事。我换了一张 AMD 显卡,因为 NVIDIA 和 Xorg 驱动程序被认为是问题所在。事实证明,更换显卡并没有什么用。

-使用不同的显卡驱动程序。我尝试过 NVIDIA 和 Xorg (nouveau) 驱动程序,但系统性能没有变化。

-使用不同的内核。我尝试了以下内核,但均未成功:4.9.0-040900-generic、4.10.0-28-generic、4.13.0-43-generic、4.6.2。有趣的是,4.6.2 内核在冻结时会响应键盘,但使用命令行终止进程无法恢复系统。

-向戴尔询问这个问题。戴尔没有人知道我在说什么。

我有一台 Dell Precision 5810,它运行的是 4.6.2 内核的 Ubuntu 16.04,可以同时运行所有 4 个相同的 DAQ 板,没有任何问题。这台电脑卖给了一位客户,所以我对它的访问有限,但我知道它有一个 NVIDIA K620 显卡,它使用 Xorg 驱动程序与该板,它有 Intel Xeon E5-1620 x8 CPU。我可能在某处有该塔式机的 lspci 的打印件。

你们能解释一下为什么我的系统在使用第 4 个 PCIe 设备时会冻结,以及为什么在初始冻结之后使用任何 PCIe 设备时系统都会开始冻结吗?任何意见都会非常有帮助。我遇到这个问题已经大约 4 个月了。

编辑1

我尝试更换 DAQ 板,看看启动第 4 块板时出现的问题是否与某块板有关,但问题仍然存在,正如最初描述的那样。我之前没有提到这一点,但 4 块 DAQ 板是相同的。

编辑2

我尝试在 Tower 冻结时输入 tty (alt+ctrl+F1),看看能否在该控制台中运行一些命令,但 tty 陷入了登录循环。它要求我输入登录 ID,但在我提交登录名后,它不会继续要求我输入密码。

相关内容