6 个 GPU 中只有 4 个使用 NVIDIA 内核驱动程序

6 个 GPU 中只有 4 个使用 NVIDIA 内核驱动程序

我在一台机器上通过 PCI 插槽连接了 6 个 GPU。它们均能被识别:

$ lspci -v | grep 'VGA'
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
02:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
04:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
05:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
07:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
08:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])

然而,NVIDIA 内核驱动程序仅在其中 4 个上使用:

$ lspci -v | grep -A 10 'VGA' | grep 'Kernel driver in use:'
Kernel driver in use: nvidia
Kernel driver in use: nvidia
Kernel driver in use: nvidia
Kernel driver in use: nvidia

另外两个甚至没有列出内核驱动程序:

$ lspci -v | ...
07:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GP104 [GeForce GTX 1070]
Flags: fast devsel, IRQ 18
Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
Memory at <ignored> (64-bit, prefetchable)
Memory at <ignored> (64-bit, prefetchable)
I/O ports at a000 [size=128]
Expansion ROM at d7000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

为了进行比较,列出了具有内核驱动程序的 GPU 的完整列表:

05:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GP104 [GeForce GTX 1070]
Flags: bus master, fast devsel, latency 0, IRQ 325
Memory at d8000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at b8000000 (64-bit, prefetchable) [size=32M]
I/O ports at b000 [size=128]
[virtual] Expansion ROM at d9000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

第 5 和第 6 个 gpu 也无法被识别nvidia-settings

如何让第 5 和第 6 个 GPU 开始使用 NVIDIA 内核驱动程序?

使用:Ubuntu 17.04,NVIDIA 驱动程序版本 375.66。

答案1

BIOS 更新帮我解决了这个问题。看来系统在分配地址空间时出现了问题。查看 NVIDIA 的 dmesg 输出,看看启动时是否存在任何问题。

相关内容