目前,我正在运行带有 gyx1080ti 和 Tesla K80 的双 GPU 设置,但无法启动 X 服务。我的计划是使用 1080ti 作为我的显示 GPU,使用 Tesla K80 进行 CUDA 计算。以下是结果的屏幕截图lspci
:
00:00.0 Host bridge: Intel Corporation Device 9b33 (rev 05)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05)
00:14.0 USB controller: Intel Corporation Device 43ed (rev 11)
00:14.2 RAM memory: Intel Corporation Device 43ef (rev 11)
00:14.3 Network controller: Intel Corporation Device 43f0 (rev 11)
00:15.0 Serial bus controller [0c80]: Intel Corporation Device 43e8 (rev 11)
00:15.1 Serial bus controller [0c80]: Intel Corporation Device 43e9 (rev 11)
00:16.0 Communication controller: Intel Corporation Device 43e0 (rev 11)
00:17.0 SATA controller: Intel Corporation Device 43d2 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Device 43c0 (rev 11)
00:1b.4 PCI bridge: Intel Corporation Device 43c4 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 43b8 (rev 11)
00:1c.7 PCI bridge: Intel Corporation Device 43bf (rev 11)
00:1d.0 PCI bridge: Intel Corporation Device 43b0 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Device 4385 (rev 11)
00:1f.3 Audio device: Intel Corporation Device f0c8 (rev 11)
00:1f.4 SMBus: Intel Corporation Device 43a3 (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Device 43a4 (rev 11)
01:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ff)
02:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ff)
02:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ff)
03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev ff)
04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev ff)
05:00.0 Non-Volatile memory controller: Sandisk Corp Device 5006
06:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
08:00.0 Ethernet controller: Intel Corporation Device 15f3 (rev 03)
09:00.0 Non-Volatile memory controller: Sandisk Corp Device 5011 (rev 01)
我目前担心的是我的 NVIDIA 驱动程序仅支持 GTX 系列设备。我只是想知道我是否可以只使用 GTX 卡上的显示驱动程序,从而启动 X 服务?
看来 NVIDIA 驱动程序运行正常,因为它nvidia-smi
成功找到了两张卡。我还在输出中发现lspci
K80 被识别为“3d 控制器”。也许这就是问题所在?
答案1
好的。我找到了解决我的问题的方法。简而言之,X 服务器无法启动的原因是 X 服务器的自动配置无法找到正确的显示设备,即 gpu。这可以通过在配置文件中手动设置设备来解决。特别是,
sudo nvidia-xconfig
这将生成一个配置文件/etc/X11/xorg.conf
- 接下来,输入命令
lspci
,并找到要连接显示器的 GPU 的 BusID。 在我的情况下是这样的:
06:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
BusID 是6:0:0
。
- 接下来,
xorg.conf
找到以下内容
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:6:0:0" # make sure the ID matches the VGA device ID
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
EndSubSection
EndSection
并确保部分BusId
中的Device
与您的 gpu BusID 的结果相匹配,如步骤 2 所示。
- 重启
现在您可以出发了 ;)。