我在 Ubuntu 18.04 机器上安装了两块 GTX 1080ti,都是 Founder 版。我主要用它们来训练神经网络。
现在,我主要面临两个问题:
设置 coolbits(即使使用 --enable-all-gpus)可以让我设置风扇速度和时钟仅适用于连接到显示器的 GPU
我不想静态设置风扇速度:相反,我想设置一个动态配置文件,%fanspeed 与温度。请注意,在自动模式下,在负载下,一台 1080ti 的温度通常会达到 89-90C,无论节流和机箱是否宽敞......(另一台 1080ti 的温度较低......我认为并非所有 gpu 都是一样的)。
有关我的配置的信息:
inxi -b
System: Host: nimrod Kernel: 4.15.0-46-generic x86_64 bits: 64
Desktop: Xfce 4.12.3 Distro: Ubuntu 18.04.2 LTS
Machine: Device: desktop Mobo: FUJITSU model: D3128-B2 v: S26361-D3128-B2 serial: N/A
UEFI: FUJITSU // American Megatrends v: V4.6.5.4 R1.8.0 for D3128-B2x date: 06/28/2018
CPU: 10 core Intel Xeon E5-2680 v2 (-MT-MCP-) speed/max: 2269/3600 MHz
Graphics: Card-1: Advanced Micro Devices [AMD/ATI] Park [Mobility Radeon HD 5430]
Card-2: NVIDIA GP102 [GeForce GTX 1080 Ti]
Card-3: NVIDIA GP102 [GeForce GTX 1080 Ti]
Display Server: x11 (X.Org 1.19.6 )
drivers: modesetting,nvidia,ati,radeon,nouveau (unloaded: fbdev,vesa)
Resolution: [email protected]
OpenGL: renderer: GeForce GTX 1080 Ti/PCIe/SSE2
version: 4.6.0 NVIDIA 415.27
Network: Card: Intel 82579LM Gigabit Network Connection (Lewisville)
driver: e1000e
Drives: HDD Total Size: 2262.5GB (9.5% used)
Info: Processes: 413 Uptime: 10 min Memory: 3677.2/96560.4MB
Client: Shell (bash) inxi: 2.3.56
Nvidia-smi:
Mon Mar 25 04:19:30 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27 Driver Version: 415.27 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 23% 39C P8 10W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:04:00.0 On | N/A |
| 31% 57C P0 69W / 250W | 204MiB / 11176MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1465 G /usr/lib/xorg/Xorg 201MiB |
+-----------------------------------------------------------------------------+
最后是我的 xorg.conf
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 415.27
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection
Section "Monitor"
Identifier "Monitor1"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1080 Ti"
BusID "PCI:3:0:0"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "GeForce GTX 1080 Ti"
BusID "PCI:4:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "31"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Section "Screen"
Identifier "Screen1"
Device "Device1"
Monitor "Monitor1"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "31"
SubSection "Display"
Depth 24
EndSubSection
EndSection
请注意,它们两者都设置了 coolbits。
你能帮助我吗?
谢谢! :)
答案1
上周也遇到了同样的情况。这是驱动程序的问题。尝试 390 或 430 版本,这两个版本我确认可以在 arch 上正常工作,配有两个 1080ti。
问题很难定位,一开始以为是主板不支持SLI,于是换了块主板开SLI,然后就能调两张显卡的风扇转速了。但是开SLI的时候两张显卡的内存是两张显卡的内存一样,SLI会让batch size变小,这很不可接受。然后我关掉SLI,两张显卡的风扇转速又调不上了。于是换了nvidia的驱动,结果就正常了。该死的nvidia,我换了块主板,把第一块主板的LGA底座弄坏了,然后因为底座坏了烧了一台i5-9400f。我知道是我粗心,但是要不是nvidia驱动有bug,我也不至于受这么多苦。(无稽之谈)