如何在无头节点上调整 NVIDIA GPU 风扇速度？

Question 1

下面是一个简单的方法，不需要编写脚本、连接假显示器或摆弄，可以通过 SSH 执行来控制多个 NVIDIA GPU 的风扇。它已经在 Arch Linux 上进行了测试。

创建 xorg.conf

sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7

这将为/etc/X11/xorg.conf每个 GPU 创建一个条目，类似于手动方法。

笔记：一些发行版（Fedora、CentOS、Manjaro）有额外的配置文件（例如 in/etc/X11/xorg.conf.d/或/usr/share/X11/xorg.conf.d/），它们覆盖xorg.conf并设置AllowNVIDIAGPUScreens.此选项与本指南不兼容。应修改或删除额外的配置文件。 X11 日志文件显示已加载哪些配置文件。

替代方案：手动创建 xorg.conf

识别您的卡的 PCI ID：

nvidia-xconfig --query-gpu-info

找到PCI BusID字段。请注意，这些与内核中报告的总线 ID 不同。

或者，执行sudo startx、打开/var/log/Xorg.0.log（或 startX 在其输出中“日志文件：”行下列出的任何位置），然后查找行NVIDIA(0): Valid display device(s) on GPU-<GPU number> at PCI:<PCI ID>。

编辑`/etc/X11/xorg.conf`

xorg.conf以下是三 GPU 机器的示例：

Section "ServerLayout"
        Identifier "dual"
        Screen 0 "Screen0"
        Screen 1 "Screen1" RightOf "Screen0"
        Screen 1 "Screen2" RightOf "Screen1"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:5:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:6:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:9:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
EndSection

Section "Screen"
        Identifier     "Screen1"
        Device         "Device1"
EndSection

Section "Screen"
        Identifier     "Screen2"
        Device         "Device2"
EndSection

必须BusID与我们在上一步中识别的总线 ID 匹配。即使没有连接显示器，该选项AllowEmptyInitialConfiguration也允许 X 启动。该选项Coolbits允许控制风扇。它还可以允许超频。

笔记：一些发行版（Fedora、CentOS、Manjaro）有额外的配置文件（例如 in/etc/X11/xorg.conf.d/或/usr/share/X11/xorg.conf.d/），它们覆盖xorg.conf并设置AllowNVIDIAGPUScreens.此选项与本指南不兼容。应修改或删除额外的配置文件。 X11 日志文件显示已加载哪些配置文件。

编辑`/root/.xinitrc`

nvidia-settings -q fans
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:2]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=75

为了方便起见，我使用 .xinitrc 来执行 nvidia-settings，尽管可能还有其他方法。第一行将打印出系统中的每个 GPU 风扇。这里，我将粉丝设置为75%。

发射X

sudo startx -- :0

您可以从 SSH 执行此命令。输出将是：

Current version of pixman: 0.34.0
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat May 27 02:22:08 2017
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"

  Attribute 'GPUFanControlState' (pushistik:0[gpu:0]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:0]) assigned value 75.


  Attribute 'GPUFanControlState' (pushistik:0[gpu:1]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:1]) assigned value 75.


  Attribute 'GPUFanControlState' (pushistik:0[gpu:2]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:2]) assigned value 75.

监控温度和时钟速度

nvidia-smi并可nvtop用于观察温度和功耗。较低的温度将使卡的时钟频率更高并增加其功耗。您可以用于sudo nvidia-smi -pl 150限制功耗并保持卡凉爽，或用于sudo nvidia-smi -pl 300让它们超频。如果给定 150W，我的 1080 Ti 运行频率为 1480 MHz；如果给定 300W，则运行频率超过 1800 MHz，但这取决于工作负载。您可以监控他们的时钟速度，nvidia-smi -q或者更具体地说，watch 'nvidia-smi -q | grep -E "Utilization| Graphics|Power Draw"'

返回自动风扇管理。

重启。我还没有找到其他方法让风扇自动运转。

Answer

下面是一个简单的方法，不需要编写脚本、连接假显示器或摆弄，可以通过 SSH 执行来控制多个 NVIDIA GPU 的风扇。它已经在 Arch Linux 上进行了测试。

创建 xorg.conf

sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7

这将为/etc/X11/xorg.conf每个 GPU 创建一个条目，类似于手动方法。

笔记：一些发行版（Fedora、CentOS、Manjaro）有额外的配置文件（例如 in/etc/X11/xorg.conf.d/或/usr/share/X11/xorg.conf.d/），它们覆盖xorg.conf并设置AllowNVIDIAGPUScreens.此选项与本指南不兼容。应修改或删除额外的配置文件。 X11 日志文件显示已加载哪些配置文件。

替代方案：手动创建 xorg.conf

识别您的卡的 PCI ID：

nvidia-xconfig --query-gpu-info

找到PCI BusID字段。请注意，这些与内核中报告的总线 ID 不同。

或者，执行sudo startx、打开/var/log/Xorg.0.log（或 startX 在其输出中“日志文件：”行下列出的任何位置），然后查找行NVIDIA(0): Valid display device(s) on GPU-<GPU number> at PCI:<PCI ID>。

编辑`/etc/X11/xorg.conf`

xorg.conf以下是三 GPU 机器的示例：

Section "ServerLayout"
        Identifier "dual"
        Screen 0 "Screen0"
        Screen 1 "Screen1" RightOf "Screen0"
        Screen 1 "Screen2" RightOf "Screen1"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:5:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:6:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:9:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
EndSection

Section "Screen"
        Identifier     "Screen1"
        Device         "Device1"
EndSection

Section "Screen"
        Identifier     "Screen2"
        Device         "Device2"
EndSection

必须BusID与我们在上一步中识别的总线 ID 匹配。即使没有连接显示器，该选项AllowEmptyInitialConfiguration也允许 X 启动。该选项Coolbits允许控制风扇。它还可以允许超频。

笔记：一些发行版（Fedora、CentOS、Manjaro）有额外的配置文件（例如 in/etc/X11/xorg.conf.d/或/usr/share/X11/xorg.conf.d/），它们覆盖xorg.conf并设置AllowNVIDIAGPUScreens.此选项与本指南不兼容。应修改或删除额外的配置文件。 X11 日志文件显示已加载哪些配置文件。

编辑`/root/.xinitrc`

nvidia-settings -q fans
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:2]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=75

为了方便起见，我使用 .xinitrc 来执行 nvidia-settings，尽管可能还有其他方法。第一行将打印出系统中的每个 GPU 风扇。这里，我将粉丝设置为75%。

发射X

sudo startx -- :0

您可以从 SSH 执行此命令。输出将是：

Current version of pixman: 0.34.0
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat May 27 02:22:08 2017
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"

  Attribute 'GPUFanControlState' (pushistik:0[gpu:0]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:0]) assigned value 75.


  Attribute 'GPUFanControlState' (pushistik:0[gpu:1]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:1]) assigned value 75.


  Attribute 'GPUFanControlState' (pushistik:0[gpu:2]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:2]) assigned value 75.

监控温度和时钟速度

nvidia-smi并可nvtop用于观察温度和功耗。较低的温度将使卡的时钟频率更高并增加其功耗。您可以用于sudo nvidia-smi -pl 150限制功耗并保持卡凉爽，或用于sudo nvidia-smi -pl 300让它们超频。如果给定 150W，我的 1080 Ti 运行频率为 1480 MHz；如果给定 300W，则运行频率超过 1800 MHz，但这取决于工作负载。您可以监控他们的时钟速度，nvidia-smi -q或者更具体地说，watch 'nvidia-smi -q | grep -E "Utilization| Graphics|Power Draw"'

返回自动风扇管理。

重启。我还没有找到其他方法让风扇自动运转。

Question 2

我编写了一个可安装 pip 的 Python 脚本来执行类似于 @AlexsandrDubinsky 的建议的操作。

当您运行 fans.py 时，它会为每个 GPU 设置一个临时 X 服务器，并附加一个假显示器。然后，它每隔几秒循环一次 GPU，并根据温度设置风扇速度。当脚本终止时，它将风扇的控制权返回给驱动程序并清理 X 服务器。

Answer

我编写了一个可安装 pip 的 Python 脚本来执行类似于 @AlexsandrDubinsky 的建议的操作。

当您运行 fans.py 时，它会为每个 GPU 设置一个临时 X 服务器，并附加一个假显示器。然后，它每隔几秒循环一次 GPU，并根据温度设置风扇速度。当脚本终止时，它将风扇的控制权返回给驱动程序并清理 X 服务器。

Question 3

根据这个问题和类似的 StackExchange 问题的答案，我编写了一个 shell 脚本，它将风扇速度设置为100（或任何你想要的值）全部你的粉丝数量全部机器上 GPU 的数量。

该脚本假设您的计算机安装了 X11，但您没有使用它为用户提供 GUI。

/bin/set-gpu-fan-speed.sh：

#!/bin/bash
set -Eeuxo pipefail

# Kill any existing X servers.
killall Xorg || true
sleep 5

# Create a NVIDIA-friendly Xorg config.
nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration --enable-all-gpus

# Start a new X server for nvidia-settings to use.
export XDG_SESSION_TYPE=x11
export DISPLAY=:0
startx -- $DISPLAY &
sleep 5

# Determine the number of GPUs and fans on this machine.
NUM_GPUS=$(nvidia-settings -q gpus | grep -c 'gpu:')
NUM_FANS=$(nvidia-settings -q fans | grep -c 'fan:')

# For each GPU, enable fan control.
for ((i=0; i < NUM_GPUS; i++))
do
    nvidia-settings --verbose=all -a "[gpu:$i]/GPUFanControlState=1"
done

# For each fan, set fan speed to 100%.
for ((i=0; i < NUM_FANS; i++))
do
    nvidia-settings --verbose=all -a "[fan:$i]/GPUTargetFanSpeed=100"
done

# Kill the X server that we started.
killall Xorg || true

这些风扇速度变化不会在重新启动后持续存在，因此我编写了一个 systemd 单元文件来在每次启动时运行上述脚本。

/etc/systemd/system/set-gpu-fan-speed.service：

[Unit]
Description="Sets the GPU fan speed"

[Service]
Type=oneshot
User=root
ExecStart=/bin/set-gpu-fan-speed.sh

[Install]
WantedBy=multi-user.target

创建上述文件后，以 root 身份运行以下命令以使脚本在重新启动时运行。

systemctl enable set-gpu-fan-speed.service
systemctl start set-gpu-fan-speed.service

Answer

根据这个问题和类似的 StackExchange 问题的答案，我编写了一个 shell 脚本，它将风扇速度设置为100（或任何你想要的值）全部你的粉丝数量全部机器上 GPU 的数量。

该脚本假设您的计算机安装了 X11，但您没有使用它为用户提供 GUI。

/bin/set-gpu-fan-speed.sh：

#!/bin/bash
set -Eeuxo pipefail

# Kill any existing X servers.
killall Xorg || true
sleep 5

# Create a NVIDIA-friendly Xorg config.
nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration --enable-all-gpus

# Start a new X server for nvidia-settings to use.
export XDG_SESSION_TYPE=x11
export DISPLAY=:0
startx -- $DISPLAY &
sleep 5

# Determine the number of GPUs and fans on this machine.
NUM_GPUS=$(nvidia-settings -q gpus | grep -c 'gpu:')
NUM_FANS=$(nvidia-settings -q fans | grep -c 'fan:')

# For each GPU, enable fan control.
for ((i=0; i < NUM_GPUS; i++))
do
    nvidia-settings --verbose=all -a "[gpu:$i]/GPUFanControlState=1"
done

# For each fan, set fan speed to 100%.
for ((i=0; i < NUM_FANS; i++))
do
    nvidia-settings --verbose=all -a "[fan:$i]/GPUTargetFanSpeed=100"
done

# Kill the X server that we started.
killall Xorg || true

这些风扇速度变化不会在重新启动后持续存在，因此我编写了一个 systemd 单元文件来在每次启动时运行上述脚本。

/etc/systemd/system/set-gpu-fan-speed.service：

[Unit]
Description="Sets the GPU fan speed"

[Service]
Type=oneshot
User=root
ExecStart=/bin/set-gpu-fan-speed.sh

[Install]
WantedBy=multi-user.target

创建上述文件后，以 root 身份运行以下命令以使脚本在重新启动时运行。

systemctl enable set-gpu-fan-speed.service
systemctl start set-gpu-fan-speed.service

如何在无头节点上调整 NVIDIA GPU 风扇速度？

答案1

创建 xorg.conf

替代方案：手动创建 xorg.conf

识别您的卡的 PCI ID：

编辑`/etc/X11/xorg.conf`

编辑`/root/.xinitrc`

发射X

监控温度和时钟速度

返回自动风扇管理。

答案2

答案3

相关内容

答案1

创建 xorg.conf

替代方案：手动创建 xorg.conf

识别您的卡的 PCI ID：

编辑/etc/X11/xorg.conf

编辑/root/.xinitrc

发射X

监控温度和时钟速度

返回自动风扇管理。

答案2

答案3

相关内容

编辑`/etc/X11/xorg.conf`

编辑`/root/.xinitrc`