我构建了一个带有 Geforce GTX 960 和 Quadro M4000 显卡的系统,我通常将其连接到虚拟机。 GTX 960卡仅供主机使用。
通常,主机无法使用 Quadro 卡,因为内核驱动程序vfio-pci
阻止使用它。但是,当我不在虚拟机中使用它时,我希望可以从主机访问它,例如进行一些计算。
nvidia-setttings
但是,功耗和风扇速度存在这种非常奇怪的行为...如何在不需要一直打开的情况下降低功耗和风扇速度?
从我的笔记来看:
在主机上重用直通就绪设备
假设应该在主机上使用已准备好将其传递给来宾的辅助显卡。该设备通常无法在主机上使用,因为加载了错误的驱动程序。此处,Quadro M4000 已vfio-pci
使用驱动程序,但nvidia
应使用该驱动程序。
sudo lspci -nnk | egrep -A3 "VGA|Display|3D"
# 0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
# Subsystem: Gigabyte Technology Co., Ltd Device [1458:36ac]
# Kernel driver in use: nvidia
# Kernel modules: nouveau, nvidia_drm, nvidia
# --
# 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
# Subsystem: Hewlett-Packard Company Device [103c:1153]
# Kernel driver in use: vfio-pci
# Kernel modules: nouveau, nvidia_drm, nvidia
卸载vfio-pci
驱动程序并再次检查设备状态。不应使用任何内核驱动程序,因此线路Kernel driver in use: ...
消失了。
sudo modprobe -r vfio-pci
sudo lspci -nnk | egrep -A3 "VGA|Display|3D"
# 0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
# Subsystem: Gigabyte Technology Co., Ltd Device [1458:36ac]
# Kernel driver in use: nvidia
# Kernel modules: nouveau, nvidia_drm, nvidia
# --
# 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
# Subsystem: Hewlett-Packard Company Device [103c:1153]
# Kernel modules: nouveau, nvidia_drm, nvidia
# 0c:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
还要检查 nvidia 驱动程序工具的输出nvidia-smi
。它应该只列出一张显卡(未通过的 GTX 960)。
sudo nvidia-smi
# Tue Sep 28 18:19:36 2021
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# | | | MIG M. |
# |===============================+======================+======================|
# | 0 NVIDIA GeForce ... Off | 00000000:0B:00.0 On | N/A |
# | 0% 51C P8 19W / 160W | 477MiB / 4040MiB | 0% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# ...
从系统中删除所有关联的 PCI 设备。在本例中,它们是0c:00.0
和0c:00.1
。然后检查那些是否真的消失了。
echo 1 | sudo tee /sys/bus/pci/devices/0000\:0c\:00.0/remove
echo 1 | sudo tee /sys/bus/pci/devices/0000\:0c\:00.1/remove
sudo ls /sys/bus/pci/devices/ | grep 0c:00.
# nothing...
然后让它rescan
用于 PCI 设备并检查设备是否再次存在并启用。还要检查正在使用哪个内核驱动程序以及nvidia-smi
正在说明什么。
echo 1 | sudo tee /sys/bus/pci/rescan
sudo ls /sys/bus/pci/devices/ | grep 0c:00.
sudo cat /sys/bus/pci/devices/0000\:0c\:00.?/enable
# 1
# 1
sudo lspci -nnk | egrep -A3 "VGA|Display|3D"
# 0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
# Subsystem: Gigabyte Technology Co., Ltd Device [1458:36ac]
# Kernel driver in use: nvidia
# Kernel modules: nouveau, nvidia_drm, nvidia
# --
# 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
# Subsystem: Hewlett-Packard Company Device [103c:1153]
# Kernel driver in use: nvidia # <-- here!
# Kernel modules: nouveau, nvidia_drm, nvidia
sudo nvidia-smi
# Tue Sep 28 18:26:16 2021
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# | | | MIG M. |
# |===============================+======================+======================|
# | 0 NVIDIA GeForce ... Off | 00000000:0B:00.0 On | N/A |
# | 0% 47C P8 19W / 160W | 479MiB / 4040MiB | 0% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# | 1 Quadro M4000 Off | 00000000:0C:00.0 Off | N/A |
# | 45% 37C P0 42W / 120W | 0MiB / 8127MiB | 2% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# ...
有趣的是,Quadro M4000 在完全无负载的情况下消耗约 42 瓦。我猜这是由于驱动程序问题......
然而nvidia-settings
,如果加载图形程序,则功率需求滴大概12瓦。
# Terminal A
watch -d -n 1 sudo nvidia-smi
# Terminal B
nvidia-settings
nvidia-smi
当奇迹发生时,观看并聆听风扇的噪音......
watch -d -n 1 sudo nvidia-smi
# ...
# +-------------------------------+----------------------+----------------------+
# | 1 Quadro M4000 Off | 00000000:0C:00.0 Off | N/A |
# | 46% 38C P0 10W / 120W | 0MiB / 8127MiB | 0% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# ...