KVM GPU 直通:组 15 不可行。请确保 iommu_group 内的所有设备都已绑定到其 vfio 总线驱动程序。'

KVM GPU 直通:组 15 不可行。请确保 iommu_group 内的所有设备都已绑定到其 vfio 总线驱动程序。'

我跟着https://mathiashueber.com/windows-virtual-machine-gpu-passthrough-ubuntu/。但是,有一件事我没有遵循:我保留了 noveau 而不是官方驱动程序,因为如果我按照它说的做,当我重新启动时,我只会看到黑屏。而且我想在主机上使用 noveau,而不是专有且可能不安全的驱动程序。

我在技嘉 B450m 主板上安装了 Ryzen 7 2700X。我有一台 GTX 1060 想放在虚拟机里,还有一台 GT 750 想在主机上使用。

AMD-Vi 工作原理:

lz@z:~$ dmesg |grep AMD-Vi
[    0.327637] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.330500] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.330501] pci 0000:00:00.2: AMD-Vi: Extended features (0xf77ef22294ada):
[    0.330504] AMD-Vi: Interrupt remapping enabled
[    0.330505] AMD-Vi: Virtual APIC enabled
[    0.330572] AMD-Vi: Lazy IO/TLB flushing enabled

这是我的 IOMMU 组:

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 11 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 11 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 12 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 12 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 12 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 12 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 12 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 12 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU Group 12 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 13 01:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. Device [2646:2263] (rev 03)
IOMMU Group 14 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
IOMMU Group 14 02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
IOMMU Group 14 02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
IOMMU Group 14 03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU Group 14 03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU Group 14 03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU Group 14 05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
IOMMU Group 14 06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 730] [10de:0f02] (rev a1)
IOMMU Group 14 06:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
>>>>>>>>>>>>>>> IOMMU Group 15 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660] [10de:2184] (rev a1)
>>>>>>>>>>>>>>> IOMMU Group 15 07:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aeb] (rev a1)
>>>>>>>>>>>>>>> IOMMU Group 15 07:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1aec] (rev a1)
>>>>>>>>>>>>>>> IOMMU Group 15 07:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1aed] (rev a1)
IOMMU Group 16 08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU Group 17 08:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
IOMMU Group 18 08:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
IOMMU Group 19 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU Group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 20 09:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 21 09:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
IOMMU Group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 8 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 9 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]

您可以看到我的 GTX1060 在第 15 组,以及其他我不关心的东西,它们也可以进入虚拟机。例如 USB 控制器。

Soi 我有 10de:2184(GTX 1060)和 10de:1aeb(GTX Audio)。我是否需要保存第 15 组中其他事物的 ID?我要尝试使用所有这些,所以我保存了 10de:1aec (USB) 和 10de:1aed (串行总线)

lz@z:~$ cat /etc/initramfs-tools/modules 
# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
#
# Syntax:  module_name [args ...]
#
# You must run update-initramfs(8) to effect this change.
#
# Examples:
#
# raid1
# sd_mod
vfio vfio_iommu_type1 vfio_virqfd vfio_pci ids=10de:2184,10de:1aeb,10de:1aec,10de:1aed

lz@z:~$ cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

vfio vfio_iommu_type1 vfio_pci ids=10de:2184,10de:1aeb,10de:1aec,10de:1aed

lz@z:~$ cat /etc/modprobe.d/vfio.conf 
options vfio-pci ids=10de:2184,10de:1aeb,10de:1aec,10de:1aed

lz@z:~$ cat /etc/modprobe.d/kvm.conf 
options kvm ignore_msrs=1

现在看看我的 lspci重启后

lz@z:~$ lspci -nnv
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
    Flags: fast devsel

00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
    Flags: fast devsel, IRQ 25
    Capabilities: <access denied>

00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    Flags: fast devsel

00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453] (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 26
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    I/O behind bridge: None
    Memory behind bridge: f7600000-f76fffff [size=1M]
    Prefetchable memory behind bridge: None
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453] (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 27
    Bus: primary=00, secondary=02, subordinate=06, sec-latency=0
    I/O behind bridge: 0000d000-0000efff [size=8K]
    Memory behind bridge: f4000000-f53fffff [size=20M]
    Prefetchable memory behind bridge: 00000000e8000000-00000000f21fffff [size=162M]
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    Flags: fast devsel

00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    Flags: fast devsel

00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453] (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 28
    Bus: primary=00, secondary=07, subordinate=07, sec-latency=0
    I/O behind bridge: 0000f000-0000ffff [size=4K]
    Memory behind bridge: f6000000-f70fffff [size=17M]
    Prefetchable memory behind bridge: 00000000d0000000-00000000e20fffff [size=289M]
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    Flags: fast devsel

00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    Flags: fast devsel

00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454] (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 29
    Bus: primary=00, secondary=08, subordinate=08, sec-latency=0
    I/O behind bridge: None
    Memory behind bridge: f7200000-f74fffff [size=3M]
    Prefetchable memory behind bridge: None
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    Flags: fast devsel

00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454] (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 31
    Bus: primary=00, secondary=09, subordinate=09, sec-latency=0
    I/O behind bridge: None
    Memory behind bridge: f7500000-f75fffff [size=1M]
    Prefetchable memory behind bridge: None
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
    Subsystem: Gigabyte Technology Co., Ltd FCH SMBus Controller [1458:5001]
    Flags: 66MHz, medium devsel
    Kernel modules: i2c_piix4, sp5100_tco

00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
    Subsystem: Gigabyte Technology Co., Ltd FCH LPC Bridge [1458:5001]
    Flags: bus master, 66MHz, medium devsel, latency 0

00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
    Flags: fast devsel

00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
    Flags: fast devsel

00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
    Flags: fast devsel

00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
    Flags: fast devsel
    Kernel driver in use: k10temp
    Kernel modules: k10temp

00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
    Flags: fast devsel

00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
    Flags: fast devsel

00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
    Flags: fast devsel

00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
    Flags: fast devsel

01:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. Device [2646:2263] (rev 03) (prog-if 02 [NVM Express])
    Subsystem: Kingston Technology Company, Inc. Device [2646:2263]
    Flags: bus master, fast devsel, latency 0, IRQ 60, NUMA node 0
    Memory at f7600000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: nvme
    Kernel modules: nvme

02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01) (prog-if 30 [XHCI])
    Subsystem: ASMedia Technology Inc. 400 Series Chipset USB 3.1 XHCI Controller [1b21:1142]
    Flags: bus master, fast devsel, latency 0, IRQ 30
    Memory at f53a0000 (64-bit, non-prefetchable) [size=32K]
    Capabilities: <access denied>
    Kernel driver in use: xhci_hcd

02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01) (prog-if 01 [AHCI 1.0])
    Subsystem: ASMedia Technology Inc. 400 Series Chipset SATA Controller [1b21:1062]
    Flags: bus master, fast devsel, latency 0, IRQ 59
    Memory at f5380000 (32-bit, non-prefetchable) [size=128K]
    Expansion ROM at f5300000 [disabled] [size=512K]
    Capabilities: <access denied>
    Kernel driver in use: ahci
    Kernel modules: ahci

02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 33
    Bus: primary=02, secondary=03, subordinate=06, sec-latency=0
    I/O behind bridge: 0000d000-0000efff [size=8K]
    Memory behind bridge: f4000000-f52fffff [size=19M]
    Prefetchable memory behind bridge: 00000000e8000000-00000000f21fffff [size=162M]
    Capabilities: <access denied>
    Kernel driver in use: pcieport

03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01) (prog-if 00 [Normal decode])
    DeviceName: Broadcom 5762
    Flags: bus master, fast devsel, latency 0, IRQ 34
    Bus: primary=03, secondary=04, subordinate=04, sec-latency=0
    I/O behind bridge: None
    Memory behind bridge: None
    Prefetchable memory behind bridge: None
    Capabilities: <access denied>
    Kernel driver in use: pcieport

03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 36
    Bus: primary=03, secondary=05, subordinate=05, sec-latency=0
    I/O behind bridge: 0000e000-0000efff [size=4K]
    Memory behind bridge: f5200000-f52fffff [size=1M]
    Prefetchable memory behind bridge: 00000000f2100000-00000000f21fffff [size=1M]
    Capabilities: <access denied>
    Kernel driver in use: pcieport

03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 37
    Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
    I/O behind bridge: 0000d000-0000dfff [size=4K]
    Memory behind bridge: f4000000-f50fffff [size=17M]
    Prefetchable memory behind bridge: 00000000e8000000-00000000f1ffffff [size=160M]
    Capabilities: <access denied>
    Kernel driver in use: pcieport

05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
    Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet [1458:e000]
    Flags: bus master, fast devsel, latency 0, IRQ 35
    I/O ports at e000 [size=256]
    Memory at f5200000 (64-bit, non-prefetchable) [size=4K]
    Memory at f2100000 (64-bit, prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: r8169
    Kernel modules: r8169

06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 730] [10de:0f02] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation GF108 [GeForce GT 730] [10de:0825]
    Flags: bus master, fast devsel, latency 0, IRQ 86
    Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
    Memory at e8000000 (64-bit, prefetchable) [size=128M]
    Memory at f0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at d000 [size=128]
    Expansion ROM at f5000000 [disabled] [size=512K]
    Capabilities: <access denied>
    Kernel driver in use: nouveau
    Kernel modules: nvidiafb, nouveau

06:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
    Subsystem: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0825]
    Flags: bus master, fast devsel, latency 0, IRQ 35
    Memory at f5080000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel

>>>>>>>>>>>>>>>> 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660] [10de:2184] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation TU116 [GeForce GTX 1660] [10de:1324]
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at e0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at f000 [size=128]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

>>>>>>>>>>>>>>>> 07:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aeb] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:1324]
    Flags: bus master, fast devsel, latency 0, IRQ 83
    Memory at f7080000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel

>>>>>>>>>>>>>>>> 07:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1aec] (rev a1) (prog-if 30 [XHCI])
    Subsystem: NVIDIA Corporation Device [10de:1324]
    Flags: fast devsel, IRQ 47
    Memory at e2000000 (64-bit, prefetchable) [size=256K]
    Memory at e2040000 (64-bit, prefetchable) [size=64K]
    Capabilities: <access denied>
    Kernel driver in use: xhci_hcd

>>>>>>>>>>>>>>>> 07:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1aed] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:1324]
    Flags: bus master, fast devsel, latency 0, IRQ 58
    Memory at f7084000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: nvidia-gpu
    Kernel modules: i2c_nvidia_gpu

08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
    Flags: fast devsel
    Capabilities: <access denied>

08:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
    Flags: bus master, fast devsel, latency 0, IRQ 80
    Memory at f7300000 (32-bit, non-prefetchable) [size=1M]
    Memory at f7400000 (32-bit, non-prefetchable) [size=8K]
    Capabilities: <access denied>
    Kernel driver in use: ccp
    Kernel modules: ccp

08:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f] (prog-if 30 [XHCI])
    Subsystem: Gigabyte Technology Co., Ltd Zeppelin USB 3.0 Host controller [1458:5007]
    Flags: bus master, fast devsel, latency 0, IRQ 48
    Memory at f7200000 (64-bit, non-prefetchable) [size=1M]
    Capabilities: <access denied>
    Kernel driver in use: xhci_hcd

09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
    Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
    Flags: fast devsel
    Capabilities: <access denied>

09:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51) (prog-if 01 [AHCI 1.0])
    Subsystem: Gigabyte Technology Co., Ltd FCH SATA Controller [AHCI mode] [1458:b002]
    Flags: bus master, fast devsel, latency 0, IRQ 63
    Memory at f7508000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: ahci
    Kernel modules: ahci

09:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
    Subsystem: Gigabyte Technology Co., Ltd Family 17h (Models 00h-0fh) HD Audio Controller [1458:a182]
    Flags: bus master, fast devsel, latency 0, IRQ 85
    Memory at f7500000 (32-bit, non-prefetchable) [size=32K]
    Capabilities: <access denied>
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel

我突出显示了第 15 组中的设备。只有 NVIDIA GTX 1060 正在被使用vfio-pci,其他的正在被其他内核模块使用。这就是问题的根源吗?为了通过 GTX,我必须通过第 15 组,但这些其他的东西正在被其他司机使用,而不是vfio-pci

Unable to complete install: 'internal error: qemu unexpectedly closed the monitor: 2020-02-19T22:48:02.001713Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
2020-02-19T22:48:02.002255Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
2020-02-19T22:48:02.002845Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
2020-02-19T22:48:02.003340Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
2020-02-19T22:48:02.003842Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
2020-02-19T22:48:02.024485Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.4,addr=0x0: vfio 0000:07:00.0: group 15 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.'

请看一下

请确保 iommu_group 内的所有设备都已绑定到其 vfio 总线驱动程序

这证实了我的想法,vfio-pci尽管我明确地告诉过他们,但并非所有设备都得到了帮助。

我认为他在这一部分中做到了这一点,但对于 nvidia 驱动程序:

为了在 nvidia 驱动程序之前改变加载顺序以利于 vfio_pci,请通过 sudo nano /etc/modprobe.d/nvidia.conf 在 modprobe.d 文件夹中创建一个文件并添加以下行:

softdep nouveau 之前:vfio-pci softdep nvidia 之前:vfio-pci softdep nvidia* 之前:vfio-pci

有没有办法对 noveau 做同样的事情?

答案1

我发现有一种方法可以手动解除 pci 中特定设备的内核模块绑定,所以我写了一个小脚本

echo -n "0000:07:00.1" > /sys/bus/pci/drivers/snd_hda_intel/unbind
echo -n "0000:07:00.1" > /sys/bus/pci/drivers/vfio-pci/bind

echo -n "0000:07:00.2" > /sys/bus/pci/drivers/xhci_hcd/unbind
echo -n "0000:07:00.2" > /sys/bus/pci/drivers/vfio-pci/bind

echo -n "0000:07:00.3" > /sys/bus/pci/drivers/nvidia-gpu/unbind
echo -n "0000:07:00.3" > /sys/bus/pci/drivers/vfio-pci/bind

由于该行,它会挂起一段时间(例如 2 分钟),echo -n "0000:07:00.3" > /sys/bus/pci/drivers/nvidia-gpu/unbind但当它完成时,这是输出lspci -nnv

7:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660] [10de:2184] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation TU116 [GeForce GTX 1660] [10de:1324]
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at e0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at f000 [size=128]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau

07:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aeb] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:1324]
    Flags: fast devsel, IRQ 83
    Memory at f7080000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

07:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1aec] (rev a1) (prog-if 30 [XHCI])
    Subsystem: NVIDIA Corporation Device [10de:1324]
    Flags: fast devsel, IRQ 46
    Memory at e2000000 (64-bit, prefetchable) [size=256K]
    Memory at e2040000 (64-bit, prefetchable) [size=64K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci

07:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1aed] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:1324]
    Flags: fast devsel, IRQ 58
    Memory at f7084000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: i2c_nvidia_gpu

如您所见,它们都使用 vfio-pci。然后我简单地将 GPU 添加到 virt-manager 中,它就起作用了。然而,我仍在调查为什么在 Windows 10 安装过程中,整个 ubuntu 永远冻结了。

更新:

手动解除绑定可以解除 GPU 的绑定,但如果必须解除绑定,则意味着 GPU 的 Linux 驱动程序已经接触过 GPU,因此现在 GPU 知道它在 Linux 上。当您将其绑定到 VM 并启动 VM 时,GPU 的 Windows 驱动程序将读取 GPU 状态并知道有人(Linux)之前弄乱了它,因此将拒绝工作,因为 NVIDIA 很差劲。

不要手动解除绑定,或者至少尝试一下,但可能行不通。相反,确保 Linux 驱动程序永远不会接触 GPU

答案2

我突出显示了第 15 组中的设备。只有 NVIDIA GTX 1060 被 vfio-pci 使用,其他设备被其他内核模块使用。这是问题的根源吗?为了通过 GTX,我必须通过第 15 组中的所有内容,但这些其他东西被其他驱动程序使用,而不是 vfio-pci。

可能是的,但至少其中的三个应该由 vfio-pci 承担

在我的安装了 gtx2070 的机器上,这些是:

  1. VGA 兼容控制器,
  2. 音频设备,
  3. 串行总线控制器
lspci -knn

GPU slot 1 GT 710
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 710B] [10de:128b] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] GK208B [GeForce GT 710] [1462:8c93]
        Kernel driver in use: nvidia
        Kernel modules: nvidia
0b:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] GK208 HDMI/DP Audio Controller [1462:8c93]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

GPU slot 2 gtx2070
0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e84] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:4008]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidia
0c:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f8] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:4008]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
0c:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad8] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:4008]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
0c:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad9] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:4008]
        Kernel driver in use: vfio-pci

我也按照以下说明进行操作https://mathiashueber.com/在设置我的机器时。

  • 我在我的机器上安装了两个 gpu。在第一个插槽(将用于我的 linux 机器的插槽)中,我放了一块低能耗 nvidia 卡。在第二个插槽中,我安装了应该传递到 vm 的 gtx2070

  • 我安装了虚拟机软件 [和其他工具,如 firmware-linux 或来自 debian buster backports 的较新内核]: sudo apt install ovmf virt-manager qemu-kvm

  • 激活 IOMMU(Bios 中的 vt-x/vt-d 等)并将以下行添加到 Grub: GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 kvm.ignore_msrs=1 video=vesafb:off,efifb:off disable_idle_d3=1"

  • 确保我的 GPU 位于其自己的组中:

IOMMU Group 29 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e84] (rev a1)
IOMMU Group 29 0c:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f8] (rev a1)
IOMMU Group 29 0c:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad8] (rev a1)
IOMMU Group 29 0c:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad9] (rev a1)
  • 然后我添加了我想在启动时由 vfio-pci 绑定的 id
sudo nano /etc/initramfs-tools/modules
vfio_pci ids=10de:1e84,10de:10f8,10de:1ad8,10de:1ad9

sudo update-initramfs -u -k --all 
  • 之后,重新启动并重新检查lspci -knn它是否有效。正如你从上面的图片中看到的那样,它对 id 不起作用10de:1ad8。但幸运的是,这不是问题。我的 win10 vm 运行正常,尽管0c:00.2 USB 控制器未被 vfio-pci 占用

我使用的软件版本和内核是:

qemu-system-x86_64 --version
QEMU emulator version 5.0.0 (Debian 1:5.0-14~bpo10+1)
Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
uname -r
5.7.0-0.bpo.2-amd64

关于如何成功通过 gpu 的整个主题非常复杂。可能会发生许多不同的问题。

以下是我的一些经验:

  1. 我记得当使用带有技嘉 ga-p55-ud7 的 Lubuntu 16.04 时,我在启动时无法将我的 gtx970 绑定到 pci-stub,因此我必须像您一样使用 bind/unbind 命令手动执行此操作。(将 Nvidia gpu 列入 qemu/kvm 直通的黑名单

  2. 使用我的新机器 ROG STRIX X570-F GAMING 和 debian buster(如上所示),我能够启动,并且在启动过程中我的主卡(gt710)由 nvidia 驱动程序占用,我的 gtx2070 由 vfio-pci 占用。

  3. 使用另一台 ASRockRack EPYC3251D4I-2T 机器与 debian buster 结合使用时,我在尝试将我的 gtx970 传递到 Windows 客户机时遇到了大问题。为了规避这些问题,我不得不复制一个脚本并在后台运行它(请参阅https://www.reddit.com/r/Amd/comments/7gp1z7/threadripper_kvm_gpu_passthru_testers_needed/

告诉你,为什么你会遇到这些问题,我不知道。也许是:软件过时了?你使用的发行版以及该发行版如何与加载模块交互?制造商未正确编程的 BIOS/主板固件?可能有可用的 BIOS 更新?

相关内容