无法为 nvidia GPU 加载 vfio-pci 驱动程序

无法为 nvidia GPU 加载 vfio-pci 驱动程序

好吧,我没有得到任何进一步的信息,所以请求帮助。我已经尝试了所有我能想到的或在网上找到的方法。我正在尝试让 GPU 直通工作,这样我就可以在带有 virt-manager/KVM 的 VM 中使用它。

我主要遵循本指南(如下)设置所有文件、更新内核并设置 grub 行。我无法从dmesg | grep vfio下面的另一个问题中得到任何输出,所以也许这是一个线索。一个答案说 vfio 模块已集成到内核中,因此 lsmod 不会显示,而我的内核配置文件显示 vfio 条目。我已经使用 pre: 命令尝试在 nvidia 驱动程序之前加载。我能够使用 blocklist.conf 来阻止它,但我的显卡也是 nvidia,并且我无法在恢复模式下进入 shell。

https://github.com/NVIDIA/deepops/blob/master/virtual/README.md#bootloader-changes

https://askubuntu.com/questions/1247058/how-do-i-confirm-that-vfio-is-working-in-20-04

---
lspci -nn | grep NVIDIA
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108GL [Quadro 600] [10de:0df8] (rev a1)
03:00.1 Audio device [0403]: NVIDIA Corporation GF108 High Definition Audio Controller [10de:0bea] (rev a1)
08:00.0 3D controller [0302]: NVIDIA Corporation GF110GL [Tesla M2090] [10de:1091] (rev a1)
08:00.1 Audio device [0403]: NVIDIA Corporation GF110 High Definition Audio Controller [10de:0e09] (rev a1)
---
lspci -nnk -d 10de:1091
08:00.0 3D controller [0302]: NVIDIA Corporation GF110GL [Tesla M2090] [10de:1091] (rev a1)
        Subsystem: NVIDIA Corporation GF110GL [Tesla M2090] [10de:0887]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia

"linux /boot/vmlinuz root=UUID=$uuid acpi=noirq intel_iommu=on iommu=pt vfio-pci ids=10de:1091,10de:0e09  vfio_iommu_type1 allow_unsafe_interrupts=1"

我尝试了vfio_iommu_type1 allow_unsafe_interrupts=1vfio_iommu_type1.allow_unsafe_interrupts=1


CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_VIRQFD=y
CONFIG_VFIO=y
CONFIG_VFIO_NOIOMMU=y
CONFIG_VFIO_PCI=y
CONFIG_VFIO_PCI_VGA=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI_IGD=y
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m

grep -oE 'svm|vmx' /proc/cpuinfo | uniq
vmx

cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

bonding
pci_stub
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_intel

cat /etc/modules-load.d/vfio-pci.conf
vfio-pci

cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1091,10de:0e09
options vfio_iommu_type1 allow_unsafe_interrupts=1
---
cat /etc/modprobe.d/nvidia.conf 
softdep nvidia_384 pre: vfio-pci
#softdep radeon pre: vfio-pci
#softdep amdgpu pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia* pre: vfio-pci
#softdep drm pre: vfio-pci
#softdep xhci_hdc pre: vfio-pci
#options kvm_amd avic=1

modprobe -c | grep vfio
options vfio_pci ids=10de:1091,10de:0e09
options vfio_iommu_type1 allow_unsafe_interrupts=1
softdep mdev post: vfio_mdev
softdep nvidia_384 pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia* pre: vfio-pci

cat /etc/initramfs-tools/modules
# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
#
# Syntax:  module_name [args ...]
#
# You must run update-initramfs(8) to effect this change.
#
# Examples:
#
# raid1
# sd_mod

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
vhost-net

journalctl -b | grep vfio
Dec 10 19:35:17 osboxes kernel: Command line: BOOT_IMAGE=/boot/vmlinuz root=UUID=ef2ecb3b-8e9a-4b20-bf15-47e0c7c98a1f acpi=noirq intel_iommu=on iommu=pt vfio-pci ids=10de:1091,10de:0e09 vfio_iommu_type1 allow_unsafe_interrupts=1
Dec 10 19:35:17 osboxes kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz root=UUID=ef2ecb3b-8e9a-4b20-bf15-47e0c7c98a1f acpi=noirq intel_iommu=on iommu=pt vfio-pci ids=10de:1091,10de:0e09 vfio_iommu_type1 allow_unsafe_interrupts=1
Dec 10 19:35:17 osboxes systemd-modules-load[518]: Module 'vfio' is built in
Dec 10 19:35:17 osboxes systemd-modules-load[518]: Module 'vfio_iommu_type1' is built in
Dec 10 19:35:17 osboxes systemd-modules-load[518]: Module 'vfio_pci' is built in
Dec 10 19:35:17 osboxes systemd-modules-load[518]: Module 'vfio_pci' is built in

编辑:是的,在仅将 nouveau 列入黑名单之后,仍然导致没有驱动程序被加载,我删除了除 blacklist nouveau 之外的所有设置,甚至 nvidia 驱动程序也没有显示.. 采取该黑名单,一切都很好。

相关内容