无需重启 lightdm 即可解除 GPU 与 Radeon 驱动程序的绑定

无需重启 lightdm 即可解除 GPU 与 Radeon 驱动程序的绑定

我使用 bash 脚本重新绑定 GPU 以实现 KVM PCI 直通,但这需要我停止 lightdm 才能将其与 radeon 驱动程序解除绑定/绑定。如果我不停止 lightdm,整个系统会在几秒钟后挂起,我甚至无法通过 SSH 查看发生了什么。一定有某种方法可以安全地分离驱动程序。我使用的是内核 4.1.6,因为 4.2 目前会破坏 PCI 直通。

我尝试在解除绑定之前删除 radeon 驱动程序,但是没有效果。

modprobe --remove-dependencies radeon

我怀疑这是因为它被这些东西使用,但由于某种原因而没有被删除:

lsmod | grep radeon
radeon               1589248  0
ttm                    94208  1 radeon
i2c_algo_bit           16384  2 i915,radeon
drm_kms_helper        126976  2 i915,radeon
drm                   352256  7 ttm,i915,drm_kms_helper,radeon

有很多像这样的堆栈跟踪。一些来自 sysfs/group.c,其余来自 drm。看起来这是内存管理的问题。我不确定如何正确解除绑定。

WARNING: CPU: 3 PID: 10935 at /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_object.c:83 radeon_ttm_bo_destroy+0xea/0xf0 [radeon]()
Modules linked in: pci_stub joydev binfmt_misc arc4 nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap ath9k ath9k_common intel_rapl iosf_mbi amdkfd x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek amd_iommu_v2 snd_hda_co$
CPU: 3 PID: 10935 Comm: echo Tainted: G        W       4.1.6-040106-generic #201508170230
Hardware name: ASUS All Series/Z97-E/USB 3.1, BIOS 0403 04/07/2015
ffffffffc08d62a0 ffff88010656fa38 ffffffff817d1363 0000000000000000
0000000000000000 ffff88010656fa78 ffffffff81079c3a ffff88012d9d1ec0
ffff880220a6f868 ffff880220a6f800 0000000000002480 ffff880220a6f868
Call Trace:
[<ffffffff817d1363>] dump_stack+0x45/0x57
[<ffffffff81079c3a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff81079d2a>] warn_slowpath_null+0x1a/0x20
[<ffffffffc07bf5ba>] radeon_ttm_bo_destroy+0xea/0xf0 [radeon]
[<ffffffffc042e4d9>] ttm_bo_release_list+0xa9/0x180 [ttm]
[<ffffffffc04351e0>] ? ttm_bo_man_put_node+0x40/0x50 [ttm]
[<ffffffffc042e6cd>] ttm_bo_release+0x11d/0x2b0 [ttm]
[<ffffffff81507816>] ? __dev_printk+0x46/0xa0
[<ffffffffc042e889>] ttm_bo_unref+0x29/0x30 [ttm]
[<ffffffffc07bfada>] radeon_bo_unref+0x2a/0x50 [radeon]
[<ffffffffc07d4cdb>] radeon_gem_object_free+0x4b/0x50 [radeon]
[<ffffffffc00254a7>] drm_gem_object_free+0x27/0x30 [drm]
[<ffffffffc07bff78>] radeon_bo_force_delete+0x128/0x130 [radeon]
[<ffffffffc07d4ebe>] radeon_gem_fini+0xe/0x10 [radeon]
[<ffffffffc083ebad>] si_fini+0xbd/0x110 [radeon]
[<ffffffffc07a1612>] radeon_device_fini+0x42/0x140 [radeon]
[<ffffffffc07a3d40>] radeon_driver_unload_kms+0x50/0x70 [radeon]
[<ffffffffc002a8cd>] drm_dev_unregister+0x2d/0xc0 [drm]
[<ffffffffc002af87>] drm_put_dev+0x27/0x80 [drm]
[<ffffffffc079f295>] radeon_pci_remove+0x15/0x20 [radeon]
[<ffffffff8140193f>] pci_device_remove+0x3f/0xc0
[<ffffffff8150b297>] __device_release_driver+0x87/0x120
[<ffffffff8150b353>] device_release_driver+0x23/0x30
[<ffffffff8150a04d>] unbind_store+0xbd/0xe0
[<ffffffff81509484>] drv_attr_store+0x24/0x40
[<ffffffff8127478d>] sysfs_kf_write+0x3d/0x50
[<ffffffff81273c3a>] kernfs_fop_write+0x12a/0x180
[<ffffffff811f8d98>] __vfs_write+0x28/0x100
[<ffffffff811fba19>] ? __sb_start_write+0x49/0xf0
[<ffffffff81320993>] ? security_file_permission+0x23/0xa0
[<ffffffff811f9499>] vfs_write+0xa9/0x1b0
[<ffffffff817d6f66>] ? mutex_lock+0x16/0x37
[<ffffffff811fa2a6>] SyS_write+0x46/0xb0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8f32>] system_call_fastpath+0x16/0x75

对于那些感兴趣的人,这是我当前的脚本。(从 xsession 之外的 tty 控制台执行)

#!/bin/bash

read -n3 -rsp "Restart lightdm to unbind the GPU? [yes] " res
test "$res" != 'yes' && exit 1
echo

sudo service lightdm stop
sudo echo "1002 683d" > /sys/bus/pci/drivers/vfio-pci/new_id
sudo echo "1002 aab0" > /sys/bus/pci/drivers/vfio-pci/new_id
sudo echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
sudo echo "0000:01:00.1" > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
sudo echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
sudo echo "1002 683d" > /sys/bus/pci/drivers/vfio-pci/remove_id
sudo echo "1002 aab0" > /sys/bus/pci/drivers/vfio-pci/remove_id
sudo service lightdm start

echo "Rebind Audio"
sudo modprobe pci_stub
sudo echo "8086 8ca0" > /sys/bus/pci/drivers/pci-stub/new_id
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/snd_hda_intel/unbind
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/pci-stub/bind
sudo echo "8086 8ca0" > /sys/bus/pci/drivers/pci-stub/remove_id

# Check if VM drive is mounted
if ! grep -qs '/media/ljosalfur/VM' /proc/mounts; then
echo "Attempting to mount VM drive. I don't know how though."
#sudo mkdir /media/ljosalfur/VM
#sudo mount /dev/disk/by-id/0BD253F0-EF7F-6F40-BDD8-FABF85161762 /media/ljosalfur/VM
fi

sudo kvm -monitor stdio -vnc :0 \
-m 6G -mem-path /dev/hugepages \
-drive if=pflash,format=raw,file=./OVMF.fd -rtc base=localtime \
-cpu host -smp 6,sockets=1,cores=6,threads=1 \
-device vfio-pci,host=01:00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=01:00.1 \
-device pci-assign,host=00:1b.0 \
-drive file=/media/ljosalfur/VM/vm7.img,format=raw,cache=writethrough \
-smb /media/ljosalfur \
-usb -usbdevice host:046d:c24a -show-cursor \
-usb -usbdevice host:1b1c:1b08

echo
echo "Re-Rebind Audio"
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/pci-stub/unbind
sudo echo "0000:00:1b.0" > /sys/bus/pci/drivers/snd_hda_intel/bind

echo "Unbind GPU from vfio-pci"
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind
sudo echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/unbind

read -n3 -rsp "Restart lightdm to rebind the GPU? [yes] " ress
test "$ress" != 'yes' && (exit 1)
echo
sudo echo "0000:01:00.0" > /sys/bus/pci/drivers/radeon/bind

答案1

最后,根据此线程中的信息,找到了一种方法来做到这一点:https://www.reddit.com/r/VFIO/comments/41pm1q/my_bash_script_for_rebinding_a_secondary_nvidia/

工作脚本是:

#!/bin/bash
# Which device and which related HDMI audio device. They're usually in pairs.
export VGA_DEVICE=0000:01:00.0
export AUDIO_DEVICE=0000:01:00.1
 
export VGA_DRIVER=radeon
export AUDIO_DRIVER=snd_hda_intel
 
# Passing through USB devices. Querying bus address and feeding that to QEMU
# instead of the device ID, so you can yank and replug the keyboard to regain
# control.
export KEYBOARD="1b1c:1b08"
export MOUSE="046d:c24a"
 
# Unbinds a device and loads the driver specified.
flipdriver() {
    dev="$1"
    driver="$2"
 
    if [ -z $driver ] | [ -z $dev ];
    then
        return 1
    fi
 
    vendor=$(cat /sys/bus/pci/devices/${dev}/vendor)
    device=$(cat /sys/bus/pci/devices/${dev}/device)
 
    echo -n Unbinding $vendor:$device ...
 
    if [ -e /sys/bus/pci/devices/${dev}/driver ]; then
        echo ${dev} > /sys/bus/pci/devices/${dev}/driver/unbind
        while [ -e /sys/bus/pci/devices/${dev}/driver ]; do
            sleep 0.5
            echo -n .
        done
    fi
    echo " OK!"
 
    echo -n Binding \'$driver\' to $vendor:$device ...
    echo ${vendor} ${device} > /sys/bus/pci/drivers/${driver}/new_id
 
    echo " OK!"
 
    return 0
}
 
# Common error message
fliperror()
{
    echo "Couldn\'t perform required driver switch\'n\'bait!"
    exit 1
}
 
# Xorg shouldn't run.
if [ -n "$( ps -C xinit | grep xinit )" ];
then
    echo "Don\'t run this inside Xorg!"
    exit 1
fi
 
# Unbind specified graphics card and audio device.
echo "Pulling the plug on the specified passthrough devices..."
flipdriver $VGA_DEVICE vfio-pci
flipdriver $AUDIO_DEVICE vfio-pci
 
export QEMU_PA_SAMPLES=128
export QEMU_AUDIO_DRV=alsa
 
# Get the bus addresses for keyboard and mouse.
export QEMU_KEYB=$( lsusb | sed -n 's/Bus \([0-9]*\) Device \([0-9]*\): ID '$KEYBOARD'.*/-device usb-host,bus=xhci.0,hostbus=\1,hostaddr=\2/p' )
export QEMU_MOUS=$( lsusb | sed -n 's/Bus \([0-9]*\) Device \([0-9]*\): ID '$MOUSE'.*/-device usb-host,bus=xhci.0,hostbus=\1,hostaddr=\2 -show-cursor/p' )
 
# Check if VM drive is mounted
if ! grep -qs '/media/user/VM' /proc/mounts; then
echo "Attempting to mount VM drive."
sudo mount /dev/sdc1
fi
 
#network stuff
tunctl -t vmtap10
ip link set dev tap10 address 42:42:42:42:42:10
ifconfig vmtap10 192.168.42.1 up
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -A FORWARD -i vmtap10 -j ACCEPT
iptables -A FORWARD -o vmtap10 -m state --state RELATED,ESTABLISHED -j ACCEPT
 
#pactl set-sink-volume 2 50%
 
echo Starting virtual machine...
sleep 0.2
 
# QEMU stuff
sudo kvm -monitor stdio -vnc :1 -vga none \
-drive if=pflash,format=raw,file=./OVMF.fd -rtc base=localtime \
-m 4G -mem-path /dev/hugepages \
-cpu host -smp sockets=1,cores=6,threads=1 \
-soundhw ac97 \
-device vfio-pci,host=01:00.0 \
-usb -usbdevice host:046d:c215 \
-usb -device nec-usb-xhci,id=xhci \
$QEMU_KEYB \
$QEMU_MOUS \
-net nic,macaddr=42:42:42:42:42:42 -net tap,ifname=vmtap10,script=no,downscript=no,vhost=on \
-drive file=/media/user/VM/vm10.img,format=qcow2,cache=writeback \
-smb /media/ljosalfur \
-cdrom /home/user/Downloads/virtio-win-0.1.105.iso
 
# Rebind the devices for the host.
echo Adios vfio, reloading the host drivers for the passedthrough devices...
flipdriver $AUDIO_DEVICE $AUDIO_DRIVER
flipdriver $VGA_DEVICE $VGA_DRIVER
 
iptables -F
iptables -t nat -F POSTROUTING
ip link delete vmtap10

相关内容