嵌套的 QEMU-KVM 挂起,并显示“watchdog: BUG: soft lockup - CPU#3 stuck for 134s!”内核消息

嵌套的 QEMU-KVM 挂起,并显示“watchdog: BUG: soft lockup - CPU#3 stuck for 134s!”内核消息

我在 Ubuntu Server 22.04 上运行另一个 QEMU-KVM 虚拟机时遇到 QEMU-KVM 问题。

第一个 VM 通过基于 libvirt 的虚拟机管理器以特权模式启动,命令如下,如图所示ps -A -o pid,tty,etime,user,cmd | grep qemu

1578678 ?              02:28 root     /usr/lib/qemu/virtiofsd --fd=38 -o source=/home/[email protected]/work/109Server
1578694 ?              02:27 libvirt+ /usr/bin/qemu-system-x86_64 -name guest=obmchost-session,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-2-obmchost-session/master-key.aes"} -machine pc-q35-6.2,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram -accel kvm -cpu host,migratable=on -m 8192 -object {"qom-type":"memory-backend-memfd","id":"pc.ram","share":true,"x-use-canonical-path-for-ramblock-id":false,"size":8589934592} -overcommit mem-lock=off -smp 8,sockets=8,cores=1,threads=1 -uuid 3bcc155f-f5a6-4f79-8025-ace56672b355 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=37,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 -device pcie-root-port,port=24,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 -device pcie-root-port,port=25,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 -device pcie-root-port,port=26,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 -device pcie-root-port,port=27,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 -device pcie-root-port,port=28,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 -device pcie-root-port,port=29,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 -blockdev {"driver":"file","filename":"/home/[email protected]/work/109Server/158/vm/host/ubuntu22.04-host.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -chardev socket,id=chr-vu-fs0,path=/var/lib/libvirt/qemu/domain-2-obmchost-session/fs0-fs.sock -device vhost-user-fs-pci,id=fs0,chardev=chr-vu-fs0,tag=/mnt/host-work-sources,bus=pci.1,addr=0x0 -netdev tap,fd=38,id=hostnet0,vhost=on,vhostfd=40 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3b:01:02,bus=pci.2,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=36,server=on,wait=off -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -audiodev {"id":"audio1","driver":"spice"} -spice port=5900,addr=127.0.0.1,disable-ticketing=on,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0,audiodev=audio1 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
1578716 ?              02:27 root     /usr/lib/qemu/virtiofsd --fd=38 -o source=/home/[email protected]/work/109Server

在该虚拟机中,我尝试使用以下脚本手动运行另一个虚拟机:

#!/usr/bin/env bash
 
SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
 
export LC_ALL=C
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
export HOME="$SCRIPTPATH/../vm/bmc"
export XDG_DATA_HOME="$HOME/.local/share"
export XDG_CACHE_HOME="$HOME/.cache"
export XDG_CONFIG_HOME="$HOME/.config"
export QEMU_AUDIO_DRV=none
 
_boot_once=""
 
# shellcheck disable=2154
while getopts ":b" _opt; do
    case "$_opt" in
        "b")
            _boot_once="-boot once=d"
            ;;
        "?")
            printf "Invalid option: -%s" "$OPTARG" >&2
            ;;
    esac
done
 
_uuid="$(uuid)"
 
for _direction in "in" "out"; do
    [ ! -p "$HOME/ipmi-$_direction.pipe" ] && mkfifo "$HOME/ipmi-$_direction.pipe"
done
 
#    -chardev socket,path="$HOME/i2c-0.sock0",id=virt-i2c-0 \
#    -device vhost-user-i2c-pci,chardev=virt-i2c-0,id=i2c-0 \
#    \
# Example is here: https://developers.redhat.com/blog/2020/03/06/configure-and-run-a-qemu-based-vm-outside-of-libvirt#create_a_boot_script_from_the_qemu_command
# TODO: networks... 
# shellcheck disable=2140
qemu-system-x86_64 \
    -name BMC-Ubuntu-22.04.3,debug-threads=on \
    -machine pc-q35-6.2,accel=kvm,dump-guest-core=off,vmport=off,hmat=on \
    -cpu host,hypervisor=on \
    -m 6144M \
    -object memory-backend-file,id=mem,size=6G,mem-path=/dev/shm,share=on \
    -numa node,memdev=mem \
    -overcommit mem-lock=off \
    -smp 2,sockets=1,cores=2,threads=1,maxcpus=2 \
    -device i6300esb \
    -uuid "$_uuid" \
    -no-user-config \
    -nodefaults \
    -chardev socket,id=charmonitor,path="$HOME/monitor.sock",server=on,telnet=on,mux=on,wait=off \
    -mon chardev=charmonitor,id=monitor,mode=readline \
    -chardev socket,id=qmpmonitor,path="$HOME/monitor-qmp.sock",server=on,wait=off \
    -mon chardev=qmpmonitor,id=qmp_chardev,mode=control,pretty=on \
    -rtc base=utc,driftfix=slew \
    -global kvm-pit.lost_tick_policy=discard \
    -no-shutdown \
    -global ICH9-LPC.disable_s3=1 \
    -global ICH9-LPC.disable_s4=1 \
    \
    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
    -device virtio-serial-pci,id=virtio-serial0,bus=pci.1,addr=0x0 \
    -serial mon:telnet::4444,server=on,wait=off \
    \
    -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
    -device virtio-serial-pci,id=virtio-serial1,bus=pci.2,addr=0x0 \
    -serial mon:telnet::4445,server=on,wait=off \
    \
    -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
    -netdev bridge,id=bridgenet0,br=qemu-br0 \
    -device virtio-net-pci,netdev=bridgenet0,id=net0,mac=52:54:00:c9:2d:4f,bus=pci.3,addr=0x0 \
    \
    -drive file="$SCRIPTPATH/../images/ubuntu-22.04.3-live-server-amd64.iso",index=0,media=cdrom \
    \
    -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
    -drive file="$HOME/bmc-x86_64.qcow2",format=qcow2,if=none,id=drive-virtio-disk0,index=1,aio=threads \
    -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0 \
    \
    -chardev socket,id="chr-vu-host-fs0",path="$HOME/host-fs0.sock" \
    -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
    -device vhost-user-fs-pci,id="host-fs0",chardev=chr-vu-host-fs0,tag="/host/work-sources",bus=pci.5,addr=0x0 \
    \
    -chardev pipe,id=pipe_ch0,path="$HOME/ipmi-in.pipe" \
    -device virtserialport,chardev=pipe_ch0,name=serial0 \
    \
    -chardev pipe,id=pipe_ch1,path="$HOME/ipmi-out.pipe" \
    -device virtserialport,chardev=pipe_ch1,name=serial1 \
    \
    -msg timestamp=on \
    $_boot_once \
    -display none \
    -nographic \

我尝试在其上安装操作系统,有时,或者当我在嵌套 VM 内安装一些基于 virtiofs 的 FS 编译软件时,它会挂起并显示以下内核消息:

watchdog: BUG: soft lockup - CPU#3 stuck for 134s!

只有重启才有帮助。

我使用softlockup_panic=1内核选项来通过堆栈跟踪获取恐慌,并得到以下堆栈跟踪(我只能从 tmux 获得这个,终端与 Ubuntu 安装 UI 混淆了):

[   88.672579] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001000
[   88.672579] RDX: ffffe8bf47846200 RSI: ffff99ad984c6500 RDI: ffff99ae61188000
[   88.672579] RBP: ffffbccc40b5ba38 R08: ffffe8bf47846240 R09: 0000000000000001
[   88.672579] R10: 0000000000000293 R11: ffff99ae79d366e0 R12: ffff99ae7ffd6b80
[   88.672579] R13: 0000000000000901 R14: ffff99ae7ffd7cc0 R15: 0000000000000000
[   88.672579] FS:  00007f99a54be740(0000) GS:ffff99ae79d00000(0000) knlGS:0000000000000000
[   88.672579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.672579] CR2: 0000562f966e7580 CR3: 000000011430a000 CR4: 0000000000750ee0
[   88.672579] PKRU: 55555554
[   88.672579] Call Trace:
[   88.672579]  <TASK>
[   88.672579]  ? kernel_init_free_pages.part.0+0x4a/0x70
[   88.672579]  get_page_from_freelist+0x353/0x540
[   88.672579]  ? _copy_to_iter+0xd7/0x710
[   88.672579]  __alloc_pages+0x17e/0x330
[   88.672579]  alloc_pages+0x9e/0x1e0
[   88.672579]  __page_cache_alloc+0x7e/0x90
[   88.672579]  pagecache_get_page+0x152/0x590
[   88.672579]  grab_cache_page_write_begin+0x21/0x40
[   88.672579]  ext4_da_write_begin+0xec/0x2c0
[   88.672579]  generic_perform_write+0xc6/0x200
[   88.672579]  ? file_update_time+0x66/0x140
[   88.672579]  ext4_buffered_write_iter+0xac/0x180
[   88.672579]  ext4_file_write_iter+0x43/0x60
[   88.672579]  new_sync_write+0x111/0x1a0
[   88.672579]  vfs_write+0x1d5/0x270
[   88.672579]  ksys_write+0x67/0xf0
[   88.672579]  __x64_sys_write+0x19/0x20
[   88.672579]  do_syscall_64+0x59/0xc0
[   88.672579]  ? ksys_lseek+0x85/0xc0
[   88.672579]  ? exit_to_user_mode_prepare+0x37/0xb0
[   88.672579]  ? syscall_exit_to_user_mode+0x27/0x50
[   88.672579]  ? __x64_sys_lseek+0x18/0x20
[   88.672579]  ? do_syscall_64+0x69/0xc0
[   88.672579]  ? do_syscall_64+0x69/0xc0
[   88.672579]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   88.672579] RIP: 0033:0x7f99a55d5a37
[   88.672579] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   88.672579] RSP: 002b:00007ffdcf317098 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   88.672579] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f99a55d5a37
[   88.672579] RDX: 0000000000000400 RSI: 000055ce0b672e80 RDI: 0000000000000001
[   88.672579] RBP: 0000000000000400 R08: 0000000000000400 R09: 00000000000ffffe
[   88.672579] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000400
[   88.672579] R13: 000055ce0b672e80 R14: 0000000000008000 R15: 000055ce0b672e80
[   88.672579]  </TASK>
[   88.668575] Shutting down cpus with NMI
[   88.668575] Kernel Offset: 0xec00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   88.668575] ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---

我做错了什么以及如何在主 VM 内获取稳定的嵌套 VM?

我的主机裸机系统是Ubuntu Desktop 22.04.3。

uname -a

Linux georgyodisharia-workstation 6.2.0-1003-nvidia #3~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed May 31 15:37:08 UTC 20 x86_64 x86_64 x86_64 GNU/Linux

lscpu

`Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          24
On-line CPU(s) list:             0-23
Vendor ID:                       GenuineIntel
Model name:                      12th Gen Intel(R) Core(TM) i9-12900K
CPU family:                      6
Model:                           151
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
Stepping:                        2
CPU max MHz:                     5200,0000
CPU min MHz:                     800,0000
BogoMIPS:                        6374.40
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities
Virtualization:                  VT-x
L1d cache:                       640 KiB (16 instances)
L1i cache:                       768 KiB (16 instances)
L2 cache:                        14 MiB (10 instances)
L3 cache:                        30 MiB (1 instance)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-23
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

lsmem

RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x0000000037ffffff  896M online       yes    0-6
0x0000000100000000-0x00000008bfffffff   31G online       yes 32-279

Memory block size:       128M
Total online memory:    31,9G
Total offline memory:      0B

答案1

也许您过度使用了 CPU,而您主机上的 CPU 正忙于或被其他虚拟机阻塞,因此您的虚拟机的 vCPU 在几秒钟内没有响应。

相关内容