具有 freeNas 操作系统的来宾虚拟机在具有 amd r9 5995x 的主机中陷入 kdb 模式

具有 freeNas 操作系统的来宾虚拟机在具有 amd r9 5995x 的主机中陷入 kdb 模式

问题描述:

我曾经在配备 amd r3 3100 的主机中安装 ubuntu20.04,并安装了 kvm 并启动了 freeNas vm,一切顺利。但是一旦我更换了 cpu,freeNas guest 无法工作,但其他使用 ubuntu 的 guest 可以运行。

登录 freeNas guest

db> reboot
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 1
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.3-RELEASE-p14 #0 r325575+c936002dbe2(HEAD): Mon Sep 28 10:48:27 EDT 2020
    [email protected]:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64-DEBUG amd64
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): text 80x25
CPU: AMD EPYC-Milan Processor (3400.05-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0xa00f11  Family=0x19  Model=0x1  Stepping=1
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0xfff83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0xc003f7<LAHF,CMP,SVM,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,Topology,PCXC>
  Structured Extended Features=0x211c07ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLWB,SHA>
  Structured Extended Features2=0x40060c<UMIP,PKU,RDPID>
  Structured Extended Features3=0xac000010<IBPB,STIBP,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x69<RDCL_NO,SKIP_L1DFL_VME>
  AMD Extended Feature Extensions ID EBX=0x300d205<CLZERO,XSaveErPtr>
  SVM: NP,NRIP,NAsids=16
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 8489271296 (8096 MB)
avail memory = 8143572992 (7766 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPCAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s)
WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
random: entropy device external interface
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
mlx5en: Mellanox Ethernet driver 3.5.1 (April 2019)
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard
padlock0: No ACE support.
acpi0: <BOCHS BXPCRSDT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc1a0-0xc1af at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> port 0xc100-0xc11f mem 0xf4000000-0xf7ffffff,0xf8000000-0xfbffffff,0xfc094000-0xfc095fff irq 10 at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI Network adapter> port 0xc120-0xc13f mem 0xfc096000-0xfc096fff,0xfebf0000-0xfebf3fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 52:54:00:9b:85:3a
pci0: <multimedia, HDA> at device 4.0 (no driver attached)
uhci0: <Intel 82801I (ICH9) USB controller> port 0xc140-0xc15f irq 10 at device 5.0 on pci0
usbus0 on uhci0
usbus0: 12Mbps Full Speed USB v1.0
uhci1: <Intel 82801I (ICH9) USB controller> port 0xc160-0xc17f irq 10 at device 5.1 on pci0
usbus1 on uhci1
usbus1: 12Mbps Full Speed USB v1.0
uhci2: <Intel 82801I (ICH9) USB controller> port 0xc180-0xc19f irq 11 at device 5.2 on pci0
usbus2 on uhci2
usbus2: 12Mbps Full Speed USB v1.0
ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfc097000-0xfc097fff irq 11 at device 5.7 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci0
usbus3: 480Mbps High Speed USB v2.0
virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfc098000-0xfc098fff,0xfebf4000-0xfebf7fff irq 10 at device 6.0 on pci0
virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc0c0-0xc0ff mem 0xfebf8000-0xfebfbfff irq 11 at device 7.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2
virtio_pci3: <VirtIO PCI Block adapter> port 0xc000-0xc07f mem 0xfc099000-0xfc099fff,0xfebfc000-0xfebfffff irq 11 at device 8.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci3
vtblk0: 5723166MB (11721045168 512 byte sectors)
acpi_syscontainer0: <System Container> on acpi0
acpi_syscontainer1: <System Container> port 0xaf00-0xaf0b on acpi0
acpi_syscontainer2: <System Container> port 0xafe0-0xafe3 on acpi0
acpi_syscontainer3: <System Container> port 0xae00-0xae13 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
orm0: <ISA Option ROM> at iomem 0xe9800-0xeffff on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 10.000 msec
freenas_sysctl: adding account.
freenas_sysctl: adding directoryservice.
freenas_sysctl: adding middlewared.
freenas_sysctl: adding network.
freenas_sysctl: adding services.
ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled
ugen2.1: <Intel UHCI root HUB> at usbus2
ugen3.1: <Intel EHCI root HUB> at usbus3
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel UHCI root HUB> at usbus1
uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <QEMU HARDDISK 2.5+> ATA-7 device
ada0: Serial Number QM00001
ada0: 16.700MB/s transfers (WDMA2, PIO 8192bytes)
ada0: 61440MB (125829120 512 byte sectors)
cd0 at ata0 bus 0 scbus0 target 1 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00002
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from zfs:freenas-boot/ROOT/default []...
Root mount waiting for: usbus3 usbus2 usbus1 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
Root mount waiting for: usbus3
Root mount waiting for: usbus3
uhub1: 6 ports with 6 removable, self powered
Root mount waiting for: usbus3
ugen3.2: <QEMU QEMU USB Tablet> at usbus3
Starting devd.
warning: KLD '/boot/kernel-debug/uhid.ko' is newer than the linker.hints file
lo0: link state changed to UP


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xfffffe02311f30c0
fault code      = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff81016d09


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xfffffe02311c60c0
stack pointer           = 0x28:0xfffffe02311f1eb0
frame pointer           = 0x28:0xfffffe02311f1eb0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 99 (python3.7)
trap number     = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02311f1b70
vpanic() at vpanic+0x17e/frame 0xfffffe02311f1bd0
panic() at panic+0x43/frame 0xfffffe02311f1c30
trap_fatal() at trap_fatal+0x369/frame 0xfffffe02311f1c80
trap_pfault() at trap_pfault+0x62/frame 0xfffffe02311f1cd0
trap() at trap+0x2b3/frame 0xfffffe02311f1de0
calltrap() at calltrap+0x8/frame 0xfffffe02311f1de0
--- trap 0xc, rip = 0xffffffff81016d09, rsp = 0xfffffe02311f1eb0, rbp = 0xfffffe02311f1eb0 ---
bcopy() at bcopy+0x19/frame 0xfffffe02311f1eb0
fpugetregs() at fpugetregs+0x192/frame 0xfffffe02311f1f00
get_mcontext() at get_mcontext+0x1b4/frame 0xfffffe02311f1f50
sys_getcontext() at sys_getcontext+0x56/frame 0xfffffe02311f2300
amd64_syscall() at amd64_syscall+0x792/frame 0xfffffe02311f2430
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe02311f2430
--- syscall (421, FreeBSD ELF64, sys_getcontext), rip = 0x801c26280, rsp = 0x7fffffffd188, rbp = 0x7fffffffdcf0 ---
KDB: enter: panic
[ thread pid 99 tid 100490 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

cpu的bios设置如下:

dmidecode | grep "Processor Information" -A 54
Processor Information
    Socket Designation: AM4
    Type: Central Processor
    Family: Zen
    Manufacturer: Advanced Micro Devices, Inc.
    ID: 10 0F A2 00 FF FB 8B 17
    Signature: Family 25, Model 33, Stepping 0
    Flags:
        FPU (Floating-point unit on-chip)
        VME (Virtual mode extension)
        DE (Debugging extension)
        PSE (Page size extension)
        TSC (Time stamp counter)
        MSR (Model specific registers)
        PAE (Physical address extension)
        MCE (Machine check exception)
        CX8 (CMPXCHG8 instruction supported)
        APIC (On-chip APIC hardware supported)
        SEP (Fast system call)
        MTRR (Memory type range registers)
        PGE (Page global enable)
        MCA (Machine check architecture)
        CMOV (Conditional move instruction supported)
        PAT (Page attribute table)
        PSE-36 (36-bit page size extension)
        CLFSH (CLFLUSH instruction supported)
        MMX (MMX technology supported)
        FXSR (FXSAVE and FXSTOR instructions supported)
        SSE (Streaming SIMD extensions)
        SSE2 (Streaming SIMD extensions 2)
        HTT (Multi-threading)
    Version: AMD Ryzen 9 5950X 16-Core Processor
    Voltage: 1.1 V
    External Clock: 100 MHz
    Max Speed: 5050 MHz
    Current Speed: 3400 MHz
    Status: Populated, Enabled
    Upgrade: Socket AM4
    L1 Cache Handle: 0x0013
    L2 Cache Handle: 0x0014
    L3 Cache Handle: 0x0015
    Serial Number: Unknown
    Asset Tag: Unknown
    Part Number: Unknown
    Core Count: 16
    Core Enabled: 16
    Thread Count: 32
    Characteristics:
        64-bit capable
        Multi-Core
        Hardware Thread
        Execute Protection
        Enhanced Virtualization
        Power/Performance Control

在 kdb 中重置后,我发现以下信息:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xfffffe02311d00c0
fault code      = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff81016d09
stack pointer           = 0x28:0xfffffe02311ceeb0
frame pointer           = 0x28:0xfffffe02311ceeb0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 99 (python3.7)
trap number     = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:

我尝试过的事情:

  1. 重新安装 guest 但失败,同样的问题也无法进入 kdb 模式
  2. 重启主机,但无法修复

问题:

  1. 我可以做什么来从 kdb 收集更详细的信息
  2. 如何解决问题
  3. freeNas 不支持 AMD Ryzen 9 5950X 16 核处理器

答案1

在 Wu 的帮助下,我能够使用以下命令使用 freeNas os 映像创建测试虚拟机:

virt-install \
--name test \
--memory 8096 \
--vcpus 2 \
--cpu host-model-only \
--cdrom /var/lib/libvirt/isos/TrueNAS-12.0-U5.1.iso \
--disk size=30,bus=virtio \
--network type=direct,source=enp42s0,source_mode=bridge \
--os-type=linux  \
--os-variant freebsd11.3 \
--graphics vnc,listen=0.0.0.0,port=20012 \
--video vga --input tablet,bus=usb

比较 freeNas vm 和 test vm 的 xml 后,我将 cpu 组件更改为以下

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>EPYC-Rome</model>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='spec-ctrl'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='virt-ssbd'/>
  </cpu>

并运行如下命令

virsh destroy freeNas
virsh start freeNas

最后它回来了。

目前,我不知道为什么会这样,因为这只是受到尝试而不是理论的启发。

相关内容