我开始将服务器从 Ubuntu 18.04 升级到 20.04。在 dom0 中重新安装 xen/qemu 后XEN 从 18.04 升级到 20.04 后无法启动域dom0 和所有 domU 实例均按预期工作。
接下来,我将 domU 从 18.04 更新到 20.04。更新进行得很顺利,但 domU 在启动时崩溃了。
domU 的配置如下:
#
# Hostname
#
name = 'smarthome'
#
# Kernel + memory size
#
kernel = '/usr/lib/grub-xen/grub-x86_64-xen.bin'
root = ''
extra = 'iommu=soft console=hvc0 earlyprintk=xen (xen/xvdb)/boot/grub/grub.cfg'
vcpus = '2'
cpus = '6-11'
memory = '4096'
localtime = 0
#
# Disk device(s).
#
disk = [
'/dev/pulsar02-vg/Xsmarthome-disk,,xvdb',
'/dev/pulsar02-vg/Xsmarthome-swap,,xvda',
]
#
# Networking
#
#vif = [ 'ip=192.168.20.250 ,mac=00:16:3E:2D:F8:6B,bridge=xbrlan' ]
pci = [
"0000:02:12.1,permissive=1" # SRV (VF)
]
#
# Behaviour
#
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'destroy'
使用 xl create -c /etc/xen/Xsmarthome.cfg 启动时的内核消息是:
(early) [ 1.652005] PM: Registered nosave memory: [mem 0xfed92000-0xfedfffff]
(early) [ 1.652008] PM: Registered nosave memory: [mem 0xfee00000-0xfeefffff]
(early) [ 1.652012] PM: Registered nosave memory: [mem 0xfef00000-0xfeffffff]
(early) [ 1.652015] PM: Registered nosave memory: [mem 0xff000000-0xffffffff]
(early) [ 1.652019] [mem 0x90000000-0xdfffffff] available for PCI devices
(early) [ 1.652023] Booting paravirtualized kernel on Xen
(early) [ 1.652026] Xen version: 4.11.4-pre (preserve-AD)
(early) [ 1.652032] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
(early) [ 1.652040] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
(early) [ 1.652133] percpu: Embedded 54 pages/cpu s184320 r8192 d28672 u1048576
(early) [ 1.652174] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
(early) [ 1.652180] Built 1 zonelists, mobility grouping on. Total pages: 1032073
(early) [ 1.652184] Policy zone: Normal
(early) [ 1.652188] Kernel command line: root=UUID=024e1144-d306-4e5f-b67e-01db80827787 ro mitigations=off iommu=soft console=hvc0 earlyprintk=xen
(early) [ 1.652374] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
(early) [ 1.652439] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
(early) [ 1.652660] mem auto-init: stack:off, heap alloc:on, heap free:off
(early) [ 1.698831] software IO TLB: mapped [mem 0x178200000-0x17c200000] (64MB)
(early) [ 1.706256] Memory: 3960888K/4193916K available (14339K kernel code, 2398K rwdata, 4956K rodata, 2716K init, 4988K bss, 233028K reserved, 0K cma-reserved)
(early) [ 1.706269] random: get_random_u64 called from kmem_cache_open+0x2d/0x410 with crng_init=0
(early) [ 1.706471] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
(early) [ 1.706906] ftrace: allocating 44527 entries in 174 pages
(early) [ 1.718159] rcu: Hierarchical RCU implementation.
(early) [ 1.718164] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2.
(early) [ 1.718168] Tasks RCU enabled.
(early) [ 1.718171] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
(early) [ 1.718174] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
(early) [ 1.721252] Using NULL legacy PIC
(early) [ 1.721257] NR_IRQS: 524544, nr_irqs: 48, preallocated irqs: 0
(early) [ 1.721304] xen:events: Using FIFO-based ABI
(early) [ 1.721478] random: crng done (trusting CPU's manufacturer)
(early) [ 1.721508] Console: colour dummy device 80x25
(early) [ 1.721597] printk: console [tty0] enabled
[ 1.721604] printk: console [hvc0] enabled
(early) [ 1.721604] printk: console [hvc0] enabled
[ 1.721610] printk: bootconsole [xenboot0] disabled
(early) [ 1.721610] printk: bootconsole [xenboot0] disabled
[ 1.721636] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 1.721650] installing Xen timer for CPU 0
[ 1.721677] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2fc04a0142b, max_idle_ns: 440795346615 ns
[ 1.721687] Calibrating delay loop (skipped), value calculated using timer frequency.. 6625.46 BogoMIPS (lpj=13250936)
[ 1.721694] pid_max: default: 32768 minimum: 301
[ 1.721752] LSM: Security Framework initializing
[ 1.721772] Yama: becoming mindful.
[ 1.721818] AppArmor: AppArmor initialized
[ 1.721900] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[ 1.721910] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[ 1.721943] *** VALIDATE tmpfs ***
[ 1.722115] *** VALIDATE proc ***
[ 1.722214] *** VALIDATE cgroup1 ***
[ 1.722218] *** VALIDATE cgroup2 ***
(early) Poking(early) KASLR using(early) RDRAND(early) RDTSC(early) ...
[ 1.722386] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[ 1.722391] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[ 1.722398] Speculative Store Bypass: Vulnerable
[ 1.722402] SRBDS: Unknown: Dependent on hypervisor status
[ 1.762961] cpu 0 spinlock event irq 1
[ 1.762967] VPMU disabled by hypervisor.
[ 1.763121] Performance Events: unsupported p6 CPU model 158 no PMU driver, software events only.
[ 1.763173] rcu: Hierarchical SRCU implementation.
[ 1.763631] NMI watchdog: Perf NMI watchdog permanently disabled
[ 1.763674] smp: Bringing up secondary CPUs ...
[ 1.763790] installing Xen timer for CPU 1
[ 1.763811] SMP alternatives: switching to SMP code
[ 1.803767] cpu 1 spinlock event irq 13
[ 1.803767] smp: Brought up 1 node, 2 CPUs
[ 1.803767] smpboot: Max logical packages: 1
[ 1.803767] devtmpfs: initialized
[ 1.803767] x86/mm: Memory block size: 128MB
[ 1.803767] PM: Registering ACPI NVS region [mem 0x836a9000-0x836a9fff] (4096 bytes)
[ 1.803767] PM: Registering ACPI NVS region [mem 0x8ccc5000-0x8cda6fff] (925696 bytes)
[ 1.803767] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 1.803767] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 1.803767] pinctrl core: initialized pinctrl subsystem
[ 1.822672] PM: RTC time: 165:165:165, date: 2065-165-165
[ 1.822823] NET: Registered protocol family 16
[ 1.822837] xen:grant_table: Grant tables using version 1 layout
[ 1.849698] Grant table initialized
[ 1.849738] audit: initializing netlink subsys (disabled)
[ 1.849751] audit: type=2000 audit(1603042631.327:1): state=initialized audit_enabled=0 res=1
[ 1.849751] EISA bus registered
[ 1.850622] PCI: setting up Xen PCI frontend stub
[ 1.853869] fbcon: Taking over console
[ 1.853869] ACPI: Interpreter disabled.
[ 1.853869] xen:balloon: Initialising balloon driver
[ 1.853869] iommu: Default domain type: Translated
[ 1.853869] SCSI subsystem initialized
[ 1.853869] vgaarb: loaded
[ 1.853869] usbcore: registered new interface driver usbfs
[ 1.853869] usbcore: registered new interface driver hub
[ 1.853869] usbcore: registered new device driver usb
[ 1.853869] pps_core: LinuxPPS API ver. 1 registered
[ 1.853869] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 1.853869] PTP clock support registered
[ 1.853869] EDAC MC: Ver: 3.0.0
[ 1.853869] PCI: System does not support PCI
[ 1.853869] NetLabel: Initializing
[ 1.853869] NetLabel: domain hash size = 128
[ 1.853869] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 1.853869] NetLabel: unlabeled traffic allowed by default
[ 1.858197] clocksource: Switched to clocksource xen
[ 1.864969] *** VALIDATE bpf ***
[ 1.865005] VFS: Disk quotas dquot_6.6.0
[ 1.865017] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.865033] *** VALIDATE ramfs ***
[ 1.865036] hugetlbfs: disabling because there are no supported hugepage sizes
[ 1.865085] AppArmor: AppArmor Filesystem Enabled
[ 1.865099] pnp: PnP ACPI: disabled
[ 1.866763] thermal_sys: Registered thermal governor 'fair_share'
[ 1.866763] thermal_sys: Registered thermal governor 'bang_bang'
[ 1.866785] thermal_sys: Registered thermal governor 'step_wise'
[ 1.866790] thermal_sys: Registered thermal governor 'user_space'
[ 1.866794] thermal_sys: Registered thermal governor 'power_allocator'
[ 1.866838] NET: Registered protocol family 2
[ 1.866947] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[ 1.866968] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[ 1.867010] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[ 1.867047] TCP: Hash tables configured (established 32768 bind 32768)
[ 1.867066] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[ 1.867077] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[ 1.867105] NET: Registered protocol family 1
[ 1.867112] NET: Registered protocol family 44
[ 1.867117] PCI: CLS 0 bytes, default 64
[ 1.867138] Trying to unpack rootfs image as initramfs...
[ 2.218397] Freeing initrd memory: 43228K
[ 2.218495] check: Scanning for low memory corruption every 60 seconds
[ 2.218936] Initialise system trusted keyrings
[ 2.218950] Key type blacklist registered
[ 2.219018] workingset: timestamp_bits=36 max_order=20 bucket_order=0
[ 2.219708] zbud: loaded
[ 2.219896] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 2.220102] fuse: init (API version 7.31)
[ 2.220121] *** VALIDATE fuse ***
[ 2.220125] *** VALIDATE fuse ***
[ 2.220207] Platform Keyring initialized
[ 2.222191] Key type asymmetric registered
[ 2.222195] Asymmetric key parser 'x509' registered
[ 2.222204] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 244)
[ 2.222244] io scheduler mq-deadline registered
[ 2.222316] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 2.222711] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[ 2.223862] Linux agpgart interface v0.103
[ 2.224871] loop: module loaded
[ 2.224876] Invalid max_queues (4), will use default max: 2.
[ 2.225814] libphy: Fixed MDIO Bus: probed
[ 2.225821] tun: Universal TUN/TAP device driver, 1.6
[ 2.225857] PPP generic driver version 2.4.2
[ 2.225890] xen_netfront: Initialising Xen virtual ethernet driver
[ 2.225922] VFIO - User Level meta-driver version: 0.3
[ 2.225977] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 2.225983] ehci-pci: EHCI PCI platform driver
[ 2.225991] ehci-platform: EHCI generic platform driver
[ 2.226002] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 2.226011] ohci-pci: OHCI PCI platform driver
[ 2.226018] ohci-platform: OHCI generic platform driver
[ 2.226023] uhci_hcd: USB Universal Host Controller Interface driver
[ 2.226060] i8042: PNP: No PS/2 controller found.
[ 2.226064] i8042: Probing ports directly.
[ 3.238830] i8042: No controller found
[ 3.238859] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2fc04a0142b, max_idle_ns: 440795346615 ns
[ 3.238984] mousedev: PS/2 mouse device common for all mice
[ 3.239072] i2c /dev entries driver
[ 3.239103] device-mapper: uevent: version 1.0.3
[ 3.239168] device-mapper: ioctl: 4.41.0-ioctl (2019-09-16) initialised: [email protected]
[ 3.239189] platform eisa.0: Probing EISA bus 0
[ 3.239213] platform eisa.0: EISA: Detected 0 cards
[ 3.239222] intel_pstate: CPU model not supported
[ 3.239263] ledtrig-cpu: registered to indicate activity on CPUs
[ 3.239301] BUG: unable to handle page fault for address: ffffc900401d3818
[ 3.239308] #PF: supervisor read access in kernel mode
[ 3.239312] #PF: error_code(0x0000) - not-present page
[ 3.239315] PGD 7ec72067 P4D 7ec72067 PUD 177d50067 PMD 177d51067 PTE 0
[ 3.239321] Oops: 0000 [#1] SMP NOPTI
[ 3.239325] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-51-generic #56-Ubuntu
[ 3.239333] RIP: e030:pmc_core_probe+0x134/0x17f
[ 3.239337] Code: 82 48 c7 c7 68 70 d7 82 e8 a9 e5 80 ff 48 8b 05 82 2c 48 01 48 c7 83 88 00 00 00 40 70 d7 82 48 63 40 44 48 03 05 64 2c 48 01 <8b> 00 48 8b 15 63 2c 48 01 48 c7 c7 60 db 14 82 8b 4a 48 ba 01 00
[ 3.239349] RSP: e02b:ffffc9004000bbc8 EFLAGS: 00010286
[ 3.239353] RAX: ffffc900401d3818 RBX: ffffffff827dbe80 RCX: 80000000fe001073
[ 3.239358] RDX: ffffffff82d77020 RSI: ffffffff8242e365 RDI: ffffffff82d77068
[ 3.239363] RBP: ffffc9004000bbe0 R08: 0000000000000000 R09: ffffc9004000ba80
[ 3.239368] R10: 0000000000007ff0 R11: ffff888177f96900 R12: ffffffff827dbe90
[ 3.239373] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.239384] FS: 0000000000000000(0000) GS:ffff88817c600000(0000) knlGS:0000000000000000
[ 3.239389] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.239394] CR2: ffffc900401d3818 CR3: 000000000260a000 CR4: 0000000000040660
[ 3.239402] Call Trace:
[ 3.239408] platform_drv_probe+0x3b/0x80
[ 3.239412] really_probe+0x2b3/0x3e0
[ 3.239415] driver_probe_device+0xbc/0x100
[ 3.239419] __device_attach_driver+0x71/0xd0
[ 3.239423] ? driver_allows_async_probing+0x50/0x50
[ 3.239427] bus_for_each_drv+0x84/0xd0
[ 3.239431] __device_attach+0xed/0x170
[ 3.239434] device_initial_probe+0x13/0x20
[ 3.239437] bus_probe_device+0x8f/0xa0
[ 3.239441] device_add+0x3c7/0x6b0
[ 3.239444] platform_device_add+0xf9/0x240
[ 3.239448] platform_device_register+0x6b/0x70
[ 3.239455] ? pmc_core_driver_init+0x19/0x19
[ 3.239459] pmc_core_platform_init+0x43/0x45
[ 3.239464] do_one_initcall+0x4a/0x1fa
[ 3.239469] kernel_init_freeable+0x1b2/0x255
[ 3.239474] ? rest_init+0xb0/0xb0
[ 3.239478] kernel_init+0xe/0x100
[ 3.239481] ret_from_fork+0x1f/0x40
[ 3.239485] Modules linked in:
[ 3.239488] CR2: ffffc900401d3818
[ 3.239494] ---[ end trace c5dde7a582f9e4e9 ]---
[ 3.239499] RIP: e030:pmc_core_probe+0x134/0x17f
[ 3.239502] Code: 82 48 c7 c7 68 70 d7 82 e8 a9 e5 80 ff 48 8b 05 82 2c 48 01 48 c7 83 88 00 00 00 40 70 d7 82 48 63 40 44 48 03 05 64 2c 48 01 <8b> 00 48 8b 15 63 2c 48 01 48 c7 c7 60 db 14 82 8b 4a 48 ba 01 00
[ 3.239513] RSP: e02b:ffffc9004000bbc8 EFLAGS: 00010286
[ 3.239517] RAX: ffffc900401d3818 RBX: ffffffff827dbe80 RCX: 80000000fe001073
[ 3.239522] RDX: ffffffff82d77020 RSI: ffffffff8242e365 RDI: ffffffff82d77068
[ 3.239527] RBP: ffffc9004000bbe0 R08: 0000000000000000 R09: ffffc9004000ba80
[ 3.239532] R10: 0000000000007ff0 R11: ffff888177f96900 R12: ffffffff827dbe90
[ 3.239538] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.239547] FS: 0000000000000000(0000) GS:ffff88817c600000(0000) knlGS:0000000000000000
[ 3.239552] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.239556] CR2: ffffc900401d3818 CR3: 000000000260a000 CR4: 0000000000040660
[ 3.239565] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 3.239574] Kernel Offset: disabled
在进一步尝试启动 domU 时,我发现崩溃仅在网卡(i350 4 端口 SR-IOV VF)使用 pci 直通时发生。禁用该线路后,domU 将使用 Ubuntu 20.04 内核 5.4.x 启动
另一次尝试表明,具有 pci 直通的 domU 在 Ubuntu 20.04 下运行,其内核较旧,例如 4.15.x odr 5.0.x。
内核 5.4.x 的 domU 启动命令行是否有新参数,或者我是否必须使用 initramfs 在 domU 中加载任何 xen 相关模块?
为什么 domU 会在内核 5.4.x 下崩溃?我该如何修复?其他人有类似的经历吗?这是英特尔 pmc_core_probe 中的错误吗?
亲切的问候
托尔斯滕
答案1
尝试添加swiotlb=force
到你的 domUgrub.cfg
或xen.cfg
额外部分