如何让内核在发送硬件 NMI 时打印堆栈跟踪

如何让内核在发送硬件 NMI 时打印堆栈跟踪

我有运行 FreeBSD、Windows 和 Linux 的 Qemu VM,我可以通过 Qemu 监视器向它们发送硬件 NMI。

qm monitor 100 Entering Qemu Monitor for VM 100 - type 'help' for help qm> help nmi nmi -- inject an NMI

将 NMI 连接到 Windows 虚拟机时,我收到消息正在保存故障转储,然后重新启动虚拟机。

在 Linux 上我收到消息 [26731.911302] Uhhuh. NMI received for unknown reason 31 on CPU 0. [26731.911303] Do you have a strange power saving mode enabled? [26731.911304] Dazed and confused, but trying to continue

如何让内核在控制台上打印堆栈跟踪而不是仅打印此消息?

我需要它来调试由于 IO 速度非常慢而挂起的虚拟机。

答案1

事实证明,在 Linux 上执行此操作的方法是通过 sysctl

sysctl kernel.unknown_nmi_panic=1

设置此参数后,我在控制台上得到堆栈跟踪(在我的例子中是串行控制台,但我认为这在这里并不重要。

[ 253.697690] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.13.4-1-pve #1 [ 253.697691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 253.697691] Call Trace: [ 253.697692] <NMI> [ 253.697697] dump_stack+0x63/0x8b [ 253.697699] panic+0xe4/0x23d [ 253.697700] nmi_panic+0x39/0x40 [ 253.697703] unknown_nmi_error+0x77/0x90 [ 253.697704] default_do_nmi+0xe7/0x110 [ 253.697705] do_nmi+0x119/0x180 [ 253.697707] end_repeat_nmi+0x1a/0x1e [ 253.697710] RIP: 0010:native_safe_halt+0x6/0x10 [ 253.697711] RSP: 0018:ffffffffb0803de0 EFLAGS: 00000246 [ 253.697712] RAX: 0000000000000000 RBX: ffffffffb0810480 RCX: 0000000000000000 [ 253.697712] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 253.697713] RBP: ffffffffb0803de0 R08: 0000006e031e1414 R09: ffffb9a70831fd00 [ 253.697714] R10: 0000000000000000 R11: 8000003b10fbadff R12: 0000000000000000 [ 253.697714] R13: ffffffffb0810480 R14: 0000000000000000 R15: 0000000000000000 [ 253.697717] ? native_safe_halt+0x6/0x10 [ 253.697718] ? native_safe_halt+0x6/0x10 [ 253.697718] </NMI> [ 253.697720] default_idle+0x20/0x100 [ 253.697721] arch_cpu_idle+0xf/0x20 [ 253.697723] default_idle_call+0x23/0x30 [ 253.697725] do_idle+0x17c/0x200 [ 253.697726] cpu_startup_entry+0x73/0x80 [ 253.697727] rest_init+0xbc/0xc0 [ 253.697733] start_kernel+0x4d2/0x4f3 [ 253.697745] ? early_idt_handler_array+0x120/0x120 [ 253.697746] x86_64_start_reservations+0x24/0x26 [ 253.697747] x86_64_start_kernel+0x14f/0x172 [ 253.697748] secondary_startup_64+0x9f/0x9f [ 253.697875] Kernel Offset: 0x2ea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

相关内容