我在两个搭载 Windows Server 2016 Datacenter 的 Hyper-V 主机上遇到了一个严重的问题。
如果没有外部影响,它们会崩溃,所有客户虚拟机也会崩溃。STOP 代码是 UNEXPECTED_KERNEL_MODE_TRAP 0x0000007f,这是由 vmswitch.sys 引起的。简单的重启无法解决这个问题,它总是连续崩溃 8 到 10 次,直到再次正常工作。
这个问题发生在我们两台硬件完全不同的服务器上。我尝试了多个版本的网卡驱动程序,但都没有解决问题。网卡和虚拟机设置中的 VMQ 已被禁用。
我将所有驱动程序和 BIOS 更新到最新版本。在一台服务器上,我更换了以太网适配卡,但问题仍然存在。
您可以在这里下载最近的小型转储文件:https://dl.dropboxusercontent.com/u/76615769/110416-20546-01.dmp
我使用 WinDbg 分析了最近的 memory.dmp,发现故障是由 vmswitch!VmsPktParseIPv4Packet 引起的。
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault). The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken
(on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: fffff80234218e70
Arg3: fffff80234202fc0
Arg4: fffff80d94e0fa0a
Debugging Details:
------------------
DUMP_CLASS: 1
DUMP_QUALIFIER: 402
BUILD_VERSION_STRING: 14393.447.amd64fre.rs1_release_inmarket.161102-0100
SYSTEM_MANUFACTURER: LENOVO
SYSTEM_PRODUCT_NAME: Lenovo ThinkServer TS430
SYSTEM_SKU: OEM_String
SYSTEM_VERSION: 04411GG
BIOS_VENDOR: LENOVO
BIOS_VERSION: 4.25
BIOS_DATE: 08/08/2016
BASEBOARD_MANUFACTURER: LENOVO
BASEBOARD_PRODUCT: GA-6UASV2
BASEBOARD_VERSION: N/A
DUMP_TYPE: 0
BUGCHECK_P1: 8
BUGCHECK_P2: fffff80234218e70
BUGCHECK_P3: fffff80234202fc0
BUGCHECK_P4: fffff80d94e0fa0a
BUGCHECK_STR: 0x7f_8
TRAP_FRAME: fffff80234218e70 -- (.trap 0xfffff80234218e70)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ffff622415d5af2e rbx=0000000000000000 rcx=ffffd982943bfd10
rdx=0000000000000065 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80d94e0fa0a rsp=fffff80234202fc0 rbp=0000000000000000
r8=0000000000000366 r9=fffff80234206cb0 r10=0000000000000000
r11=fffff80d94e57f83 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
vmswitch!VmsPktParseIPv4Packet+0x1a:
fffff80d`94e0fa0a 4c89442438 mov qword ptr [rsp+38h],r8 ss:0018:fffff802`34202ff8=????????????????
Resetting default scope
CPU_COUNT: 8
CPU_MHZ: d40
CPU_VENDOR: GenuineIntel
CPU_FAMILY: 6
CPU_MODEL: 3a
CPU_STEPPING: 9
CPU_MICROCODE: 6,3a,9,0 (F,M,S,R) SIG: 1B'00000000 (cache) 1B'00000000 (init)
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: 2
ANALYSIS_SESSION_HOST: FELIX-WIN10
ANALYSIS_SESSION_TIME: 12-03-2016 16:29:37.0725
ANALYSIS_VERSION: 10.0.14321.1024 amd64fre
STACK_OVERFLOW: Stack Limit: fffff80234203000. Use (kF) and (!stackusage) to investigate stack usage.
STACKUSAGE_FUNCTION: The function at address 0xFFFFF80D94E7411C was blamed for the stack overflow. It is using 15264 bytes of stack total in 106 instances (likely recursion).
FOLLOWUP_IP:
vmswitch!VmsPktParseIPv4Packet+6472c
fffff80d`94e7411c 90 nop
STACK_COMMAND: .trap 0xfffff80234218e70 ; kb
THREAD_SHA1_HASH_MOD_FUNC: 71a30400073d671566ecf9d703e6e5093d8f333d
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: b894bbd73b89626ade2941eb152107226b6892b2
THREAD_SHA1_HASH_MOD: 6c44de534055635e7718f6cd495814d6f86d55df
FAULT_INSTR_CODE: ba45e990
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: vmswitch!VmsPktParseIPv4Packet+6472c
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: vmswitch
IMAGE_NAME: vmswitch.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 5819bf9a
BUCKET_ID_FUNC_OFFSET: 6472c
FAILURE_BUCKET_ID: 0x7f_8_STACK_USAGE_RECURSION_vmswitch!VmsPktParseIPv4Packet
BUCKET_ID: 0x7f_8_STACK_USAGE_RECURSION_vmswitch!VmsPktParseIPv4Packet
PRIMARY_PROBLEM_CLASS: 0x7f_8_STACK_USAGE_RECURSION_vmswitch!VmsPktParseIPv4Packet
TARGET_TIME: 2016-12-03T14:54:56.000Z
OSBUILD: 14393
OSSERVICEPACK: 0
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 400
PRODUCT_TYPE: 3
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
OSEDITION: Windows 10 Server TerminalServer DataCenter SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: 2016-11-02 11:17:03
BUILDDATESTAMP_STR: 161102-0100
BUILDLAB_STR: rs1_release_inmarket
BUILDOSVER_STR: 10.0.14393.447.amd64fre.rs1_release_inmarket.161102-0100
ANALYSIS_SESSION_ELAPSED_TIME: 5c3
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x7f_8_stack_usage_recursion_vmswitch!vmspktparseipv4packet
FAILURE_ID_HASH: {54616c76-71d3-7277-08ba-c2d2fa4a114c}
以下是调用堆栈:
nt!KeBugCheckEx
nt!KiBugCheckDispatch+0x69
nt!KiDoubleFaultAbort+0xb3
vmswitch!VmsPktParseIPv4Packet+0x1a
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParseIPv4Packet+0x6472c
vmswitch!VmsPktParsePacket+0x156
vmswitch!VmsNblGenerateRssHash+0xd3
vmswitch!VmsVmNicPvtPacketForward+0x234
vmswitch!VmsRouterDeliverNetBufferLists+0x81a
vmswitch!VmsExtPtReceiveNetBufferLists+0x193
NDIS!ndisMIndicateNetBufferListsToOpen+0x11e
NDIS!ndisMTopReceiveNetBufferLists+0x265fc
NDIS!ndisCallReceiveHandler+0x47
NDIS!NdisMIndicateReceiveNetBufferLists+0x735
vmswitch!VmsExtMpIndicatePackets+0x7b4
vmswitch!VmsExtMpSendNetBufferLists+0x5a8
NDIS!ndisMSendNBLToMiniportInternal+0xee
NDIS!NdisSendNetBufferLists+0x36c
vmswitch!VmsExtPtRouteNetBufferLists+0x3fe
vmswitch!VmsPtNicReceiveNetBufferLists+0x8a0
NDIS!ndisMIndicateNetBufferListsToOpen+0x11e
NDIS!NdisMIndicateReceiveNetBufferLists+0x26ca3
e1r65x64!DriverEntry+0x12c1b
e1r65x64!DriverEntry+0x13eaf
e1r65x64!DriverEntry+0x1accd
e1r65x64!DriverEntry+0x1afdb
e1r65x64!DriverEntry+0x1a76c
NDIS!ndisInterruptDpc+0x1c9
nt!KiExecuteAllDpcs+0x2b1
nt!KiRetireDpcList+0x5df
nt!KiIdleLoop+0x5a
另一个有趣的点是,一台服务器仅使用 2008 R2 客户机运行良好。添加 Server 2016 客户机后,这台服务器也开始出现蓝屏。主机和客户机的事件日志均不包含任何有用信息。
我已经在 TechNet 论坛上讨论过这个问题,但那里的提示没有帮助。也许有人可以帮助我
感谢您的努力,FelR