目前,我们有 3 个系统在完全相同的硬件和软件配置下运行 CentOS,但遇到了随机系统挂起的情况。这种情况可能在启动后 20 分钟内随机发生,也可能在 1 或 2 周后才会发生。我们运行了一个独立的实时 Ubuntu 映像,并连续运行压力测试,没有任何问题。我们认为这可能是我们系统上安装的驱动程序或软件,但不确定如何确定是什么原因造成的。
如果我们想确定是什么原因导致我们的系统挂起,我们该怎么做?
KERNEL: /lib/debug/lib/modules/3.10.0-1062.12.1.el7.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2020-08-28-19:02:49/vmcore [PARTIAL DUMP]
CPUS: 72
DATE: Fri Aug 28 19:02:35 2020
UPTIME: 6 days, 13:03:56 LOAD AVERAGE: 7.87, 7.35, 7.45
TASKS: 5679
NODENAME: zagreb
RELEASE: 3.10.0-1062.12.1.el7.x86_64
VERSION: #1 SMP Tue Feb 4 23:02:59 UTC 2020
MACHINE: x86_64 (3000 Mhz)
MEMORY: 1023.4 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at (null)"
PID: 19718
COMMAND: "9_scheduler"
TASK: ffff8a8bc9ab1070 [THREAD_INFO: ffff8a8be0618000]
CPU: 34
STATE: TASK_RUNNING (PANIC)
crash>
以下是回溯的日志:
crash> bt
PID: 19718 TASK: ffff8a8bc9ab1070 CPU: 34 COMMAND: "9_scheduler"
#0 [ffff8a8be061ba90] machine_kexec at ffffffff90665b34
#1 [ffff8a8be061baf0] __crash_kexec at ffffffff90722352
#2 [ffff8a8be061bbc0] crash_kexec at ffffffff90722440
#3 [ffff8a8be061bbd8] oops_end at ffffffff90d85798
#4 [ffff8a8be061bc00] no_context at ffffffff90675bb4
#5 [ffff8a8be061bc50] __bad_area_nosemaphore at ffffffff90675e82
#6 [ffff8a8be061bca0] bad_area_nosemaphore at ffffffff90675fa4
#7 [ffff8a8be061bcb0] __do_page_fault at ffffffff90d88750
#8 [ffff8a8be061bd20] do_page_fault at ffffffff90d88975
#9 [ffff8a8be061bd50] page_fault at ffffffff90d84778
[exception RIP: anon_vma_clone+117]
RIP: ffffffff908008e5 RSP: ffff8a8be061be08 RFLAGS: 00010286
RAX: ffff8a90d42e95f0 RBX: 0000000000000000 RCX: 0000000000ea39f5
RDX: 0000000000000040 RSI: 0000000000000200 RDI: ffff8a0f7fc07b00
RBP: ffff8a8be061be48 R8: 000000000001f0a0 R9: ffffffff908008d4
R10: ffff8ad35135e0c0 R11: 0000000000000000 R12: ffff8a90d42e9d18
R13: ffff8b0bea29d410 R14: ffff8a90d42e9cb0 R15: ffff8a90d42e95f0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8a8be061be50] __split_vma at ffffffff907f962e
#11 [ffff8a8be061be90] do_munmap at ffffffff907f992a
#12 [ffff8a8be061bee0] vm_munmap at ffffffff907f9cb5
#13 [ffff8a8be061bf30] sys_munmap at ffffffff907faf52
#14 [ffff8a8be061bf50] system_call_fastpath at ffffffff90d8dede
RIP: 00007f1ef3f82dd7 RSP: 00007f1e53ffebc0 RFLAGS: 00000246
RAX: 000000000000000b RBX: 0000000000040000 RCX: 00007f1ef3f6d727
RDX: 0000000000000003 RSI: 0000000000040000 RDI: 00007f1d2af40000
RBP: 0000000000922a40 R8: ffffffffffffffff R9: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 00007f1e53ffea58
R13: 00007f1d2af00000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 000000000000000b CS: 0033 SS: 002b
crash>