Ubuntu 崩溃，日志中没有任何内容，无法在崩溃转储中找到 dmesg 日志

2024-6-7 • tag-icon

我们正在运行旧版本的 Ubuntu (14)，内核为 3.16.0-30-generic。我们正在使用只有 32 位驱动程序的外围设备，因此使用旧版本（16 是最后一个 32 位 Ubuntu，但我们仍在使用 14）。

我们在运行时开始遇到随机崩溃 - 以前从未发生过这种情况，而且我们使用的硬件和 Linux 映像与以前相同。考虑到芯片短缺和供应链问题，我们怀疑我们使用的 PC 或外围设备上的某些组件有问题；并试图找出问题所在。

崩溃似乎是随机发生的；操作系统完全死机。屏幕变黑（紫色/黑色）；我们必须重启电源。/var/log 中的任何日志（syslog、kern.log 或 dmesg）中都没有显示任何内容；我们只看到黑屏，必须重新启动。所以我的结论是内核崩溃了。

我进行了很多调试步骤，尝试转储内核，并取得了一定成功。我最想找到的是崩溃时的实时 dmesg，因为我希望它能指出导致崩溃的特定外围设备或驱动程序。但我无法在任何内核转储日志/文件中找到它。

以下是我的设置，以及迄今为止我尝试过的三件事。

/etc/default/kdump 工具

# kdump-tools configuration
# ---------------------------------------------------------------------------
# USE_KDUMP - controls kdump will be configured
#     0 - kdump kernel will not be loaded
#     1 - kdump kernel will be loaded and kdump is configured
# KDUMP_SYSCTL - controls when a panic occurs, using the sysctl
#     interface.  The contents of this variable should be the
#     "variable=value ..." portion of the 'sysctl -w ' command.
#     If not set, the default value "kernel.panic_on_oops=1" will
#     be used.  Disable this feature by setting KDUMP_SYSCTL=" "
#     Example - also panic on oom:
#         KDUMP_SYSCTL="kernel.panic_on_oops=1 vm.panic_on_oom=1"
#
USE_KDUMP=1
#KDUMP_SYSCTL="kernel.panic_on_oops=1"


# ---------------------------------------------------------------------------
# Kdump Kernel:
# KDUMP_KERNEL - A full pathname to a kdump kernel.
# KDUMP_INITRD - A full pathname to the kdump initrd (if used).
#     If these are not set, kdump-config will try to use the current kernel
#     and initrd if it is relocatable.  Otherwise, you will need to specify
#     these manually.
#KDUMP_KERNEL=
#KDUMP_INITRD=


# ---------------------------------------------------------------------------
# vmcore Handling:
# KDUMP_COREDIR - local path to save the vmcore to.
# KDUMP_FAIL_CMD - This variable can be used to cause a reboot or
#     start a shell if saving the vmcore fails.  If not set, "reboot -f"
#     is the default.
#     Example - start a shell if the vmcore copy fails:
#         KDUMP_FAIL_CMD="echo 'makedumpfile FAILED.'; /bin/bash; reboot -f"
KDUMP_COREDIR="/var/crash"
#KDUMP_FAIL_CMD="reboot -f"


# ---------------------------------------------------------------------------
# Makedumpfile options:
# DEBUG_KERNEL - a debug version of the running kernel.  If not set,
#     kdump-config will use /usr/lib/debug/vmlinux-$(uname -r) if it is
#     available.  If it is not available, makedumpfile will be limited to
#     dumping all pages in memory.
# MAKEDUMP_ARGS - extra arguments passed to makedumpfile (8).  The default,
#     if unset, is to pass '-c -d 31' telling makedumpfile to use compression
#     and reduce the corefile to in-use kernel pages only.
#DEBUG_KERNEL=
MAKEDUMP_ARGS="-c -d 31 --dump-dmesg /proc/vmcore dmesgfile"


# ---------------------------------------------------------------------------
# Kexec/Kdump args
# KDUMP_KEXEC_ARGS - Additional arguments to the kexec command used to load
#     the kdump kernel
#     Example - Use this option on x86 systems with PAE and more than
#     4 gig of memory:
#         KDUMP_KEXEC_ARGS="--elf64-core-headers"
# KDUMP_CMDLINE - The default is to use the contents of /proc/cmdline.
#     Set this variable to override /proc/cmdline.
# KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
#     for the kdump kernel.  If unset, it defaults to "irqpoll maxcpus=1 nousb"
#KDUMP_KEXEC_ARGS=""
#KDUMP_CMDLINE=""
#KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb"

# ---------------------------------------------------------------------------
# Architecture specific Overrides:

kdump-config 显示

USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x18000000
current state:    ready to kdump

kernel link:


kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-3.16.0-30-generic root=UUID=673608e6-1369-49d0-8f4e-6ee7c70a217b ro memmap=128M$256M vmalloc=280M quiet splash vt.handoff=7 irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.16.0-30-generic /boot/vmlinuz-3.16.0-30-generic

但是当我使用以下方法强制崩溃时：echo c > /proc/sysrq-trigger

我在 /var/crash 文件夹中收到以下输出：

（1KB）linux-image-3.16.0-30-generic 3.16.0-30.40~14.04.1-202208090959.crash文件夹，其名称为日期
（3.8GB）vmcore.CurrentDate
从这里我尝试了三件事。

我尝试的第一件物品

首先，尝试打开崩溃文件并查看其中是否包含 dmesg。在其中我发现：

ProblemType: KernelCrash
Architecture: i386
Date: Tue Aug  9 09:59:44 2022
DistroRelease: Ubuntu 14.04
Package: linux-image-3.16.0-30-generic 3.16.0-30.40~14.04.1
Uname: Linux 3.16.0-30-generic i686
VmCoreDmesg: base64
 H4sICAAAAAAC/1ZtQ29yZURtZXNnAA==

这没有 Dmesg 的读数。

第二项

我尝试的第二件事是在崩溃时打开 VmCore.CurrentDate 文件，使用以下命令：

sudo crash ./System.map-3.16.0-30-generic ./vmlinux-3.16.0-30-generic ./202208101500/vmcore.20220810150

但这会引发一个错误：

crash: ./vmlinux-3.16.0-30-generic: no debugging data available

我尝试查找此版本内核的调试符号，但无济于事；我从 /boot 目录中的 vmlinuz 文件中提取了 vmlinux 文件，但我怀疑它是否有调试符号。所以我在这里陷入了困境。

第三项...

我一直在读的另一件事是 kdump 应该输出一个 vmcore-dmesg.txt 文件，但我没有找到任何这样的文件。这确实是我想指出的一个方向，但它似乎缺失了。

我已经在互联网上搜索了有关这个问题的所有信息，希望有人能给我指明方向，帮助我找到导致这些内核崩溃的原因。

谢谢

相关内容