我在aws上创建了一个CentOS 8流机器,并在grub中配置了crashkernel
$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-4.18.0-552.el8.x86_64 root=UUID=e52ef623-609b-4202-9b2c-ac7aba5c3bee ro console=ttyS0,115200n8 no_timer_check net.ifnames=0 nvme_core.io_timeout=4294967295 nvme_core.max_retries=10 crashkernel=128M
然后我安装并验证 kdump 正常运行
# systemctl status kdump
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
Active: active (exited) since Sat 2024-04-13 08:42:55 UTC; 40s ago
Process: 808 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Main PID: 808 (code=exited, status=0/SUCCESS)
Apr 13 08:42:43 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Install squash loader ***
Apr 13 08:42:43 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Stripping files ***
Apr 13 08:42:44 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Stripping files done ***
Apr 13 08:42:44 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Squashing the files inside the initramfs ***
Apr 13 08:42:54 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Squashing the files inside the initramfs done ***
Apr 13 08:42:54 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Creating image file '/boot/initramfs-4.18.0-552.el8.x86_64kdump.img' ***
Apr 13 08:42:55 ip-172-31-2-139.ap-east-1.compute.internal dracut[1101]: *** Creating initramfs image file '/boot/initramfs-4.18.0-552.el8.x86_64kdump.img' done ***
Apr 13 08:42:55 ip-172-31-2-139.ap-east-1.compute.internal kdumpctl[814]: kdump: kexec: loaded kdump kernel
Apr 13 08:42:55 ip-172-31-2-139.ap-east-1.compute.internal kdumpctl[814]: kdump: Starting kdump: [OK]
Apr 13 08:42:55 ip-172-31-2-139.ap-east-1.compute.internal systemd[1]: Started Crash recovery kernel arming.
现在我需要模拟内核崩溃,所以我使用 sysrq 来做到这一点
echo c > /proc/sysrq-trigger
但是,重新启动后,我没有像 CentOS 7.X 中那样在 /var/crash 中看到 dmesg 文件
# ls /var/crash
- nothing, empty folder -
我唯一能看到的是 /var/log/kdump.log
+ 2024-04-13 08:35:56 /usr/bin/kdumpctl@679: ret=0
+ 2024-04-13 08:35:56 /usr/bin/kdumpctl@680: set +x
+ 2024-04-13 08:42:55 /usr/bin/kdumpctl@675: /sbin/kexec -s -d -p '--command-line=BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-4.18.0-552.el8.x86_64 ro console=ttyS0,115200n8 no_timer_check net.ifnames=0 nvme_core.io_timeout=4294967295 nvme_core.max_retries=10 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable disable_cpu_apicid=0' --initrd=/boot/initramfs-4.18.0-552.el8.x86_64kdump.img /boot/vmlinuz-4.18.0-552.el8.x86_64
Try gzip decompression.
Try LZMA decompression.
+ 2024-04-13 08:42:55 /usr/bin/kdumpctl@679: ret=0
+ 2024-04-13 08:42:55 /usr/bin/kdumpctl@680: set +x
+ 2024-04-13 08:44:31 /usr/bin/kdumpctl@675: /sbin/kexec -s -d -p '--command-line=BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-4.18.0-552.el8.x86_64 ro console=ttyS0,115200n8 no_timer_check net.ifnames=0 nvme_core.io_timeout=4294967295 nvme_core.max_retries=10 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable disable_cpu_apicid=0' --initrd=/boot/initramfs-4.18.0-552.el8.x86_64kdump.img /boot/vmlinuz-4.18.0-552.el8.x86_64
Try gzip decompression.
Try LZMA decompression.
+ 2024-04-13 08:44:31 /usr/bin/kdumpctl@679: ret=0
+ 2024-04-13 08:44:31 /usr/bin/kdumpctl@680: set +x
我需要 dmesg 来了解导致崩溃的原因。我怎样才能做到这一点?
PS 我有一个内核模块会触发崩溃,但我不知道为什么,为了演示 kdump 的问题,我使用 sysrq,这样任何想要帮助的人都可以轻松重现它。