崩溃重启导致意外崩溃

崩溃重启导致意外崩溃

我正在使用一台物理服务器(Debian 11 bullseye),该服务器去年一直工作良好,最近几天它开始表现得非常奇怪,它随机重新启动崩溃,我无法通过检查系统日志找出问题所在。 ..

REBOOT CRASH 1

Nov 26 07:04:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:04:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:04:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:05:01 testing CRON[320608]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 26 07:05:02 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:05:02 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:05:20 testing smartd[1136]: Device: /dev/bus/0 [megaraid_disk_04], SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5
Nov 26 07:05:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:06:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:06:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:28:51 testing systemd-random-seed[456]: Kernel entropy pool is not initialized yet, waiting until it is.
Nov 26 07:28:51 testing systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 26 07:28:51 testing systemd[1]: Finished Create System Users.
Nov 26 07:28:51 testing systemd[1]: Starting Create Static Device Nodes in /dev...
Nov 26 07:28:51 testing systemd[1]: [email protected]: Succeeded.
Nov 26 07:28:51 testing kernel: [    0.000000] Linux version 5.10.0-9-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
Nov 26 07:28:51 testing systemd[1]: Finished Load Kernel Module drm.
Nov 26 07:28:51 testing systemd[1]: Finished Coldplug All udev Devices.
Nov 26 07:28:51 testing kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-8f99-5441121afaf2412 ro quiet crashkernel=2000M crashkernel=384M-:128M
Nov 26 07:28:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Nov 26 07:28:51 testing kernel: [    0.000000] x86/fpu: x87 FPU will use FXSAVE
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-provided physical RAM map:
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20
Nov 26 07:28:51 testing systemd[1]: Finished Set the console keyboard layout.
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved
Nov 26 07:28:51 testing apparmor.systemd[962]: Restarting AppArmor
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable
Nov 26 07:28:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bd369000-0x00000000bf38efff] reserved

------------------------------------------------------------------------------------------------------------------------------------------------------

REBOOT CRASH 2

Nov 26 07:45:41 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:45:41 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:46:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:46:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:46:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:47:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:47:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:47:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:48:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:48:43 testing ddclient[2198]: CONNECT:  checkip.dyndns.org
Nov 26 07:48:43 testing ddclient[2198]: CONNECTED:  using HTTP
Nov 26 07:48:43 testing ddclient[2198]: SENDING:  GET / HTTP/1.0
Nov 26 07:48:43 testing ddclient[2198]: SENDING:   Host: checkip.dyndns.org
Nov 26 07:48:43 testing ddclient[2198]: SENDING:   User-Agent: ddclient/3.9.1
Nov 26 07:48:43 testing ddclient[2198]: SENDING:   Connection: close
Nov 26 07:48:43 testing ddclient[2198]: SENDING:
Nov 26 07:48:43 testing ddclient[2198]: SENDING:
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  HTTP/1.1 200 OK#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Date: Fri, 26 Nov 2021 06:48:43 GMT#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Content-Type: text/html#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Content-Length: 104#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Connection: close#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Cache-Control: no-cache#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  Pragma: no-cache#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  #015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE:  <html><head><title>Current IP Check</title></head><body>Current IP Address: 123.456.78.90</body></html>#015
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS:  database.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS:  jenkins.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS:  monitors.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:48:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:49:11 testing kernel: [  356.406208] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Nov 26 07:56:51 testing systemd-random-seed[448]: Kernel entropy pool is not initialized yet, waiting until it is.
Nov 26 07:56:51 testing systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 26 07:56:51 testing systemd[1]: [email protected]: Succeeded.
Nov 26 07:56:51 testing systemd[1]: Finished Load Kernel Module drm.
Nov 26 07:56:51 testing kernel: [    0.000000] Linux version 5.10.0-9-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
Nov 26 07:56:51 testing systemd[1]: Finished Coldplug All udev Devices.
Nov 26 07:56:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Nov 26 07:56:51 testing kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-1234-123456789 ro quiet crashkernel=2000M crashkernel=384M-:128M
Nov 26 07:56:51 testing systemd[1]: Finished Create Static Device Nodes in /dev.
Nov 26 07:56:51 testing kernel: [    0.000000] x86/fpu: x87 FPU will use FXSAVE
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-provided physical RAM map:
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20
Nov 26 07:56:51 testing systemd[1]: Starting Rule-based Manager for Device Events and Files...
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable
Nov 26 07:56:51 testing kernel: [    0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable

Nov 23 11:05:13 myserver kernel: [    2.352549] ata_piix 0000:00:1f.2: version 2.13
Nov 23 11:05:13 myserver kernel: [    3.576528] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Nov 23 11:05:13 myserver kernel: [    3.723306] sr 1:0:0:0: Attached scsi CD-ROM sr0
Nov 23 11:05:13 myserver kernel: [    4.093233] PM: Image not found (code -22)
Nov 23 11:05:13 myserver kernel: [   10.167638] checking generic (d5800000 130000) vs hw (d5800000 800000)
Nov 23 12:37:25 myserver PackageKit: daemon start
Nov 26 07:28:51 myserver kernel: [    0.002793] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [    0.002798] e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 26 07:28:51 myserver kernel: [    0.002814] MTRR default type: uncachable
Nov 26 07:28:51 myserver kernel: [    0.002815] MTRR fixed ranges enabled:
Nov 26 07:28:51 myserver kernel: [    0.002817]   00000-9FFFF write-back
Nov 26 07:28:51 myserver kernel: [    0.002819]   A0000-BFFFF uncachable
Nov 26 07:28:51 myserver kernel: [    0.002820]   C0000-CBFFF write-protect
Nov 26 07:28:51 myserver kernel: [    0.002822]   CC000-D3FFF write-back
Nov 26 07:28:51 myserver kernel: [    0.002823]   D4000-EBFFF uncachable
Nov 26 07:28:51 myserver kernel: [    0.002825]   EC000-FFFFF write-protect
Nov 26 07:28:51 myserver kernel: [    0.002826] MTRR variable ranges enabled:
Nov 26 07:28:51 myserver kernel: [    0.002829]   0 base 0000000000 mask FF80000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002831]   1 base 0080000000 mask FFC0000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002833]   2 base 0100000000 mask FF00000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002834]   3 base 0200000000 mask FE00000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002836]   4 base 0400000000 mask FC00000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002838]   5 base 0800000000 mask F800000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002840]   6 base 1000000000 mask F800000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002842]   7 base 1800000000 mask FFC0000000 write-back
Nov 26 07:28:51 myserver kernel: [    0.002843]   8 disabled
Nov 26 07:28:51 myserver kernel: [    0.002844]   9 disabled
Nov 26 07:28:51 myserver kernel: [    0.004625] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [    0.021048] e820: update [mem 0xba378000-0xba37afff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [    0.022384] ACPI: Local APIC address 0xfee00000
Nov 26 07:28:51 myserver kernel: [    0.023393] On node 0 totalpages: 12582912
Nov 26 07:28:51 myserver kernel: [    0.023395]   Normal zone: 196608 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023396]   Normal zone: 12582912 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [    0.023401] On node 1 totalpages: 12569668
Nov 26 07:28:51 myserver kernel: [    0.023402]   DMA zone: 64 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023404]   DMA zone: 3984 pages, LIFO batch:0
Nov 26 07:28:51 myserver kernel: [    0.023405]   DMA32 zone: 12019 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023407]   DMA32 zone: 769204 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [    0.023408]   Normal zone: 184320 pages used for memmap
Nov 26 07:28:51 myserver kernel: [    0.023410]   Normal zone: 11796480 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [    0.040393] ACPI: Local APIC address 0xfee00000
Nov 26 07:28:51 myserver kernel: [    0.040434] ACPI: IRQ0 used by override.
Nov 26 07:28:51 myserver kernel: [    0.040436] ACPI: IRQ9 used by override.
Nov 26 07:28:51 myserver kernel: [    0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152
Nov 26 07:28:51 myserver kernel: [    0.040436] ACPI: IRQ9 used by override.
Nov 26 07:28:51 myserver kernel: [    0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152
Nov 26 07:28:51 myserver kernel: [    0.049933] pcpu-alloc: [0] 00 02 04 06 08 10 12 14 [0] 16 18 20 22 -- -- -- --
Nov 26 07:28:51 myserver kernel: [    0.049950] pcpu-alloc: [1] 01 03 05 07 09 11 13 15 [1] 17 19 21 23 -- -- -- --
Nov 26 07:28:51 myserver kernel: [    0.950514] PCI: root bus fe: using default resources
Nov 26 07:28:51 myserver kernel: [    0.950516] PCI: Probing PCI hardware (bus fe)
Nov 26 07:28:51 myserver kernel: [    0.952145] PCI: root bus ff: using default resources
Nov 26 07:28:51 myserver kernel: [    0.952146] PCI: Probing PCI hardware (bus ff)
Nov 26 07:28:51 myserver kernel: [    0.953705] PCI: pci_cache_line_size set to 64 bytes
Nov 26 07:28:51 myserver kernel: [    0.953817] e820: reserve RAM buffer [mem 0xba378000-0xbbffffff]
Nov 26 07:28:51 myserver kernel: [    0.953820] e820: reserve RAM buffer [mem 0xbc768000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.953823] e820: reserve RAM buffer [mem 0xbca67000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.953826] e820: reserve RAM buffer [mem 0xbcf12000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.953828] e820: reserve RAM buffer [mem 0xbcf69000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [    0.974640] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [    0.974699] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active)
Nov 26 07:28:51 myserver kernel: [    0.975062] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
Nov 26 07:28:51 myserver kernel: [    0.975413] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
Nov 26 07:28:51 myserver kernel: [    0.976584] system 00:04: Plug and Play ACPI device, IDs PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [    0.976656] pnp 00:05: [irq 0 disabled]
Nov 26 07:28:51 myserver kernel: [    0.976725] system 00:05: Plug and Play ACPI device, IDs IPI0001 PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [    0.977664] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active)
Nov 26 07:28:51 myserver kernel: [    0.977799] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
Nov 26 07:28:51 myserver kernel: [    1.847200] intel_idle: MWAIT substates: 0x1120
Nov 26 07:28:51 myserver kernel: [    1.847270] Monitor-Mwait will be used to enter C-1 state
Nov 26 07:28:51 myserver kernel: [    1.847287] Monitor-Mwait will be used to enter C-3 state
Nov 26 07:28:51 myserver kernel: [    1.847390] intel_idle: v0.5.1 model 0x2C
Nov 26 07:28:51 myserver kernel: [    1.848984] intel_idle: Local APIC timer is reliable in all C-states
Nov 26 07:28:51 myserver kernel: [    2.168571]   with arguments:
Nov 26 07:28:51 myserver kernel: [    2.168572]     /init
Nov 26 07:28:51 myserver kernel: [    2.168573]   with environment:
Nov 26 07:28:51 myserver kernel: [    2.168575]     HOME=/
Nov 26 07:28:51 myserver kernel: [    2.168576]     TERM=linux
Nov 26 07:28:51 myserver kernel: [    2.168577]     BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64
Nov 26 07:28:51 myserver kernel: [    2.168579]     crashkernel=384M-:128M
Nov 26 07:28:51 myserver kernel: [    2.336535] megaraid_sas 0000:04:00.0: BAR:0x1  BAR's base_addr(phys):0x00000000df1bc000  mapped virt_addr:0x(____ptrval____)
Nov 26 07:28:51 myserver kernel: [    2.348825] libata version 3.00 loaded.
Nov 26 07:28:51 myserver kernel: [    2.353143] ata_piix 0000:00:1f.2: version 2.13
Nov 26 07:28:51 myserver kernel: [    3.577545] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Nov 26 07:28:51 myserver kernel: [    3.697853] sr 1:0:0:0: Attached scsi CD-ROM sr0
Nov 26 07:28:51 myserver kernel: [    4.107713] PM: Image not found (code -22)
Nov 26 07:28:51 myserver kernel: [   12.677156] checking generic (d5800000 130000) vs hw (d5800000 800000)
Nov 26 07:43:30 myserver kernel: [    0.002794] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 26 07:43:30 myserver kernel: [    0.002799] e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 26 07:43:30 myserver kernel: [    0.002814] MTRR default type: uncachable
Nov 26 07:43:30 myserver kernel: [    0.002815] MTRR fixed ranges enabled:
Nov 26 07:43:30 myserver kernel: [    0.002817]   00000-9FFFF write-back
Nov 26 07:43:30 myserver kernel: [    0.002819]   A0000-BFFFF uncachable
Nov 26 07:43:30 myserver kernel: [    0.002821]   C0000-CBFFF write-protect
Nov 26 07:43:30 myserver kernel: [    0.002822]   CC000-D3FFF write-back
Nov 26 07:43:30 myserver kernel: [    0.002824]   D4000-EBFFF uncachable
Nov 26 07:43:30 myserver kernel: [    0.002825]   EC000-FFFFF write-protect
Nov 26 07:43:30 myserver kernel: [    0.002827] MTRR variable ranges enabled:
Nov 26 07:43:30 myserver kernel: [    0.002829]   0 base 0000000000 mask FF80000000 write-back
Nov 26 07:43:30 myserver kernel: [    0.002831]   1 base 0080000000 mask FFC0000000 write-back

我一直在监控我的 CPU/RAM 使用情况,CPU 温度从未达到超过 34 摄氏度,也从未达到超过 30% 的过载。 RAM 使用量约为 70GB 可供使用...

我不太确定重新启动的原因,希望获得任何帮助,我们将不胜感激!

答案1

机器可能会因 a 崩溃kernel panic,因此您不会在日志中看到任何内容,因为一旦panic发生 a ,内核实际上就会崩溃,并且无法再向日志写入任何内容。在崩溃之前内核未同步到磁盘的任何内容都将丢失。

core dump您应该使用启用内核,一旦触发kdumpa ,该内核会将内存转储写入本地磁盘中的 a 文件。panic稍后可以在机器启动后使用crash.

你可以阅读这里有关如何启用内核核心转储的说明。如果它不适合您的发行版,您可能可以找到一些其他文章来解释如何做到这一点。当你的机器崩溃并产生核心转储后,你需要使用crash它来分析它。一些不错的教程可以在德多梅多。这不一定很容易,但这是找到崩溃线索的唯一方法。在核心转储中,您还可以读取崩溃前未同步到磁盘的日志。

相关内容