我正在使用一台物理服务器(Debian 11 bullseye),该服务器去年一直工作良好,最近几天它开始表现得非常奇怪,它随机重新启动崩溃,我无法通过检查系统日志找出问题所在。 ..
REBOOT CRASH 1
Nov 26 07:04:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:04:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:04:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:05:01 testing CRON[320608]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 26 07:05:02 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:05:02 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:05:20 testing smartd[1136]: Device: /dev/bus/0 [megaraid_disk_04], SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5
Nov 26 07:05:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:06:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:06:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:28:51 testing systemd-random-seed[456]: Kernel entropy pool is not initialized yet, waiting until it is.
Nov 26 07:28:51 testing systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 26 07:28:51 testing systemd[1]: Finished Create System Users.
Nov 26 07:28:51 testing systemd[1]: Starting Create Static Device Nodes in /dev...
Nov 26 07:28:51 testing systemd[1]: [email protected]: Succeeded.
Nov 26 07:28:51 testing kernel: [ 0.000000] Linux version 5.10.0-9-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
Nov 26 07:28:51 testing systemd[1]: Finished Load Kernel Module drm.
Nov 26 07:28:51 testing systemd[1]: Finished Coldplug All udev Devices.
Nov 26 07:28:51 testing kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-8f99-5441121afaf2412 ro quiet crashkernel=2000M crashkernel=384M-:128M
Nov 26 07:28:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Nov 26 07:28:51 testing kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-provided physical RAM map:
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20
Nov 26 07:28:51 testing systemd[1]: Finished Set the console keyboard layout.
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved
Nov 26 07:28:51 testing apparmor.systemd[962]: Restarting AppArmor
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable
Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bd369000-0x00000000bf38efff] reserved
------------------------------------------------------------------------------------------------------------------------------------------------------
REBOOT CRASH 2
Nov 26 07:45:41 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:45:41 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:46:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:46:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:46:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:47:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:47:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:47:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:48:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter...
Nov 26 07:48:43 testing ddclient[2198]: CONNECT: checkip.dyndns.org
Nov 26 07:48:43 testing ddclient[2198]: CONNECTED: using HTTP
Nov 26 07:48:43 testing ddclient[2198]: SENDING: GET / HTTP/1.0
Nov 26 07:48:43 testing ddclient[2198]: SENDING: Host: checkip.dyndns.org
Nov 26 07:48:43 testing ddclient[2198]: SENDING: User-Agent: ddclient/3.9.1
Nov 26 07:48:43 testing ddclient[2198]: SENDING: Connection: close
Nov 26 07:48:43 testing ddclient[2198]: SENDING:
Nov 26 07:48:43 testing ddclient[2198]: SENDING:
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: HTTP/1.1 200 OK#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Date: Fri, 26 Nov 2021 06:48:43 GMT#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Content-Type: text/html#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Content-Length: 104#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Connection: close#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Cache-Control: no-cache#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Pragma: no-cache#015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: #015
Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: <html><head><title>Current IP Check</title></head><body>Current IP Address: 123.456.78.90</body></html>#015
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS: database.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS: jenkins.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:43 testing ddclient[2198]: SUCCESS: monitors.testing.com: skipped: IP address was already set to 123.456.78.90.
Nov 26 07:48:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded.
Nov 26 07:48:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter.
Nov 26 07:49:11 testing kernel: [ 356.406208] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Nov 26 07:56:51 testing systemd-random-seed[448]: Kernel entropy pool is not initialized yet, waiting until it is.
Nov 26 07:56:51 testing systemd[1]: Starting Flush Journal to Persistent Storage...
Nov 26 07:56:51 testing systemd[1]: [email protected]: Succeeded.
Nov 26 07:56:51 testing systemd[1]: Finished Load Kernel Module drm.
Nov 26 07:56:51 testing kernel: [ 0.000000] Linux version 5.10.0-9-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30)
Nov 26 07:56:51 testing systemd[1]: Finished Coldplug All udev Devices.
Nov 26 07:56:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Nov 26 07:56:51 testing kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-1234-123456789 ro quiet crashkernel=2000M crashkernel=384M-:128M
Nov 26 07:56:51 testing systemd[1]: Finished Create Static Device Nodes in /dev.
Nov 26 07:56:51 testing kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-provided physical RAM map:
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20
Nov 26 07:56:51 testing systemd[1]: Starting Rule-based Manager for Device Events and Files...
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable
Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable
Nov 23 11:05:13 myserver kernel: [ 2.352549] ata_piix 0000:00:1f.2: version 2.13
Nov 23 11:05:13 myserver kernel: [ 3.576528] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Nov 23 11:05:13 myserver kernel: [ 3.723306] sr 1:0:0:0: Attached scsi CD-ROM sr0
Nov 23 11:05:13 myserver kernel: [ 4.093233] PM: Image not found (code -22)
Nov 23 11:05:13 myserver kernel: [ 10.167638] checking generic (d5800000 130000) vs hw (d5800000 800000)
Nov 23 12:37:25 myserver PackageKit: daemon start
Nov 26 07:28:51 myserver kernel: [ 0.002793] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [ 0.002798] e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 26 07:28:51 myserver kernel: [ 0.002814] MTRR default type: uncachable
Nov 26 07:28:51 myserver kernel: [ 0.002815] MTRR fixed ranges enabled:
Nov 26 07:28:51 myserver kernel: [ 0.002817] 00000-9FFFF write-back
Nov 26 07:28:51 myserver kernel: [ 0.002819] A0000-BFFFF uncachable
Nov 26 07:28:51 myserver kernel: [ 0.002820] C0000-CBFFF write-protect
Nov 26 07:28:51 myserver kernel: [ 0.002822] CC000-D3FFF write-back
Nov 26 07:28:51 myserver kernel: [ 0.002823] D4000-EBFFF uncachable
Nov 26 07:28:51 myserver kernel: [ 0.002825] EC000-FFFFF write-protect
Nov 26 07:28:51 myserver kernel: [ 0.002826] MTRR variable ranges enabled:
Nov 26 07:28:51 myserver kernel: [ 0.002829] 0 base 0000000000 mask FF80000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002831] 1 base 0080000000 mask FFC0000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002833] 2 base 0100000000 mask FF00000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002834] 3 base 0200000000 mask FE00000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002836] 4 base 0400000000 mask FC00000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002838] 5 base 0800000000 mask F800000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002840] 6 base 1000000000 mask F800000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002842] 7 base 1800000000 mask FFC0000000 write-back
Nov 26 07:28:51 myserver kernel: [ 0.002843] 8 disabled
Nov 26 07:28:51 myserver kernel: [ 0.002844] 9 disabled
Nov 26 07:28:51 myserver kernel: [ 0.004625] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [ 0.021048] e820: update [mem 0xba378000-0xba37afff] usable ==> reserved
Nov 26 07:28:51 myserver kernel: [ 0.022384] ACPI: Local APIC address 0xfee00000
Nov 26 07:28:51 myserver kernel: [ 0.023393] On node 0 totalpages: 12582912
Nov 26 07:28:51 myserver kernel: [ 0.023395] Normal zone: 196608 pages used for memmap
Nov 26 07:28:51 myserver kernel: [ 0.023396] Normal zone: 12582912 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [ 0.023401] On node 1 totalpages: 12569668
Nov 26 07:28:51 myserver kernel: [ 0.023402] DMA zone: 64 pages used for memmap
Nov 26 07:28:51 myserver kernel: [ 0.023404] DMA zone: 3984 pages, LIFO batch:0
Nov 26 07:28:51 myserver kernel: [ 0.023405] DMA32 zone: 12019 pages used for memmap
Nov 26 07:28:51 myserver kernel: [ 0.023407] DMA32 zone: 769204 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [ 0.023408] Normal zone: 184320 pages used for memmap
Nov 26 07:28:51 myserver kernel: [ 0.023410] Normal zone: 11796480 pages, LIFO batch:63
Nov 26 07:28:51 myserver kernel: [ 0.040393] ACPI: Local APIC address 0xfee00000
Nov 26 07:28:51 myserver kernel: [ 0.040434] ACPI: IRQ0 used by override.
Nov 26 07:28:51 myserver kernel: [ 0.040436] ACPI: IRQ9 used by override.
Nov 26 07:28:51 myserver kernel: [ 0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152
Nov 26 07:28:51 myserver kernel: [ 0.040436] ACPI: IRQ9 used by override.
Nov 26 07:28:51 myserver kernel: [ 0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152
Nov 26 07:28:51 myserver kernel: [ 0.049933] pcpu-alloc: [0] 00 02 04 06 08 10 12 14 [0] 16 18 20 22 -- -- -- --
Nov 26 07:28:51 myserver kernel: [ 0.049950] pcpu-alloc: [1] 01 03 05 07 09 11 13 15 [1] 17 19 21 23 -- -- -- --
Nov 26 07:28:51 myserver kernel: [ 0.950514] PCI: root bus fe: using default resources
Nov 26 07:28:51 myserver kernel: [ 0.950516] PCI: Probing PCI hardware (bus fe)
Nov 26 07:28:51 myserver kernel: [ 0.952145] PCI: root bus ff: using default resources
Nov 26 07:28:51 myserver kernel: [ 0.952146] PCI: Probing PCI hardware (bus ff)
Nov 26 07:28:51 myserver kernel: [ 0.953705] PCI: pci_cache_line_size set to 64 bytes
Nov 26 07:28:51 myserver kernel: [ 0.953817] e820: reserve RAM buffer [mem 0xba378000-0xbbffffff]
Nov 26 07:28:51 myserver kernel: [ 0.953820] e820: reserve RAM buffer [mem 0xbc768000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [ 0.953823] e820: reserve RAM buffer [mem 0xbca67000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [ 0.953826] e820: reserve RAM buffer [mem 0xbcf12000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [ 0.953828] e820: reserve RAM buffer [mem 0xbcf69000-0xbfffffff]
Nov 26 07:28:51 myserver kernel: [ 0.974640] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [ 0.974699] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active)
Nov 26 07:28:51 myserver kernel: [ 0.975062] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active)
Nov 26 07:28:51 myserver kernel: [ 0.975413] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active)
Nov 26 07:28:51 myserver kernel: [ 0.976584] system 00:04: Plug and Play ACPI device, IDs PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [ 0.976656] pnp 00:05: [irq 0 disabled]
Nov 26 07:28:51 myserver kernel: [ 0.976725] system 00:05: Plug and Play ACPI device, IDs IPI0001 PNP0c01 (active)
Nov 26 07:28:51 myserver kernel: [ 0.977664] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active)
Nov 26 07:28:51 myserver kernel: [ 0.977799] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
Nov 26 07:28:51 myserver kernel: [ 1.847200] intel_idle: MWAIT substates: 0x1120
Nov 26 07:28:51 myserver kernel: [ 1.847270] Monitor-Mwait will be used to enter C-1 state
Nov 26 07:28:51 myserver kernel: [ 1.847287] Monitor-Mwait will be used to enter C-3 state
Nov 26 07:28:51 myserver kernel: [ 1.847390] intel_idle: v0.5.1 model 0x2C
Nov 26 07:28:51 myserver kernel: [ 1.848984] intel_idle: Local APIC timer is reliable in all C-states
Nov 26 07:28:51 myserver kernel: [ 2.168571] with arguments:
Nov 26 07:28:51 myserver kernel: [ 2.168572] /init
Nov 26 07:28:51 myserver kernel: [ 2.168573] with environment:
Nov 26 07:28:51 myserver kernel: [ 2.168575] HOME=/
Nov 26 07:28:51 myserver kernel: [ 2.168576] TERM=linux
Nov 26 07:28:51 myserver kernel: [ 2.168577] BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64
Nov 26 07:28:51 myserver kernel: [ 2.168579] crashkernel=384M-:128M
Nov 26 07:28:51 myserver kernel: [ 2.336535] megaraid_sas 0000:04:00.0: BAR:0x1 BAR's base_addr(phys):0x00000000df1bc000 mapped virt_addr:0x(____ptrval____)
Nov 26 07:28:51 myserver kernel: [ 2.348825] libata version 3.00 loaded.
Nov 26 07:28:51 myserver kernel: [ 2.353143] ata_piix 0000:00:1f.2: version 2.13
Nov 26 07:28:51 myserver kernel: [ 3.577545] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Nov 26 07:28:51 myserver kernel: [ 3.697853] sr 1:0:0:0: Attached scsi CD-ROM sr0
Nov 26 07:28:51 myserver kernel: [ 4.107713] PM: Image not found (code -22)
Nov 26 07:28:51 myserver kernel: [ 12.677156] checking generic (d5800000 130000) vs hw (d5800000 800000)
Nov 26 07:43:30 myserver kernel: [ 0.002794] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 26 07:43:30 myserver kernel: [ 0.002799] e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 26 07:43:30 myserver kernel: [ 0.002814] MTRR default type: uncachable
Nov 26 07:43:30 myserver kernel: [ 0.002815] MTRR fixed ranges enabled:
Nov 26 07:43:30 myserver kernel: [ 0.002817] 00000-9FFFF write-back
Nov 26 07:43:30 myserver kernel: [ 0.002819] A0000-BFFFF uncachable
Nov 26 07:43:30 myserver kernel: [ 0.002821] C0000-CBFFF write-protect
Nov 26 07:43:30 myserver kernel: [ 0.002822] CC000-D3FFF write-back
Nov 26 07:43:30 myserver kernel: [ 0.002824] D4000-EBFFF uncachable
Nov 26 07:43:30 myserver kernel: [ 0.002825] EC000-FFFFF write-protect
Nov 26 07:43:30 myserver kernel: [ 0.002827] MTRR variable ranges enabled:
Nov 26 07:43:30 myserver kernel: [ 0.002829] 0 base 0000000000 mask FF80000000 write-back
Nov 26 07:43:30 myserver kernel: [ 0.002831] 1 base 0080000000 mask FFC0000000 write-back
我一直在监控我的 CPU/RAM 使用情况,CPU 温度从未达到超过 34 摄氏度,也从未达到超过 30% 的过载。 RAM 使用量约为 70GB 可供使用...
我不太确定重新启动的原因,希望获得任何帮助,我们将不胜感激!
答案1
机器可能会因 a 崩溃kernel panic
,因此您不会在日志中看到任何内容,因为一旦panic
发生 a ,内核实际上就会崩溃,并且无法再向日志写入任何内容。在崩溃之前内核未同步到磁盘的任何内容都将丢失。
core dump
您应该使用启用内核,一旦触发kdump
a ,该内核会将内存转储写入本地磁盘中的 a 文件。panic
稍后可以在机器启动后使用crash
.
你可以阅读这里有关如何启用内核核心转储的说明。如果它不适合您的发行版,您可能可以找到一些其他文章来解释如何做到这一点。当你的机器崩溃并产生核心转储后,你需要使用crash
它来分析它。一些不错的教程可以在德多梅多。这不一定很容易,但这是找到崩溃线索的唯一方法。在核心转储中,您还可以读取崩溃前未同步到磁盘的日志。