Debian 服务器不断意外重启

Debian 服务器不断意外重启

我实验室的服务器装有 Debian-Wheezy-7.8-Stable,在运行几个小时后不断重启几次,没有任何通知。此服务器设置为进行相当高负载的数值计算以及并行计算。我已打印了和的日志,var/log/messageslast reboot我发现很难理解这些日志消息。我尝试查看重启时间发生之前的条目,并查看同一时间的条目,var/log/messages但似乎来自的条目var/log/messages仅显示重启发生后的日志/消息。

我浏览了一下,发现有些人遇到了同样的问题,但原因似乎各不相同,/var/log/messages似乎是解决问题的关键。var/log/messages关于这个不必要的重启事件,我实际上描述了什么?初学者如何开始学习如何阅读此日志?我的意思是是否有任何重要的关键字可以查找或类似的东西?

感谢您提供任何帮助。

last reboot

reboot   system boot  3.2.0-4-amd64    Wed May 20 03:29 - 12:43  (09:14)
reboot   system boot  3.2.0-4-amd64    Tue May 19 16:01 - 12:43  (20:42)

var/log/messages

May 18 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 19 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 19 16:01:19 labserver kernel: imklog 5.8.11, log source = /proc/kmsg started.
May 19 16:01:19 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2401" x-info="http://www.rsyslog.com"] start
May 19 16:01:19 labserver kernel: [    0.000000] Initializing cgroup subsys cpuset
May 19 16:01:19 labserver kernel: [    0.000000] Initializing cgroup subsys cpu
May 19 16:01:19 labserver kernel: [    0.000000] Linux version 3.2.0-4-amd64 ([email protected]) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.65-1+deb7u2
May 19 16:01:19 labserver kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text
May 19 16:01:19 labserver kernel: [    0.000000] BIOS-provided physical RAM map:
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000000100000 - 000000007df71000 (usable)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007df71000 - 000000007e0f1000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007e0f1000 - 000000007e2ec000 (ACPI NVS)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007e2ec000 - 000000007f367000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007f367000 - 000000007f800000 (ACPI NVS)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000080000000 - 0000000090000000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed40000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000100000000 - 0000000880000000 (usable)
May 19 16:01:19 labserver kernel: [    0.000000] NX (Execute Disable) protection: active
May 19 16:01:19 labserver kernel: [    0.000000] SMBIOS 2.7 present.
May 19 16:01:19 labserver kernel: [    0.000000] No AGP bridge found
May 19 16:01:19 labserver kernel: [    0.000000] last_pfn = 0x880000 max_arch_pfn = 0x400000000
May 19 16:01:19 labserver kernel: [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
May 19 16:01:19 labserver kernel: [    0.000000] last_pfn = 0x7df71 max_arch_pfn = 0x400000000
May 19 16:01:19 labserver kernel: [    0.000000] found SMP MP-table at [ffff8800000fd900] fd900
May 19 16:01:19 labserver kernel: [    0.000000] Using GB pages for direct mapping
May 19 16:01:19 labserver kernel: [    0.000000] init_memory_mapping: 0000000000000000-000000007df71000
May 19 16:01:19 labserver kernel: [    0.000000] init_memory_mapping: 0000000100000000-0000000880000000
May 19 16:01:19 labserver kernel: [    0.000000] RAMDISK: 36bea000 - 375ed000
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: RSDP 00000000000f04a0 00024 (v02 ALASKA)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: XSDT 000000007e204088 0008C (v01 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: FACP 000000007e211040 0010C (v05 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI Warning: FADT (revision 5) is longer than ACPI 2.0 version, truncating length 268 to 244 (20110623/tbfadt-288)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: DSDT 000000007e2041a8 0CE96 (v02 ALASKA    A M I 00000015 INTL 20051117)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: FACS 000000007e2e3080 00040
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: APIC 000000007e211150 00100 (v03 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: FPDT 000000007e211250 00044 (v01 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: MCFG 000000007e211298 0003C (v01 ALASKA OEMMCFG. 01072009 MSFT 00000097)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: HPET 000000007e2112d8 00038 (v01 ALASKA    A M I 01072009 AMI. 00000005)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: PRAD 000000007e211310 000BE (v02 PRADID  PRADTID 00000001 MSFT 03000001)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: SPMI 000000007e2113d0 00040 (v05 A M I   OEMSPMI 00000000 AMI. 00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: SSDT 000000007e211410 D0CB0 (v02  INTEL    CpuPm 00004000 INTL 20051117)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: EINJ 000000007e2e20c0 00130 (v01    AMI AMI EINJ 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: ERST 000000007e2e21f0 00230 (v01  AMIER AMI ERST 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: HEST 000000007e2e2420 000A8 (v01    AMI AMI HEST 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: BERT 000000007e2e24c8 00030 (v01    AMI AMI BERT 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: DMAR 000000007e2e24f8 000C4 (v01 A M I   OEMDMAR 00000001 INTL 00000001)
May 19 16:01:19 labserver kernel: [    0.000000] No NUMA configuration found
May 19 16:01:19 labserver kernel: [    0.000000] Faking a node at 0000000000000000-0000000880000000
May 19 16:01:19 labserver kernel: [    0.000000] Initmem setup node 0 0000000000000000-0000000880000000
May 19 16:01:19 labserver kernel: [    0.000000]   NODE_DATA [000000087fffb000 - 000000087fffffff]
May 19 16:01:19 labserver kernel: [    0.000000] Zone PFN ranges:
May 19 16:01:19 labserver kernel: [    0.000000]   DMA      0x00000010 -> 0x00001000
May 19 16:01:19 labserver kernel: [    0.000000]   DMA32    0x00001000 -> 0x00100000
May 19 16:01:19 labserver kernel: [    0.000000]   Normal   0x00100000 -> 0x00880000
May 19 16:01:19 labserver kernel: [    0.000000] Movable zone start PFN for each node
May 19 16:01:19 labserver kernel: [    0.000000] early_node_map[3] active PFN ranges
May 19 16:01:19 labserver kernel: [    0.000000]     0: 0x00000010 -> 0x0000009a
May 19 16:01:19 labserver kernel: [    0.000000]     0: 0x00000100 -> 0x0007df71
May 19 16:01:19 labserver kernel: [    0.000000]     0: 0x00100000 -> 0x00880000
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: PM-Timer IO Port: 0x408
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
May 19 16:01:19 labserver kernel: [    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24])
May 19 16:01:19 labserver kernel: [    0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
May 19 16:01:19 labserver kernel: [    0.000000] Using ACPI (MADT) for SMP configuration information
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
May 19 16:01:19 labserver kernel: [    0.000000] SMP: Allowing 12 CPUs, 0 hotplug CPUs
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000000009a000 - 000000000009b000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000000009b000 - 00000000000a0000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007df71000 - 000000007e0f1000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007e0f1000 - 000000007e2ec000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007e2ec000 - 000000007f367000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007f367000 - 000000007f800000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007f800000 - 0000000080000000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 0000000080000000 - 0000000090000000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 0000000090000000 - 00000000fed1c000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000fed1c000 - 00000000fed40000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000fed40000 - 00000000ff000000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
May 19 16:01:19 labserver kernel: [    0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
May 19 16:01:19 labserver kernel: [    0.000000] Booting paravirtualized kernel on bare hardware
May 19 16:01:19 labserver kernel: [    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:12 nr_node_ids:1
May 19 16:01:19 labserver kernel: [    0.000000] PERCPU: Embedded 27 pages/cpu @ffff88087fc00000 s78848 r8192 d23552 u131072
May 19 16:01:19 labserver kernel: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 8258294
May 19 16:01:19 labserver kernel: [    0.000000] Policy zone: Normal
May 19 16:01:19 labserver kernel: [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text
May 19 16:01:19 labserver kernel: [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
May 19 16:01:19 labserver kernel: [    0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
May 19 16:01:19 labserver kernel: [    0.000000] Checking aperture...
May 19 16:01:19 labserver kernel: [    0.000000] No AGP bridge found
May 19 16:01:19 labserver kernel: [    0.000000] Memory: 32975732k/35651584k available (3434k kernel code, 2130964k absent, 544888k reserved, 3305k data, 576k init)
May 19 16:01:19 labserver kernel: [    0.000000] Hierarchical RCU implementation.
May 19 16:01:19 labserver kernel: [    0.000000]    RCU dyntick-idle grace-period acceleration is enabled.
May 19 16:01:19 labserver kernel: [    0.000000] NR_IRQS:33024 nr_irqs:1184 16
May 19 16:01:19 labserver kernel: [    0.000000] Extended CMOS year: 2000
May 19 16:01:19 labserver kernel: [    0.000000] Console: colour VGA+ 80x25
May 19 16:01:19 labserver kernel: [    0.000000] console [tty0] enabled
May 19 16:01:19 labserver kernel: [    0.000000] Fast TSC calibration using PIT
May 19 16:01:19 labserver kernel: [    0.004000] Detected 2100.074 MHz processor.
May 19 16:01:19 labserver kernel: [    0.000003] Calibrating delay loop (skipped), value calculated using timer frequency.. 4200.14 BogoMIPS (lpj=8400296)
May 19 16:01:19 labserver kernel: [    0.000144] pid_max: default: 32768 minimum: 301
May 19 16:01:19 labserver kernel: [    0.000253] Security Framework initialized
May 19 16:01:19 labserver kernel: [    0.000324] AppArmor: AppArmor disabled by boot time parameter
May 19 16:01:19 labserver kernel: [    0.002355] Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes)
May 19 16:01:19 labserver kernel: [    0.011585] Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes)
May 19 16:01:19 labserver kernel: [    0.015724] Mount-cache hash table entries: 256
May 19 16:01:19 labserver kernel: [    0.015915] Initializing cgroup subsys cpuacct
May 19 16:01:19 labserver kernel: [    0.015986] Initializing cgroup subsys memory
May 19 16:01:19 labserver kernel: [    0.016063] Initializing cgroup subsys devices
May 19 16:01:19 labserver kernel: [    0.016133] Initializing cgroup subsys freezer
May 19 16:01:19 labserver kernel: [    0.016201] Initializing cgroup subsys net_cls
May 19 16:01:19 labserver kernel: [    0.016270] Initializing cgroup subsys blkio
May 19 16:01:19 labserver kernel: [    0.016344] Initializing cgroup subsys perf_event
May 19 16:01:19 labserver kernel: [    0.016441] CPU: Physical Processor ID: 0
May 19 16:01:19 labserver kernel: [    0.016509] CPU: Processor Core ID: 0
May 19 16:01:19 labserver kernel: [    0.017564] mce: CPU supports 23 MCE banks
May 19 16:01:19 labserver kernel: [    0.017670] CPU0: Thermal monitoring enabled (TM1)
May 19 16:01:19 labserver kernel: [    0.017768] using mwait in idle threads.
May 19 16:01:19 labserver kernel: [    0.018315] ACPI: Core revision 20110623
May 19 16:01:19 labserver kernel: [    0.049889] DMAR: Host address width 46
May 19 16:01:19 labserver kernel: [    0.049958] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
May 19 16:01:19 labserver kernel: [    0.050034] IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de
May 19 16:01:19 labserver kernel: [    0.050122] DMAR: RMRR base: 0x0000007f239000 end: 0x0000007f247fff
May 19 16:01:19 labserver kernel: [    0.050195] DMAR: ATSR flags: 0x0
May 19 16:01:19 labserver kernel: [    0.050261] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0
May 19 16:01:19 labserver kernel: [    0.050427] IOAPIC id 0 under DRHD base  0xfbffc000 IOMMU 0
May 19 16:01:19 labserver kernel: [    0.050497] IOAPIC id 2 under DRHD base  0xfbffc000 IOMMU 0
May 19 16:01:19 labserver kernel: [    0.050568] HPET id 0 under DRHD base 0xfbffc000
May 19 16:01:19 labserver kernel: [    0.050741] Enabled IRQ remapping in x2apic mode
May 19 16:01:19 labserver kernel: [    0.050810] Enabling x2apic
May 19 16:01:19 labserver kernel: [    0.050875] Enabled x2apic
May 19 16:01:19 labserver kernel: [    0.050943] Switched APIC routing to cluster x2apic.
May 19 16:01:19 labserver kernel: [    0.051552] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
May 19 16:01:19 labserver kernel: [    0.091256] CPU0: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping 04
May 19 16:01:19 labserver kernel: [    0.195570] Performance Events: PEBS fmt1+, generic architected perfmon, Intel PMU driver.
May 19 16:01:19 labserver kernel: [    0.195802] ... version:                3
May 19 16:01:19 labserver kernel: [    0.195869] ... bit width:              48
May 19 16:01:19 labserver kernel: [    0.195936] ... generic registers:      4
May 19 16:01:19 labserver kernel: [    0.196003] ... value mask:             0000ffffffffffff
May 19 16:01:19 labserver kernel: [    0.196073] ... max period:             000000007fffffff
May 19 16:01:19 labserver kernel: [    0.196143] ... fixed-purpose events:   3
May 19 16:01:19 labserver kernel: [    0.196210] ... event mask:             000000070000000f
May 19 16:01:19 labserver kernel: [    0.196468] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.196637] Booting Node   0, Processors  #1
May 19 16:01:19 labserver kernel: [    0.312587] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.312765]  #2
May 19 16:01:19 labserver kernel: [    0.424400] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.424578]  #3
May 19 16:01:19 labserver kernel: [    0.536316] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.536489]  #4
May 19 16:01:19 labserver kernel: [    0.648124] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.648303]  #5
May 19 16:01:19 labserver kernel: [    0.759941] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.760115]  #6
May 19 16:01:19 labserver kernel: [    0.871864] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.872050]  #7
May 19 16:01:19 labserver kernel: [    0.983690] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.983866]  #8
May 19 16:01:19 labserver kernel: [    1.095600] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.095774]  #9
May 19 16:01:19 labserver kernel: [    1.207414] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.207589]  #10
May 19 16:01:19 labserver kernel: [    1.319223] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.319400]  #11 Ok.
May 19 16:01:19 labserver kernel: [    1.431095] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.431192] Brought up 12 CPUs
May 19 16:01:19 labserver kernel: [    1.431260] Total of 12 processors activated (50398.84 BogoMIPS).
May 19 16:01:19 labserver kernel: [    1.450786] devtmpfs: initialized
May 19 16:01:19 labserver kernel: [    1.455360] PM: Registering ACPI NVS region at 7e0f1000 (2076672 bytes)
May 19 16:01:19 labserver kernel: [    1.455494] PM: Registering ACPI NVS region at 7f367000 (4820992 bytes)
May 19 16:01:19 labserver kernel: [    1.455843] print_constraints: dummy: 
May 19 16:01:19 labserver kernel: [    1.455977] NET: Registered protocol family 16
May 19 16:01:19 labserver kernel: [    1.456140] ACPI: bus type pci registered
May 19 16:01:19 labserver kernel: [    1.456268] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
May 19 16:01:19 labserver kernel: [    1.456361] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
May 19 16:01:19 labserver kernel: [    1.466673] PCI: Using configuration type 1 for base access
May 19 16:01:19 labserver kernel: [    1.468173] bio: create slab <bio-0> at 0
May 19 16:01:19 labserver kernel: [    1.468353] ACPI: Added _OSI(Module Device)
May 19 16:01:19 labserver kernel: [    1.468422] ACPI: Added _OSI(Processor Device)
May 19 16:01:19 labserver kernel: [    1.468491] ACPI: Added _OSI(3.0 _SCP Extensions)
May 19 16:01:19 labserver kernel: [    1.468560] ACPI: Added _OSI(Processor Aggregator Device)
May 19 16:01:19 labserver kernel: [    1.484562] ACPI: Executed 1 blocks of module-level executable AML code
May 19 16:01:19 labserver kernel: [    1.727818] ACPI: Interpreter enabled
May 19 16:01:19 labserver kernel: [    1.727891] ACPI: (supports S0 S1 S4 S5)
May 19 16:01:19 labserver kernel: [    1.728159] ACPI: Using IOAPIC for interrupt routing
May 19 16:01:19 labserver kernel: [    1.736531] ACPI: No dock devices found.
May 19 16:01:19 labserver kernel: [    1.736630] HEST: Table parsing has been initialized.
May 19 16:01:19 labserver kernel: [    1.736704] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
May 19 16:01:19 labserver kernel: [    1.737041] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
May 19 16:01:19 labserver kernel: [    1.737361] pci_root PNP0A08:00: host bridge window [io  0x0000-0x03af]
May 19 16:01:19 labserver kernel: [    1.737435] pci_root PNP0A08:00: host bridge window [io  0x03e0-0x0cf7]
May 19 16:01:19 labserver kernel: [    1.737508] pci_root PNP0A08:00: host bridge window [io  0x03b0-0x03df]
May 19 16:01:19 labserver kernel: [    1.737586] pci_root PNP0A08:00: host bridge window [io  0x0d00-0xffff]
May 19 16:01:19 labserver kernel: [    1.737659] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
May 19 16:01:19 labserver kernel: [    1.737747] pci_root PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff]
May 19 16:01:19 labserver kernel: [    1.737834] pci_root PNP0A08:00: host bridge window [mem 0xfed0e000-0xfed0ffff]
May 19 16:01:19 labserver kernel: [    1.737922] pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xfbffffff]
May 19 16:01:19 labserver kernel: [    1.740791] pci 0000:00:01.0: PCI bridge to [bus 01-01]
May 19 16:01:19 labserver kernel: [    1.745575] pci 0000:00:01.1: PCI bridge to [bus 02-03]
May 19 16:01:19 labserver kernel: [    1.745700] pci 0000:00:02.0: PCI bridge to [bus 04-04]
May 19 16:01:19 labserver kernel: [    1.745816] pci 0000:00:03.0: PCI bridge to [bus 05-05]
May 19 16:01:19 labserver kernel: [    1.745933] pci 0000:00:03.2: PCI bridge to [bus 06-06]
May 19 16:01:19 labserver kernel: [    1.746285] pci 0000:00:11.0: PCI bridge to [bus 07-07]
May 19 16:01:19 labserver kernel: [    1.746541] pci 0000:00:1e.0: PCI bridge to [bus 08-08] (subtractive decode)
May 19 16:01:19 labserver kernel: [    1.747170]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May 19 16:01:19 labserver kernel: [    1.747465]  pci0000:00: ACPI _OSC control (0x15) granted
May 19 16:01:19 labserver kernel: [    1.756901] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus ff])
May 19 16:01:19 labserver kernel: [    1.758443]  pci0000:ff: Requesting ACPI _OSC control (0x1d)
May 19 16:01:19 labserver kernel: [    1.758528]  pci0000:ff: ACPI _OSC control (0x1d) granted
May 19 16:01:19 labserver kernel: [    1.759439] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.760105] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.760768] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 10 11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.761383] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.762006] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [    1.762729] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [    1.763450] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [    1.764170] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 10 11 12 14 15)

答案1

您需要提供更多信息,尤其是系统重启前的日志条目。但据我所知,它可能无法提供更多信息。检查其他日志,例如 syslog。

在我看来,最常见的突然重启而没有任何迹象表明到底出了什么问题,通常与硬件有关。否则,内核大多有机会在日志中写入一些内容来提供线索。

突然重启的一些常见原因:

  • 过热,可能是主要原因,了解一下温度,尝试记录下来,服务器是否有可以显示温度的显示器,房间是否冷却正常。也许更换覆盖 CPU 的散热器上的导热化合物。

  • 硬件或驱动程序损坏,使用“lspci”获取列表,例如,损坏的 DIMM 可能导致系统突然挂起和/或重新启动(重新安装 DIMM、CPU 和卡)。我记得一台服务器由于英特尔以太网卡的问题偶尔会重新启动。有时坏磁盘也会导致此类问题,尽管通常它只会导致它挂起而不是重新启动。

  • 不良的 UPS我记得一个电池供电的 UPS 会慢慢出现故障,而故障的迹象之一是每周定期对与其相连的服务器进行电源循环。您可能只是电源循环计划配置错误。

相关内容