我有一个 Proxmox 集群,其中有两台物理服务器,pve 和 pve2。它们是相同的 Dell R710,具有 96GB 内存和 1TB(RAID-10)。由于某种我尚未确定的原因,pve2 将关闭电源。我已通过 iDRAC 检查硬件日志,没有警报或错误。
我对 Proxmox 还不太熟悉,所以我不知道除了常见的 Linux 位置之外,还可以去哪里查找错误日志,例如系统日志和消息。
以下是重启时我的系统日志的片段:(@ 12 月 30 日 16:54:01)
Dec 30 16:50:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:50:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:50:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:51:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:51:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:51:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:52:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:52:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:52:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:53:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:53:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:53:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:54:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:54:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:54:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:57:42 pve2 dmeventd[492]: dmeventd ready for processing.
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Linux version 5.4.73-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100) ()
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.73-1-pve root=/dev/mapper/pve-root ro quiet
Dec 30 16:57:42 pve2 kernel: [ 0.000000] KERNEL supported cpus:
Dec 30 16:57:42 pve2 systemd-modules-load[483]: Inserted module 'iscsi_tcp'
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Intel GenuineIntel
Dec 30 16:57:42 pve2 kernel: [ 0.000000] AMD AuthenticAMD
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Hygon HygonGenuine
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Centaur CentaurHauls
Dec 30 16:57:42 pve2 kernel: [ 0.000000] zhaoxin Shanghai
Dec 30 16:57:42 pve2 systemd[1]: Starting Flush Journal to Persistent Storage...
Dec 30 16:57:42 pve2 kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-provided physical RAM map:
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf378fff] usable
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf379000-0x00000000bf38efff] reserved
Dec 30 16:57:42 pve2 systemd[1]: Started udev Coldplug all Devices.
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf38f000-0x00000000bf3cdfff] ACPI data
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf3ce000-0x00000000bfffffff] reserved
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
Dec 30 16:57:42 pve2 systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000ffffffff] reserved
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000183fffffff] usable
Dec 30 16:57:42 pve2 kernel: [ 0.000000] NX (Execute Disable) protection: active
Dec 30 16:57:42 pve2 kernel: [ 0.000000] SMBIOS 2.6 present.
Dec 30 16:57:42 pve2 kernel: [ 0.000000] DMI: Dell Inc. PowerEdge R710/0Y7JM4, BIOS 6.3.0 07/24/2012
关于这可能是什么或者我应该在哪里检查,有什么建议吗?
答案1
这可能是硬件故障,最常见的是 RAM 故障。
在您的服务器上运行内存测试来找出答案。