我们已经将我们的服务迁移到新的专用服务器,除了一个非常奇怪的问题外,一切都运行良好。每隔 3-4 周,服务器就会离线一次。我们的调查显示,服务器继续工作,但由于我们没有 VNC 或其他方式连接到它,我们只能通过重新启动它来恢复控制。重新启动后,一切都恢复正常。我们在 syslog 文件中看到 eth0 出现故障,但我们的知识不足以了解原因。
我们非常感谢任何帮助!
包含故障和重启后启动的完整系统日志这里
以下是系统日志中有关故障本身的部分:
Feb 9 15:13:40 m23430 kernel: [1122633.562497] igb 0000:04:00.0 eth0: PCIe link lost
Feb 9 15:13:40 m23430 kernel: [1122633.565241] ------------[ cut here ]------------
Feb 9 15:13:40 m23430 kernel: [1122633.565242] igb: Failed to read reg 0xc030!
Feb 9 15:13:40 m23430 kernel: [1122633.565271] WARNING: CPU: 0 PID: 3998508 at drivers/net/ethernet/intel/igb/igb_main.c:747 igb_rd32.cold+0x3a/0x46 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565283] Modules linked in: cpuid tls xt_recent ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt xt_LOG nf_log_syslog xt_comment ipt_REJECT nf_reject_ipv4 nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter intel_rapl_msr intel_rapl_common nf_tables edac_mce_amd binfmt_misc libcrc32c nfnetlink kvm_amd ipmi_ssif nls_iso8859_1 kvm snd_hda_codec_hdmi crct10dif_pclmul ghash_clmulni_intel aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd drm_vram_helper snd_intel_sdw_acpi cryptd drm_ttm_helper snd_hda_codec rapl ttm snd_hda_core joydev input_leds snd_pci_acp6x snd_hwdep drm_kms_helper wmi_bmof snd_pcm snd_pci_acp5x cec snd_timer rc_core snd snd_rn_pci_acp3x fb_sys_fops ccp soundcore snd_pci_acp3x syscopyarea sysfillrect sysimgblt acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel efi_pstore drm ip_tables x_tables autofs4 hid_generic usbhid cdc_ether usbnet hid mii igb nvme i2c_algo_bit ahci
Feb 9 15:13:40 m23430 kernel: [1122633.565315] crc32_pclmul xhci_pci i2c_piix4 nvme_core libahci xhci_pci_renesas dca wmi video
Feb 9 15:13:40 m23430 kernel: [1122633.565321] CPU: 0 PID: 3998508 Comm: kworker/0:0 Not tainted 5.15.0-87-generic #97-Ubuntu
Feb 9 15:13:40 m23430 kernel: [1122633.565323] Hardware name: primeLine Solutions B650D4U/B650D4U, BIOS 1.10.PL01 09/05/2023
Feb 9 15:13:40 m23430 kernel: [1122633.565325] Workqueue: events igb_watchdog_task [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565331] RIP: 0010:igb_rd32.cold+0x3a/0x46 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565337] Code: c7 c6 fc 95 12 c0 e8 ad 64 a8 de 48 8b bb 30 ff ff ff e8 6b 76 3b de 84 c0 74 16 44 89 ee 48 c7 c7 98 a2 12 c0 e8 17 00 9f de <0f> 0b e9 99 01 fe ff e9 b4 01 fe ff 0f b6 d0 be 00 00 04 00 48 c7
Feb 9 15:13:40 m23430 kernel: [1122633.565338] RSP: 0018:ffffc19163677db8 EFLAGS: 00010286
Feb 9 15:13:40 m23430 kernel: [1122633.565339] RAX: 0000000000000000 RBX: ffff9d9b132f8ed0 RCX: 0000000000000027
Feb 9 15:13:40 m23430 kernel: [1122633.565340] RDX: ffff9db9a8020588 RSI: 0000000000000001 RDI: ffff9db9a8020580
Feb 9 15:13:40 m23430 kernel: [1122633.565341] RBP: ffffc19163677dd0 R08: 0000000000000003 R09: 000000000194d238
Feb 9 15:13:40 m23430 kernel: [1122633.565342] R10: 0000000000ffff10 R11: 000000000000000f R12: 00000000ffffffff
Feb 9 15:13:40 m23430 kernel: [1122633.565343] R13: 000000000000c030 R14: 0000000000000000 R15: ffff9d9b1329a340
Feb 9 15:13:40 m23430 kernel: [1122633.565344] FS: 0000000000000000(0000) GS:ffff9db9a8000000(0000) knlGS:0000000000000000
Feb 9 15:13:40 m23430 kernel: [1122633.565345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 9 15:13:40 m23430 kernel: [1122633.565346] CR2: 000055fc969fd000 CR3: 0000001d00e10000 CR4: 0000000000750ef0
Feb 9 15:13:40 m23430 kernel: [1122633.565347] PKRU: 55555554
Feb 9 15:13:40 m23430 kernel: [1122633.565347] Call Trace:
Feb 9 15:13:40 m23430 kernel: [1122633.565349] <TASK>
Feb 9 15:13:40 m23430 kernel: [1122633.565351] ? show_trace_log_lvl+0x1d6/0x2ea
Feb 9 15:13:40 m23430 kernel: [1122633.565355] ? show_trace_log_lvl+0x1d6/0x2ea
Feb 9 15:13:40 m23430 kernel: [1122633.565358] ? igb_update_stats+0x84/0x880 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565362] ? show_regs.part.0+0x23/0x29
Feb 9 15:13:40 m23430 kernel: [1122633.565364] ? show_regs.cold+0x8/0xd
Feb 9 15:13:40 m23430 kernel: [1122633.565366] ? igb_rd32.cold+0x3a/0x46 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565370] ? __warn+0x8c/0x100
Feb 9 15:13:40 m23430 kernel: [1122633.565373] ? igb_rd32.cold+0x3a/0x46 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565377] ? report_bug+0xa4/0xd0
Feb 9 15:13:40 m23430 kernel: [1122633.565380] ? handle_bug+0x39/0x90
Feb 9 15:13:40 m23430 kernel: [1122633.565383] ? exc_invalid_op+0x19/0x70
Feb 9 15:13:40 m23430 kernel: [1122633.565384] ? asm_exc_invalid_op+0x1b/0x20
Feb 9 15:13:40 m23430 kernel: [1122633.565387] ? igb_rd32.cold+0x3a/0x46 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565391] ? igb_rd32.cold+0x3a/0x46 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565395] igb_update_stats+0x84/0x880 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565399] igb_watchdog_task+0xa8/0x480 [igb]
Feb 9 15:13:40 m23430 kernel: [1122633.565403] process_one_work+0x22b/0x3d0
Feb 9 15:13:40 m23430 kernel: [1122633.565406] worker_thread+0x53/0x420
Feb 9 15:13:40 m23430 kernel: [1122633.565407] ? process_one_work+0x3d0/0x3d0
Feb 9 15:13:40 m23430 kernel: [1122633.565408] kthread+0x12a/0x150
Feb 9 15:13:40 m23430 kernel: [1122633.565410] ? set_kthread_struct+0x50/0x50
Feb 9 15:13:40 m23430 kernel: [1122633.565412] ret_from_fork+0x22/0x30
Feb 9 15:13:40 m23430 kernel: [1122633.565416] </TASK>
Feb 9 15:13:40 m23430 kernel: [1122633.565416] ---[ end trace e9520e7cbe879ab3 ]---
Feb 9 15:13:45 m23430 kernel: [1122637.786639] ------------[ cut here ]------------
Feb 9 15:13:45 m23430 kernel: [1122637.786649] NETDEV WATCHDOG: eth0 (igb): transmit queue 1 timed out
Feb 9 15:13:45 m23430 kernel: [1122637.786661] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280
Feb 9 15:13:45 m23430 kernel: [1122637.786668] Modules linked in: cpuid tls xt_recent ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt xt_LOG nf_log_syslog xt_comment ipt_REJECT nf_reject_ipv4 nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter intel_rapl_msr intel_rapl_common nf_tables edac_mce_amd binfmt_misc libcrc32c nfnetlink kvm_amd ipmi_ssif nls_iso8859_1 kvm snd_hda_codec_hdmi crct10dif_pclmul ghash_clmulni_intel aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd drm_vram_helper snd_intel_sdw_acpi cryptd drm_ttm_helper snd_hda_codec rapl ttm snd_hda_core joydev input_leds snd_pci_acp6x snd_hwdep drm_kms_helper wmi_bmof snd_pcm snd_pci_acp5x cec snd_timer rc_core snd snd_rn_pci_acp3x fb_sys_fops ccp soundcore snd_pci_acp3x syscopyarea sysfillrect sysimgblt acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel efi_pstore drm ip_tables x_tables autofs4 hid_generic usbhid cdc_ether usbnet hid mii igb nvme i2c_algo_bit ahci
Feb 9 15:13:45 m23430 kernel: [1122637.786783] crc32_pclmul xhci_pci i2c_piix4 nvme_core libahci xhci_pci_renesas dca wmi video
Feb 9 15:13:45 m23430 kernel: [1122637.786795] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.15.0-87-generic #97-Ubuntu
Feb 9 15:13:45 m23430 kernel: [1122637.786798] Hardware name: primeLine Solutions B650D4U/B650D4U, BIOS 1.10.PL01 09/05/2023
Feb 9 15:13:45 m23430 kernel: [1122637.786799] RIP: 0010:dev_watchdog+0x277/0x280
Feb 9 15:13:45 m23430 kernel: [1122637.786804] Code: eb 97 48 8b 5d d0 c6 05 0d d8 67 01 01 48 89 df e8 8e 5f f9 ff 44 89 e1 48 89 de 48 c7 c7 80 fc 4d 9f 48 89 c2 e8 b8 d5 19 00 <0f> 0b eb 80 e9 30 6a 23 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56
Feb 9 15:13:45 m23430 kernel: [1122637.786806] RSP: 0018:ffffc19140003e70 EFLAGS: 00010282
Feb 9 15:13:45 m23430 kernel: [1122637.786810] RAX: 0000000000000000 RBX: ffff9d9b132f8000 RCX: 0000000000000000
Feb 9 15:13:45 m23430 kernel: [1122637.786812] RDX: ffff9db9a802cb40 RSI: ffff9db9a8020580 RDI: 0000000000000300
Feb 9 15:13:45 m23430 kernel: [1122637.786813] RBP: ffffc19140003ea8 R08: 0000000000000003 R09: 000000000194e050
Feb 9 15:13:45 m23430 kernel: [1122637.786815] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000001
Feb 9 15:13:45 m23430 kernel: [1122637.786816] R13: ffff9d9b021d3940 R14: 0000000000000008 R15: ffff9d9b132f84c0
Feb 9 15:13:45 m23430 kernel: [1122637.786818] FS: 0000000000000000(0000) GS:ffff9db9a8000000(0000) knlGS:0000000000000000
Feb 9 15:13:45 m23430 kernel: [1122637.786820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 9 15:13:45 m23430 kernel: [1122637.786822] CR2: 000055ffe83dd000 CR3: 0000001d00e10000 CR4: 0000000000750ef0
Feb 9 15:13:45 m23430 kernel: [1122637.786824] PKRU: 55555554
Feb 9 15:13:45 m23430 kernel: [1122637.786825] Call Trace:
Feb 9 15:13:45 m23430 kernel: [1122637.786827] <IRQ>
Feb 9 15:13:45 m23430 kernel: [1122637.786830] ? show_trace_log_lvl+0x1d6/0x2ea
Feb 9 15:13:45 m23430 kernel: [1122637.786834] ? show_trace_log_lvl+0x1d6/0x2ea
Feb 9 15:13:45 m23430 kernel: [1122637.786838] ? call_timer_fn+0x2c/0x120
Feb 9 15:13:45 m23430 kernel: [1122637.786842] ? show_regs.part.0+0x23/0x29
Feb 9 15:13:45 m23430 kernel: [1122637.786845] ? show_regs.cold+0x8/0xd
Feb 9 15:13:45 m23430 kernel: [1122637.786848] ? dev_watchdog+0x277/0x280
Feb 9 15:13:45 m23430 kernel: [1122637.786850] ? __warn+0x8c/0x100
Feb 9 15:13:45 m23430 kernel: [1122637.786853] ? dev_watchdog+0x277/0x280
Feb 9 15:13:45 m23430 kernel: [1122637.786856] ? report_bug+0xa4/0xd0
Feb 9 15:13:45 m23430 kernel: [1122637.786860] ? arch_irq_work_raise+0x3a/0x50
Feb 9 15:13:45 m23430 kernel: [1122637.786864] ? handle_bug+0x39/0x90
Feb 9 15:13:45 m23430 kernel: [1122637.786867] ? exc_invalid_op+0x19/0x70
Feb 9 15:13:45 m23430 kernel: [1122637.786870] ? asm_exc_invalid_op+0x1b/0x20
Feb 9 15:13:45 m23430 kernel: [1122637.786873] ? dev_watchdog+0x277/0x280
Feb 9 15:13:45 m23430 kernel: [1122637.786875] ? pfifo_fast_enqueue+0x160/0x160
Feb 9 15:13:45 m23430 kernel: [1122637.786878] call_timer_fn+0x2c/0x120
Feb 9 15:13:45 m23430 kernel: [1122637.786881] __run_timers.part.0+0x1e3/0x270
Feb 9 15:13:45 m23430 kernel: [1122637.786883] ? ktime_get+0x46/0xc0
Feb 9 15:13:45 m23430 kernel: [1122637.786886] ? native_x2apic_icr_read+0x20/0x20
Feb 9 15:13:45 m23430 kernel: [1122637.786889] ? lapic_next_event+0x20/0x30
Feb 9 15:13:45 m23430 kernel: [1122637.786892] ? clockevents_program_event+0xad/0x130
Feb 9 15:13:45 m23430 kernel: [1122637.786897] run_timer_softirq+0x2a/0x60
Feb 9 15:13:45 m23430 kernel: [1122637.786899] __do_softirq+0xd9/0x2e7
Feb 9 15:13:45 m23430 kernel: [1122637.786902] irq_exit_rcu+0x94/0xc0
Feb 9 15:13:45 m23430 kernel: [1122637.786905] sysvec_apic_timer_interrupt+0x80/0x90
Feb 9 15:13:45 m23430 kernel: [1122637.786908] </IRQ>
Feb 9 15:13:45 m23430 kernel: [1122637.786909] <TASK>
Feb 9 15:13:45 m23430 kernel: [1122637.786910] asm_sysvec_apic_timer_interrupt+0x1b/0x20
Feb 9 15:13:45 m23430 kernel: [1122637.786912] RIP: 0010:cpuidle_enter_state+0xd9/0x620
Feb 9 15:13:45 m23430 kernel: [1122637.786916] Code: 3d cc 69 78 61 e8 97 66 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 d8 73 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00
Feb 9 15:13:45 m23430 kernel: [1122637.786918] RSP: 0018:ffffffff9fc03db8 EFLAGS: 00000246
Feb 9 15:13:45 m23430 kernel: [1122637.786921] RAX: ffff9db9a80314c0 RBX: ffff9d9b12981800 RCX: 0000000000000000
Feb 9 15:13:45 m23430 kernel: [1122637.786922] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
Feb 9 15:13:45 m23430 kernel: [1122637.786924] RBP: ffffffff9fc03e08 R08: 0003fd087a7bd0c4 R09: 0000000000000000
Feb 9 15:13:45 m23430 kernel: [1122637.786925] R10: 0000000000000000 R11: ffffffff9fc03ca8 R12: ffffffff9fee96a0
Feb 9 15:13:45 m23430 kernel: [1122637.786927] R13: 0000000000000003 R14: 0000000000000003 R15: 0003fd087a7bd0c4
Feb 9 15:13:45 m23430 kernel: [1122637.786929] ? cpuidle_enter_state+0xc8/0x620
Feb 9 15:13:45 m23430 kernel: [1122637.786931] ? tick_nohz_stop_tick+0x16a/0x1d0
Feb 9 15:13:45 m23430 kernel: [1122637.786934] cpuidle_enter+0x2e/0x50
Feb 9 15:13:45 m23430 kernel: [1122637.786936] cpuidle_idle_call+0x142/0x1e0
Feb 9 15:13:45 m23430 kernel: [1122637.786939] do_idle+0x83/0xf0
Feb 9 15:13:45 m23430 kernel: [1122637.786941] cpu_startup_entry+0x20/0x30
Feb 9 15:13:45 m23430 kernel: [1122637.786944] rest_init+0xd3/0x100
Feb 9 15:13:45 m23430 kernel: [1122637.786946] ? acpi_enable_subsystem+0x21d/0x229
Feb 9 15:13:45 m23430 kernel: [1122637.786951] arch_call_rest_init+0xe/0x23
Feb 9 15:13:45 m23430 kernel: [1122637.786955] start_kernel+0x4a9/0x4ca
Feb 9 15:13:45 m23430 kernel: [1122637.786957] x86_64_start_reservations+0x24/0x2a
Feb 9 15:13:45 m23430 kernel: [1122637.786960] x86_64_start_kernel+0xfb/0x106
Feb 9 15:13:45 m23430 kernel: [1122637.786963] secondary_startup_64_no_verify+0xc2/0xcb
Feb 9 15:13:45 m23430 kernel: [1122637.786967] </TASK>
Feb 9 15:13:45 m23430 kernel: [1122637.786968] ---[ end trace e9520e7cbe879ab4 ]---
Feb 9 15:13:45 m23430 kernel: [1122637.786985] igb 0000:04:00.0 eth0: Reset adapter
Feb 9 15:13:46 m23430 systemd-networkd[74560]: eth0: Lost carrier
Feb 9 15:13:46 m23430 kernel: [1122638.619616] igb 0000:04:00.0 eth0: Reset adapter
Feb 9 15:13:46 m23430 systemd-networkd[74560]: eth0: DHCPv6 lease lost
Feb 9 15:13:46 m23430 systemd-timesyncd[74660]: No network connectivity, watching for changes.
这是 syslog 启动部分的可疑部分。但我不确定是否与问题有关:
Feb 9 15:52:08 m23430 networkd-dispatcher[731]: No valid path found for iwconfig
Feb 9 15:52:08 m23430 dbus-daemon[721]: [system] Activating via systemd: service name='org.freedesktop.network1' unit='dbus-org.freedesktop.network1.service' requested by ':1.4' (uid=0 pid=771 comm="/usr/bin/networkctl list --no-pager --no-legend " label="unconfined")
Feb 9 15:52:08 m23430 dbus-daemon[721]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.network1.service': Unit dbus-org.freedesktop.network1.service not found.
Feb 9 15:52:08 m23430 networkd-dispatcher[771]: WARNING: systemd-networkd is not running, output will be incomplete.
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: ERROR:Unknown state for interface NetworkctlListState(idx=1, name='lo', type='loopback', operational='n/a', administrative='unmanaged'): n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: Traceback (most recent call last):
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 298, in trigger_all
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: self.handle_state(iface_name,
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 348, in handle_state
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: raise UnknownState(operational_state)
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: UnknownState: n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: ERROR:Unknown state for interface NetworkctlListState(idx=2, name='eth0', type='ether', operational='n/a', administrative='unmanaged'): n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: Traceback (most recent call last):
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 298, in trigger_all
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: self.handle_state(iface_name,
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 348, in handle_state
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: raise UnknownState(operational_state)
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: UnknownState: n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: ERROR:Unknown state for interface NetworkctlListState(idx=3, name='eth1', type='ether', operational='n/a', administrative='unmanaged'): n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: Traceback (most recent call last):
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 298, in trigger_all
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: self.handle_state(iface_name,
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 348, in handle_state
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: raise UnknownState(operational_state)
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: UnknownState: n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: ERROR:Unknown state for interface NetworkctlListState(idx=4, name='usb0', type='ether', operational='n/a', administrative='unmanaged'): n/a
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: Traceback (most recent call last):
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 298, in trigger_all
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: self.handle_state(iface_name,
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: File "/usr/bin/networkd-dispatcher", line 348, in handle_state
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: raise UnknownState(operational_state)
Feb 9 15:52:09 m23430 networkd-dispatcher[731]: UnknownState: n/a