软锁定后网络断开

软锁定后网络断开

我在 Ubuntu Server 20.04 上运行一个带有 AMD Athlon 3200G 的系统,最近切换到 Ryzen 7 1800X。 3200G 一切都很好,没有任何问题。

然而,由于我已经切换到新的CPU,每次运行几个小时后网络就开始断开。

所以我重新安装了 Ubuntu Server 20.04 并重新安装了所有服务(主要是一些 docker 容器和 docker 内的反向代理)。但这并没有解决问题,同样的事情又发生了。

在查看journalctl时,我注意到大约在系统开始失败/断开连接时,有几条错误消息提到软锁定和CPU卡住了大约20秒(日志附加在最后)。这些消息大约每 30 秒记录一次。

我不认为自己是标准 Linux 使用的初学者,但我对内核的了解相当有限,不幸的是我无法理解错误消息。

也许有人知道发生了什么事或者可以帮助我破译这些消息,如果有人可以提供帮助,我会非常高兴,提前致谢!

Jul 19 17:21:16 ld-nas kernel: Modules linked in: veth xt_nat xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat aufs quota_v2 quota_tree nls_iso8859_1 dm_multipath scsi_dh_rd>
Jul 19 17:21:16 ld-nas kernel:  glue_helper r8169 i2c_piix4 realtek ahci libahci wmi gpio_amdpt gpio_generic
Jul 19 17:21:16 ld-nas kernel: CPU: 7 PID: 25166 Comm: (imesyncd) Tainted: G             L    5.4.0-77-generic #86-Ubuntu
Jul 19 17:21:16 ld-nas kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M S2H/B450M S2H, BIOS F50 11/27/2019
Jul 19 17:21:16 ld-nas kernel: RIP: 0010:smp_call_function_many+0x208/0x270
Jul 19 17:21:16 ld-nas kernel: Code: 92 00 3b 05 de d2 70 01 89 c7 0f 83 9b fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 80 89 64 98 8b 41 18 a8 01 74 0a f3 90 8b 51 18 <83> e2 01 75 f6 eb c8 89 cf 48 c7 c2 20 b8 a>
Jul 19 17:21:16 ld-nas kernel: RSP: 0018:ffff9e2643fe7b60 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Jul 19 17:21:16 ld-nas kernel: RAX: 0000000000000003 RBX: ffff892ffe9ebd40 RCX: ffff892ffe8323e0
Jul 19 17:21:16 ld-nas kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000
Jul 19 17:21:16 ld-nas kernel: RBP: ffff9e2643fe7ba0 R08: ffff892ffcc38538 R09: ffff892ffcc38ec0
Jul 19 17:21:16 ld-nas kernel: R10: ffff892ffcc38538 R11: 0000000000000000 R12: ffffffff97281930
Jul 19 17:21:16 ld-nas kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000020
Jul 19 17:21:16 ld-nas kernel: FS:  00007fea61f31980(0000) GS:ffff892ffe9c0000(0000) knlGS:0000000000000000
Jul 19 17:21:16 ld-nas kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 19 17:21:16 ld-nas kernel: CR2: 000055bd11646d18 CR3: 00000003e8150000 CR4: 00000000003406e0
Jul 19 17:21:16 ld-nas kernel: Call Trace:
Jul 19 17:21:16 ld-nas kernel:  ? load_new_mm_cr3+0xf0/0xf0
Jul 19 17:21:16 ld-nas kernel:  on_each_cpu+0x2d/0x60
Jul 19 17:21:16 ld-nas kernel:  flush_tlb_kernel_range+0x38/0x90
Jul 19 17:21:16 ld-nas kernel:  __purge_vmap_area_lazy+0x70/0x6d0
Jul 19 17:21:16 ld-nas kernel:  _vm_unmap_aliases+0xf5/0x130
Jul 19 17:21:16 ld-nas kernel:  vm_unmap_aliases+0x19/0x20
Jul 19 17:21:16 ld-nas kernel:  change_page_attr_set_clr+0xcf/0x200
Jul 19 17:21:16 ld-nas kernel:  set_memory_ro+0x29/0x30
Jul 19 17:21:16 ld-nas kernel:  bpf_int_jit_compile+0x2d1/0x340
Jul 19 17:21:16 ld-nas kernel:  bpf_prog_select_runtime+0xa7/0x130
Jul 19 17:21:16 ld-nas kernel:  bpf_prepare_filter+0x44c/0x4b0
Jul 19 17:21:16 ld-nas kernel:  ? hardlockup_detector_perf_cleanup+0xa0/0xa0
Jul 19 17:21:16 ld-nas kernel:  bpf_prog_create_from_user+0xc7/0x120
Jul 19 17:21:16 ld-nas kernel:  seccomp_set_mode_filter+0x11c/0x740
Jul 19 17:21:16 ld-nas kernel:  do_seccomp+0x39/0x200
Jul 19 17:21:16 ld-nas kernel:  __x64_sys_seccomp+0x1a/0x20
Jul 19 17:21:16 ld-nas kernel:  do_syscall_64+0x57/0x190
Jul 19 17:21:16 ld-nas kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 19 17:21:16 ld-nas kernel: RIP: 0033:0x7fea62dfe89d
Jul 19 17:21:16 ld-nas kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0>
Jul 19 17:21:16 ld-nas kernel: RSP: 002b:00007ffec565caa8 EFLAGS: 00000246 ORIG_RAX: 000000000000013d
Jul 19 17:21:16 ld-nas kernel: RAX: ffffffffffffffda RBX: 000055bd117d1f20 RCX: 00007fea62dfe89d
Jul 19 17:21:16 ld-nas kernel: RDX: 000055bd11794d60 RSI: 0000000000000000 RDI: 0000000000000001
Jul 19 17:21:16 ld-nas kernel: RBP: 000055bd11794d60 R08: 000055bd117d1f20 R09: 00007fea62c73350
Jul 19 17:21:16 ld-nas kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Jul 19 17:21:16 ld-nas kernel: R13: 00007ffec565cad0 R14: 00007fea62c73dd0 R15: 00007ffec565cf50
Jul 19 17:21:43 ld-nas kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 242419 jiffies s: 5425 root: 0x1/.
Jul 19 17:21:43 ld-nas kernel: rcu: blocking rcu_node structures: l=1:0-15:0x80/.
Jul 19 17:21:43 ld-nas kernel: Task dump for CPU 7:
Jul 19 17:21:43 ld-nas kernel: (imesyncd)      R  running task        0 25166      1 0x8000000c
Jul 19 17:21:43 ld-nas kernel: Call Trace:
Jul 19 17:21:43 ld-nas kernel:  ? bpf_int_jit_compile+0x2d1/0x340
Jul 19 17:21:43 ld-nas kernel:  ? bpf_prog_select_runtime+0xa7/0x130
Jul 19 17:21:43 ld-nas kernel:  ? bpf_prepare_filter+0x44c/0x4b0
Jul 19 17:21:43 ld-nas kernel:  ? hardlockup_detector_perf_cleanup+0xa0/0xa0
Jul 19 17:21:43 ld-nas kernel:  ? bpf_prog_create_from_user+0xc7/0x120
Jul 19 17:21:43 ld-nas kernel:  ? seccomp_set_mode_filter+0x11c/0x740
Jul 19 17:21:43 ld-nas kernel:  ? do_seccomp+0x39/0x200
Jul 19 17:21:43 ld-nas kernel:  ? __x64_sys_seccomp+0x1a/0x20
Jul 19 17:21:43 ld-nas kernel:  ? do_syscall_64+0x57/0x190
Jul 19 17:21:43 ld-nas kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 19 17:21:44 ld-nas kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [(imesyncd):25166]
-- Reboot --

如果您需要更多信息(有关系统或更多日志),我很乐意为您提供。


编辑:将我的 BIOS 更新到最新版本后,系统似乎运行更稳定且更长时间没有故障。然而,似乎有一个新问题现在会导致(另一个 CPU)硬锁定。

Jul 21 00:02:36 ld-nas kernel: xhci_hcd 0000:0a:00.3: xHCI host not responding to stop endpoint command.
Jul 21 00:02:36 ld-nas kernel: xhci_hcd 0000:0a:00.3: Host halt failed, -110
Jul 21 00:02:36 ld-nas kernel: xhci_hcd 0000:0a:00.3: xHCI host controller not responding, assume dead
Jul 21 00:02:36 ld-nas kernel: xhci_hcd 0000:0a:00.3: HC died; cleaning up
Jul 21 00:02:36 ld-nas kernel: usb 3-2: USB disconnect, device number 2
Jul 21 00:02:36 ld-nas kernel: usb 4-3: USB disconnect, device number 2
Jul 21 00:02:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16457 PROTO=2
Jul 21 00:03:36 ld-nas systemd-udevd[617]: sdd: Worker [49959] processing SEQNUM=12062 is taking a long time
Jul 21 00:03:36 ld-nas systemd-udevd[617]: hiddev0: Worker [49962] processing SEQNUM=12069 is taking a long time
Jul 21 00:03:36 ld-nas systemd-udevd[617]: 0003:046D:C52B.0001: Worker [49960] processing SEQNUM=12063 is taking a long time
Jul 21 00:03:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16459 PROTO=2
Jul 21 00:04:15 ld-nas systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Jul 21 00:04:15 ld-nas systemd[1]: systemd-logind.service: Killing process 1134 (systemd-logind) with signal SIGABRT.
Jul 21 00:04:31 ld-nas systemd[1]: systemd-resolved.service: Watchdog timeout (limit 3min)!
Jul 21 00:04:31 ld-nas systemd[1]: systemd-resolved.service: Killing process 1073 (systemd-resolve) with signal SIGABRT.
Jul 21 00:04:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16461 PROTO=2
Jul 21 00:05:36 ld-nas systemd-udevd[617]: sdd: Worker [49959] processing SEQNUM=12062 killed
Jul 21 00:05:36 ld-nas systemd-udevd[617]: hiddev0: Worker [49962] processing SEQNUM=12069 killed
Jul 21 00:05:36 ld-nas systemd-udevd[617]: 0003:046D:C52B.0001: Worker [49960] processing SEQNUM=12063 killed
Jul 21 00:05:45 ld-nas systemd[1]: systemd-logind.service: State 'stop-watchdog' timed out. Terminating.
Jul 21 00:05:49 ld-nas systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Jul 21 00:05:49 ld-nas systemd[1]: snapd.service: Killing process 1123 (snapd) with signal SIGABRT.
Jul 21 00:05:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16463 PROTO=2
Jul 21 00:06:02 ld-nas systemd[1]: systemd-resolved.service: State 'stop-watchdog' timed out. Terminating.
Jul 21 00:06:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16465 PROTO=2
Jul 21 00:07:15 ld-nas systemd[1]: systemd-logind.service: State 'stop-sigterm' timed out. Killing.
Jul 21 00:07:15 ld-nas systemd[1]: systemd-logind.service: Killing process 1134 (systemd-logind) with signal SIGKILL.
Jul 21 00:07:19 ld-nas systemd[1]: snapd.service: State 'stop-watchdog' timed out. Terminating.
Jul 21 00:07:32 ld-nas systemd[1]: systemd-resolved.service: State 'stop-sigterm' timed out. Killing.
Jul 21 00:07:32 ld-nas systemd[1]: systemd-resolved.service: Killing process 1073 (systemd-resolve) with signal SIGKILL.
Jul 21 00:07:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16467 PROTO=2
Jul 21 00:08:46 ld-nas systemd[1]: systemd-logind.service: Processes still around after SIGKILL. Ignoring.
Jul 21 00:08:50 ld-nas systemd[1]: snapd.service: State 'stop-sigterm' timed out. Killing.
Jul 21 00:08:50 ld-nas systemd[1]: snapd.service: Killing process 1123 (snapd) with signal SIGKILL.
Jul 21 00:08:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16469 PROTO=2
Jul 21 00:09:02 ld-nas systemd[1]: systemd-resolved.service: Processes still around after SIGKILL. Ignoring.
Jul 21 00:09:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16471 PROTO=2
Jul 21 00:10:16 ld-nas systemd[1]: systemd-logind.service: State 'stop-final-sigterm' timed out. Killing.
Jul 21 00:10:16 ld-nas systemd[1]: systemd-logind.service: Killing process 1134 (systemd-logind) with signal SIGKILL.
Jul 21 00:10:20 ld-nas systemd[1]: snapd.service: Processes still around after SIGKILL. Ignoring.
Jul 21 00:10:32 ld-nas systemd[1]: systemd-resolved.service: State 'stop-final-sigterm' timed out. Killing.
Jul 21 00:10:32 ld-nas systemd[1]: systemd-resolved.service: Killing process 1073 (systemd-resolve) with signal SIGKILL.
Jul 21 00:10:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16474 PROTO=2
Jul 21 00:11:46 ld-nas systemd[1]: systemd-logind.service: Processes still around after final SIGKILL. Entering failed mode.
Jul 21 00:11:46 ld-nas systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Jul 21 00:11:46 ld-nas systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
Jul 21 00:11:46 ld-nas systemd[1]: Stopped Login Service.
Jul 21 00:11:46 ld-nas systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jul 21 00:11:46 ld-nas systemd[1]: systemd-logind.service: Found left-over process 1134 (systemd-logind) in control group while starting unit. Ignoring.
Jul 21 00:11:46 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:11:46 ld-nas systemd[1]: Starting Login Service...
Jul 21 00:11:50 ld-nas systemd[1]: snapd.service: State 'stop-final-sigterm' timed out. Killing.
Jul 21 00:11:50 ld-nas systemd[1]: snapd.service: Killing process 1123 (snapd) with signal SIGKILL.
Jul 21 00:11:53 ld-nas kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=01:00:5e:00:00:01:0c:8e:29:0e:d8:60:08:00 SRC=192.168.2.1 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0xC0 TTL=1 ID=16475 PROTO=2
Jul 21 00:12:03 ld-nas systemd[1]: systemd-resolved.service: Processes still around after final SIGKILL. Entering failed mode.
Jul 21 00:12:03 ld-nas systemd[1]: systemd-resolved.service: Failed with result 'watchdog'.
Jul 21 00:12:03 ld-nas systemd[1]: systemd-resolved.service: Scheduled restart job, restart counter is at 1.
Jul 21 00:12:03 ld-nas systemd[1]: Stopped Network Name Resolution.
Jul 21 00:12:03 ld-nas systemd[1]: systemd-resolved.service: Found left-over process 1073 (systemd-resolve) in control group while starting unit. Ignoring.
Jul 21 00:12:03 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:12:03 ld-nas systemd[1]: Starting Network Name Resolution...
Jul 21 00:13:17 ld-nas systemd[1]: systemd-logind.service: start operation timed out. Terminating.
Jul 21 00:13:20 ld-nas systemd[1]: snapd.service: Processes still around after final SIGKILL. Entering failed mode.
Jul 21 00:13:20 ld-nas systemd[1]: snapd.service: Failed with result 'watchdog'.
Jul 21 00:13:21 ld-nas systemd[1]: snapd.service: Scheduled restart job, restart counter is at 1.
Jul 21 00:13:21 ld-nas systemd[1]: Stopped Snap Daemon.
Jul 21 00:13:21 ld-nas systemd[1]: snapd.service: Found left-over process 1123 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:13:21 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:13:21 ld-nas systemd[1]: Starting Snap Daemon...
Jul 21 00:13:21 ld-nas systemd-udevd[617]: Worker [49959] terminated by signal 9 (KILL)
Jul 21 00:13:21 ld-nas systemd-udevd[617]: sdd: Worker [49959] failed
Jul 21 00:13:33 ld-nas systemd[1]: systemd-resolved.service: start operation timed out. Terminating.
Jul 21 00:14:47 ld-nas systemd[1]: systemd-logind.service: State 'stop-sigterm' timed out. Killing.
Jul 21 00:14:47 ld-nas systemd[1]: systemd-logind.service: Killing process 1134 (systemd-logind) with signal SIGKILL.
Jul 21 00:14:51 ld-nas systemd[1]: snapd.service: start operation timed out. Terminating.
Jul 21 00:14:51 ld-nas systemd[1]: snapd.service: Failed with result 'timeout'.
Jul 21 00:14:51 ld-nas systemd[1]: Failed to start Snap Daemon.
Jul 21 00:14:51 ld-nas systemd[1]: snapd.service: Scheduled restart job, restart counter is at 2.
Jul 21 00:14:51 ld-nas systemd[1]: Stopped Snap Daemon.
Jul 21 00:14:51 ld-nas systemd[1]: snapd.service: Found left-over process 1123 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:14:51 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:14:51 ld-nas systemd[1]: snapd.service: Found left-over process 50071 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:14:51 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:14:51 ld-nas systemd[1]: Starting Snap Daemon...
Jul 21 00:15:03 ld-nas systemd[1]: systemd-resolved.service: State 'stop-sigterm' timed out. Killing.
Jul 21 00:15:03 ld-nas systemd[1]: systemd-resolved.service: Killing process 1073 (systemd-resolve) with signal SIGKILL.
Jul 21 00:16:17 ld-nas systemd[1]: systemd-logind.service: Processes still around after SIGKILL. Ignoring.
Jul 21 00:16:21 ld-nas systemd[1]: snapd.service: start operation timed out. Terminating.
Jul 21 00:16:21 ld-nas systemd[1]: snapd.service: Failed with result 'timeout'.
Jul 21 00:16:21 ld-nas systemd[1]: Failed to start Snap Daemon.
Jul 21 00:16:22 ld-nas systemd[1]: snapd.service: Scheduled restart job, restart counter is at 3.
Jul 21 00:16:22 ld-nas systemd[1]: Stopped Snap Daemon.
Jul 21 00:16:22 ld-nas systemd[1]: snapd.service: Found left-over process 1123 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:16:22 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:16:22 ld-nas systemd[1]: snapd.service: Found left-over process 50079 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:16:22 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:16:22 ld-nas systemd[1]: Starting Snap Daemon...
Jul 21 00:16:34 ld-nas systemd[1]: systemd-resolved.service: Processes still around after SIGKILL. Ignoring.
Jul 21 00:17:47 ld-nas systemd[1]: systemd-logind.service: State 'stop-final-sigterm' timed out. Killing.
Jul 21 00:17:47 ld-nas systemd[1]: systemd-logind.service: Killing process 1134 (systemd-logind) with signal SIGKILL.
Jul 21 00:17:52 ld-nas systemd[1]: snapd.service: start operation timed out. Terminating.
Jul 21 00:17:52 ld-nas systemd[1]: snapd.service: Failed with result 'timeout'.
Jul 21 00:17:52 ld-nas systemd[1]: Failed to start Snap Daemon.
Jul 21 00:17:52 ld-nas systemd[1]: snapd.service: Scheduled restart job, restart counter is at 4.
Jul 21 00:17:52 ld-nas systemd[1]: Stopped Snap Daemon.
Jul 21 00:17:52 ld-nas systemd[1]: snapd.service: Found left-over process 1123 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:17:52 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:17:52 ld-nas systemd[1]: snapd.service: Found left-over process 50087 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:17:52 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:17:52 ld-nas systemd[1]: Starting Snap Daemon...
Jul 21 00:18:04 ld-nas systemd[1]: systemd-resolved.service: State 'stop-final-sigterm' timed out. Killing.
Jul 21 00:18:04 ld-nas systemd[1]: systemd-resolved.service: Killing process 1073 (systemd-resolve) with signal SIGKILL.
Jul 21 00:19:18 ld-nas systemd[1]: systemd-logind.service: Processes still around after final SIGKILL. Entering failed mode.
Jul 21 00:19:18 ld-nas systemd[1]: systemd-logind.service: Failed with result 'timeout'.
Jul 21 00:19:18 ld-nas systemd[1]: Failed to start Login Service.
Jul 21 00:19:18 ld-nas systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 2.
Jul 21 00:19:18 ld-nas systemd[1]: Stopped Login Service.
Jul 21 00:19:18 ld-nas systemd[1]: Condition check resulted in Load Kernel Module drm being skipped.
Jul 21 00:19:18 ld-nas systemd[1]: systemd-logind.service: Found left-over process 1134 (systemd-logind) in control group while starting unit. Ignoring.
Jul 21 00:19:18 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:19:18 ld-nas systemd[1]: Starting Login Service...
Jul 21 00:19:22 ld-nas systemd[1]: snapd.service: start operation timed out. Terminating.
Jul 21 00:19:22 ld-nas systemd[1]: snapd.service: Failed with result 'timeout'.
Jul 21 00:19:22 ld-nas systemd[1]: Failed to start Snap Daemon.
Jul 21 00:19:23 ld-nas systemd[1]: snapd.service: Scheduled restart job, restart counter is at 5.
Jul 21 00:19:23 ld-nas systemd[1]: Stopped Snap Daemon.
Jul 21 00:19:23 ld-nas systemd[1]: snapd.service: Found left-over process 1123 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:19:23 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:19:23 ld-nas systemd[1]: snapd.service: Found left-over process 50095 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:19:23 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:19:23 ld-nas systemd[1]: Starting Snap Daemon...
Jul 21 00:19:23 ld-nas snapd[50100]: AppArmor status: apparmor is enabled and all features are available
Jul 21 00:19:34 ld-nas systemd[1]: systemd-resolved.service: Processes still around after final SIGKILL. Entering failed mode.
Jul 21 00:19:34 ld-nas systemd[1]: systemd-resolved.service: Failed with result 'timeout'.
Jul 21 00:19:34 ld-nas systemd[1]: Failed to start Network Name Resolution.
Jul 21 00:19:34 ld-nas systemd[1]: systemd-resolved.service: Scheduled restart job, restart counter is at 2.
Jul 21 00:19:34 ld-nas systemd[1]: Stopped Network Name Resolution.
Jul 21 00:19:34 ld-nas systemd[1]: systemd-resolved.service: Found left-over process 1073 (systemd-resolve) in control group while starting unit. Ignoring.
Jul 21 00:19:34 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:19:34 ld-nas systemd[1]: Starting Network Name Resolution...
Jul 21 00:20:48 ld-nas systemd[1]: systemd-logind.service: start operation timed out. Terminating.
Jul 21 00:20:53 ld-nas systemd[1]: snapd.service: start operation timed out. Terminating.
Jul 21 00:20:53 ld-nas systemd[1]: snapd.service: Failed with result 'timeout'.
Jul 21 00:20:53 ld-nas systemd[1]: Failed to start Snap Daemon.
Jul 21 00:20:53 ld-nas systemd[1]: snapd.service: Scheduled restart job, restart counter is at 6.
Jul 21 00:20:53 ld-nas systemd[1]: Stopped Snap Daemon.
Jul 21 00:20:53 ld-nas systemd[1]: snapd.service: Found left-over process 1123 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:20:53 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:20:53 ld-nas systemd[1]: snapd.service: Found left-over process 50109 (snapd) in control group while starting unit. Ignoring.
Jul 21 00:20:53 ld-nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 00:20:53 ld-nas systemd[1]: Starting Snap Daemon...
Jul 21 00:20:53 ld-nas snapd[50115]: AppArmor status: apparmor is enabled and all features are available
Jul 21 00:04:29 ld-nas kernel: NMI watchdog: Watchdog detected hard LOCKUP on cpu 12
Jul 21 00:04:29 ld-nas kernel: Modules linked in: veth xt_nat xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat aufs quota_v2 quota_tree nls_iso8859_1 dm_multipath scsi_dh_rd>
Jul 21 00:04:29 ld-nas kernel:  hid_generic usbhid hid nouveau crct10dif_pclmul mxm_wmi crc32_pclmul video ghash_clmulni_intel i2c_algo_bit ttm drm_kms_helper aesni_intel syscopyarea sysfillrect crypto_simd s>
Jul 21 00:04:29 ld-nas kernel: CPU: 12 PID: 50109 Comm: systemd-detect- Not tainted 5.4.0-77-generic #86-Ubuntu
Jul 21 00:04:29 ld-nas kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M S2H/B450M S2H, BIOS F61c 05/10/2021
Jul 21 00:04:29 ld-nas kernel: RIP: 0010:smp_call_function_single+0x9b/0x110
Jul 21 00:04:29 ld-nas kernel: Code: 65 8b 05 90 81 6d 64 a9 00 01 1f 00 75 79 85 c9 75 40 48 c7 c6 c0 bc 02 00 65 48 03 35 46 19 6d 64 8b 46 18 a8 01 74 09 f3 90 <8b> 46 18 a8 01 75 f7 83 4e 18 01 4c 89 c9 4>
Jul 21 00:04:29 ld-nas kernel: RSP: 0018:ffffb4c60448fba0 EFLAGS: 00000202
Jul 21 00:04:29 ld-nas kernel: RAX: 0000000000000001 RBX: 0000010da42e2b19 RCX: 0000000000000000
Jul 21 00:04:29 ld-nas kernel: RDX: 0000000000000000 RSI: ffff8d02eeb2bcc0 RDI: 0000000000000001
Jul 21 00:04:29 ld-nas kernel: RBP: ffffb4c60448fbe8 R08: ffffffff9b846090 R09: 0000000000000000
Jul 21 00:04:29 ld-nas kernel: R10: 0000000000000001 R11: 006f666e69757063 R12: 0000000000000001
Jul 21 00:04:29 ld-nas kernel: R13: 00002b8a84bbd593 R14: 0000000000000001 R15: ffff8d02e3fd5f00
Jul 21 00:04:29 ld-nas kernel: FS:  00007fde6214c980(0000) GS:ffff8d02eeb00000(0000) knlGS:0000000000000000
Jul 21 00:04:29 ld-nas kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 21 00:04:29 ld-nas kernel: CR2: 0000562bbe2c2d98 CR3: 00000002577a6000 CR4: 00000000003406e0
Jul 21 00:04:29 ld-nas kernel: Call Trace:
Jul 21 00:04:29 ld-nas kernel:  ? ktime_get+0x3e/0xa0
Jul 21 00:04:29 ld-nas kernel:  aperfmperf_snapshot_cpu+0x42/0x50
Jul 21 00:04:29 ld-nas kernel:  arch_freq_prepare_all+0x67/0xa0
Jul 21 00:04:29 ld-nas kernel:  cpuinfo_open+0x13/0x30
Jul 21 00:04:29 ld-nas kernel:  proc_reg_open+0x77/0x130
Jul 21 00:04:29 ld-nas kernel:  ? proc_put_link+0x10/0x10
Jul 21 00:04:29 ld-nas kernel:  do_dentry_open+0x143/0x3a0
Jul 21 00:04:29 ld-nas kernel:  vfs_open+0x2d/0x30
Jul 21 00:04:29 ld-nas kernel:  do_last+0x194/0x900
Jul 21 00:04:29 ld-nas kernel:  path_openat+0x8d/0x290
Jul 21 00:04:29 ld-nas kernel:  ? putname+0x4a/0x50
Jul 21 00:04:29 ld-nas kernel:  do_filp_open+0x91/0x100
Jul 21 00:04:29 ld-nas kernel:  ? __alloc_fd+0x46/0x150
Jul 21 00:04:29 ld-nas kernel:  do_sys_open+0x17e/0x290
Jul 21 00:04:29 ld-nas kernel:  __x64_sys_openat+0x20/0x30
Jul 21 00:04:29 ld-nas kernel:  do_syscall_64+0x57/0x190
Jul 21 00:04:29 ld-nas kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 21 00:04:29 ld-nas kernel: RIP: 0033:0x7fde62ff9eab
Jul 21 00:04:29 ld-nas kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4>
Jul 21 00:04:29 ld-nas kernel: RSP: 002b:00007ffeaa449770 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Jul 21 00:04:29 ld-nas kernel: RAX: ffffffffffffffda RBX: 0000562bbe2c12d0 RCX: 00007fde62ff9eab
Jul 21 00:04:29 ld-nas kernel: RDX: 0000000000080000 RSI: 00007fde62e6b227 RDI: 00000000ffffff9c
Jul 21 00:04:29 ld-nas kernel: RBP: 00007fde62e6b227 R08: 0000000000000008 R09: 0000000000000001
Jul 21 00:04:29 ld-nas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080000
Jul 21 00:04:29 ld-nas kernel: R13: 00007fde62e92e21 R14: 00007fde62e6b869 R15: 00007fde62e6b88c

答案1

Ryzen 和 Linux 存在已知问题。你已经通过 ssh 连接到这台机器了吗?如果你谷歌“ryzen linux soft lockup”,就会有数百个关于系统冻结并需要重新启动的线程,但没有一个线程提到间歇性网络连接作为症状。

这个线程解释说添加

processor.max_cstate=5 rcu_nocbs=0-15

更改内核的启动选项可能会解决该问题。

错误报告看起来与您的问题相同。相同的CPU和相同的内核。无论如何,更新你的 BIOS 是一件好事(你的是 2019 年的)。有些人声称这是 CPU 空闲时的电源问题,并建议尝试使用 BIOS 中的任何电源设置可以解决锁定问题。

如果其他方法都失败,最后要尝试的就是与 AMD 联系,看看他们是否会给您 RMA。

相关内容