还有其他人遇到过 Threadripper 2 冻结的情况吗?
华硕 ROG MB 配有来自其批准列表的内存。2290wx CPU。最新的 BIOS。最新的内核 4.18 补丁。
无超频。C6 状态已禁用。rcu_nocbs=0-63 处理器。max_cstate=1 CONFIG_RCU_NOCB_CPU=y CONFIG_RCU_NOCB_CPU_ALL=y ... 这些都无法解决这个问题。
我不确定应该在这里发布哪些日志来提供帮助。faillog 似乎没有任何内容。
我的系统日志中确实反复出现这种情况
[ 846.975579] Not tainted 4.18.0-041800-generic #201808122131
[ 846.975582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 846.975585] systemd-udevd D 0 1050 984 0x80000124
[ 846.975589] Call Trace:
[ 846.975600] __schedule+0x29e/0x840
[ 846.975603] schedule+0x2c/0x80
[ 846.975612] __sev_do_cmd_locked+0x21f/0x290 [ccp]
[ 846.975617] ? wait_woken+0x80/0x80
[ 846.975622] sev_do_cmd+0x2f/0x50 [ccp]
[ 846.975624] ? 0xffffffffc0aba000
[ 846.975629] sev_get_api_version+0x36/0xa0 [ccp]
[ 846.975634] ? sp_get_psp_master_device+0x68/0x80 [ccp]
[ 846.975638] psp_pci_init+0x45/0x230 [ccp]
[ 846.975641] ? kobject_uevent+0xb/0x10
[ 846.975645] ? driver_register+0x9e/0xc0
[ 846.975646] ? 0xffffffffc0aba000
[ 846.975650] sp_mod_init+0x1a/0x1000 [ccp]
[ 846.975654] do_one_initcall+0x4a/0x1c4
[ 846.975656] ? _cond_resched+0x19/0x30
[ 846.975660] ? kmem_cache_alloc_trace+0xb8/0x1d0
[ 846.975663] ? do_init_module+0x27/0x220
[ 846.975665] do_init_module+0x60/0x220
[ 846.975666] load_module+0x149b/0x1830
[ 846.975670] __do_sys_finit_module+0xbd/0x120
[ 846.975671] ? __do_sys_finit_module+0xbd/0x120
[ 846.975674] __x64_sys_finit_module+0x1a/0x20
[ 846.975676] do_syscall_64+0x5a/0x110
[ 846.975677] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 846.975679] RIP: 0033:0x7fa477ba1839
[ 846.975680] Code: Bad RIP value.
[ 846.975688] RSP: 002b:00007ffd8947f588 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 846.975691] RAX: ffffffffffffffda RBX: 0000560a3694b4b0 RCX: 00007fa477ba1839
[ 846.975692] RDX: 0000000000000000 RSI: 00007fa4778800e5 RDI: 0000000000000007
[ 846.975693] RBP: 00007fa4778800e5 R08: 0000000000000000 R09: 00007ffd8947f6a0
[ 846.975694] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000000
[ 846.975694] R13: 0000560a3693b7c0 R14: 0000000000020000 R15: 0000560a3694b4b0
答案1
通过使用 zenstates.py 并关闭 cstate 6,我已经顺利运行了 2 天。