这是 (K)Ubuntu 20.04、带有 Plasma 的 KDE、Nvidia 2070。
我仍然可以移动鼠标,但除此之外,点击没有任何反应,并且窗口内容也没有更新。
系统仍在运行。我可以通过 SSH 访问它。
dmesg
:
[27014.327443] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[27014.327446] NVRM: GPU Board Serial Number:
[27014.327450] NVRM: Xid (PCI:0000:09:00): 61, pid=1872, 0cec(3098) 00000000 00000000
[27037.447808] NVRM: Xid (PCI:0000:09:00): 8, pid=1872, Channel 00000013
[27051.943743] show_signal_msg: 14 callbacks suppressed
[27051.943745] GpuWatchdog[4265]: segfault at 0 ip 0000555bbc5e54e0 sp 00007f59eb1b54a0 error 6 in chrome[555bb8279000+734a000]
[27051.943750] Code: 3d 00 58 fb fa be 01 00 00 00 ba 07 00 00 00 e8 16 fa 71 fe 48 8d 3d e8 95 fc fa be 01 00 00 00 ba 03 00 00 00 e8 00 fa 71 fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 e6 bf 96 03 01 80 7d 87 00
[27430.418701] INFO: task CJobMgr::m_Work:9447 blocked for more than 120 seconds.
[27430.418705] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27430.418706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27430.418708] CJobMgr::m_Work D 0 9447 9196 0xa0024082
[27430.418710] Call Trace:
[27430.418716] __schedule+0x2e3/0x740
[27430.418719] ? try_to_wake_up+0x224/0x6a0
[27430.418721] schedule+0x42/0xb0
[27430.418722] schedule_timeout+0x203/0x2f0
[27430.418724] __down+0x82/0xd0
[27430.418726] down+0x47/0x60
[27430.418865] os_acquire_semaphore+0x35/0x40 [nvidia]
[27430.419071] _nv033291rm+0xc/0x20 [nvidia]
[27430.419285] ? _nv034166rm+0xb6/0x170 [nvidia]
[27430.419464] ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27430.419595] ? nvidia_close_callback+0x35/0x190 [nvidia]
[27430.419725] ? nvidia_close+0xe0/0x2e0 [nvidia]
[27430.419856] ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27430.419858] ? __fput+0xcc/0x260
[27430.419860] ? ____fput+0xe/0x10
[27430.419863] ? task_work_run+0x8f/0xb0
[27430.419865] ? do_exit+0x351/0xac0
[27430.419867] ? timerqueue_del+0x24/0x50
[27430.419868] ? do_group_exit+0x47/0xb0
[27430.419871] ? get_signal+0x169/0x890
[27430.419872] ? hrtimer_cancel+0x15/0x20
[27430.419875] ? do_signal+0x34/0x6c0
[27430.419878] ? exit_to_usermode_loop+0xbf/0x160
[27430.419880] ? do_int80_syscall_32+0x106/0x130
[27430.419881] ? entry_INT80_compat+0x85/0x90
[27430.419897] INFO: task slack:10856 blocked for more than 120 seconds.
[27430.419899] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27430.419899] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27430.419900] slack D 0 10856 10855 0x000003a0
[27430.419902] Call Trace:
[27430.419904] __schedule+0x2e3/0x740
[27430.419905] schedule+0x42/0xb0
[27430.419907] schedule_timeout+0x203/0x2f0
[27430.419908] __down+0x82/0xd0
[27430.419910] down+0x47/0x60
[27430.420042] os_acquire_semaphore+0x35/0x40 [nvidia]
[27430.420243] _nv033291rm+0x15/0x20 [nvidia]
[27430.420460] ? _nv034166rm+0xb6/0x170 [nvidia]
[27430.420636] ? _nv034114rm+0x22/0xd0 [nvidia]
[27430.420812] ? _nv000909rm+0x1c9/0x940 [nvidia]
[27430.420983] ? rm_ioctl+0x54/0xb0 [nvidia]
[27430.420986] ? __check_object_size+0x61/0x150
[27430.421117] ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27430.421247] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27430.421249] ? do_vfs_ioctl+0x407/0x670
[27430.421251] ? __audit_syscall_entry+0xdb/0x120
[27430.421252] ? ksys_ioctl+0x67/0x90
[27430.421253] ? __x64_sys_ioctl+0x1a/0x20
[27430.421255] ? do_syscall_64+0x57/0x190
[27430.421257] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27551.250805] INFO: task CJobMgr::m_Work:9447 blocked for more than 241 seconds.
[27551.250808] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27551.250809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27551.250810] CJobMgr::m_Work D 0 9447 9196 0xa0024082
[27551.250813] Call Trace:
[27551.250819] __schedule+0x2e3/0x740
[27551.250822] ? try_to_wake_up+0x224/0x6a0
[27551.250824] schedule+0x42/0xb0
[27551.250825] schedule_timeout+0x203/0x2f0
[27551.250827] __down+0x82/0xd0
[27551.250829] down+0x47/0x60
[27551.250968] os_acquire_semaphore+0x35/0x40 [nvidia]
[27551.251176] _nv033291rm+0xc/0x20 [nvidia]
[27551.251390] ? _nv034166rm+0xb6/0x170 [nvidia]
[27551.251570] ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27551.251700] ? nvidia_close_callback+0x35/0x190 [nvidia]
[27551.251831] ? nvidia_close+0xe0/0x2e0 [nvidia]
[27551.251961] ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27551.251964] ? __fput+0xcc/0x260
[27551.251966] ? ____fput+0xe/0x10
[27551.251969] ? task_work_run+0x8f/0xb0
[27551.251970] ? do_exit+0x351/0xac0
[27551.251973] ? timerqueue_del+0x24/0x50
[27551.251974] ? do_group_exit+0x47/0xb0
[27551.251977] ? get_signal+0x169/0x890
[27551.251978] ? hrtimer_cancel+0x15/0x20
[27551.251981] ? do_signal+0x34/0x6c0
[27551.251984] ? exit_to_usermode_loop+0xbf/0x160
[27551.251986] ? do_int80_syscall_32+0x106/0x130
[27551.251988] ? entry_INT80_compat+0x85/0x90
[27551.252002] INFO: task slack:10856 blocked for more than 241 seconds.
[27551.252003] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27551.252004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27551.252005] slack D 0 10856 10855 0x000003a0
[27551.252007] Call Trace:
[27551.252009] __schedule+0x2e3/0x740
[27551.252010] schedule+0x42/0xb0
[27551.252012] schedule_timeout+0x203/0x2f0
[27551.252013] __down+0x82/0xd0
[27551.252015] down+0x47/0x60
[27551.252147] os_acquire_semaphore+0x35/0x40 [nvidia]
[27551.252348] _nv033291rm+0x15/0x20 [nvidia]
[27551.252561] ? _nv034166rm+0xb6/0x170 [nvidia]
[27551.252738] ? _nv034114rm+0x22/0xd0 [nvidia]
[27551.252914] ? _nv000909rm+0x1c9/0x940 [nvidia]
[27551.253085] ? rm_ioctl+0x54/0xb0 [nvidia]
[27551.253088] ? __check_object_size+0x61/0x150
[27551.253219] ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27551.253349] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27551.253351] ? do_vfs_ioctl+0x407/0x670
[27551.253353] ? __audit_syscall_entry+0xdb/0x120
[27551.253354] ? ksys_ioctl+0x67/0x90
[27551.253356] ? __x64_sys_ioctl+0x1a/0x20
[27551.253357] ? do_syscall_64+0x57/0x190
[27551.253359] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27672.082839] INFO: task CJobMgr::m_Work:9447 blocked for more than 362 seconds.
[27672.082843] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27672.082844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27672.082845] CJobMgr::m_Work D 0 9447 9196 0xa0024082
[27672.082847] Call Trace:
[27672.082853] __schedule+0x2e3/0x740
[27672.082856] ? try_to_wake_up+0x224/0x6a0
[27672.082858] schedule+0x42/0xb0
[27672.082859] schedule_timeout+0x203/0x2f0
[27672.082861] __down+0x82/0xd0
[27672.082863] down+0x47/0x60
[27672.083002] os_acquire_semaphore+0x35/0x40 [nvidia]
[27672.083208] _nv033291rm+0xc/0x20 [nvidia]
[27672.083422] ? _nv034166rm+0xb6/0x170 [nvidia]
[27672.083601] ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27672.083732] ? nvidia_close_callback+0x35/0x190 [nvidia]
[27672.083862] ? nvidia_close+0xe0/0x2e0 [nvidia]
[27672.083992] ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27672.083995] ? __fput+0xcc/0x260
[27672.083997] ? ____fput+0xe/0x10
[27672.083999] ? task_work_run+0x8f/0xb0
[27672.084001] ? do_exit+0x351/0xac0
[27672.084003] ? timerqueue_del+0x24/0x50
[27672.084005] ? do_group_exit+0x47/0xb0
[27672.084007] ? get_signal+0x169/0x890
[27672.084009] ? hrtimer_cancel+0x15/0x20
[27672.084012] ? do_signal+0x34/0x6c0
[27672.084014] ? exit_to_usermode_loop+0xbf/0x160
[27672.084016] ? do_int80_syscall_32+0x106/0x130
[27672.084018] ? entry_INT80_compat+0x85/0x90
[27672.084032] INFO: task slack:10856 blocked for more than 362 seconds.
[27672.084034] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27672.084035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27672.084036] slack D 0 10856 10855 0x000003a0
[27672.084037] Call Trace:
[27672.084039] __schedule+0x2e3/0x740
[27672.084041] schedule+0x42/0xb0
[27672.084042] schedule_timeout+0x203/0x2f0
[27672.084044] __down+0x82/0xd0
[27672.084045] down+0x47/0x60
[27672.084177] os_acquire_semaphore+0x35/0x40 [nvidia]
[27672.084378] _nv033291rm+0x15/0x20 [nvidia]
[27672.084597] ? _nv034166rm+0xb6/0x170 [nvidia]
[27672.084774] ? _nv034114rm+0x22/0xd0 [nvidia]
[27672.084950] ? _nv000909rm+0x1c9/0x940 [nvidia]
[27672.085121] ? rm_ioctl+0x54/0xb0 [nvidia]
[27672.085124] ? __check_object_size+0x61/0x150
[27672.085254] ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27672.085384] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27672.085386] ? do_vfs_ioctl+0x407/0x670
[27672.085388] ? __audit_syscall_entry+0xdb/0x120
[27672.085390] ? ksys_ioctl+0x67/0x90
[27672.085391] ? __x64_sys_ioctl+0x1a/0x20
[27672.085393] ? do_syscall_64+0x57/0x190
[27672.085395] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27792.917541] INFO: task CJobMgr::m_Work:9447 blocked for more than 483 seconds.
[27792.917545] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27792.917546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27792.917547] CJobMgr::m_Work D 0 9447 9196 0xa0024082
[27792.917549] Call Trace:
[27792.917556] __schedule+0x2e3/0x740
[27792.917559] ? try_to_wake_up+0x224/0x6a0
[27792.917560] schedule+0x42/0xb0
[27792.917562] schedule_timeout+0x203/0x2f0
[27792.917564] __down+0x82/0xd0
[27792.917566] down+0x47/0x60
[27792.917704] os_acquire_semaphore+0x35/0x40 [nvidia]
[27792.917912] _nv033291rm+0xc/0x20 [nvidia]
[27792.918126] ? _nv034166rm+0xb6/0x170 [nvidia]
[27792.918305] ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27792.918436] ? nvidia_close_callback+0x35/0x190 [nvidia]
[27792.918567] ? nvidia_close+0xe0/0x2e0 [nvidia]
[27792.918697] ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27792.918700] ? __fput+0xcc/0x260
[27792.918702] ? ____fput+0xe/0x10
[27792.918704] ? task_work_run+0x8f/0xb0
[27792.918706] ? do_exit+0x351/0xac0
[27792.918709] ? timerqueue_del+0x24/0x50
[27792.918710] ? do_group_exit+0x47/0xb0
[27792.918712] ? get_signal+0x169/0x890
[27792.918714] ? hrtimer_cancel+0x15/0x20
[27792.918717] ? do_signal+0x34/0x6c0
[27792.918720] ? exit_to_usermode_loop+0xbf/0x160
[27792.918722] ? do_int80_syscall_32+0x106/0x130
[27792.918724] ? entry_INT80_compat+0x85/0x90
[27792.918738] INFO: task slack:10856 blocked for more than 483 seconds.
[27792.918740] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27792.918741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27792.918742] slack D 0 10856 10855 0x000003a0
[27792.918743] Call Trace:
[27792.918745] __schedule+0x2e3/0x740
[27792.918746] schedule+0x42/0xb0
[27792.918748] schedule_timeout+0x203/0x2f0
[27792.918749] __down+0x82/0xd0
[27792.918751] down+0x47/0x60
[27792.918883] os_acquire_semaphore+0x35/0x40 [nvidia]
[27792.919084] _nv033291rm+0x15/0x20 [nvidia]
[27792.919296] ? _nv034166rm+0xb6/0x170 [nvidia]
[27792.919472] ? _nv034114rm+0x22/0xd0 [nvidia]
[27792.919648] ? _nv000909rm+0x1c9/0x940 [nvidia]
[27792.919820] ? rm_ioctl+0x54/0xb0 [nvidia]
[27792.919823] ? __check_object_size+0x61/0x150
[27792.919953] ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27792.920083] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27792.920085] ? do_vfs_ioctl+0x407/0x670
[27792.920087] ? __audit_syscall_entry+0xdb/0x120
[27792.920089] ? ksys_ioctl+0x67/0x90
[27792.920090] ? __x64_sys_ioctl+0x1a/0x20
[27792.920092] ? do_syscall_64+0x57/0x190
[27792.920093] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27913.752229] INFO: task CJobMgr::m_Work:9447 blocked for more than 604 seconds.
[27913.752233] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27913.752234] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27913.752235] CJobMgr::m_Work D 0 9447 9196 0xa0024082
[27913.752237] Call Trace:
[27913.752243] __schedule+0x2e3/0x740
[27913.752246] ? try_to_wake_up+0x224/0x6a0
[27913.752248] schedule+0x42/0xb0
[27913.752249] schedule_timeout+0x203/0x2f0
[27913.752251] __down+0x82/0xd0
[27913.752253] down+0x47/0x60
[27913.752397] os_acquire_semaphore+0x35/0x40 [nvidia]
[27913.752606] _nv033291rm+0xc/0x20 [nvidia]
[27913.752831] ? _nv034166rm+0xb6/0x170 [nvidia]
[27913.753011] ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27913.753143] ? nvidia_close_callback+0x35/0x190 [nvidia]
[27913.753274] ? nvidia_close+0xe0/0x2e0 [nvidia]
[27913.753404] ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27913.753407] ? __fput+0xcc/0x260
[27913.753409] ? ____fput+0xe/0x10
[27913.753411] ? task_work_run+0x8f/0xb0
[27913.753413] ? do_exit+0x351/0xac0
[27913.753415] ? timerqueue_del+0x24/0x50
[27913.753417] ? do_group_exit+0x47/0xb0
[27913.753419] ? get_signal+0x169/0x890
[27913.753421] ? hrtimer_cancel+0x15/0x20
[27913.753424] ? do_signal+0x34/0x6c0
[27913.753427] ? exit_to_usermode_loop+0xbf/0x160
[27913.753429] ? do_int80_syscall_32+0x106/0x130
[27913.753430] ? entry_INT80_compat+0x85/0x90
[27913.753446] INFO: task slack:10856 blocked for more than 604 seconds.
[27913.753447] Tainted: P O 5.4.0-29-generic #33-Ubuntu
[27913.753448] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27913.753449] slack D 0 10856 10855 0x000003a0
[27913.753451] Call Trace:
[27913.753452] __schedule+0x2e3/0x740
[27913.753454] schedule+0x42/0xb0
[27913.753455] schedule_timeout+0x203/0x2f0
[27913.753457] __down+0x82/0xd0
[27913.753459] down+0x47/0x60
[27913.753590] os_acquire_semaphore+0x35/0x40 [nvidia]
[27913.753791] _nv033291rm+0x15/0x20 [nvidia]
[27913.754003] ? _nv034166rm+0xb6/0x170 [nvidia]
[27913.754180] ? _nv034114rm+0x22/0xd0 [nvidia]
[27913.754355] ? _nv000909rm+0x1c9/0x940 [nvidia]
[27913.754526] ? rm_ioctl+0x54/0xb0 [nvidia]
[27913.754530] ? __check_object_size+0x61/0x150
[27913.754660] ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27913.754790] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27913.754792] ? do_vfs_ioctl+0x407/0x670
[27913.754794] ? __audit_syscall_entry+0xdb/0x120
[27913.754796] ? ksys_ioctl+0x67/0x90
[27913.754797] ? __x64_sys_ioctl+0x1a/0x20
[27913.754799] ? do_syscall_64+0x57/0x190
[27913.754800] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
这可能与我最近进行的内核更新有关(因为这是我第一次看到这种冻结)。来自/var/log/apt/history.log
:
...
Start-Date: 2020-05-01 16:49:34
Requested-By: kubuntu (999)
Install: ... linux-image-5.4.0-28-generic:amd64 (5.4.0-28.32, automatic), ...
Upgrade: linux-headers-generic:amd64 (5.4.0.26.32, 5.4.0.28.33), linux-image-generic:amd64 (5.4.0.26.32, 5.4.0.28.33), linux-modules-nvidia-4
40-generic-hwe-20.04:amd64 (5.4.0-26.30+2, 5.4.0-28.32), linux-generic:amd64 (5.4.0.26.32, 5.4.0.28.33)
End-Date: 2020-05-01 16:50:23
...
Start-Date: 2020-05-05 11:24:27
Commandline: packagekit role='update-packages'
Requested-By: az (1000)
Install: linux-image-5.4.0-29-generic:amd64 (5.4.0-29.33), linux-modules-extra-5.4.0-29-generic:amd64 (5.4.0-29.33), linux-headers-5.4.0-29-g
eneric:amd64 (5.4.0-29.33), linux-modules-nvidia-440-5.4.0-29-generic:amd64 (5.4.0-29.33), linux-modules-5.4.0-29-generic:amd64 (5.4.0-29.33)
, linux-headers-5.4.0-29:amd64 (5.4.0-29.33)
Upgrade: update-manager-core:amd64 (1:20.04.9, 1:20.04.10), linux-headers-generic:amd64 (5.4.0.28.33, 5.4.0.29.34), linux-libc-dev:amd64 (5.4
.0-28.32, 5.4.0-29.33), linux-image-generic:amd64 (5.4.0.28.33, 5.4.0.29.34), python3-update-manager:amd64 (1:20.04.9, 1:20.04.10), linux-mod
ules-nvidia-440-generic-hwe-20.04:amd64 (5.4.0-28.32, 5.4.0-29.33), linux-generic:amd64 (5.4.0.28.33, 5.4.0.29.34)
End-Date: 2020-05-05 11:25:08
...
Start-Date: 2020-05-06 09:18:30
Commandline: /usr/bin/unattended-upgrade
Remove: linux-modules-nvidia-440-5.4.0-26-generic:amd64 (5.4.0-26.30+2), linux-image-5.4.0-26-generic:amd64 (5.4.0-26.30)
End-Date: 2020-05-06 09:18:36
...
当前正在运行的内核是5.4.0-29-generic
。当我查看 时htop
,我看到的Xorg
是顶部的。
Launchpad 错误 #1861294(Gpu 看门狗段错误和 optiplex 7060 intel gpu 上的视频+kbd+鼠标冻结)看起来很相似。虽然第一份报告是针对英特尔 GPU 的。相似之处在于GpuWatchdog
段错误,有趣的是还提到了 Slack,它甚至在我的内核堆栈跟踪中,但不确定这是否相关。
Unix SE 562458(Google Chrome 中的段错误 - 与 Nvidia 卡有关吗?我如何查找?)看起来也颇有关联。
现在,使用较新的内核(Linux az-Desktop2020 5.4.0-33-generic #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
),我再次遇到了另一个奇怪的冻结(或者没有完全冻结,它仍然有响应,但速度太慢以至于无法使用)。Xorg
CPU 使用率为 99%。 我杀死了Xorg
,Ubuntu 自动启动了一个新的 Xorg 实例,该实例再次具有 99% 的 CPU 使用率,但现在屏幕保持黑色(也许我只需要等待几个小时,它就会非常缓慢地返回桌面……)。
dmesg
告诉我这个,这可能有关,但我不确定(这些消息很旧;今天是星期二......)
[So Jun 7 23:55:23 2020] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[So Jun 7 23:55:23 2020] NVRM: GPU Board Serial Number:
[So Jun 7 23:55:23 2020] NVRM: Xid (PCI:0000:09:00): 61, pid=1979, 0cec(3098) 00000000 00000000
[So Jun 7 23:57:27 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[So Jun 7 23:57:52 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[So Jun 7 23:58:18 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun 7 23:58:18 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun 7 23:58:23 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun 7 23:58:23 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun 7 23:58:29 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun 7 23:58:29 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun 7 23:58:34 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun 7 23:58:34 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun 7 23:58:40 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
...
[Mo Jun 8 00:00:02 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[Mo Jun 8 00:00:02 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
我又得到它了。
[Fri Jul 31 23:09:41 2020] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[Fri Jul 31 23:09:41 2020] NVRM: GPU Board Serial Number:
[Fri Jul 31 23:09:41 2020] NVRM: Xid (PCI:0000:09:00): 61, pid=1976, 0cec(3098) 00000000 00000000
[Fri Jul 31 23:10:45 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Fri Jul 31 23:10:49 2020] show_signal_msg: 23 callbacks suppressed
[Fri Jul 31 23:10:49 2020] GpuWatchdog[3313]: segfault at 0 ip 000055c61b34120c sp 00007f86a6987450 error 6 in chrome[55c616bd8000+7a52000]
[Fri Jul 31 23:10:49 2020] Code: 89 de e8 57 83 a4 fe 80 7d c7 00 79 09 48 8b 7d b0 e8 f8 de 6d fe 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 54 f6 ae fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e
[Fri Jul 31 23:11:09 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Fri Jul 31 23:11:23 2020] chrome[419865]: segfault at 4 ip 00007f86a4dcece7 sp 00007ffde03243b8 error 6 in libnvidia-glcore.so.440.100[7f86a3d61000+12e0000]
[Fri Jul 31 23:11:23 2020] Code: 04 01 00 00 44 89 ab 08 01 00 00 44 89 b3 0c 01 00 00 e9 5b ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 8b 44 24 08 83 c2 1a <c7> 46 04 e4 08 04 20 c1 e2 12 89 4e 08 44 89 46 0c 81 ca 00 0e 00
[Fri Jul 31 23:11:31 2020] chrome[419912]: segfault at 4 ip 00007f86a4dcece7 sp 00007ffde03243b8 error 6 in libnvidia-glcore.so.440.100[7f86a3d61000+12e0000]
[Fri Jul 31 23:11:31 2020] Code: 04 01 00 00 44 89 ab 08 01 00 00 44 89 b3 0c 01 00 00 e9 5b ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 8b 44 24 08 83 c2 1a <c7> 46 04 e4 08 04 20 c1 e2 12 89 4e 08 44 89 46 0c 81 ca 00 0e 00
uname:Linux az-Desktop2020 5.4.0-42-generic #46-Ubuntu SMP 星期五 7 月 10 日 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
再次(2021-01-04):
[639970.703933] Xorg: page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodem
ask=(null),cpuset=/,mems_allowed=0
[639970.703937] CPU: 9 PID: 1823 Comm: Xorg Tainted: P OE 5.4.0-58-generic #64-Ubuntu [639970.703938] Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI
), BIOS 1407 04/01/2020
[639970.703938] Call Trace:
...
[639970.703961] nvkms_alloc+0x24/0x60 [nvidia_modeset]
[639970.703969] _nv002653kms+0x16/0x30 [nvidia_modeset]
[639970.703971] WARNING: kernel stack frame pointer at 0000000020ca1a81 in Xorg:1823 has bad value 00000000d5eee3ed ...
[639970.704373] ? _nv037019rm+0xa1/0x190 [nvidia]
[639970.704380] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704386] ? _nv000673kms+0x31/0xe0 [nvidia_modeset]
[639970.704392] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704398] ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[639970.704404] ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
[639970.704410] ? nvkms_ioctl+0xc4/0x100 [nvidia_modeset]
[639970.704477] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[639970.704477] ? do_vfs_ioctl+0x407/0x670
...
[639970.704513] BUG: unable to handle page fault for address: 0000000000007980
[639970.704515] #PF: supervisor read access in kernel mode
[639970.704515] #PF: error_code(0x0000) - not-present page
[639970.704516] PGD 0 P4D 0
[639970.704517] Oops: 0000 [#1] SMP NOPTI
...
[639970.704528] RIP: 0010:_nv002606kms+0x60/0x100 [nvidia_modeset]
...
[639970.704535] Call Trace:
[639970.704543] ? _nv002759kms+0x3ca/0x1470 [nvidia_modeset]
[639970.704544] ? kmalloc_order+0x63/0x80
[639970.704545] ? kmalloc_order_trace+0x24/0xa0
[639970.704614] ? _nv037019rm+0xa1/0x190 [nvidia]
[639970.704621] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704627] ? _nv000673kms+0x31/0xe0 [nvidia_modeset]
[639970.704633] ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704639] ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[639970.704645] ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
[639970.704651] ? nvkms_ioctl+0xc4/0x100 [nvidia_modeset]
[639970.704718] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[639970.704719] ? do_vfs_ioctl+0x407/0x670
...
[640674.801878] GpuWatchdog[525174]: segfault at 0 ip 000055f5be4d1ad9 sp 00007f0b525e7680 error 6 in code[55f5baea1000+57ee000]
[640674.801887] Code: 00 79 09 48 8b 7d c0 e8 25 41 c0 fe c7 45 c0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 c0 48 8d 7d c0 e8 b7 47 9d fc <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
(更详细这里。
再次。桌面大部分(但不是完全)冻结,即我几乎无法移动鼠标,点击时甚至有一些反应,但速度非常慢(真的无法使用)。我仍然可以通过 SSH 访问它(基本上一切都通过 SSH 正常运行)。这是在我从睡眠模式唤醒 PC 后发生的。
日志的相关部分:
[Fri Jan 8 01:43:00 2021] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[Fri Jan 8 01:43:00 2021] NVRM: GPU Board Serial Number:
[Fri Jan 8 01:43:00 2021] NVRM: Xid (PCI:0000:09:00): 31, pid=1924, Ch 00000009, intr 00000000. MMU Fault: ENGINE CE0 HUBCLIENT_HSCE0 faulted @ 0x1_05044000. Fault is of type FAULT_PTE ACCESS_TYPE_VIRT_READ
[Fri Jan 8 01:43:00 2021] spotify[65829]: segfault at 10 ip 00007f0c7a338ca0 sp 00007ffd220aeec0 error 4 in libnvidia-glcore.so.450.80.02[7f0c7934a000+133e000]
[Fri Jan 8 01:43:00 2021] Code: 89 d5 41 54 49 89 fc 55 48 89 f5 53 48 83 ec 08 83 e0 0f 0f 85 f9 00 00 00 40 f6 c5 0f 74 3e e9 a9 00 00 00 66 0f 1f 44 00 00 <0f> 28 4d 10 49 83 ed 40 0f 28 55 20 0f 28 5d 30 0f 28 45 00 48 83
完整日志这里。这次,崩溃似乎发生在spotify
。但这可能是由之前的 NVRM 错误引起的。
我刚刚了解到Nvidia Xid 错误。
此错误报告(nvidia-graphics-drivers-435 包:Nvidia 驱动程序崩溃后桌面冻结 30 秒)聽到有關的。
Pid 1924(来自 Xid 错误 31):
root 1924 1.6 0.6 327484 213620 tty1 Rsl+ Jan07 27:23 /usr/lib/xorg/Xorg -nolisten tcp -auth /var/run/sddm/{0cae4c35-d824-4286-8e07-30b9d710224b} -background none -noreset -displayfd 17 -seat seat0 vt1
(重新启动 Xorg,即sudo kill 1924
似乎已恢复系统。)
uname
:Linux az-Desktop2020 5.4.0-59-generic #65-Ubuntu SMP Thu Dec 10 12:01:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Nvidia 版本:nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 450.80.02 Wed Sep 23 00:48:09 UTC 2020
,
NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0
现在(2021-11-05)有一些新的挂起。dmesg
:
[1190321.012451] INFO: task QSGRenderThread:639492 blocked for more than 120 seconds.
[1190321.012455] Tainted: P W OE 5.4.0-89-generic #100-Ubuntu
[1190321.012456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1190321.012457] QSGRenderThread D 0 639492 2848 0x00004080
[1190321.012460] Call Trace:
[1190321.012466] __schedule+0x2e3/0x740
[1190321.012469] schedule+0x42/0xb0
[1190321.012470] schedule_timeout+0x10e/0x160
[1190321.012473] __down+0x82/0xd0
[1190321.012476] down+0x47/0x60
[1190321.012623] os_acquire_semaphore+0x35/0x40 [nvidia]
[1190321.012839] _nv035261rm+0xc/0x30 [nvidia]
[1190321.013051] ? _nv035253rm+0x15/0x20 [nvidia]
[1190321.013217] ? _nv036090rm+0x18d/0x1c0 [nvidia]
[1190321.013379] ? _nv037747rm+0x45/0xd0 [nvidia]
[1190321.013588] ? _nv037715rm+0xed/0x4e0 [nvidia]
[1190321.013750] ? _nv036080rm+0xbe/0x140 [nvidia]
[1190321.013911] ? _nv036081rm+0x42/0x70 [nvidia]
[1190321.014073] ? _nv000567rm+0x41/0x50 [nvidia]
[1190321.014267] ? _nv000724rm+0x73a/0xa90 [nvidia]
[1190321.014461] ? _nv000724rm+0x38/0xa90 [nvidia]
[1190321.014651] ? rm_ioctl+0x54/0xb0 [nvidia]
[1190321.014804] ? nvidia_ioctl+0x66f/0x880 [nvidia]
[1190321.014949] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[1190321.014952] ? do_vfs_ioctl+0x407/0x670
[1190321.014954] ? __audit_syscall_entry+0xdb/0x120
[1190321.014956] ? ksys_ioctl+0x67/0x90
[1190321.014958] ? __x64_sys_ioctl+0x1a/0x20
[1190321.014960] ? do_syscall_64+0x57/0x190
[1190321.014962] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
使用NVIDIA UNIX x86_64 Kernel Module 470.74
(通过dmesg
)
uname -a
。:Linux az-Desktop2020 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
我认为这里有些 Gentoo 用户也遇到过类似的问题。