桌面挂断/冻结。GpuWatchdog 段错误。nvidia_frontend_close

桌面挂断/冻结。GpuWatchdog 段错误。nvidia_frontend_close

这是 (K)Ubuntu 20.04、带有 Plasma 的 KDE、Nvidia 2070。

我仍然可以移动鼠标,但除此之外,点击没有任何反应,并且窗口内容也没有更新。

系统仍在运行。我可以通过 SSH 访问它。

dmesg

[27014.327443] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[27014.327446] NVRM: GPU Board Serial Number: 
[27014.327450] NVRM: Xid (PCI:0000:09:00): 61, pid=1872, 0cec(3098) 00000000 00000000
[27037.447808] NVRM: Xid (PCI:0000:09:00): 8, pid=1872, Channel 00000013
[27051.943743] show_signal_msg: 14 callbacks suppressed
[27051.943745] GpuWatchdog[4265]: segfault at 0 ip 0000555bbc5e54e0 sp 00007f59eb1b54a0 error 6 in chrome[555bb8279000+734a000]
[27051.943750] Code: 3d 00 58 fb fa be 01 00 00 00 ba 07 00 00 00 e8 16 fa 71 fe 48 8d 3d e8 95 fc fa be 01 00 00 00 ba 03 00 00 00 e8 00 fa 71 fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 e6 bf 96 03 01 80 7d 87 00
[27430.418701] INFO: task CJobMgr::m_Work:9447 blocked for more than 120 seconds.
[27430.418705]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27430.418706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27430.418708] CJobMgr::m_Work D    0  9447   9196 0xa0024082
[27430.418710] Call Trace:
[27430.418716]  __schedule+0x2e3/0x740
[27430.418719]  ? try_to_wake_up+0x224/0x6a0
[27430.418721]  schedule+0x42/0xb0
[27430.418722]  schedule_timeout+0x203/0x2f0
[27430.418724]  __down+0x82/0xd0
[27430.418726]  down+0x47/0x60
[27430.418865]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27430.419071]  _nv033291rm+0xc/0x20 [nvidia]
[27430.419285]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27430.419464]  ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27430.419595]  ? nvidia_close_callback+0x35/0x190 [nvidia]
[27430.419725]  ? nvidia_close+0xe0/0x2e0 [nvidia]
[27430.419856]  ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27430.419858]  ? __fput+0xcc/0x260
[27430.419860]  ? ____fput+0xe/0x10
[27430.419863]  ? task_work_run+0x8f/0xb0
[27430.419865]  ? do_exit+0x351/0xac0
[27430.419867]  ? timerqueue_del+0x24/0x50
[27430.419868]  ? do_group_exit+0x47/0xb0
[27430.419871]  ? get_signal+0x169/0x890
[27430.419872]  ? hrtimer_cancel+0x15/0x20
[27430.419875]  ? do_signal+0x34/0x6c0
[27430.419878]  ? exit_to_usermode_loop+0xbf/0x160
[27430.419880]  ? do_int80_syscall_32+0x106/0x130
[27430.419881]  ? entry_INT80_compat+0x85/0x90
[27430.419897] INFO: task slack:10856 blocked for more than 120 seconds.
[27430.419899]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27430.419899] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27430.419900] slack           D    0 10856  10855 0x000003a0
[27430.419902] Call Trace:
[27430.419904]  __schedule+0x2e3/0x740
[27430.419905]  schedule+0x42/0xb0
[27430.419907]  schedule_timeout+0x203/0x2f0
[27430.419908]  __down+0x82/0xd0
[27430.419910]  down+0x47/0x60
[27430.420042]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27430.420243]  _nv033291rm+0x15/0x20 [nvidia]
[27430.420460]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27430.420636]  ? _nv034114rm+0x22/0xd0 [nvidia]
[27430.420812]  ? _nv000909rm+0x1c9/0x940 [nvidia]
[27430.420983]  ? rm_ioctl+0x54/0xb0 [nvidia]
[27430.420986]  ? __check_object_size+0x61/0x150
[27430.421117]  ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27430.421247]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27430.421249]  ? do_vfs_ioctl+0x407/0x670
[27430.421251]  ? __audit_syscall_entry+0xdb/0x120
[27430.421252]  ? ksys_ioctl+0x67/0x90
[27430.421253]  ? __x64_sys_ioctl+0x1a/0x20
[27430.421255]  ? do_syscall_64+0x57/0x190
[27430.421257]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27551.250805] INFO: task CJobMgr::m_Work:9447 blocked for more than 241 seconds.
[27551.250808]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27551.250809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27551.250810] CJobMgr::m_Work D    0  9447   9196 0xa0024082
[27551.250813] Call Trace:
[27551.250819]  __schedule+0x2e3/0x740
[27551.250822]  ? try_to_wake_up+0x224/0x6a0
[27551.250824]  schedule+0x42/0xb0
[27551.250825]  schedule_timeout+0x203/0x2f0
[27551.250827]  __down+0x82/0xd0
[27551.250829]  down+0x47/0x60
[27551.250968]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27551.251176]  _nv033291rm+0xc/0x20 [nvidia]
[27551.251390]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27551.251570]  ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27551.251700]  ? nvidia_close_callback+0x35/0x190 [nvidia]
[27551.251831]  ? nvidia_close+0xe0/0x2e0 [nvidia]
[27551.251961]  ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27551.251964]  ? __fput+0xcc/0x260
[27551.251966]  ? ____fput+0xe/0x10
[27551.251969]  ? task_work_run+0x8f/0xb0
[27551.251970]  ? do_exit+0x351/0xac0
[27551.251973]  ? timerqueue_del+0x24/0x50
[27551.251974]  ? do_group_exit+0x47/0xb0
[27551.251977]  ? get_signal+0x169/0x890
[27551.251978]  ? hrtimer_cancel+0x15/0x20
[27551.251981]  ? do_signal+0x34/0x6c0
[27551.251984]  ? exit_to_usermode_loop+0xbf/0x160
[27551.251986]  ? do_int80_syscall_32+0x106/0x130
[27551.251988]  ? entry_INT80_compat+0x85/0x90
[27551.252002] INFO: task slack:10856 blocked for more than 241 seconds.
[27551.252003]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27551.252004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27551.252005] slack           D    0 10856  10855 0x000003a0
[27551.252007] Call Trace:
[27551.252009]  __schedule+0x2e3/0x740
[27551.252010]  schedule+0x42/0xb0
[27551.252012]  schedule_timeout+0x203/0x2f0
[27551.252013]  __down+0x82/0xd0
[27551.252015]  down+0x47/0x60
[27551.252147]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27551.252348]  _nv033291rm+0x15/0x20 [nvidia]
[27551.252561]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27551.252738]  ? _nv034114rm+0x22/0xd0 [nvidia]
[27551.252914]  ? _nv000909rm+0x1c9/0x940 [nvidia]
[27551.253085]  ? rm_ioctl+0x54/0xb0 [nvidia]
[27551.253088]  ? __check_object_size+0x61/0x150
[27551.253219]  ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27551.253349]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27551.253351]  ? do_vfs_ioctl+0x407/0x670
[27551.253353]  ? __audit_syscall_entry+0xdb/0x120
[27551.253354]  ? ksys_ioctl+0x67/0x90
[27551.253356]  ? __x64_sys_ioctl+0x1a/0x20
[27551.253357]  ? do_syscall_64+0x57/0x190
[27551.253359]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27672.082839] INFO: task CJobMgr::m_Work:9447 blocked for more than 362 seconds.
[27672.082843]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27672.082844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27672.082845] CJobMgr::m_Work D    0  9447   9196 0xa0024082
[27672.082847] Call Trace:
[27672.082853]  __schedule+0x2e3/0x740
[27672.082856]  ? try_to_wake_up+0x224/0x6a0
[27672.082858]  schedule+0x42/0xb0
[27672.082859]  schedule_timeout+0x203/0x2f0
[27672.082861]  __down+0x82/0xd0
[27672.082863]  down+0x47/0x60
[27672.083002]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27672.083208]  _nv033291rm+0xc/0x20 [nvidia]
[27672.083422]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27672.083601]  ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27672.083732]  ? nvidia_close_callback+0x35/0x190 [nvidia]
[27672.083862]  ? nvidia_close+0xe0/0x2e0 [nvidia]
[27672.083992]  ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27672.083995]  ? __fput+0xcc/0x260
[27672.083997]  ? ____fput+0xe/0x10
[27672.083999]  ? task_work_run+0x8f/0xb0
[27672.084001]  ? do_exit+0x351/0xac0
[27672.084003]  ? timerqueue_del+0x24/0x50
[27672.084005]  ? do_group_exit+0x47/0xb0
[27672.084007]  ? get_signal+0x169/0x890
[27672.084009]  ? hrtimer_cancel+0x15/0x20
[27672.084012]  ? do_signal+0x34/0x6c0
[27672.084014]  ? exit_to_usermode_loop+0xbf/0x160
[27672.084016]  ? do_int80_syscall_32+0x106/0x130
[27672.084018]  ? entry_INT80_compat+0x85/0x90
[27672.084032] INFO: task slack:10856 blocked for more than 362 seconds.
[27672.084034]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27672.084035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27672.084036] slack           D    0 10856  10855 0x000003a0
[27672.084037] Call Trace:
[27672.084039]  __schedule+0x2e3/0x740
[27672.084041]  schedule+0x42/0xb0
[27672.084042]  schedule_timeout+0x203/0x2f0
[27672.084044]  __down+0x82/0xd0
[27672.084045]  down+0x47/0x60
[27672.084177]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27672.084378]  _nv033291rm+0x15/0x20 [nvidia]
[27672.084597]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27672.084774]  ? _nv034114rm+0x22/0xd0 [nvidia]
[27672.084950]  ? _nv000909rm+0x1c9/0x940 [nvidia]
[27672.085121]  ? rm_ioctl+0x54/0xb0 [nvidia]
[27672.085124]  ? __check_object_size+0x61/0x150
[27672.085254]  ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27672.085384]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27672.085386]  ? do_vfs_ioctl+0x407/0x670
[27672.085388]  ? __audit_syscall_entry+0xdb/0x120
[27672.085390]  ? ksys_ioctl+0x67/0x90
[27672.085391]  ? __x64_sys_ioctl+0x1a/0x20
[27672.085393]  ? do_syscall_64+0x57/0x190
[27672.085395]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27792.917541] INFO: task CJobMgr::m_Work:9447 blocked for more than 483 seconds.
[27792.917545]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27792.917546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27792.917547] CJobMgr::m_Work D    0  9447   9196 0xa0024082
[27792.917549] Call Trace:
[27792.917556]  __schedule+0x2e3/0x740
[27792.917559]  ? try_to_wake_up+0x224/0x6a0
[27792.917560]  schedule+0x42/0xb0
[27792.917562]  schedule_timeout+0x203/0x2f0
[27792.917564]  __down+0x82/0xd0
[27792.917566]  down+0x47/0x60
[27792.917704]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27792.917912]  _nv033291rm+0xc/0x20 [nvidia]
[27792.918126]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27792.918305]  ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27792.918436]  ? nvidia_close_callback+0x35/0x190 [nvidia]
[27792.918567]  ? nvidia_close+0xe0/0x2e0 [nvidia]
[27792.918697]  ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27792.918700]  ? __fput+0xcc/0x260
[27792.918702]  ? ____fput+0xe/0x10
[27792.918704]  ? task_work_run+0x8f/0xb0
[27792.918706]  ? do_exit+0x351/0xac0
[27792.918709]  ? timerqueue_del+0x24/0x50
[27792.918710]  ? do_group_exit+0x47/0xb0
[27792.918712]  ? get_signal+0x169/0x890
[27792.918714]  ? hrtimer_cancel+0x15/0x20
[27792.918717]  ? do_signal+0x34/0x6c0
[27792.918720]  ? exit_to_usermode_loop+0xbf/0x160
[27792.918722]  ? do_int80_syscall_32+0x106/0x130
[27792.918724]  ? entry_INT80_compat+0x85/0x90
[27792.918738] INFO: task slack:10856 blocked for more than 483 seconds.
[27792.918740]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27792.918741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27792.918742] slack           D    0 10856  10855 0x000003a0
[27792.918743] Call Trace:
[27792.918745]  __schedule+0x2e3/0x740
[27792.918746]  schedule+0x42/0xb0
[27792.918748]  schedule_timeout+0x203/0x2f0
[27792.918749]  __down+0x82/0xd0
[27792.918751]  down+0x47/0x60
[27792.918883]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27792.919084]  _nv033291rm+0x15/0x20 [nvidia]
[27792.919296]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27792.919472]  ? _nv034114rm+0x22/0xd0 [nvidia]
[27792.919648]  ? _nv000909rm+0x1c9/0x940 [nvidia]
[27792.919820]  ? rm_ioctl+0x54/0xb0 [nvidia]
[27792.919823]  ? __check_object_size+0x61/0x150
[27792.919953]  ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27792.920083]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27792.920085]  ? do_vfs_ioctl+0x407/0x670
[27792.920087]  ? __audit_syscall_entry+0xdb/0x120
[27792.920089]  ? ksys_ioctl+0x67/0x90
[27792.920090]  ? __x64_sys_ioctl+0x1a/0x20
[27792.920092]  ? do_syscall_64+0x57/0x190
[27792.920093]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[27913.752229] INFO: task CJobMgr::m_Work:9447 blocked for more than 604 seconds.
[27913.752233]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27913.752234] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27913.752235] CJobMgr::m_Work D    0  9447   9196 0xa0024082
[27913.752237] Call Trace:
[27913.752243]  __schedule+0x2e3/0x740
[27913.752246]  ? try_to_wake_up+0x224/0x6a0
[27913.752248]  schedule+0x42/0xb0
[27913.752249]  schedule_timeout+0x203/0x2f0
[27913.752251]  __down+0x82/0xd0
[27913.752253]  down+0x47/0x60
[27913.752397]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27913.752606]  _nv033291rm+0xc/0x20 [nvidia]
[27913.752831]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27913.753011]  ? rm_free_unused_clients+0x6f/0xe0 [nvidia]
[27913.753143]  ? nvidia_close_callback+0x35/0x190 [nvidia]
[27913.753274]  ? nvidia_close+0xe0/0x2e0 [nvidia]
[27913.753404]  ? nvidia_frontend_close+0x2f/0x50 [nvidia]
[27913.753407]  ? __fput+0xcc/0x260
[27913.753409]  ? ____fput+0xe/0x10
[27913.753411]  ? task_work_run+0x8f/0xb0
[27913.753413]  ? do_exit+0x351/0xac0
[27913.753415]  ? timerqueue_del+0x24/0x50
[27913.753417]  ? do_group_exit+0x47/0xb0
[27913.753419]  ? get_signal+0x169/0x890
[27913.753421]  ? hrtimer_cancel+0x15/0x20
[27913.753424]  ? do_signal+0x34/0x6c0
[27913.753427]  ? exit_to_usermode_loop+0xbf/0x160
[27913.753429]  ? do_int80_syscall_32+0x106/0x130
[27913.753430]  ? entry_INT80_compat+0x85/0x90
[27913.753446] INFO: task slack:10856 blocked for more than 604 seconds.
[27913.753447]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[27913.753448] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27913.753449] slack           D    0 10856  10855 0x000003a0
[27913.753451] Call Trace:
[27913.753452]  __schedule+0x2e3/0x740
[27913.753454]  schedule+0x42/0xb0
[27913.753455]  schedule_timeout+0x203/0x2f0
[27913.753457]  __down+0x82/0xd0
[27913.753459]  down+0x47/0x60
[27913.753590]  os_acquire_semaphore+0x35/0x40 [nvidia]
[27913.753791]  _nv033291rm+0x15/0x20 [nvidia]
[27913.754003]  ? _nv034166rm+0xb6/0x170 [nvidia]
[27913.754180]  ? _nv034114rm+0x22/0xd0 [nvidia]
[27913.754355]  ? _nv000909rm+0x1c9/0x940 [nvidia]
[27913.754526]  ? rm_ioctl+0x54/0xb0 [nvidia]
[27913.754530]  ? __check_object_size+0x61/0x150
[27913.754660]  ? nvidia_ioctl+0x5b1/0x8a0 [nvidia]
[27913.754790]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[27913.754792]  ? do_vfs_ioctl+0x407/0x670
[27913.754794]  ? __audit_syscall_entry+0xdb/0x120
[27913.754796]  ? ksys_ioctl+0x67/0x90
[27913.754797]  ? __x64_sys_ioctl+0x1a/0x20
[27913.754799]  ? do_syscall_64+0x57/0x190
[27913.754800]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

这可能与我最近进行的内核更新有关(因为这是我第一次看到这种冻结)。来自/var/log/apt/history.log

...

Start-Date: 2020-05-01  16:49:34
Requested-By: kubuntu (999)
Install: ... linux-image-5.4.0-28-generic:amd64 (5.4.0-28.32, automatic), ...
Upgrade: linux-headers-generic:amd64 (5.4.0.26.32, 5.4.0.28.33), linux-image-generic:amd64 (5.4.0.26.32, 5.4.0.28.33), linux-modules-nvidia-4
40-generic-hwe-20.04:amd64 (5.4.0-26.30+2, 5.4.0-28.32), linux-generic:amd64 (5.4.0.26.32, 5.4.0.28.33)
End-Date: 2020-05-01  16:50:23

...

Start-Date: 2020-05-05  11:24:27
Commandline: packagekit role='update-packages'
Requested-By: az (1000)
Install: linux-image-5.4.0-29-generic:amd64 (5.4.0-29.33), linux-modules-extra-5.4.0-29-generic:amd64 (5.4.0-29.33), linux-headers-5.4.0-29-g
eneric:amd64 (5.4.0-29.33), linux-modules-nvidia-440-5.4.0-29-generic:amd64 (5.4.0-29.33), linux-modules-5.4.0-29-generic:amd64 (5.4.0-29.33)
, linux-headers-5.4.0-29:amd64 (5.4.0-29.33)
Upgrade: update-manager-core:amd64 (1:20.04.9, 1:20.04.10), linux-headers-generic:amd64 (5.4.0.28.33, 5.4.0.29.34), linux-libc-dev:amd64 (5.4
.0-28.32, 5.4.0-29.33), linux-image-generic:amd64 (5.4.0.28.33, 5.4.0.29.34), python3-update-manager:amd64 (1:20.04.9, 1:20.04.10), linux-mod
ules-nvidia-440-generic-hwe-20.04:amd64 (5.4.0-28.32, 5.4.0-29.33), linux-generic:amd64 (5.4.0.28.33, 5.4.0.29.34)
End-Date: 2020-05-05  11:25:08

...

Start-Date: 2020-05-06  09:18:30
Commandline: /usr/bin/unattended-upgrade
Remove: linux-modules-nvidia-440-5.4.0-26-generic:amd64 (5.4.0-26.30+2), linux-image-5.4.0-26-generic:amd64 (5.4.0-26.30)
End-Date: 2020-05-06  09:18:36

...

当前正在运行的内核是5.4.0-29-generic。当我查看 时htop,我看到的Xorg是顶部的。


Launchpad 错误 #1861294(Gpu 看门狗段错误和 optiplex 7060 intel gpu 上的视频+kbd+鼠标冻结)看起来很相似。虽然第一份报告是针对英特尔 GPU 的。相似之处在于GpuWatchdog段错误,有趣的是还提到了 Slack,它甚至在我的内核堆栈跟踪中,但不确定这是否相关。

Unix SE 562458(Google Chrome 中的段错误 - 与 Nvidia 卡有关吗?我如何查找?)看起来也颇有关联。


现在,使用较新的内核(Linux az-Desktop2020 5.4.0-33-generic #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux),我再次遇到了另一个奇怪的冻结(或者没有完全冻结,它仍然有响应,但速度太慢以至于无法使用)。XorgCPU 使用率为 99%。 我杀死了Xorg,Ubuntu 自动启动了一个新的 Xorg 实例,该实例再次具有 99% 的 CPU 使用率,但现在屏幕保持黑色(也许我只需要等待几个小时,它就会非常缓慢地返回桌面……)。

dmesg告诉我这个,这可能有关,但我不确定(这些消息很旧;今天是星期二......)

[So Jun  7 23:55:23 2020] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[So Jun  7 23:55:23 2020] NVRM: GPU Board Serial Number:                                                                    
[So Jun  7 23:55:23 2020] NVRM: Xid (PCI:0000:09:00): 61, pid=1979, 0cec(3098) 00000000 00000000
[So Jun  7 23:57:27 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[So Jun  7 23:57:52 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[So Jun  7 23:58:18 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun  7 23:58:18 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun  7 23:58:23 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun  7 23:58:23 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun  7 23:58:29 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun  7 23:58:29 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun  7 23:58:34 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[So Jun  7 23:58:34 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0
[So Jun  7 23:58:40 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
...
[Mo Jun  8 00:00:02 2020] NVRM: GPU 0000:09:00.0: RmInitAdapter failed! (0x24:0x65:1185)
[Mo Jun  8 00:00:02 2020] NVRM: GPU 0000:09:00.0: rm_init_adapter failed, device minor number 0

我又得到它了。

[Fri Jul 31 23:09:41 2020] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[Fri Jul 31 23:09:41 2020] NVRM: GPU Board Serial Number: 
[Fri Jul 31 23:09:41 2020] NVRM: Xid (PCI:0000:09:00): 61, pid=1976, 0cec(3098) 00000000 00000000
[Fri Jul 31 23:10:45 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Fri Jul 31 23:10:49 2020] show_signal_msg: 23 callbacks suppressed
[Fri Jul 31 23:10:49 2020] GpuWatchdog[3313]: segfault at 0 ip 000055c61b34120c sp 00007f86a6987450 error 6 in chrome[55c616bd8000+7a52000]
[Fri Jul 31 23:10:49 2020] Code: 89 de e8 57 83 a4 fe 80 7d c7 00 79 09 48 8b 7d b0 e8 f8 de 6d fe 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 54 f6 ae fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e
[Fri Jul 31 23:11:09 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Fri Jul 31 23:11:23 2020] chrome[419865]: segfault at 4 ip 00007f86a4dcece7 sp 00007ffde03243b8 error 6 in libnvidia-glcore.so.440.100[7f86a3d61000+12e0000]
[Fri Jul 31 23:11:23 2020] Code: 04 01 00 00 44 89 ab 08 01 00 00 44 89 b3 0c 01 00 00 e9 5b ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 8b 44 24 08 83 c2 1a <c7> 46 04 e4 08 04 20 c1 e2 12 89 4e 08 44 89 46 0c 81 ca 00 0e 00
[Fri Jul 31 23:11:31 2020] chrome[419912]: segfault at 4 ip 00007f86a4dcece7 sp 00007ffde03243b8 error 6 in libnvidia-glcore.so.440.100[7f86a3d61000+12e0000]
[Fri Jul 31 23:11:31 2020] Code: 04 01 00 00 44 89 ab 08 01 00 00 44 89 b3 0c 01 00 00 e9 5b ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 8b 44 24 08 83 c2 1a <c7> 46 04 e4 08 04 20 c1 e2 12 89 4e 08 44 89 46 0c 81 ca 00 0e 00

uname:Linux az-Desktop2020 5.4.0-42-generic #46-Ubuntu SMP 星期五 7 月 10 日 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


再次(2021-01-04):

[639970.703933] Xorg: page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodem
ask=(null),cpuset=/,mems_allowed=0                                                                
[639970.703937] CPU: 9 PID: 1823 Comm: Xorg Tainted: P           OE     5.4.0-58-generic #64-Ubuntu                                                                                                 [639970.703938] Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI
), BIOS 1407 04/01/2020                                                                           
[639970.703938] Call Trace:                                                                       
...
[639970.703961]  nvkms_alloc+0x24/0x60 [nvidia_modeset]                                           
[639970.703969]  _nv002653kms+0x16/0x30 [nvidia_modeset]                                          
[639970.703971] WARNING: kernel stack frame pointer at 0000000020ca1a81 in Xorg:1823 has bad value 00000000d5eee3ed                                                                                 ...
[639970.704373]  ? _nv037019rm+0xa1/0x190 [nvidia]
[639970.704380]  ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704386]  ? _nv000673kms+0x31/0xe0 [nvidia_modeset]
[639970.704392]  ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704398]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[639970.704404]  ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
[639970.704410]  ? nvkms_ioctl+0xc4/0x100 [nvidia_modeset]
[639970.704477]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[639970.704477]  ? do_vfs_ioctl+0x407/0x670
...
[639970.704513] BUG: unable to handle page fault for address: 0000000000007980
[639970.704515] #PF: supervisor read access in kernel mode
[639970.704515] #PF: error_code(0x0000) - not-present page
[639970.704516] PGD 0 P4D 0 
[639970.704517] Oops: 0000 [#1] SMP NOPTI
...
[639970.704528] RIP: 0010:_nv002606kms+0x60/0x100 [nvidia_modeset]
...
[639970.704535] Call Trace:
[639970.704543]  ? _nv002759kms+0x3ca/0x1470 [nvidia_modeset]
[639970.704544]  ? kmalloc_order+0x63/0x80
[639970.704545]  ? kmalloc_order_trace+0x24/0xa0
[639970.704614]  ? _nv037019rm+0xa1/0x190 [nvidia]
[639970.704621]  ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704627]  ? _nv000673kms+0x31/0xe0 [nvidia_modeset]
[639970.704633]  ? _nv000531kms+0x50/0x50 [nvidia_modeset]
[639970.704639]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
[639970.704645]  ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
[639970.704651]  ? nvkms_ioctl+0xc4/0x100 [nvidia_modeset]
[639970.704718]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[639970.704719]  ? do_vfs_ioctl+0x407/0x670
...
[640674.801878] GpuWatchdog[525174]: segfault at 0 ip 000055f5be4d1ad9 sp 00007f0b525e7680 error 6 in code[55f5baea1000+57ee000]
[640674.801887] Code: 00 79 09 48 8b 7d c0 e8 25 41 c0 fe c7 45 c0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 c0 48 8d 7d c0 e8 b7 47 9d fc <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e

(更详细这里


再次。桌面大部分(但不是完全)冻结,即我几乎无法移动鼠标,点击时甚至有一些反应,但速度非常慢(真的无法使用)。我仍然可以通过 SSH 访问它(基本上一切都通过 SSH 正常运行)。这是在我从睡眠模式唤醒 PC 后发生的。

日志的相关部分:

[Fri Jan  8 01:43:00 2021] NVRM: GPU at PCI:0000:09:00: GPU-00ebafba-eb00-ee69-eba0-3c804c97f796
[Fri Jan  8 01:43:00 2021] NVRM: GPU Board Serial Number: 
[Fri Jan  8 01:43:00 2021] NVRM: Xid (PCI:0000:09:00): 31, pid=1924, Ch 00000009, intr 00000000. MMU Fault: ENGINE CE0 HUBCLIENT_HSCE0 faulted @ 0x1_05044000. Fault is of type FAULT_PTE ACCESS_TYPE_VIRT_READ
[Fri Jan  8 01:43:00 2021] spotify[65829]: segfault at 10 ip 00007f0c7a338ca0 sp 00007ffd220aeec0 error 4 in libnvidia-glcore.so.450.80.02[7f0c7934a000+133e000]
[Fri Jan  8 01:43:00 2021] Code: 89 d5 41 54 49 89 fc 55 48 89 f5 53 48 83 ec 08 83 e0 0f 0f 85 f9 00 00 00 40 f6 c5 0f 74 3e e9 a9 00 00 00 66 0f 1f 44 00 00 <0f> 28 4d 10 49 83 ed 40 0f 28 55 20 0f 28 5d 30 0f 28 45 00 48 83

完整日志这里。这次,崩溃似乎发生在spotify。但这可能是由之前的 NVRM 错误引起的。

我刚刚了解到Nvidia Xid 错误

此错误报告(nvidia-graphics-drivers-435 包:Nvidia 驱动程序崩溃后桌面冻结 30 秒)聽到有關的。

Pid 1924(来自 Xid 错误 31):

root        1924  1.6  0.6 327484 213620 tty1    Rsl+ Jan07  27:23 /usr/lib/xorg/Xorg -nolisten tcp -auth /var/run/sddm/{0cae4c35-d824-4286-8e07-30b9d710224b} -background none -noreset -displayfd 17 -seat seat0 vt1

(重新启动 Xorg,即sudo kill 1924似乎已恢复系统。)

unameLinux az-Desktop2020 5.4.0-59-generic #65-Ubuntu SMP Thu Dec 10 12:01:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Nvidia 版本:nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 450.80.02 Wed Sep 23 00:48:09 UTC 2020NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0


现在(2021-11-05)有一些新的挂起。dmesg

[1190321.012451] INFO: task QSGRenderThread:639492 blocked for more than 120 seconds.
[1190321.012455]       Tainted: P        W  OE     5.4.0-89-generic #100-Ubuntu
[1190321.012456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1190321.012457] QSGRenderThread D    0 639492   2848 0x00004080
[1190321.012460] Call Trace:
[1190321.012466]  __schedule+0x2e3/0x740
[1190321.012469]  schedule+0x42/0xb0
[1190321.012470]  schedule_timeout+0x10e/0x160
[1190321.012473]  __down+0x82/0xd0
[1190321.012476]  down+0x47/0x60
[1190321.012623]  os_acquire_semaphore+0x35/0x40 [nvidia]
[1190321.012839]  _nv035261rm+0xc/0x30 [nvidia]
[1190321.013051]  ? _nv035253rm+0x15/0x20 [nvidia]
[1190321.013217]  ? _nv036090rm+0x18d/0x1c0 [nvidia]
[1190321.013379]  ? _nv037747rm+0x45/0xd0 [nvidia]
[1190321.013588]  ? _nv037715rm+0xed/0x4e0 [nvidia]
[1190321.013750]  ? _nv036080rm+0xbe/0x140 [nvidia]
[1190321.013911]  ? _nv036081rm+0x42/0x70 [nvidia]
[1190321.014073]  ? _nv000567rm+0x41/0x50 [nvidia]
[1190321.014267]  ? _nv000724rm+0x73a/0xa90 [nvidia]
[1190321.014461]  ? _nv000724rm+0x38/0xa90 [nvidia]
[1190321.014651]  ? rm_ioctl+0x54/0xb0 [nvidia]
[1190321.014804]  ? nvidia_ioctl+0x66f/0x880 [nvidia]
[1190321.014949]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
[1190321.014952]  ? do_vfs_ioctl+0x407/0x670
[1190321.014954]  ? __audit_syscall_entry+0xdb/0x120
[1190321.014956]  ? ksys_ioctl+0x67/0x90
[1190321.014958]  ? __x64_sys_ioctl+0x1a/0x20
[1190321.014960]  ? do_syscall_64+0x57/0x190
[1190321.014962]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

使用NVIDIA UNIX x86_64 Kernel Module 470.74(通过dmesguname -a。:Linux az-Desktop2020 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux


我认为这里有些 Gentoo 用户也遇到过类似的问题。

相关内容