一周内系统会黑屏好几次。仍可以通过 SSH 连接,但图形会一直显示不出来,直到硬重启。
dmesg.logs 中包含与 amd 相关的内容:
9.574643] amdgpu 0000:2f:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 9.589995] amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 9.589998] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 9.589999] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 9.589999] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 9.590000] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 9.590001] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 9.590001] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 9.590002] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 9.590002] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 9.590003] amdgpu 0000:2f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 9.590003] amdgpu 0000:2f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 9.590004] amdgpu 0000:2f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 9.590004] amdgpu 0000:2f:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ 9.590005] amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ 9.590006] amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ 9.590006] amdgpu 0000:2f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ 9.591822] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:2f:00.0 on minor 0
[405336.815466] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[405339.384808] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[405342.936641] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
[405343.207409] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle:
405382.220639] amdgpu 0000:2f:00.0: amdgpu: Failed to disable gfxoff!
[405397.798055] amdgpu 0000:2f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[405397.798141] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[405398.078473] amdgpu 0000:2f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[405398.078540] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[405403.035730] amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command!
[405403.035735] amdgpu 0000:2f:00.0: amdgpu: Failed to disable smu features.
[405403.035738] amdgpu 0000:2f:00.0: amdgpu: Fail to disable dpm features!
[405403.035739] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[405403.048480] [drm] free PSP TMR buffer
[405404.144031] [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x80000306)
[405404.165017] amdgpu 0000:2f:00.0: amdgpu: MODE1 reset
[405404.165020] amdgpu 0000:2f:00.0: amdgpu: GPU mode1 reset
[405404.165099] amdgpu 0000:2f:00.0: amdgpu: GPU smu mode1 reset
[405409.277640] amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command!
[405409.277644] amdgpu 0000:2f:00.0: amdgpu: GPU mode1 reset failed
[405409.277749] amdgpu 0000:2f:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:2f:00.0
[405420.341892] amdgpu 0000:2f:00.0: amdgpu: GPU reset succeeded, trying to resume
[405420.342197] [drm] PCIE GART of 512M enabled (table at 0x0000008001FA4000).
[405420.342234] [drm] VRAM is lost due to GPU reset!
[405420.343561] [drm] PSP is resuming...
[405421.458583] [drm] failed to load ucode SMC(0x18)
[405421.458601] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x80000306)
[405421.458606] [drm] reserve 0xa00000 from 0x81fe000000 for PSP TMR
[405421.695431] amdgpu 0000:2f:00.0: amdgpu: RAS: optional ras ta ucode is not available
[405421.708606] amdgpu 0000:2f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[405421.708609] amdgpu 0000:2f:00.0: amdgpu: SMU is resuming...
[405421.708613] amdgpu 0000:2f:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw version = 0x003b2900 (59.41.0)
[405421.708615] amdgpu 0000:2f:00.0: amdgpu: SMU driver if version not matched
[405426.947166] amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command!
[405426.947171] amdgpu 0000:2f:00.0: amdgpu: Failed to SetDriverDramAddr!
[405426.947172] amdgpu 0000:2f:00.0: amdgpu: Failed to setup smc hw!
[405426.947173] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[405426.947276] amdgpu 0000:2f:00.0: amdgpu: GPU reset(2) failed
[405426.965191] snd_hda_intel 0000:2f:00.1: refused to change power state from D3hot to D0
[405427.069838] snd_hda_intel 0000:2f:00.1: CORB reset timeout#2, CORBRP = 65535
[405427.069851] amdgpu 0000:2f:00.0: amdgpu: GPU reset end with ret = -62
[405437.169457] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=244959, emitted seq=244959
[405437.169575] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[405437.169662] amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
[405629.935225] INFO: task VizCompositorTh:5424 blocked for more than 120 seconds.
[405629.935230] Not tainted 5.15.0-47-generic #51-Ubuntu
[405629.935231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[405629.935232] task:VizCompositorTh state:D stack: 0 pid: 5424 ppid: 5332 flags:0x00004002
[405629.935235] Call Trace:
[405629.935236] <TASK>
[405629.935238] __schedule+0x23d/0x590
[405629.935242] schedule+0x4e/0xc0
[405629.935243] schedule_timeout+0x103/0x140
[405629.935245] ? kmem_cache_free+0x26c/0x290
[405629.935247] dma_fence_default_wait+0x1c8/0x1f0
[405629.935250] ? dma_fence_free+0x30/0x30
[405629.935251] dma_fence_wait_timeout+0xbf/0xe0
[405629.935254] drm_sched_entity_fini+0xd7/0x250 [gpu_sched]
[405629.935257] drm_sched_entity_destroy+0x20/0x30 [gpu_sched]
[405629.935259] amdgpu_vm_fini+0x2d6/0x4c0 [amdgpu]
[405629.935352] ? idr_destroy+0x81/0xd0
[405629.935354] amdgpu_driver_postclose_kms+0x179/0x240 [amdgpu]
[405629.935421] ? idr_destroy+0x81/0xd0
[405426.947173] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[405426.947276] amdgpu 0000:2f:00.0: amdgpu: GPU reset(2) failed
[405426.965191] snd_hda_intel 0000:2f:00.1: refused to change power state from D3hot to D0
[405427.069838] snd_hda_intel 0000:2f:00.1: CORB reset timeout#2, CORBRP = 65535
[405427.069851] amdgpu 0000:2f:00.0: amdgpu: GPU reset end with ret = -62
[405437.169457] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=244959, emitted seq=244959
[405437.169575] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[405437.169662] amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
[405629.935225] INFO: task VizCompositorTh:5424 blocked for more than 120 seconds.
[405629.935230] Not tainted 5.15.0-47-generic #51-Ubuntu
[405629.935231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[405629.935232] task:VizCompositorTh state:D stack: 0 pid: 5424 ppid: 5332 flags:0x00004002
[405629.935235] Call Trace:
[405629.935236] <TASK>
[405629.935238] __schedule+0x23d/0x590
[405629.935242] schedule+0x4e/0xc0
[405629.935243] schedule_timeout+0x103/0x140
[405629.935245] ? kmem_cache_free+0x26c/0x290
[405629.935247] dma_fence_default_wait+0x1c8/0x1f0
[405629.935250] ? dma_fence_free+0x30/0x30
[405629.935251] dma_fence_wait_timeout+0xbf/0xe0
[405629.935254] drm_sched_entity_fini+0xd7/0x250 [gpu_sched]
[405629.935257] drm_sched_entity_destroy+0x20/0x30 [gpu_sched]
[405629.935259] amdgpu_vm_fini+0x2d6/0x4c0 [amdgpu]
[405629.935352] ? idr_destroy+0x81/0xd0
[405629.935354] amdgpu_driver_postclose_kms+0x179/0x240 [amdgpu]
[405629.935421] ? idr_destroy+0x81/0xd0
[405629.935424] drm_file_free.part.0+0x1da/0x230 [drm]
[405629.935435] drm_close_helper.isra.0+0x65/0x70 [drm]
[405629.935445] drm_release+0x6a/0x120 [drm]
[405629.935454] __fput+0x9f/0x260
[405629.935457] ____fput+0xe/0x20
[405629.935458] task_work_run+0x6d/0xb0
[405629.935460] do_exit+0x21b/0x3c0
[405629.935462] do_group_exit+0x3b/0xb0
[405629.935463] get_signal+0x150/0x900
[405629.935465] arch_do_signal_or_restart+0xde/0x100
[405629.935467] ? schedule+0x4e/0xc0
[405629.935468] exit_to_user_mode_loop+0xc4/0x160
[405629.935470] exit_to_user_mode_prepare+0xa0/0xb0
[405629.935472] irqentry_exit_to_user_mode+0x9/0x20
[405629.935474] irqentry_exit+0x1d/0x30
[405629.935475] common_interrupt+0x55/0xa0
[405629.935476] asm_common_interrupt+0x26/0x40
[405629.935477] RIP: 0033:0x55c8c9266d64
[405629.935479] RSP: 002b:00007f4b1320bca0 EFLAGS: 00000206
[405629.935480] RAX: 0000000000000000 RBX: 0000288c00ad1000 RCX: 00007f4b1320cc28
[405629.935481] RDX: 000000000720c851 RSI: 00007f4b1320cc28 RDI: 0000288c00ad1000
[405629.935482] RBP: 00007f4b1320c4d0 R08: 0000000000000000 R09: 000055c8c91501e0
[405629.935482] R10: 0000000000000000 R11: aaaaaaaaaaaaaaaa R12: 0000288c00ad1000
[405629.935483] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000720c851
[405629.935485] </TASK>
[405629.935561] INFO: task kworker/14:1:544425 blocked for more than 120 seconds.
[405629.935563] Not tainted 5.15.0-47-generic #51-Ubuntu
[405629.935563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[405629.935564] task:kworker/14:1 state:D stack: 0 pid:544425 ppid: 2 flags:0x00004000
[405629.935566] Workqueue: events drm_sched_job_timedout [gpu_sched]
[405629.935569] Call Trace:
[405629.935569] <TASK>
[405629.935570] __schedule+0x23d/0x590
[405629.935571] schedule+0x4e/0xc0
[405629.935572] schedule_timeout+0x103/0x140
[405629.935573] ? task_rq_lock+0x5f/0x160
[405629.935575] dma_fence_default_wait+0x1c8/0x1f0
[405629.935576] ? dma_fence_free+0x30/0x30
[405629.935578] dma_fence_wait_timeout+0xbf/0xe0
[405629.935579] drm_sched_stop+0xfc/0x170 [gpu_sched]
[405629.935581] amdgpu_device_gpu_recover.cold+0x861/0x8ff [amdgpu]
[405629.935698] amdgpu_job_timedout+0x153/0x180 [amdgpu]
[405629.935794] drm_sched_job_timedout+0x6f/0x120 [gpu_sched]
[405629.935796] process_one_work+0x22b/0x3d0
[405629.935798] worker_thread+0x53/0x420
[405629.935799] ? process_one_work+0x3d0/0x3d0
[405629.935800] kthread+0x12a/0x150
[405629.935801] ? set_kthread_struct+0x50/0x50
[405629.935802] ret_from_fork+0x22/0x30
[405629.935805] </TASK>