Ubuntu 22.04 冻结并在日志中显示 amdgpu 错误

Ubuntu 22.04 冻结并在日志中显示 amdgpu 错误

我已经运行 Ubuntu 22.04 一段时间了。直到今天一切都运行正常。我使用的是 AMD Ryzen Lenovo ThinkPad (T14 gen3)。我的系统今天卡住了两次。

冻结之前,我的日志中的最后一条消息Important是:

09:02:10 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process msedge pid 3949 thread msedge:cs0 pid 3971
09:02:10 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process msedge pid 3949 thread msedge:cs0 pid 3971
09:02:10 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=172936, emitted seq=172937
09:01:46 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
09:01:46 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled 

选项System卡显示以下日志:

09:02:10 kernel: amdgpu 0000:04:00.0: amdgpu: GPU recovery disabled.
09:02:10 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process msedge pid 3949 thread msedge:cs0 pid 3971
09:01:46 kernel: amdgpu 0000:04:00.0: amdgpu: GPU recovery disabled.
09:01:46 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
09:01:46 kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!

我不知道为什么会出现这个问题,因为我今天没有做任何更新。我也没有安装任何与 gpu 相关的驱动程序或任何其他驱动程序。只是“默认”的 Ubuntu 安装。

感谢帮助。

答案1

应该在 Ubuntu 23.04 Kernel 6.2+ 和 libdrm-amdgpu1 2.4.114-1 中解决) - 参见https://gitlab.freedesktop.org/drm/amd/-/issues/2282#note_1901512

更多信息https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980831

我有同样的问题,联想 ThinkPad T14 Gen 3 AMD(Ryzen 7 PRO 6850U、21CGS1ES00、BIOS R23ET65W - 1.35),Ubuntu 22.04.2(5.19.0-42-generic)。

libdrm-amdgpu(apt search amdgpu*):

libdrm-amdgpu1/jammy-updates,now 2.4.113-2~ubuntu0.22.04.1 amd64

系统日志:

May 22 23:12:16 P09-ThinkPad-T14-Gen-3 kernel: [ 4089.681396] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
May 22 23:12:21 P09-ThinkPad-T14-Gen-3 kernel: [ 4089.685379] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
May 22 23:12:21 P09-ThinkPad-T14-Gen-3 kernel: [ 4094.801723] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=19205, emitted seq=19206
May 22 23:12:21 P09-ThinkPad-T14-Gen-3 kernel: [ 4094.802071] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
May 22 23:12:21 P09-ThinkPad-T14-Gen-3 kernel: [ 4094.802351] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4095.955280] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4095.955394] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.150614] [drm:gfx_v10_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.163949] [drm] free PSP TMR buffer
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196421] CPU: 6 PID: 24115 Comm: kworker/u32:3 Tainted: G           OE     5.19.0-41-generic #42~22.04.1-Ubuntu
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196425] Hardware name: LENOVO 21CGS1ES00/21CGS1ES00, BIOS R23ET65W (1.35 ) 03/21/2023
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196427] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196435] Call Trace:
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196437]  <TASK>
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196440]  show_stack+0x52/0x69
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196445]  dump_stack_lvl+0x49/0x6d
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196450]  dump_stack+0x10/0x18
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196453]  amdgpu_do_asic_reset+0x2b/0x441 [amdgpu]
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196678]  amdgpu_device_gpu_recover_imp.cold+0x4f6/0x805 [amdgpu]
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.196879]  amdgpu_job_timedout+0x15e/0x190 [amdgpu]
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197059]  ? finish_task_switch.isra.0+0x84/0x290
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197064]  drm_sched_job_timedout+0x6d/0x120 [gpu_sched]
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197068]  process_one_work+0x21f/0x400
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197072]  worker_thread+0x50/0x3f0
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197074]  ? rescuer_thread+0x3a0/0x3a0
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197076]  kthread+0xee/0x120
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197078]  ? kthread_complete_and_exit+0x20/0x20
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197081]  ret_from_fork+0x22/0x30
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197086]  </TASK>
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.197088] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.206082] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.206251] [drm] PCIE GART of 512M enabled (table at 0x000000F4008C9000).
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.206264] [drm] PSP is resuming...
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.228241] [drm] reserve 0xa00000 from 0xf43f400000 for PSP TMR
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.564430] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.576531] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.576534] amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.576542] amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.576933] amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.578769] [drm] DMUB hardware initialized: version=0x04000022
May 22 23:12:23 P09-ThinkPad-T14-Gen-3 kernel: [ 4096.748970] [drm:check_syncd_pipes_for_disabled_master_pipe [amdgpu]] *ERROR* DC: Failure: pipe_idx[2] syncd with disabled master pipe_idx[1]
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.370891] [drm] kiq ring mec 2 pipe 1 q 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.376593] [drm] VCN decode and encode initialized successfully(under DPG Mode).
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377031] [drm] JPEG decode initialized successfully.
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377039] amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377045] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377047] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377049] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377050] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377052] amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377053] amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377054] amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377056] amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377058] amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377059] amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377061] amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377063] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377064] amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.377065] amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.384067] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.384073] amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
May 22 23:12:24 P09-ThinkPad-T14-Gen-3 kernel: [ 4097.384166] amdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded!

我正在使用联想 ThinkPad Universal USB-C Dock (40AY),通过 DP + 内部显示器连接两个显示器。

相关内容