内核 6.0 和 AMD-GPU/Notion 应用程序问题

内核 6.0 和 AMD-GPU/Notion 应用程序问题

最近,我在 Manjaro 笔记本电脑上安装了新的 Linux 内核 6.0.2。使用常用应用程序一段时间后,我发现笔记本电脑“冻结”了,只有强制关闭电源才能恢复。查看 journalctl 日志,我发现下面的日志恰好是在笔记本电脑停止工作之前发出的:

Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32778, for process notion-snap pid 5993 thread no>
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800108620000 from IH client 0x12 (VMC)
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140051
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
Oct 17 12:53:18 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10149, emitted seq=10151
Oct 17 12:53:18 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process notion-snap pid 5993 thread notion-sna:cs0 pid 6023
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1232258c0 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1232258e0 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225900 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225920 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225960 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225940 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225980 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1232259a0 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123240000 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: [drm] free PSP TMR buffer
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Oct 17 12:53:18 manjaro-bl kernel: [drm] PCIE GART of 1024M enabled.
Oct 17 12:53:18 manjaro-bl kernel: [drm] PTB located at 0x000000F400A00000
Oct 17 12:53:18 manjaro-bl kernel: [drm] PSP is resuming...
Oct 17 12:53:18 manjaro-bl kernel: [drm] reserve 0x400000 from 0xf439000000 for PSP TMR
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
Oct 17 12:53:20 manjaro-bl kernel: [drm] kiq ring mec 2 pipe 1 q 0
Oct 17 12:53:20 manjaro-bl kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
Oct 17 12:53:20 manjaro-bl kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
Oct 17 12:53:20 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
Oct 17 12:53:20 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -110
Oct 17 12:53:20 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Oct 17 12:53:30 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10151, emitted seq=10151
Oct 17 12:53:30 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process notion-snap pid 5993 thread notion-sna:cs0 pid 6023
Oct 17 12:53:30 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 17 12:53:30 manjaro-bl kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
Oct 17 12:53:30 manjaro-bl kernel: #PF: supervisor read access in kernel mode
Oct 17 12:53:30 manjaro-bl kernel: #PF: error_code(0x0000) - not-present page
Oct 17 12:53:30 manjaro-bl kernel: PGD 0 P4D 0 
Oct 17 12:53:30 manjaro-bl kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Oct 17 12:53:30 manjaro-bl kernel: CPU: 3 PID: 84 Comm: irq/25-AMD-Vi Tainted: G           OE      6.0.2-2-MANJARO #1 3201429db279b991ed623dff5d7b05e56f9c1c48
Oct 17 12:53:30 manjaro-bl kernel: Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN33WW 05/12/2022
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:report_iommu_fault+0x15/0x90
Oct 17 12:53:30 manjaro-bl kernel: Code: ff ff ff 5b 5d 41 5c e9 0d d8 88 00 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 53 <4>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493df8 EFLAGS: 00010202
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: RDX: 000000012320a5c0 RSI: ffff8b6e416df0d0 RDI: 0000000000000010
Oct 17 12:53:30 manjaro-bl kernel: RBP: 000000012320a5c0 R08: ffff8b6e41737280 R09: 0000000000000070
Oct 17 12:53:30 manjaro-bl kernel: R10: ffff8b6e402160b0 R11: 0000000000000003 R12: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: R13: ffff8b6e40062800 R14: 0000000000000300 R15: 0000000000000003
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000028 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: Call Trace:
Oct 17 12:53:30 manjaro-bl kernel:  <TASK>
Oct 17 12:53:30 manjaro-bl kernel:  amd_iommu_int_thread+0x61e/0x780
Oct 17 12:53:30 manjaro-bl kernel:  ? __wake_up_common_lock+0x88/0xc0
Oct 17 12:53:30 manjaro-bl kernel:  ? disable_irq_nosync+0x10/0x10
Oct 17 12:53:30 manjaro-bl kernel:  irq_thread_fn+0x23/0x60
Oct 17 12:53:30 manjaro-bl kernel:  irq_thread+0xfe/0x1c0
Oct 17 12:53:30 manjaro-bl kernel:  ? irq_thread_fn+0x60/0x60
Oct 17 12:53:30 manjaro-bl kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Oct 17 12:53:30 manjaro-bl kernel:  kthread+0xde/0x110
Oct 17 12:53:30 manjaro-bl kernel:  ? kthread_complete_and_exit+0x20/0x20
Oct 17 12:53:30 manjaro-bl kernel:  ret_from_fork+0x22/0x30
Oct 17 12:53:30 manjaro-bl kernel:  </TASK>
Oct 17 12:53:30 manjaro-bl kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep qrtr btu>
Oct 17 12:53:30 manjaro-bl kernel:  acpi_cpufreq squashfs loop uinput vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user fuse bpf_preload ip_tables x_tables usbhid bt>
Oct 17 12:53:30 manjaro-bl kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 fjes():1 amd64_edac():1 fjes():1 amd64_edac():1 >
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000028
Oct 17 12:53:30 manjaro-bl kernel: ---[ end trace 0000000000000000 ]---
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:report_iommu_fault+0x15/0x90
Oct 17 12:53:30 manjaro-bl kernel: Code: ff ff ff 5b 5d 41 5c e9 0d d8 88 00 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 53 <4>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493df8 EFLAGS: 00010202
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: RDX: 000000012320a5c0 RSI: ffff8b6e416df0d0 RDI: 0000000000000010
Oct 17 12:53:30 manjaro-bl kernel: RBP: 000000012320a5c0 R08: ffff8b6e41737280 R09: 0000000000000070
Oct 17 12:53:30 manjaro-bl kernel: R10: ffff8b6e402160b0 R11: 0000000000000003 R12: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: R13: ffff8b6e40062800 R14: 0000000000000300 R15: 0000000000000003
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000028 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a91
Oct 17 12:53:30 manjaro-bl kernel: #PF: supervisor write access in kernel mode
Oct 17 12:53:30 manjaro-bl kernel: #PF: error_code(0x0002) - not-present page
Oct 17 12:53:30 manjaro-bl kernel: PGD 0 P4D 0 
Oct 17 12:53:30 manjaro-bl kernel: Oops: 0002 [#2] PREEMPT SMP NOPTI
Oct 17 12:53:30 manjaro-bl kernel: CPU: 3 PID: 84 Comm: irq/25-AMD-Vi Tainted: G      D    OE      6.0.2-2-MANJARO #1 3201429db279b991ed623dff5d7b05e56f9c1c48
Oct 17 12:53:30 manjaro-bl kernel: Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN33WW 05/12/2022
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:mutex_lock+0x1d/0x30
Oct 17 12:53:30 manjaro-bl kernel: Code: 00 00 be 02 00 00 00 e9 51 f8 ff ff 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb 2e 2e 2e 31 c0 31 c0 65 48 8b 14 25 c0 0b 02 00 <f>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493e58 EFLAGS: 00010246
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 0000000000000a91 RCX: 00000000000001b0
Oct 17 12:53:30 manjaro-bl kernel: RDX: ffff8b6e411b0000 RSI: 0000000000001cc9 RDI: 0000000000000a91
Oct 17 12:53:30 manjaro-bl kernel: RBP: ffff8b6e411b0000 R08: 0000000000000000 R09: ffffad20c0493aa8
Oct 17 12:53:30 manjaro-bl kernel: R10: 0000000000000003 R11: ffffffff88acb508 R12: 0000000000000009
Oct 17 12:53:30 manjaro-bl kernel: R13: 0000000000000001 R14: 0000000000000a91 R15: 0000000000000ab1
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000a91 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: Call Trace:
Oct 17 12:53:30 manjaro-bl kernel:  <TASK>
Oct 17 12:53:30 manjaro-bl kernel:  perf_event_exit_task+0x41/0x2b0
Oct 17 12:53:30 manjaro-bl kernel:  do_exit+0x342/0xad0
Oct 17 12:53:30 manjaro-bl kernel:  ? task_work_run+0x60/0x90
Oct 17 12:53:30 manjaro-bl kernel:  ? do_exit+0x332/0xad0
Oct 17 12:53:30 manjaro-bl kernel:  ? make_task_dead+0x55/0x60
Oct 17 12:53:30 manjaro-bl kernel:  ? rewind_stack_and_make_dead+0x17/0x20
Oct 17 12:53:30 manjaro-bl kernel:  </TASK>
Oct 17 12:53:30 manjaro-bl kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep qrtr btu>
Oct 17 12:53:30 manjaro-bl kernel:  acpi_cpufreq squashfs loop uinput vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user fuse bpf_preload ip_tables x_tables usbhid bt>
Oct 17 12:53:30 manjaro-bl kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 fjes():1 amd64_edac():1 fjes():1 amd64_edac():1 >
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000a91
Oct 17 12:53:30 manjaro-bl kernel: ---[ end trace 0000000000000000 ]---
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:report_iommu_fault+0x15/0x90
Oct 17 12:53:30 manjaro-bl kernel: Code: ff ff ff 5b 5d 41 5c e9 0d d8 88 00 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 53 <4>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493df8 EFLAGS: 00010202
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: RDX: 000000012320a5c0 RSI: ffff8b6e416df0d0 RDI: 0000000000000010
Oct 17 12:53:30 manjaro-bl kernel: RBP: 000000012320a5c0 R08: ffff8b6e41737280 R09: 0000000000000070
Oct 17 12:53:30 manjaro-bl kernel: R10: ffff8b6e402160b0 R11: 0000000000000003 R12: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: R13: ffff8b6e40062800 R14: 0000000000000300 R15: 0000000000000003
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000a91 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: Fixing recursive fault but reboot is needed!
Oct 17 12:53:30 manjaro-bl kernel: BUG: scheduling while atomic: irq/25-AMD-Vi/84/0x00000000
Oct 17 12:53:30 manjaro-bl kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep qrtr btu>
Oct 17 12:53:30 manjaro-bl kernel:  acpi_cpufreq squashfs loop uinput vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user fuse bpf_preload ip_tables x_tables usbhid bt>
Oct 17 12:53:30 manjaro-bl kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 fjes():1 amd64_edac():1 fjes():1 amd64_edac():1 >
Oct 17 12:53:30 manjaro-bl kernel: CPU: 3 PID: 84 Comm: irq/25-AMD-Vi Tainted: G      D    OE      6.0.2-2-MANJARO #1 3201429db279b991ed623dff5d7b05e56f9c1c48
Oct 17 12:53:30 manjaro-bl kernel: Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN33WW 05/12/2022
Oct 17 12:53:30 manjaro-bl kernel: Call Trace:
Oct 17 12:53:30 manjaro-bl kernel:  <TASK>
Oct 17 12:53:30 manjaro-bl kernel:  dump_stack_lvl+0x48/0x60
Oct 17 12:53:30 manjaro-bl kernel:  __schedule_bug.cold+0x4b/0x57
Oct 17 12:53:30 manjaro-bl kernel:  __schedule+0xde8/0x11c0
Oct 17 12:53:30 manjaro-bl kernel:  do_task_dead+0x43/0x50
Oct 17 12:53:30 manjaro-bl kernel:  make_task_dead.cold+0x51/0xab
Oct 17 12:53:30 manjaro-bl kernel:  rewind_stack_and_make_dead+0x17/0x20
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0000:0x0
Oct 17 12:53:30 manjaro-bl kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel:  </TASK>
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:31 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:31 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:31 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out

似乎我的 Notion snap 应用程序导致了这个问题。有人在使用新内核时遇到过这个问题吗?

相关内容