Ubuntu 20.04.4 LTS 随机冻结

Ubuntu 20.04.4 LTS 随机冻结

无论我正在做什么,无论我正在使用电脑还是我离开,都会发生冻结。键盘没有反应,屏幕进入省电模式。

但是,我可以通过另一台计算机通过 SSH 访问它。遗憾的是,我不知道如何诊断问题。任何帮助或提示都将不胜感激!

以下是我希望有所帮助的一些细节(取自计算机冻结时的 SSH 会话),欢迎随时询问更多信息。

$ free -h

              total        used        free      shared  buff/cache   available
Mem:           31Gi        13Gi       978Mi       237Mi        16Gi        16Gi
Swap:          30Gi       1,9Gi        28Gi

$ grep -i swap /etc/fstab

UUID=2cd379c8-d157-4eee-a667-12271c8607be  none            swap    sw                         0       0

$ ll /dev/disk/by-uuid/2cd379c8-d157-4eee-a667-12271c8607be

lrwxrwxrwx 1 root root 15 janv. 16 06:34 /dev/disk/by-uuid/2cd379c8-d157-4eee-a667-12271c8607be -> ../../nvme0n1p4

$ sysctl vm.swappiness

vm.swappiness = 60

ls -al /var/crash

此日期不包含任何文件

$ sudo lshw -c video

  *-display                 
       description: VGA compatible controller
       product: Advanced Micro Devices, Inc. [AMD/ATI]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0c:00.0
       version: c7
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:79 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:fcc00000-fccfffff memory:fcd00000-fcd1ffff

$ sudo lsmod | grep -i amd

edac_mce_amd           36864  0
amdgpu               9809920  29
iommu_v2               24576  1 amdgpu
gpu_sched              45056  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
drm_ttm_helper         16384  1 amdgpu
ttm                    86016  2 amdgpu,drm_ttm_helper
drm_kms_helper        307200  1 amdgpu
gpio_amdpt             20480  0
drm                   618496  15 gpu_sched,drm_kms_helper,amdgpu,drm_ttm_helper,ttm
gpio_generic           20480  1 gpio_amdpt

编辑

/var/log/kern.log 文件显示了这一点,在冻结时,这是否相关?

Jan 16 11:28:32 benj-pc kernel: [17630.400119] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 16 11:28:38 benj-pc kernel: [17630.400121] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
Jan 16 11:28:38 benj-pc kernel: [17635.530047] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1239598, emitted seq=1239600
Jan 16 11:28:38 benj-pc kernel: [17635.530232] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2804 thread Xorg:cs0 pid 2805
Jan 16 11:28:38 benj-pc kernel: [17635.530385] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 16 11:28:38 benj-pc kernel: [17636.140112] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 16 11:28:38 benj-pc kernel: [17636.140250] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Jan 16 11:28:38 benj-pc kernel: [17636.440685] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 16 11:28:38 benj-pc kernel: [17636.440820] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jan 16 11:28:39 benj-pc kernel: [17636.740600] [drm:gfx_v10_0_cp_gfx_enable [amdgpu]] *ERROR* failed to halt cp gfx
Jan 16 11:28:39 benj-pc kernel: [17636.754647] [drm] free PSP TMR buffer
Jan 16 11:28:39 benj-pc kernel: [17636.800093] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17400300 flags=0x0030]
Jan 16 11:28:39 benj-pc kernel: [17636.800102] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17d61000 flags=0x0010]
Jan 16 11:28:39 benj-pc kernel: [17636.800107] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d16a02400 flags=0x0010]
Jan 16 11:28:39 benj-pc kernel: [17636.800112] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d16a03500 flags=0x0010]
Jan 16 11:28:39 benj-pc kernel: [17636.800116] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17401300 flags=0x0030]
Jan 16 11:28:39 benj-pc kernel: [17636.800120] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17402300 flags=0x0030]
Jan 16 11:28:39 benj-pc kernel: [17636.800124] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17404200 flags=0x0030]
Jan 16 11:28:39 benj-pc kernel: [17636.800128] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17403300 flags=0x0030]
Jan 16 11:28:39 benj-pc kernel: [17636.800132] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d17405200 flags=0x0030]
Jan 16 11:28:39 benj-pc kernel: [17636.800136] amdgpu 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d16a03500 flags=0x0010]
Jan 16 11:28:39 benj-pc kernel: [17636.800223] amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 16 11:28:39 benj-pc kernel: [17636.800227] amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 16 11:28:39 benj-pc kernel: [17636.800301] amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1 reset
Jan 16 11:28:39 benj-pc kernel: [17636.801277] AMD-Vi: IOMMU event log overflow
Jan 16 11:28:39 benj-pc kernel: [17637.312179] amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 16 11:28:39 benj-pc kernel: [17637.312406] [drm] PCIE GART of 512M enabled (table at 0x00000080008CA000).
Jan 16 11:28:39 benj-pc kernel: [17637.312434] [drm] VRAM is lost due to GPU reset!
Jan 16 11:28:39 benj-pc kernel: [17637.313819] [drm] PSP is resuming...
Jan 16 11:28:40 benj-pc kernel: [17637.511892] [drm] reserve 0xa00000 from 0x81fe000000 for PSP TMR
Jan 16 11:28:42 benj-pc kernel: [17639.666741] [drm] psp gfx command LOAD_ASD(0x4) failed and response status is (0x0)
Jan 16 11:28:42 benj-pc kernel: [17639.666747] [drm:psp_resume [amdgpu]] *ERROR* PSP load asd failed!
Jan 16 11:28:42 benj-pc kernel: [17639.666964] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Jan 16 11:28:42 benj-pc kernel: [17639.667151] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
Jan 16 11:28:42 benj-pc kernel: [17639.667270] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667272] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667292] amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) failed
Jan 16 11:28:42 benj-pc kernel: [17639.667309] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667317] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667323] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667327] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667331] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667337] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667344] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667346] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667348] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667351] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667353] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667356] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667359] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667363] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667366] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667369] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667371] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667375] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667377] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667379] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.667381] [drm] Skip scheduling IBs!
Jan 16 11:28:42 benj-pc kernel: [17639.691588] amdgpu 0000:0c:00.0: amdgpu: GPU reset end with ret = -22
Jan 16 11:28:52 benj-pc kernel: [17649.855833] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=62382, emitted seq=62384
Jan 16 11:28:52 benj-pc kernel: [17649.855833] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=41176, emitted seq=41178
Jan 16 11:28:52 benj-pc kernel: [17649.856020] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 16 11:28:52 benj-pc kernel: [17649.856025] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 16 11:28:52 benj-pc kernel: [17649.856173] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 16 11:28:52 benj-pc kernel: [17649.856177] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 16 11:28:52 benj-pc kernel: [17649.856179] amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:9d99, as another already in progress

相关内容