MCE 硬件问题导致 AMD A10-9700 RADEON R7 重新启动

MCE 硬件问题导致 AMD A10-9700 RADEON R7 重新启动

我今年一直在经历这个问题。尝试找出原因并修复,但找不到任何好的解决方案,更新也没有帮助(BIOS 现在是最新的,操作系统是最新的,内核也是最新的之一)。尝试在 Google 上搜索这个问题,尝试解析 MCE,但无法从那里获得任何有用的信息。也许你可以给我一些关于如何修复这个问题的想法。

因此,我拥有的是:

核心:5.4.0-73-generic

操作系统:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

处理器部分信息:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 101
model name      : AMD A10-9700 RADEON R7, 10 COMPUTE CORES 4C+6G
stepping        : 1
microcode       : 0x600611a
cpu MHz         : 2169.010
cache size      : 1024 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 16
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl nonstop_tsc cpuid extd_apicid aperfmp
erf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb bpext ptsc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif overflow_recov
bugs            : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 6986.87
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro acc_power [13]

dmesg以下是重启后我在日志中看到的错误:

[    0.257771] kernel: mce: [Hardware Error]: Machine check events logged
[    0.257773] kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 2: be0000000012010a
[    0.257776] kernel: mce: [Hardware Error]: TSC 0 ADDR f780 MISC d01a000100000000 
[    0.257778] kernel: mce: [Hardware Error]: PROCESSOR 2:660f51 TIME 1622490129 SOCKET 0 APIC 0 microcode 600611a
[    0.257780] kernel: mce: [Hardware Error]: Machine check events logged
[    0.257781] kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: b200001000020c0f
[    0.257782] kernel: mce: [Hardware Error]: TSC 0 
[    0.257783] kernel: mce: [Hardware Error]: PROCESSOR 2:660f51 TIME 1622490129 SOCKET 0 APIC 0 microcode 600611a
[    2.851627] kernel: RAS: Correctable Errors collector initialized.

这种重启非常频繁(每天一次),但电脑上没有任何负载,而且我在那里运行着一个媒体服务器和文件存储,这是不可接受的。

编辑1:

免费-h

              total        used        free      shared  buff/cache   available
Mem:          7.3Gi       1.2Gi       123Mi        15Mi       6.0Gi       5.8Gi
Swap:            0B          0B          0B

sysctl vm.swappiness

vm.swappiness = 60

sudo lshw -C 内存

  *-firmware                                                                                                                                                                                                                                  
       description: BIOS                                                                                                                                                                                                                      
       vendor: American Megatrends Inc.                                                                                                                                                                                                       
       physical id: 0                                                                                                                                                                                                                         
       version: F53                                                                                                                                                                                                                           
       date: 01/05/2021                                                                                                                                                                                                                       
       size: 64KiB                                                                                                                                                                                                                            
       capacity: 16MiB                  
       capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int14serial int17printer acpi usb biosbootspecification uefi                                                                        
  *-memory            
       description: System Memory 
       physical id: 1d        
       slot: System board or motherboard
       size: 8GiB           
     *-bank:0             
          description: [empty]                                                                                         
          product: Unknown                                                                                             
          vendor: Unknown                  
          physical id: 0      
          serial: FFFFFFFF                    
          slot: DIMM 0                                     
     *-bank:1                          
          description: [empty]                                                                                                                                                                                                                                                  
          product: Unknown                
          vendor: Unknown             
          physical id: 1                                                                                               
          serial: FFFFFFFF                    
          slot: DIMM 1                                                                                                                                                                                                                        
     *-bank:2                               
          description: [empty]                       
          product: Unknown                          
          vendor: Unknown                  
          physical id: 2                                   
          serial: FFFFFFFF                         
          slot: DIMM 0                                     
     *-bank:3                          
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2132 MHz (0.5 ns)                                                
          product: CMK8GX4M1D3000C16
          vendor: Unknown             
          physical id: 3                                                                                               
          serial: 00000000
          slot: DIMM 1                                                                                                 
          size: 8GiB         
          width: 64 bits               
          clock: 2132MHz (0.5ns)
  *-cache:0           
       description: L1 cache
       physical id: 1f
       slot: L1 - Cache
       size: 320KiB                                                                                                                                                                                                                           
       capacity: 320KiB                                                                                                
       clock: 1GHz (1.0ns)       
       capabilities: pipeline-burst internal write-back unified                                                                         
       configuration: level=1                                       
  *-cache:1                                                         
       description: L2 cache                                        
       physical id: 20                                              
       slot: L2 - Cache                                             
       size: 2MiB                                                   
       capacity: 2MiB                                               
       clock: 1GHz (1.0ns)                                          
       capabilities: pipeline-burst internal write-back unified                                                                         
       configuration: level=2

memtest86 的结果(无法从 USB 记忆棒下载 HTML 报告,因此添加屏幕): 内存测试_页_1 内存测试_页_2

编辑2:

以下是主板信息(修订版 1.1):

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
    Manufacturer: Gigabyte Technology Co., Ltd.
    Product Name: A320M-H-CF
    Version: x.x
    Serial Number: Default string
    Asset Tag: Default string
    Features:
        Board is a hosting board
        Board is replaceable
    Location In Chassis: Default string
    Chassis Handle: 0x0003
    Type: Motherboard
    Contained Object Handles: 0

关于内存安装的更多信息(也许有帮助):

Handle 0x001D, DMI type 16, 23 bytes
Physical Memory Array
    Location: System Board Or Motherboard
    Use: System Memory
    Error Correction Type: None
    Maximum Capacity: 128 GB
    Error Information Handle: 0x001C
    Number Of Devices: 4

Handle 0x0023, DMI type 17, 84 bytes
Memory Device
    Array Handle: 0x001D
    Error Information Handle: 0x0022
    Total Width: Unknown
    Data Width: Unknown
    Size: No Module Installed
    Form Factor: Unknown
    Set: None
    Locator: DIMM 0
    Bank Locator: CHANNEL A
    Type: Unknown
    Type Detail: None
    Speed: Unknown
    Manufacturer: Unknown
    Serial Number: FFFFFFFF
    Asset Tag: Not Specified
    Part Number: Unknown
    Rank: Unknown
    Configured Memory Speed: Unknown
    Minimum Voltage: Unknown
    Maximum Voltage: Unknown
    Configured Voltage: Unknown
    Memory Technology: DRAM
    Memory Operating Mode Capability: Volatile memory
    Firmware Version: Unknown
    Module Manufacturer ID: Unknown
    Module Product ID: Unknown
    Memory Subsystem Controller Manufacturer ID: Unknown
    Memory Subsystem Controller Product ID: Unknown
    Non-Volatile Size: None
    Volatile Size: None
    Cache Size: None
    Logical Size: None

Handle 0x0025, DMI type 17, 84 bytes
Memory Device
    Array Handle: 0x001D
    Error Information Handle: 0x0024
    Total Width: Unknown
    Data Width: Unknown
    Size: No Module Installed
    Form Factor: Unknown
    Set: None
    Locator: DIMM 1
    Bank Locator: CHANNEL A
    Type: Unknown
    Type Detail: None
    Speed: Unknown
    Manufacturer: Unknown
    Serial Number: FFFFFFFF
    Asset Tag: Not Specified
    Part Number: Unknown
    Rank: Unknown
    Configured Memory Speed: Unknown
    Minimum Voltage: Unknown
    Maximum Voltage: Unknown
    Configured Voltage: Unknown
    Memory Technology: DRAM
    Memory Operating Mode Capability: Volatile memory
    Firmware Version: Unknown
    Module Manufacturer ID: Unknown
    Module Product ID: Unknown
    Memory Subsystem Controller Manufacturer ID: Unknown
    Memory Subsystem Controller Product ID: Unknown
    Non-Volatile Size: None
    Volatile Size: None
    Cache Size: None
    Logical Size: None

Handle 0x0027, DMI type 17, 84 bytes
Memory Device
    Array Handle: 0x001D
    Error Information Handle: 0x0026
    Total Width: Unknown
    Data Width: Unknown
    Size: No Module Installed
    Form Factor: Unknown
    Set: None
    Locator: DIMM 0
    Bank Locator: CHANNEL B
    Type: Unknown
    Type Detail: None
    Speed: Unknown
    Manufacturer: Unknown
    Serial Number: FFFFFFFF
    Asset Tag: Not Specified
    Part Number: Unknown
    Rank: Unknown
    Configured Memory Speed: Unknown
    Minimum Voltage: Unknown
    Maximum Voltage: Unknown
    Configured Voltage: Unknown
    Memory Technology: DRAM
    Memory Operating Mode Capability: Volatile memory
    Firmware Version: Unknown
    Module Manufacturer ID: Unknown
    Module Product ID: Unknown
    Memory Subsystem Controller Manufacturer ID: Unknown
    Memory Subsystem Controller Product ID: Unknown
    Non-Volatile Size: None
    Volatile Size: None
    Cache Size: None
    Logical Size: None

Handle 0x0029, DMI type 17, 84 bytes
Memory Device
    Array Handle: 0x001D
    Error Information Handle: 0x0028
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: DIMM
    Set: None
    Locator: DIMM 1
    Bank Locator: CHANNEL B
    Type: DDR4
    Type Detail: Synchronous Unbuffered (Unregistered)
    Speed: 2132 MT/s
    Manufacturer: Unknown
    Serial Number: 00000000
    Asset Tag: Not Specified
    Part Number: CMK8GX4M1D3000C16   
    Rank: 1
    Configured Memory Speed: 2132 MT/s
    Minimum Voltage: 1.2 V
    Maximum Voltage: 1.2 V
    Configured Voltage: 1.2 V
    Memory Technology: DRAM
    Memory Operating Mode Capability: Volatile memory
    Firmware Version: Unknown
    Module Manufacturer ID: Bank 3, Hex 0x9E
    Module Product ID: Unknown
    Memory Subsystem Controller Manufacturer ID: Unknown
    Memory Subsystem Controller Product ID: Unknown
    Non-Volatile Size: None
    Volatile Size: 8 GB
    Cache Size: None
    Logical Size: None

编辑3:

启用内存测试和交换后,以下是注释中询问的命令的结果:

免费-h

              total        used        free      shared  buff/cache   available
Mem:          7.3Gi       988Mi       3.7Gi        15Mi       2.6Gi       6.1Gi
Swap:         4.0Gi          0B       4.0Gi

swapon-s

Filename                Type        Size    Used    Priority
/swapfile                               file        4194300 0   -2

您能否建议我应该从哪个方向寻找解决方案?

答案1

交换

您有一个交换问题。

              total        used        free      shared  buff/cache   available
Mem:          7.3Gi       1.2Gi       123Mi        15Mi       6.0Gi       5.8Gi
Swap:            0B          0B          0B

笔记:编辑您的问题并向我swapon -s展示cat /etc/fstab

让我们确认/创建一个 4G /swapfile......

笔记:错误使用rmdd命令可能会导致数据丢失。建议复制/粘贴。

在里面terminal...

sudo swapoff -a           # turn off swap
sudo rm -i /swapfile      # remove old /swapfile

sudo dd if=/dev/zero of=/swapfile bs=1M count=4096

sudo chmod 600 /swapfile  # set proper file protections
sudo mkswap /swapfile     # init /swapfile
sudo swapon /swapfile     # turn on swap
free -h                   # confirm 8G RAM and 4G swap

sudo -H gedit /etc/fstab使用或编辑 /etc/fstab sudo pico /etc/fstab

确认 /etc/fstab 中的此 /swapfile 行...并确认没有其他“交换”行...在此行中使用空格...确认没有制表符...

/swapfile  none  swap  sw  0  0

reboot                    # reboot and verify operation

记忆

虽然你的数据没有表明主板的版本,但我怀疑你的主板是 1.x 版本。请参阅https://gigabyte.com/Motherboard/GA-A320M-H-rev-1x#kf如果您查看 CPU 和内存文档,您会发现您的 8G RAM Corsair 型号 # CMK8GX4M1D3000C16 似乎不在受支持列表中。请参阅https://download.gigabyte.com/FileList/Memory/mb_memory_ga-a320m-h_bristol.pdf

更新#1:

https://www.memtest86.com/并免费下载/运行它们memtest来测试你的记忆力。至少完成一次所有 4/4 测试以确认记忆力良好。这可能需要几个小时才能完成。

如果失败,将 DIMM 从插槽 DDR4_B1 移至插槽 DDR4_A1 并重新运行 memtest。

如果再次失败,请更换 DIMM。

相关内容