我今年一直在经历这个问题。尝试找出原因并修复,但找不到任何好的解决方案,更新也没有帮助(BIOS 现在是最新的,操作系统是最新的,内核也是最新的之一)。尝试在 Google 上搜索这个问题,尝试解析 MCE,但无法从那里获得任何有用的信息。也许你可以给我一些关于如何修复这个问题的想法。
因此,我拥有的是:
核心:5.4.0-73-generic
操作系统:
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
处理器部分信息:
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 101
model name : AMD A10-9700 RADEON R7, 10 COMPUTE CORES 4C+6G
stepping : 1
microcode : 0x600611a
cpu MHz : 2169.010
cache size : 1024 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 16
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl nonstop_tsc cpuid extd_apicid aperfmp
erf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb bpext ptsc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 6986.87
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro acc_power [13]
dmesg
以下是重启后我在日志中看到的错误:
[ 0.257771] kernel: mce: [Hardware Error]: Machine check events logged
[ 0.257773] kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 2: be0000000012010a
[ 0.257776] kernel: mce: [Hardware Error]: TSC 0 ADDR f780 MISC d01a000100000000
[ 0.257778] kernel: mce: [Hardware Error]: PROCESSOR 2:660f51 TIME 1622490129 SOCKET 0 APIC 0 microcode 600611a
[ 0.257780] kernel: mce: [Hardware Error]: Machine check events logged
[ 0.257781] kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: b200001000020c0f
[ 0.257782] kernel: mce: [Hardware Error]: TSC 0
[ 0.257783] kernel: mce: [Hardware Error]: PROCESSOR 2:660f51 TIME 1622490129 SOCKET 0 APIC 0 microcode 600611a
[ 2.851627] kernel: RAS: Correctable Errors collector initialized.
这种重启非常频繁(每天一次),但电脑上没有任何负载,而且我在那里运行着一个媒体服务器和文件存储,这是不可接受的。
编辑1:
免费-h:
total used free shared buff/cache available
Mem: 7.3Gi 1.2Gi 123Mi 15Mi 6.0Gi 5.8Gi
Swap: 0B 0B 0B
sysctl vm.swappiness:
vm.swappiness = 60
sudo lshw -C 内存:
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: F53
date: 01/05/2021
size: 64KiB
capacity: 16MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int14serial int17printer acpi usb biosbootspecification uefi
*-memory
description: System Memory
physical id: 1d
slot: System board or motherboard
size: 8GiB
*-bank:0
description: [empty]
product: Unknown
vendor: Unknown
physical id: 0
serial: FFFFFFFF
slot: DIMM 0
*-bank:1
description: [empty]
product: Unknown
vendor: Unknown
physical id: 1
serial: FFFFFFFF
slot: DIMM 1
*-bank:2
description: [empty]
product: Unknown
vendor: Unknown
physical id: 2
serial: FFFFFFFF
slot: DIMM 0
*-bank:3
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2132 MHz (0.5 ns)
product: CMK8GX4M1D3000C16
vendor: Unknown
physical id: 3
serial: 00000000
slot: DIMM 1
size: 8GiB
width: 64 bits
clock: 2132MHz (0.5ns)
*-cache:0
description: L1 cache
physical id: 1f
slot: L1 - Cache
size: 320KiB
capacity: 320KiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: 20
slot: L2 - Cache
size: 2MiB
capacity: 2MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=2
memtest86 的结果(无法从 USB 记忆棒下载 HTML 报告,因此添加屏幕): 内存测试_页_1 内存测试_页_2
编辑2:
以下是主板信息(修订版 1.1):
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: A320M-H-CF
Version: x.x
Serial Number: Default string
Asset Tag: Default string
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Default string
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
关于内存安装的更多信息(也许有帮助):
Handle 0x001D, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 128 GB
Error Information Handle: 0x001C
Number Of Devices: 4
Handle 0x0023, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x001D
Error Information Handle: 0x0022
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 0
Bank Locator: CHANNEL A
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Unknown
Serial Number: FFFFFFFF
Asset Tag: Not Specified
Part Number: Unknown
Rank: Unknown
Configured Memory Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Unknown
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: None
Cache Size: None
Logical Size: None
Handle 0x0025, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x001D
Error Information Handle: 0x0024
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 1
Bank Locator: CHANNEL A
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Unknown
Serial Number: FFFFFFFF
Asset Tag: Not Specified
Part Number: Unknown
Rank: Unknown
Configured Memory Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Unknown
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: None
Cache Size: None
Logical Size: None
Handle 0x0027, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x001D
Error Information Handle: 0x0026
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: DIMM 0
Bank Locator: CHANNEL B
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Unknown
Serial Number: FFFFFFFF
Asset Tag: Not Specified
Part Number: Unknown
Rank: Unknown
Configured Memory Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Unknown
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: None
Cache Size: None
Logical Size: None
Handle 0x0029, DMI type 17, 84 bytes
Memory Device
Array Handle: 0x001D
Error Information Handle: 0x0028
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM 1
Bank Locator: CHANNEL B
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2132 MT/s
Manufacturer: Unknown
Serial Number: 00000000
Asset Tag: Not Specified
Part Number: CMK8GX4M1D3000C16
Rank: 1
Configured Memory Speed: 2132 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Unknown
Module Manufacturer ID: Bank 3, Hex 0x9E
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 8 GB
Cache Size: None
Logical Size: None
编辑3:
启用内存测试和交换后,以下是注释中询问的命令的结果:
免费-h
total used free shared buff/cache available
Mem: 7.3Gi 988Mi 3.7Gi 15Mi 2.6Gi 6.1Gi
Swap: 4.0Gi 0B 4.0Gi
swapon-s
Filename Type Size Used Priority
/swapfile file 4194300 0 -2
您能否建议我应该从哪个方向寻找解决方案?
答案1
交换
您有一个交换问题。
total used free shared buff/cache available
Mem: 7.3Gi 1.2Gi 123Mi 15Mi 6.0Gi 5.8Gi
Swap: 0B 0B 0B
笔记:编辑您的问题并向我swapon -s
展示cat /etc/fstab
。
让我们确认/创建一个 4G /swapfile......
笔记:错误使用rm
和dd
命令可能会导致数据丢失。建议复制/粘贴。
在里面terminal
...
sudo swapoff -a # turn off swap
sudo rm -i /swapfile # remove old /swapfile
sudo dd if=/dev/zero of=/swapfile bs=1M count=4096
sudo chmod 600 /swapfile # set proper file protections
sudo mkswap /swapfile # init /swapfile
sudo swapon /swapfile # turn on swap
free -h # confirm 8G RAM and 4G swap
sudo -H gedit /etc/fstab
使用或编辑 /etc/fstab sudo pico /etc/fstab
。
确认 /etc/fstab 中的此 /swapfile 行...并确认没有其他“交换”行...在此行中使用空格...确认没有制表符...
/swapfile none swap sw 0 0
reboot # reboot and verify operation
记忆
虽然你的数据没有表明主板的版本,但我怀疑你的主板是 1.x 版本。请参阅https://gigabyte.com/Motherboard/GA-A320M-H-rev-1x#kf如果您查看 CPU 和内存文档,您会发现您的 8G RAM Corsair 型号 # CMK8GX4M1D3000C16 似乎不在受支持列表中。请参阅https://download.gigabyte.com/FileList/Memory/mb_memory_ga-a320m-h_bristol.pdf
更新#1:
去https://www.memtest86.com/并免费下载/运行它们memtest
来测试你的记忆力。至少完成一次所有 4/4 测试以确认记忆力良好。这可能需要几个小时才能完成。
如果失败,将 DIMM 从插槽 DDR4_B1 移至插槽 DDR4_A1 并重新运行 memtest。
如果再次失败,请更换 DIMM。