Bionic LTS 服务器
我有一台 Ryzen 处理器和 AsRock 主板,二者均可毫无问题地运行 ECC。
syslog
我遇到的问题是Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set
:
root@localhost:/home/one# dmesg | grep edac
[ 4.858773] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 4.858781] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
root@localhost:/home/one# cat /var/log/syslog | grep -i edac
Oct 15 20:50:34 localhost systemd-modules-load[502]: Module 'edac_core' is builtin
Oct 15 20:50:34 localhost systemd[1]: Starting LSB: Initialize EDAC...
Oct 15 20:50:34 localhost edac[832]: * Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set
Oct 15 20:50:34 localhost edac[832]: ...done.
Oct 15 20:50:34 localhost edac[832]: * Loading DIMM labels for Memory Error Detection and Correction edac
Oct 15 20:50:34 localhost kernel: [ 0.156551] EDAC MC: Ver: 3.0.0
Oct 15 20:50:34 localhost kernel: [ 4.858684] EDAC amd64: Node 0: DRAM ECC enabled.
Oct 15 20:50:34 localhost kernel: [ 4.858685] EDAC amd64: F17h detected (node 0).
Oct 15 20:50:34 localhost kernel: [ 4.858719] EDAC MC: UMC0 chip selects:
Oct 15 20:50:34 localhost kernel: [ 4.858720] EDAC amd64: MC: 0: 0MB 1: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858720] EDAC amd64: MC: 2: 0MB 3: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858721] EDAC amd64: MC: 4: 0MB 5: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858721] EDAC amd64: MC: 6: 0MB 7: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858723] EDAC MC: UMC1 chip selects:
Oct 15 20:50:34 localhost kernel: [ 4.858723] EDAC amd64: MC: 0: 0MB 1: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858724] EDAC amd64: MC: 2: 16383MB 3: 16383MB
Oct 15 20:50:34 localhost kernel: [ 4.858725] EDAC amd64: MC: 4: 0MB 5: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858725] EDAC amd64: MC: 6: 0MB 7: 0MB
Oct 15 20:50:34 localhost kernel: [ 4.858725] EDAC amd64: using x8 syndromes.
Oct 15 20:50:34 localhost kernel: [ 4.858726] EDAC amd64: MCT channel count: 1
Oct 15 20:50:34 localhost kernel: [ 4.858773] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
Oct 15 20:50:34 localhost kernel: [ 4.858781] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
Oct 15 20:50:34 localhost kernel: [ 4.858781] AMD64 EDAC driver v3.5.0
Oct 15 20:50:34 localhost edac[832]: ...done.
Oct 15 20:50:34 localhost systemd[1]: Started LSB: Initialize EDAC.
在 /etc/modules 中我放置了edac_core
。我还看到内核中启用了 ECC:
root@localhost:/home/one# cat /usr/src/linux-headers-4.15.0-29-generic/.config | grep -i edac
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
CONFIG_EDAC_PND2=m
root@localhost:/home/one# cat /usr/src/linux-headers-4.15.0-29-generic/.config | grep -i ecc
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_MTD_NAND_ECC=m
# CONFIG_MTD_NAND_ECC_SMC is not set
CONFIG_MTD_NAND_ECC_BCH=y
CONFIG_AMD_XGBE_HAVE_ECC=y
CONFIG_MTD_SPINAND_ONDIEECC=y
是什么原因造成的Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set
?我该如何解决?
更新:edac-utils 的输出
root@localhost:/home/one# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.
答案1
讯息
* 由于未设置 EDAC_DRIVER,因此不启用内存错误检测和纠正这是来自 edac init 脚本(edac-utils 包的一部分)的一条不必要的令人恐惧的消息。它告诉您,它没有手动加载特定的 edac 内核模块,因为变量 $EDAC_DRIVER 未在 /etc/default/edac 中设置。您可以从 init 脚本的相关部分看到这一点:
如果 [ -n “$EDAC_DRIVER” ]; 那么 log_daemon_msg“启用${DESC}”“$SERVICE” modprobe $EDAC_DRIVER 状态=$? 案例 $STATUS 0)log_end_msg 0;; 5) log_failure_msg "此硬件不支持 EDAC"; log_end_msg 1 ;; *) log_failure_msg "失败,退出代码为 $STATUS"; log_end_msg 1 ;; 埃萨克 别的 log_daemon_msg “由于未设置 EDAC_DRIVER,因此未启用 ${DESC}” log_end_msg 0 菲 log_daemon_msg“正在为${DESC}加载DIMM标签” “$SERVICE” $edac_ctl--注册标签--安静
鉴于内核自动确定要应用哪个 edac 驱动程序,并且 $edac_ctl 命令(紧接着检查 $EDAC_DRIVER 是否设置的 if-then-else 块)成功注册了 DIMM 标签,在我看来这里一切都运行正常(但是,坦白说,我对 EDAC 并不特别了解)。