ECC - EDAC_DRIVER 未设置消息

ECC - EDAC_DRIVER 未设置消息

Bionic LTS 服务器

我有一台 Ryzen 处理器和 AsRock 主板,二者均可毫无问题地运行 ECC。

syslog我遇到的问题是Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set

root@localhost:/home/one# dmesg | grep edac
[    4.858773] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[    4.858781] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
root@localhost:/home/one# cat /var/log/syslog | grep -i edac
Oct 15 20:50:34 localhost systemd-modules-load[502]: Module 'edac_core' is builtin
Oct 15 20:50:34 localhost systemd[1]: Starting LSB: Initialize EDAC...
Oct 15 20:50:34 localhost edac[832]:  * Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set
Oct 15 20:50:34 localhost edac[832]:    ...done.
Oct 15 20:50:34 localhost edac[832]:  * Loading DIMM labels for Memory Error Detection and Correction edac
Oct 15 20:50:34 localhost kernel: [    0.156551] EDAC MC: Ver: 3.0.0
Oct 15 20:50:34 localhost kernel: [    4.858684] EDAC amd64: Node 0: DRAM ECC enabled.
Oct 15 20:50:34 localhost kernel: [    4.858685] EDAC amd64: F17h detected (node 0).
Oct 15 20:50:34 localhost kernel: [    4.858719] EDAC MC: UMC0 chip selects:
Oct 15 20:50:34 localhost kernel: [    4.858720] EDAC amd64: MC: 0:     0MB 1:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858720] EDAC amd64: MC: 2:     0MB 3:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858721] EDAC amd64: MC: 4:     0MB 5:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858721] EDAC amd64: MC: 6:     0MB 7:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858723] EDAC MC: UMC1 chip selects:
Oct 15 20:50:34 localhost kernel: [    4.858723] EDAC amd64: MC: 0:     0MB 1:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858724] EDAC amd64: MC: 2: 16383MB 3: 16383MB
Oct 15 20:50:34 localhost kernel: [    4.858725] EDAC amd64: MC: 4:     0MB 5:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858725] EDAC amd64: MC: 6:     0MB 7:     0MB
Oct 15 20:50:34 localhost kernel: [    4.858725] EDAC amd64: using x8 syndromes.
Oct 15 20:50:34 localhost kernel: [    4.858726] EDAC amd64: MCT channel count: 1
Oct 15 20:50:34 localhost kernel: [    4.858773] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
Oct 15 20:50:34 localhost kernel: [    4.858781] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
Oct 15 20:50:34 localhost kernel: [    4.858781] AMD64 EDAC driver v3.5.0
Oct 15 20:50:34 localhost edac[832]:    ...done.
Oct 15 20:50:34 localhost systemd[1]: Started LSB: Initialize EDAC.

在 /etc/modules 中我放置了edac_core。我还看到内核中启用了 ECC:

root@localhost:/home/one# cat /usr/src/linux-headers-4.15.0-29-generic/.config | grep -i edac
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
CONFIG_EDAC_PND2=m
root@localhost:/home/one# cat /usr/src/linux-headers-4.15.0-29-generic/.config | grep -i ecc
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_MTD_NAND_ECC=m
# CONFIG_MTD_NAND_ECC_SMC is not set
CONFIG_MTD_NAND_ECC_BCH=y
CONFIG_AMD_XGBE_HAVE_ECC=y
CONFIG_MTD_SPINAND_ONDIEECC=y

是什么原因造成的Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set?我该如何解决?

更新:edac-utils 的输出

root@localhost:/home/one# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.

答案1

讯息

* 由于未设置 EDAC_DRIVER,因此不启用内存错误检测和纠正
这是来自 edac init 脚本(edac-utils 包的一部分)的一条不必要的令人恐惧的消息。它告诉您,它没有手动加载特定的 edac 内核模块,因为变量 $EDAC_DRIVER 未在 /etc/default/edac 中设置。您可以从 init 脚本的相关部分看到这一点:

   如果 [ -n “$EDAC_DRIVER” ]; 那么
     log_daemon_msg“启用${DESC}”“$SERVICE”
     modprobe $EDAC_DRIVER
     状态=$?
     案例 $STATUS
       0)log_end_msg 0;;
       5) log_failure_msg "此硬件不支持 EDAC"; log_end_msg 1 ;;
       *) log_failure_msg "失败,退出代码为 $STATUS"; log_end_msg 1 ;;
     埃萨克
   别的
      log_daemon_msg “由于未设置 EDAC_DRIVER,因此未启用 ${DESC}”
      log_end_msg 0
   log_daemon_msg“正在为${DESC}加载DIMM标签” “$SERVICE”
   $edac_ctl--注册标签--安静

鉴于内核自动确定要应用哪个 edac 驱动程序,并且 $edac_ctl 命令(紧接着检查 $EDAC_DRIVER 是否设置的 if-then-else 块)成功注册了 DIMM 标签,在我看来这里一切都运行正常(但是,坦白说,我对 EDAC 并不特别了解)。

相关内容