如何防止 smartd 报告找不到磁盘?

如何防止 smartd 报告找不到磁盘?

我有 smartd 监控我的硬盘。一般情况下工作正常,但每隔 24 小时就会弹出以下错误窗口。

This email was generated by the smartd daemon running on:
  host name: sparhawk-XPS-17
  DNS domain: [Unknown]
  NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdc [SAT], unable to open device
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
Another email message will be sent in 24 hours if the problem persists.

没有连接 sdc,但我还是尝试了一下sudo smartctl -a /dev/sdc。结果是

smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.5.0-26-generic] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl open device: /dev/sdc failed: No such device

中唯一未注释掉的行/etc/smartd.conf

DEVICESCAN -m root -M exec /usr/share/smartmontools/smartd-runner

有没有办法让 smartd 正确识别该磁盘的删除,而不抱怨它?如果这是不可能的,那么有没有办法让 smartd 只监视 sda 和 sdb ?

答案1

我也遇到过同样的问题,所以我做了一些研究。我找到了这个:

/etc/smartd.conf

# smartd will re-read the configuration file if it receives a HUP
# signal

# The file gives a list of devices to monitor using smartd, with one
# device per line. Text after a hash (#) is ignored, and you may use
# spaces and tabs for white space. You may use '\' to continue lines.

# You can usually identify which hard disks are on your system by
# looking in /proc/ide and in /proc/scsi.

# The word DEVICESCAN will cause any remaining lines in this
# configuration file to be ignored: it tells smartd to scan for all
# ATA and SCSI devices.  DEVICESCAN may be followed by any of the
# Directives listed below, which will be applied to all devices that
# are found.  Most users should comment out DEVICESCAN and explicitly
# list the devices that they wish to monitor.

我假设通过从内核取消注册驱动器

root@localhost# echo 1 > /sys/block/sdX/device/delete

然后从 /etc/smartd.conf 中删除设备条目,

然后执行“sudo service smartmontools restart”将解决您的问题,并且 smartd 将停止报告丢失的驱动器。

答案2

你可以让 smartd通过显式列出这些设备来仅监控一组特定的设备在 /etc/smartd.conf 中,而不是使用DEVICESCAN关键字。

因此,为了仅监视 /dev/sda 和 /dev/sdb,您需要从 smartd.conf 中删除:

DEVICESCAN -m root -M exec /usr/share/smartmontools/smartd-runner

而不是它,添加:

/dev/sda -m root -M exec /usr/share/smartmontools/smartd-runner
/dev/sdb -m root -M exec /usr/share/smartmontools/smartd-runner

然后重新启动 smartd 守护进程。

此方法的主要缺点是您需要在配置中单独列出每个磁盘。至少在只有两个磁盘的情况下,这并不算太麻烦。

答案3

就我而言,更换失败的 HDD 后,只需删除 csv 并重新启动服务即可:

sudo systemctl stop smartmontools
sudo killall smartd
cd /var/lib/smartmontools
sudo rm attrlog.WDC_WD5000LPLX_00ZNTT0-WD_SERIAL_NUMBER.ata.csv
sudo rm smartd.WDC_WD5000LPLX_00ZNTT0-WD_SERIAL_NUMBER.ata.state
sudo rm smartd.WDC_WD5000LPLX_00ZNTT0-WD_SERIAL_NUMBER.ata.state~
sudo systemctl start smartmontools

我的 smartd.conf

DEVICESCAN -H -l error -l selftest -f -s (O/../.././14|L/../.././15|C/../.././17) -m [email protected] -M exec /usr/share/smartmontools/smartd-runner

我的/etc/smartmontools/run.d/10s-nail

#!/bin/bash -e

# Send mail if /usr/bin/s-nail exists
if ! [ -x /usr/bin/s-nail ]; then
  echo "Your system does not have /usr/bin/s-nail. Install the s-nail package" 
  exit 1
fi

# $1 - body file
# $2 - "-s"
# $3 - subject
# $4 - admin email

/usr/bin/s-nail -q $1 -s "$3" -S smtp=smtp://192.168.1.11 -S from="SERVER_NAME S.M.A.R.Td <[email protected]>" $4

答案4

我每 24 小时都会以电子邮件形式收到相同的错误,因为我从热插拔托架中卸下了驱动器。我所需要做的就是重新启动服务,错误就停止了。

sudo systemctl restart smartmontools

相关内容