我的机器上已经运行了 RAID1 设置好几年了,最近阵列性能下降了。查看 mdadm 信息,似乎一个驱动器出现故障,但当我查看 SMART 信息时,其他驱动器出现错误。我不确定该相信哪一个。
sudo mdadm --detail /dev/md0
如果我正确读取了输出,/dev/sda1
则表示已失败,并且/dev/sdb1
仍然在数组中,并且可以信任。
/dev/md0:
Version : 1.2
Creation Time : Sat Jan 5 01:18:40 2013
Raid Level : raid1
Array Size : 2930133824 (2794.39 GiB 3000.46 GB)
Used Dev Size : 2930133824 (2794.39 GiB 3000.46 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Thu Aug 6 20:33:11 2015
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : storm:0 (local to host storm)
UUID : 98b434f9:54d5c413:1acc4033:8ad34365
Events : 8388
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
但是,在两个驱动器上运行简短的 SMART 自检后,/dev/sda
没有出现任何问题,但/dev/sdb
显示了如下内容:
=== START OF INFORMATION SECTION ===
Device Model: ST3000DM001-1CH166
...
Local Time is: Thu Aug 6 20:45:02 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...
SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
...
Error 12 occurred at disk power-on lifetime: 21016 hours (875 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 8d+20:05:45.525 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 8d+20:05:45.525 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 8d+20:05:45.525 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 8d+20:05:45.524 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 8d+20:05:45.524 SET FEATURES [Set transfer mode]
...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 21129 -
# 2 Short offline Completed without error 00% 18418 -
# 3 Extended offline Completed without error 00% 1860 -
# 4 Short offline Completed without error 00% 1855 -
...
完整输出可以在这里找到:http://pastebin.com/jDN0muXk
我是否应该相信 mdadm 说的/dev/sda
不好,并且我应该相信/dev/sdb
,或者我应该相信 SMART 虽然/dev/sdb
有错误,但/dev/sda
仍然状况良好?
答案1
两个都试试!只有真正有数据并且你可以读取的那个才是值得信赖的!
老实说,我认为除非非常严重,否则 SMART 错误不会损害驱动器的信誉。我会使用 /dev/sdb,但尽快更换两个驱动器!