我在 Synology NAS 中有两个 Kingston A400 120GB SSD 作为缓存,它们似乎不支持自动离线数据收集。
# smartctl -d sat -c /dev/sdc | grep -i "Auto Offline data collection"
Auto Offline Data Collection: Disabled.
No Auto Offline data collection support.
# smartctl -d sat -o on /dev/sdc
SMART Automatic Timers not supported
SMART Enable Automatic Offline failed: scsi error aborted command
然而,当我检查标记为“离线”的属性时,其中RAW_VALUE
一个属性不断变化(具体来说246 Total_Erase_Count
),即使我不运行手动离线数据收集或自检。我检查了 smartd 是否正在运行以防万一,但它没有运行。另一个相同的 SSD 也发生了同样的事情。
问题:
- 离线数据收集究竟会更新什么?它只是更新属性表中的 VALUE/WORST/THRESH 列吗?
- 短期或长期自检会更新 SMART 属性数据吗?
输出smartctl -a
:
=== START OF INFORMATION SECTION ===
Model Family: Phison Driven SSDs
Device Model: KINGSTON SA400S37120G
Serial Number: [...]
LU WWN Device Id: [...]
Firmware Version: 03070009
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is: Fri Apr 12 01:55:30 2019 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x35) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x00) Error logging NOT supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 1) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 710
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 65
170 Bad_Blk_Ct_Erl/Lat 0x0000 100 100 010 Old_age Offline - 0/78
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 1
194 Temperature_Celsius 0x0022 024 025 000 Old_age Always - 24 (Min/Max 24/25)
196 Not_In_Use 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 4
231 SSD_Life_Left 0x0000 100 100 000 Old_age Offline - 0
233 Flash_Writes_GiB 0x0032 100 100 000 Old_age Always - 396
241 Lifetime_Writes_GiB 0x0032 100 100 000 Old_age Always - 304
242 Lifetime_Reads_GiB 0x0032 100 100 000 Old_age Always - 228
244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 2
245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 10
246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 3827
SMART Error Log not supported
SMART Self-test Log not supported
Selective Self-tests/Logging not supported
答案1
简短回答:SSD 将内部数据收集和报告封装在复杂的控制器和 FTL 固件后面,因此您在 SMART 级别看到的内容很少是其内部状态的完整表示。不必担心离线测试似乎被禁用,因为很可能控制器运行自己的健全性测试并相应地更新在线和离线属性(除非不这样做 - 一些固件故意破坏 SMART 属性,但这种情况甚至发生在 HDD 上,你对此无能为力)。
长答案: SMART offline data collection
是一种定义不明确的磁盘数据收集方式,原则上,这会降低 IO 性能,因为特定测试/收集无法真正与用户数据 IO 并行运行。因此,出现了“离线”一词 - 磁盘固件可以在离线属性收集期间自由暂停用户 IO。因此,可以完全禁用离线收集,在预定时间向用户明确请求离线收集,或者(如果磁盘支持)使用编程计时器自动运行离线收集。
然而,离线测试从未正式纳入 ATA 标准(尽管存在于其他存储相关标准中),这为(通常未记录的)固件特定行为留下了隐患。
对于我过去 15 年来使用过的任何磁盘,离线测试确实是“在线”测试,在数据收集过程中没有性能下降。与在线测试的唯一区别在于,离线测试是按照特定的固件相关时间表收集的(即每 4 小时一次)。
我发现的唯一例外是关于Offline surface scan
,这是一项特定的离线子测试,它会扫描整个盘片表面(或 NAND 芯片,对于 SSD)以查找缺陷。作为一项如此密集的测试,它会被特别报告,有时可以选择性地启用/禁用。然而,大多数 HDD(和 SSD)报告表面扫描不受支持,而是实施固件和特定型号的扫描。例如,大多数消费级 HDD 根本不进行表面扫描,而企业级磁盘即使 SMART 报告表面扫描已禁用也会自动扫描其表面。SSD 要复杂得多,控制器是必需的定期扫描闪存状态来重写边缘页面,因此表面扫描对它们来说基本没有意义。