运行 Ubuntu 12.04。今天早上,我开始在根驱动器(EDGE™ Boost Pro Plus 7mm SSD 240GB)上收到磁盘故障警告。
查看 SMART 数据可发现,失败的属性是 231 温度,其值为 1C(显然是错误的)。奇怪的是 ID 194 也是温度,而且似乎是正确的(该值在 SMART 数据窗口中也显示为温度)。
这是否可能反映出真正的硬件故障?我应该尝试获得保修更换吗?
如果没有,有没有简单的方法可以让 Ubuntu 忽略 SMART 数据的这个属性?我宁愿不完全禁用 SMART 警告。
编辑:以下是 smartctl 的结果
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.11.0-15-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: EDGE Boost Pro 7mm SSD
Serial Number: ED140408AS1326965
LU WWN Device Id: 0 000120 000000000
Firmware Version: 541ABBF0
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ACS-2 revision 3
Local Time is: Thu Dec 11 09:53:48 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
General SMART Values:
Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 25) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 095 095 050 Old_age Always - 37836672
5 Reallocated_Sector_Ct 0x0033 091 091 003 Pre-fail Always - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 83159156789007
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 35
171 Unknown_Attribute 0x000a 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0030 000 000 000 Old_age Offline - 0
177 Wear_Leveling_Count 0x0000 000 000 000 Old_age Offline - 99
181 Program_Fail_Cnt_Total 0x000a 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0012 099 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0000 029 041 000 Old_age Offline - 29 (Min/Max 18/41)
194 Temperature_Celsius 0x0022 029 041 000 Old_age Always - 29 (Min/Max 18/41)
195 Hardware_ECC_Recovered 0x001c 120 120 000 Old_age Offline - 37836672
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
201 Soft_Read_Error_Rate 0x001c 120 120 000 Old_age Offline - 37836672
204 Soft_ECC_Correction 0x001c 120 120 000 Old_age Offline - 37836672
230 Head_Amplitude 0x0013 100 100 000 Pre-fail Always - 100
231 Temperature_Celsius 0x0013 001 001 010 Pre-fail Always FAILING_NOW 68719476737
233 Media_Wearout_Indicator 0x0032 000 000 000 Old_age Always - 5891
234 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 000 000 000 Old_age Always - 0
242 Total_LBAs_Read 0x0032 000 000 000 Old_age Always - 0
SMART Error Log not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Aborted by host 90% 3853 -
# 2 Extended offline Aborted by host 10% 1990 -
# 3 Conveyance offline Aborted by host 90% 1990 -
# 4 Short offline Aborted by host 90% 1990 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
答案1
SSD 的 SMART 属性 231 不指示任何与温度有关的内容。事实上,如果标准化值小于 2,通常表示 SSD 的使用寿命已结束(对于此属性,值越高越好)。大多数制造商都使用此字段来指示单元的使用时间超过了设计周期。
根据维基百科关于 SMART 的文章我认为这个属性的含义不明确。
231 | 0xE7 | Temperature | Drive Temperature 231 | 0xE7 | SSD Life Left | Indicates the approximate SSD life left, in terms of program/erase cycles or Flash blocks currently available for use.
我认为 Ubuntu 可能存在错误,将该属性显示为 SSD 的温度字段。请尝试使用实时映像中的 14.04/14.10,看看它是否仍显示为温度字段。如果是,请报告错误。
以下是制造商在其文档中使用的内容:
带介质使用的有限保修提供保修期或直到估计寿命指标达到 SMART 属性 231 报告的 1。
SSD 剩余使用寿命 231
[...]
规范化值范围:
100 = 最好 = SSD 仍保持完整使用寿命
1 = 最差 = 剩余闪存块不足以支持 SSD 正常运行