SSD SMART 故障-两个温度值?

SSD SMART 故障-两个温度值?

运行 Ubuntu 12.04。今天早上,我开始在根驱动器(EDGE™ Boost Pro Plus 7mm SSD 240GB)上收到磁盘故障警告。

查看 SMART 数据可发现,失败的属性是 231 温度,其值为 1C(显然是错误的)。奇怪的是 ID 194 也是温度,而且似乎是正确的(该值在 SMART 数据窗口中也显示为温度)。

这是否可能反映出真正的硬件故障?我应该尝试获得保修更换吗?

如果没有,有没有简单的方法可以让 Ubuntu 忽略 SMART 数据的这个属性?我宁愿不完全禁用 SMART 警告。

编辑:以下是 smartctl 的结果

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.11.0-15-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     EDGE Boost Pro 7mm SSD
Serial Number:    ED140408AS1326965
LU WWN Device Id: 0 000120 000000000
Firmware Version: 541ABBF0
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ACS-2 revision 3
Local Time is:    Thu Dec 11 09:53:48 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x05) Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (  25) The self-test routine was aborted by
                    the host.
Total time to complete Offline 
data collection:        (    0) seconds.
Offline data collection
capabilities:            (0x79) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    (  48) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x0025) SCT Status supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   095   095   050    Old_age   Always       -       37836672
  5 Reallocated_Sector_Ct   0x0033   091   091   003    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       83159156789007
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       35
171 Unknown_Attribute       0x000a   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
174 Unknown_Attribute       0x0030   000   000   000    Old_age   Offline      -       0
177 Wear_Leveling_Count     0x0000   000   000   000    Old_age   Offline      -       99
181 Program_Fail_Cnt_Total  0x000a   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0012   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0000   029   041   000    Old_age   Offline      -       29 (Min/Max 18/41)
194 Temperature_Celsius     0x0022   029   041   000    Old_age   Always       -       29 (Min/Max 18/41)
195 Hardware_ECC_Recovered  0x001c   120   120   000    Old_age   Offline      -       37836672
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Soft_Read_Error_Rate    0x001c   120   120   000    Old_age   Offline      -       37836672
204 Soft_ECC_Correction     0x001c   120   120   000    Old_age   Offline      -       37836672
230 Head_Amplitude          0x0013   100   100   000    Pre-fail  Always       -       100
231 Temperature_Celsius     0x0013   001   001   010    Pre-fail  Always   FAILING_NOW 68719476737
233 Media_Wearout_Indicator 0x0032   000   000   000    Old_age   Always       -       5891
234 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   000   000   000    Old_age   Always       -       0
242 Total_LBAs_Read         0x0032   000   000   000    Old_age   Always       -       0

SMART Error Log not supported
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Aborted by host               90%      3853         -
# 2  Extended offline    Aborted by host               10%      1990         -
# 3  Conveyance offline  Aborted by host               90%      1990         -
# 4  Short offline       Aborted by host               90%      1990         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

答案1

SSD 的 SMART 属性 231 不指示任何与温度有关的内容。事实上,如果标准化值小于 2,通常表示 SSD 的使用寿命已结束(对于此属性,值越高越好)。大多数制造商都使用此字段来指示单元的使用时间超过了设计周期。

根据维基百科关于 SMART 的文章我认为这个属性的含义不明确。

231 | 0xE7 | Temperature   | Drive Temperature
231 | 0xE7 | SSD Life Left | Indicates the approximate SSD life left, in terms of
                             program/erase cycles or Flash blocks currently
                             available for use.

我认为 Ubuntu 可能存在错误,将该属性显示为 SSD 的温度字段。请尝试使用实时映像中的 14.04/14.10,看看它是否仍显示为温度字段。如果是,请报告错误

以下是制造商在其文档中使用的内容:

希捷

带介质使用的有限保修提供保修期或直到估计寿命指标达到 SMART 属性 231 报告的 1。

金斯顿

SSD 剩余使用寿命 231

[...]

规范化值范围:

100 = 最好 = SSD 仍保持完整使用寿命
1 = 最差 = 剩余闪存块不足以支持 SSD 正常运行

相关内容