从昨天开始,我的 SSD 有时会在 BIOS 中丢失。但今天早上,它完全从启动菜单中消失了。作为故障排除过程的一部分,我更换了 SATA 电缆并重新连接了电源线,在此过程中,有一次它在启动时被识别出来。我顺利进入了操作系统(Mint 18)。
在我备份了我重视的数据后,我进行了一个smartctl
简短的测试,发现了以下结果。有熟悉此事的人可以确认一下这个有问题的SSD是否真的处于放弃的边缘或者有希望修复提到的错误吗?
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.15.0-45-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Crucial/Micron RealSSD m4/C400/P400
Device Model: M4-CT064M4SSD2
Serial Number: 0000000011270313DEA7
LU WWN Device Id: 5 00a075 10313dea7
Firmware Version: 070H
User Capacity: 64,023,257,088 bytes [64.0 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Mar 4 13:22:22 2019 +06
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 295) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 4) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 084 084 050 Pre-fail Always - 17
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 18432
9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 9383
12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 5270
170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 9
171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 483
172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0
173 Wear_Leveling_Count 0x0033 095 095 010 Pre-fail Always - 174
174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 164
181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 1555 506 1049
183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 25
188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0
189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 49
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0
195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 9
197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 1019
202 Perc_Rated_Life_Used 0x0018 095 095 001 Old_age Offline - 5
206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 483
SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 3
ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 0 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 50 08 d0 34 2c 40 at LBA = 0x002c34d0 = 2897104
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 d0 34 2c 40 00 42d+22:22:28.928 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 42d+22:22:28.928 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 42d+22:22:28.928 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 42d+22:22:28.928 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 42d+22:22:28.928 SET FEATURES [Set transfer mode]
Error -1 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 50 08 d0 34 2c 40 at LBA = 0x002c34d0 = 2897104
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 d0 34 2c 40 00 42d+22:22:28.928 READ FPDMA QUEUED
60 00 08 c8 34 2c 40 00 42d+22:22:28.928 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 42d+22:22:28.928 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 42d+22:22:28.928 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 42d+22:22:28.928 IDENTIFY DEVICE
Error -2 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 50 30 d0 34 2c 40 at LBA = 0x002c34d0 = 2897104
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 c8 34 2c 40 00 42d+22:22:28.928 READ FPDMA QUEUED
60 00 90 20 49 2d 40 00 42d+22:22:28.928 READ FPDMA QUEUED
60 03 88 48 19 2d 40 00 42d+22:22:28.928 READ FPDMA QUEUED
60 00 90 b8 38 2d 40 00 42d+22:22:28.928 READ FPDMA QUEUED
60 00 90 e8 30 2d 40 00 42d+22:22:28.928 READ FPDMA QUEUED
Error -3 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 50 80 f8 f8 b4 40 at LBA = 0x00b4f8f8 = 11860216
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 80 40 0b 7c 40 00 42d+21:42:28.928 WRITE FPDMA QUEUED
61 00 40 80 09 7c 40 00 42d+21:42:28.928 WRITE FPDMA QUEUED
61 00 40 00 09 7c 40 00 42d+21:42:28.928 WRITE FPDMA QUEUED
61 00 88 00 f8 b4 40 00 42d+21:42:28.928 WRITE FPDMA QUEUED
61 00 40 00 14 58 40 00 42d+21:42:28.928 WRITE FPDMA QUEUED
Error -4 occurred at disk power-on lifetime: 9380 hours (390 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 50 10 40 0b 7c 40 at LBA = 0x007c0b40 = 8129344
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 10 00 2e 17 40 00 42d+21:32:28.928 WRITE FPDMA QUEUED
61 00 10 88 2d 17 40 00 42d+21:32:28.928 WRITE FPDMA QUEUED
61 00 08 58 2d 17 40 00 42d+21:32:28.928 WRITE FPDMA QUEUED
61 00 10 40 2d 17 40 00 42d+21:32:28.928 WRITE FPDMA QUEUED
61 00 18 20 2d 17 40 00 42d+21:32:28.928 WRITE FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 9383 2719472
# 2 Vendor (0xff) Completed without error 00% 9289 -
# 3 Vendor (0xff) Completed without error 00% 7651 -
# 4 Vendor (0xff) Completed without error 00% 6793 -
# 5 Vendor (0xff) Completed without error 00% 6785 -
# 6 Vendor (0xff) Completed without error 00% 6570 -
# 7 Vendor (0xff) Completed without error 00% 6171 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
附言。也许还值得一提的是,当症状第一次出现时,我陷入了 initramfs 命令提示符处。然后我fschk
在SSD上运行了一次,当时问题似乎已经解决了。
答案1
要回答标题问题,不,SMART 结果本身并不值得担心。尽管您的驱动器确实有一些不可读的扇区,但它会在下次写入时从内部保留中重新分配它们。现在,告诉您到目前为止Reallocated_Event_Count
只有 9 个闪存块(对应于 9 * 2048 = 18432 个扇区,如图所示)已被保留替换。Reallocated_Sector_Ct
如果您不想等到当前无法读取的扇区被正常系统操作重写,您可以dd
使用或等工具手动写入它们hdparm
,但这肯定不适合胆小的人(如果您在设置写入位置时搞砸了,您将丢失一些完全有效的数据)。
但是,您提到的其他症状(例如驱动器在通电时无法识别)可能确实表明电子设备即将耗尽。大多数情况下,这些问题只是由于 PSU 或布线问题造成的,因此请尝试将驱动器插入不同的计算机或更换 PSU。
SMART 测试通常不会告诉您电子设备有任何问题,它们主要测试实际的存储介质,而不是控制器。
答案2
消息
Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed.
和
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 9383 2719472
表明您的驱动器出现故障,并且驱动器本身无法正确读取逻辑块地址 2719472。
您还将在以下位置发现内核消息/var/日志/消息日志类似于这:
如果您想接近 100% 确定,请将驱动器连接到不同的主机并重复智能测试。我遇到过由于主板老化而导致 BIOS 无法识别驱动器的情况,但在另一个系统中却运行良好。