如何证明我的USB硬盘确实有缺陷？

2024-6-8 • tag-icon

几年前，我买了一台500GB Essentiel B 外置 USB 硬盘来自 Boulanger（法国的一家多媒体商店）。从那时起，我发生了 2 次大约 1x / 2 年的故障，似乎丢失了文件系统（目前NTFS）并且需要重新格式化，之后一切都很好，直到下一次失败......

一般症状：硬盘和其他媒体之间的复制问题 => 我没有注意到错误消息，因为我已经很长时间没有接触磁盘了，但它一定是类似“输入/输出错误”的内容

这是旧 FS 在重新格式化之前给我的内容：

user@host:~$ ls -al /media/user/USER-EHD
ls: impossible d'accéder à '/media/user/USER-EHD/anniv_fany.avi': Erreur d'entrée/sortie
total 15841832
drwxrwxrwx  1 user  user        4096 oct.   3 21:14 .
drwx---rwx+ 6 root root       4096 mars  12 19:11 ..
-?????????? ? ?    ?             ?              ? anniv_fany.avi
drwxrwxrwx  1 user  user        4096 août  23  2020 xen_build
-rwxrwxrwx  1 user  user  1569481183 août   7  2016 踔ororite.avi
drwxrwxrwx  1 user  user        4096 août  23  2020 $RECYCLE.BIN
drwxrwxrwx  1 user  user        4096 août  21  2020 System Volume Information

2个奇怪的事情：

anniv_fany.avi：没有元数据？？？ + 输入/输出错误
踔ororite.avi ：实际上被命名为 sororite.avi

和

user@host:~$ lsblk -f /dev/sdg
NAME   FSTYPE LABEL    UUID                                 MOUNTPOINT
sdg                                                        
└─sdg1 ntfs   USER-EHD 67AC02BC429C25D2             /media/user/USER-EHD

SMART 验证 => 看起来没问题（见下文）

注意：我知道 SMART 并不是 100% 可靠（请注意，了解 HDD 的真实运行状况是没有用的）

user@host:~$ sudo smartctl -a -d sat -t long /dev/sdg
# so ... 2 hour later
user@host:~$ sudo smartctl -l selftest -d sat /dev/sdg
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-148-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       586         -
# 2  Extended offline    Completed without error       00%       500         -
# 3  Extended offline    Aborted by host               90%       499         -
# 4  Short offline       Completed without error       00%       499         -
# 5  Extended offline    Completed: read failure       00%       489         769438584
# 6  Extended offline    Completed: read failure       00%       488         769438584
# 7  Extended offline    Aborted by host               90%       486         -
# 8  Short offline       Completed without error       00%       486         -
# 9  Short offline       Completed without error       00%       112         -
2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 1

=> 就可以了：2 个旧的读取错误已经过时，并且 smartctl 认为长时间测试进展顺利！

详细信息如下：

user@host:~$ sudo smartctl -a -d sat /dev/sdg
[sudo] Mot de passe de user : 
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-148-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 2.5" HDD MQ01ABF...
Device Model:     TOSHIBA MQ01ABF050
Serial Number:    67ODTDZST
LU WWN Device Id: 5 000039 7d160c9a5
Firmware Version: AM001U
User Capacity:    500 107 862 016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Mar 13 16:30:13 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  120) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 115) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       2054
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       491
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       588
 10 Spin_Retry_Count        0x0033   109   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       355
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       123
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       6041
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       30 (Min/Max 14/49)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   100   100   000    Old_age   Always       -       132
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       262
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 492 hours (20 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 78 b3 dc 4d  Error: UNC 8 sectors at LBA = 0x0ddcb378 = 232567672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 78 b3 dc 40 00      01:08:22.969  READ DMA EXT
  ef 03 45 78 b3 dc 00 00      01:08:22.969  SET FEATURES [Set transfer mode]
  ef 03 0c 78 b3 dc 00 00      01:08:22.969  SET FEATURES [Set transfer mode]
  ec 03 08 78 b3 dc 00 00      01:08:22.968  IDENTIFY DEVICE
  ff ff ff ff ff ff ff 0c      01:08:22.967  [VENDOR SPECIFIC]

Error 11 occurred at disk power-on lifetime: 492 hours (20 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 78 b3 dc 40  Error: UNC at LBA = 0x00dcb378 = 14463864

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 78 b3 dc 40 00      01:08:19.211  READ FPDMA QUEUED
  25 03 08 78 b3 dc 40 00      01:08:15.457  READ DMA EXT
  ef 03 45 68 b3 dc 00 00      01:08:15.457  SET FEATURES [Set transfer mode]
  ef 03 0c 68 b3 dc 00 00      01:08:15.457  SET FEATURES [Set transfer mode]
  ec 03 08 68 b3 dc 00 00      01:08:15.456  IDENTIFY DEVICE

Error 10 occurred at disk power-on lifetime: 492 hours (20 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 78 b3 dc 4d  Error: UNC 8 sectors at LBA = 0x0ddcb378 = 232567672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 08 78 b3 dc 40 00      01:08:15.457  READ DMA EXT
  ef 03 45 68 b3 dc 00 00      01:08:15.457  SET FEATURES [Set transfer mode]
  ef 03 0c 68 b3 dc 00 00      01:08:15.457  SET FEATURES [Set transfer mode]
  ec 03 08 68 b3 dc 00 00      01:08:15.456  IDENTIFY DEVICE
  ff ff ff ff ff ff ff 0c      01:08:15.455  [VENDOR SPECIFIC]

Error 9 occurred at disk power-on lifetime: 492 hours (20 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 78 b3 dc 40  Error: UNC at LBA = 0x00dcb378 = 14463864

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 78 b3 dc 40 00      01:08:11.708  READ FPDMA QUEUED
  60 08 00 70 b3 dc 40 00      01:08:11.686  READ FPDMA QUEUED
  25 03 08 68 b3 dc 40 00      01:08:11.668  READ DMA EXT
  ef 03 45 00 b3 dc 00 00      01:08:11.668  SET FEATURES [Set transfer mode]
  ef 03 0c 00 b3 dc 00 00      01:08:11.667  SET FEATURES [Set transfer mode]

Error 8 occurred at disk power-on lifetime: 492 hours (20 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 08 78 b3 dc 40  Error: UNC at LBA = 0x00dcb378 = 14463864

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 e8 b3 dc 40 00      01:08:07.918  READ FPDMA QUEUED
  60 80 08 68 b3 dc 40 00      01:08:07.918  READ FPDMA QUEUED
  60 40 00 28 b3 dc 40 00      01:08:07.918  READ FPDMA QUEUED
  60 20 00 08 b3 dc 40 00      01:08:07.907  READ FPDMA QUEUED
  25 03 08 00 b3 dc 40 00      01:08:07.890  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       586         -
# 2  Extended offline    Completed without error       00%       500         -
# 3  Extended offline    Aborted by host               90%       499         -
# 4  Short offline       Completed without error       00%       499         -
# 5  Extended offline    Completed: read failure       00%       489         769438584
# 6  Extended offline    Completed: read failure       00%       488         769438584
# 7  Extended offline    Aborted by host               90%       486         -
# 8  Short offline       Completed without error       00%       486         -
# 9  Short offline       Completed without error       00%       112         -
2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

注意：开机时间是 588

根据这个链接，属性当前待定部门count 表示当前不稳定扇区的数量：

user@host:~$ sudo smartctl -a -d sat /dev/sdg | grep "Current_Pending_Sector"
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0

=> 不会有！

最后 ...

user@host:~$ date ; badblocks -svn /dev/sdg ; date
samedi 13 mars 2021, 16:31:58 (UTC+0100)
Vérification des blocs défectueux dans un mode non destructif de lecture-
écriture
Du bloc 0 au bloc 488386583
Vérification des blocs défectueux (test non destructif de lecture-écriture)
Test en cours avec un motif aléatoire : complété                                             
Passe complétée, 0 blocs défectueux repérés. (0/0/0 erreurs)
dimanche 14 mars 2021, 09:48:53 (UTC+0100)

=> 大约17小时后，坏块宣布没有错误！

所以我问自己，我如何知道我的驱动器是否真的是 HS 并且适合回收？

有人有这方面的专业知识吗？

我去，

我尝试使用长格式（用 0 填充）磁盘（Gnome 磁盘实用程序）但在开始时我看到错误“错误擦除设备：将 1048576 字节写入 /dev/sdg 时出错：输入/输出错误（udisks-error-quark，0）”。所以我尝试格式化gparted并直接得到“从/ dev / sdg读取I / O错误”（法语到英语翻译），当我尝试添加msdos分区表时，我添加了“同步错误/关闭/ dev / sdg：I / O错误目标主机”（法语到英语翻译）然后磁盘从 gparted 中完全消失......

作为智能控制报告一切都很好（正如你向我保证的那样）以及坏块，我认为问题不直接出在它和连接到 PC 的连接器之间的驱动器上。所以我拆开了外部驱动器外壳（步骤与这个）并使用 SATA-USB 扩展坞测试了 SATA 驱动器。我做了相同的测试并看到了相同的结果。

我推断问题出在设备本身，我问自己是否没有低级实用程序来格式化磁盘并清理以获得出厂版本。有一次，我放弃了，并且已经可以看到自己将唱片回收，突然之间，因为我没有什么可失去的，破碎的破碎，我想尝试一切可能的方法，我发现这个话题这是相关的高清参数命令 ...

我研究了表面的话题1和2。这个网站允许我找到我的硬盘驱动器密码（随后存储在 PASS 变量中，对于东芝磁盘由32个车位- 是的，我也很惊讶）这根据制造商的不同而不同（根据他的进行调整）。从这些解释中，我在没有真正理解的情况下愚蠢地应用了似乎有效的行：

user@host:~$ PASS=$( printf %32s )
user@host:~$ sudo hdparm -I /dev/sdg
...
Security: 
    Master password revision code = 65534
        supported
        enabled
        locked
    not frozen
    not expired: security count
        supported: enhanced erase
    Security level high
    108min for SECURITY ERASE UNIT. 108min for ENHANCED SECURITY ERASE UNIT. 
...

user@host:~$ sudo hdparm --user-master u --security-set-pass "$PASS" /dev/sdg
security_password: "                                "

/dev/sdg:
 Issuing SECURITY_SET_PASS command, password="                                ", user=user, mode=high

user@host:~$ sudo hdparm -I /dev/sdg
...
Security: 
    Master password revision code = 65534
        supported
        enabled
        locked
    not frozen
    not expired: security count
        supported: enhanced erase
    Security level high
    108min for SECURITY ERASE UNIT. 108min for ENHANCED SECURITY ERASE UNIT. 
...

user@host:~$ date ; sudo hdparm --user-master u --security-erase "$PASS" /dev/sdg ; date
mardi 16 mars 2021, 19:22:49 (UTC+0100)
security_password: "                                "

/dev/sdg:
 Issuing SECURITY_ERASE command, password="                                ", user=user
mardi 16 mars 2021, 20:50:47 (UTC+0100)

user@host:~$ sudo hdparm -I /dev/sdg
...
Security: 
    Master password revision code = 65534
        supported
    not enabled
    not locked
    not frozen
    not expired: security count
        supported: enhanced erase
    108min for SECURITY ERASE UNIT. 108min for ENHANCED SECURITY ERASE UNIT. 
...

# enable -> not enabled AND locked -> not locked

user@host:~$ sudo hdparm --user-master m --security-unlock "$PASS" /dev/sdg
security_password: "                                "

/dev/sdg:
 Issuing SECURITY_UNLOCK command, password="                                ", user=master


user@host:~$ sudo hdparm --user-master m --security-disable "$PASS" /dev/sdg
security_password: "                                "

/dev/sdg:
 Issuing SECURITY_DISABLE command, password="                                ", user=master

我惊喜地发现这些命令解锁了一些东西（但我不知道是什么）：

测试 gparted 添加 MSDos 分区表 + NTFS 格式化 => OK
测试 Gnome 磁盘（长格式）=> OK
在 Windows 和 Linux 上测试添加/删除文件/文件夹 => OK

警告：我不太确定高清参数（如果该命令可以使硬盘物理上无法访问，有点像变砖的智能手机）。无论如何，使用上述命令，即使一开始我使用了错误的密码（我使用了给出的虚拟密码：“llformat”），某些内容已更改或已解锁（我不能说是什么）。所以要挖...

所以一切似乎都恢复正常了！

有谁知道什么高清参数改变以使其再次工作？

答案1

从 SMART 值来看，该驱动器看起来非常好（值标准化为 100，越低越差）。零原始读取错误，没有重新分配的扇区。

因此，无论您两次遇到问题期间发生了什么，都可能是其他原因。从您的目录列表来看，数据似乎已写入，但数据不正确。可能是该扇区所在的 RAM 损坏，或者 USB 传输中断，或者固件中出现一些有趣的竞争状况，或者完全不同的情况。

请注意，这并不能解释有时发生的错误 LBA - 我本来预计该扇区会被重新分配，但也许只是在硬盘重新格式化时覆盖它来修复它。

如果您在 Linux 上使用 NTFS，我的钱就花在了 NTFS 实现中的一个错误上——它在很长一段时间内都是只读的，所以显然实现正确写入 NTFS 并不是一件小事。

因此，在这种情况下，除非您出于某种原因需要它是 NTFS，否则我会将格式更改为 ext4 等，然后查看错误是否仍然发生。

LBA的意思是“逻辑块地址”。有两个离线测试停止在具有相同 LBA 的坏扇区：

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
...
# 5  Extended offline    Completed: read failure       00%       489         769438584
# 6  Extended offline    Completed: read failure       00%       488         769438584

因此无法读取该特定扇区。但是，它没有被重新分配（重新分配计数为零），因此我假设它是在重新格式化期间写入的，这解决了读取失败的问题。

拆卸驱动器并使用不同的 SATA-USB 站进行测试是一个很好的测试；这意味着问题不在于内部 SATA-USB 适配器。

ATA SECURE ERASE 实际上不会执行低级格式化，它只会写入每个块。据我所知，没有办法对现代驱动器进行低级格式化。

dmesg当/var/log/syslog您尝试读写磁盘但失败时，查看实际的错误消息会很有趣。如果这是我的驱动器，我的下一步将是使用或直接读取和写入块的dd工具。sg3_utils特别是，我尝试编写一个无法读取的块，以查看它是否被重新分配。

但如果您已经开始安全擦除，我想现在就太晚了。请确保在安全擦除完成后再次检查 SMART 值。

答案1

相关内容