HDD一直锁定为只读；无法隔离磁盘错误的原因

2024-6-20 • tag-icon

问题：在没有出现任何问题运行几个月后，我的一个硬盘在使用时似乎会随机锁定为只读模式。这通常可以通过给硬盘重新上电来解决。

还值得注意的是，有时在空闲时，机器会启动某种扫描，这会发出硬盘驱动器旋转和不连续读取的响亮且重复的声音。

设置：2x6TB 硬盘通过 SATA->USB3 桥连接到 Raspberry Pi（运行 Raspbian）。两个驱动器一起跨入一个 LVM 卷，称为nas-nas. HDD 的电源连接连接到外部 PSU，因此出现问题的行为并不是由于通过 USB 消耗过多电流造成的。

# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0  5.5T  0 disk 
└─nas-nas   254:0    0 10.9T  0 lvm  /mnt/nas
sdb           8:16   0  5.5T  0 disk 
└─nas-nas   254:0    0 10.9T  0 lvm  /mnt/nas
mmcblk0     179:0    0 29.7G  0 disk 
├─mmcblk0p1 179:1    0  256M  0 part /boot
└─mmcblk0p2 179:2    0 29.5G  0 part /

更新：USB 信息。两个驱动器通过 USB3 连接到集线器，然后连接到 RPi 上的 USB3 端口。

# lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 2: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
# lsusb
Bus 002 Device 005: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 004: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 002: ID 05e3:0626 Genesys Logic, Inc. USB3.1 Hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 05e3:0610 Genesys Logic, Inc. Hub
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

到目前为止已排除故障:

运行 smartmontools 不会/dev/sdb出现错误。
运行 smartmontools/dev/sda/会导致以下错误：

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 da 08 ff ff ff 4f 00      05:34:37.965  READ DMA EXT
  25 da 08 ff ff ff 4f 00      05:34:37.866  READ DMA EXT
  25 da 08 ff ff ff 4f 00      05:34:34.622  READ DMA EXT
  25 da 08 ff ff ff 4f 00      05:34:34.601  READ DMA EXT
  25 da 08 ff ff ff 4f 00      05:34:34.601  READ DMA EXT

上述错误重复多次；列出的寄存器和 LBA 地址始终相同。

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       40%      2726         -
# 2  Conveyance offline  Completed without error       00%      2699         -
# 3  Short offline       Completed without error       00%      2699         -

鉴于这些错误，我尝试按照所写的说明进行操作这里关于如何纠正坏块错误，但是没有相关条目，/var/log/messages并且该磁盘上列出的块中没有任何读/写错误：

# export i=268435445
# while [ $i -lt 268435465 ]; do echo $i; dd if=/dev/sda of=/dev/null bs=512 count=1 skip=$i; let i+=1; done
268435445
1+0 records in
1+0 records out
512 bytes copied, 0.23521 s, 2.2 kB/s
268435446
1+0 records in
1+0 records out
512 bytes copied, 0.000614278 s, 833 kB/s
268435447
1+0 records in
1+0 records out
512 bytes copied, 0.000601148 s, 852 kB/s
268435448
1+0 records in
1+0 records out
512 bytes copied, 0.00667811 s, 76.7 kB/s
268435449
1+0 records in
1+0 records out
512 bytes copied, 0.000606686 s, 844 kB/s
268435450
1+0 records in
1+0 records out
512 bytes copied, 0.0005965 s, 858 kB/s
268435451
1+0 records in
1+0 records out
512 bytes copied, 0.000601019 s, 852 kB/s
268435452
1+0 records in
1+0 records out
512 bytes copied, 0.000597833 s, 856 kB/s
268435453
1+0 records in
1+0 records out
512 bytes copied, 0.000597778 s, 857 kB/s
268435454
1+0 records in
1+0 records out
512 bytes copied, 0.000447834 s, 1.1 MB/s
268435455
1+0 records in
1+0 records out
512 bytes copied, 0.000444796 s, 1.2 MB/s
268435456
1+0 records in
1+0 records out
512 bytes copied, 0.000975908 s, 525 kB/s
268435457
1+0 records in
1+0 records out
512 bytes copied, 0.000445574 s, 1.1 MB/s
268435458
1+0 records in
1+0 records out
512 bytes copied, 0.000459315 s, 1.1 MB/s
268435459
1+0 records in
1+0 records out
512 bytes copied, 0.000816092 s, 627 kB/s
268435460
1+0 records in
1+0 records out
512 bytes copied, 0.000470667 s, 1.1 MB/s
268435461
1+0 records in
1+0 records out
512 bytes copied, 0.000437908 s, 1.2 MB/s
268435462
1+0 records in
1+0 records out
512 bytes copied, 0.000448389 s, 1.1 MB/s
268435463
1+0 records in
1+0 records out
512 bytes copied, 0.000474222 s, 1.1 MB/s
268435464
1+0 records in
1+0 records out
512 bytes copied, 0.000862722 s, 593 kB/s

此外，当驱动器锁定为只读状态时，会记录以下内容/var/log/kern.log：

Jul 23 03:26:58 raspberrypi kernel: [109352.963651] EXT4-fs (dm-0): error count since last fsck: 69
Jul 23 03:26:58 raspberrypi kernel: [109352.963680] EXT4-fs (dm-0): initial error at time 1688029202: __ext4_find_entry:1665: inode 242745346
Jul 23 03:26:58 raspberrypi kernel: [109352.963697] EXT4-fs (dm-0): last error at time 1689762775: __ext4_get_inode_loc_noinmem:4418: inode 242745345: block 1941962784
Jul 23 03:28:32 raspberrypi kernel: [109447.265453] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:29:09 raspberrypi kernel: [109484.133832] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:30:12 raspberrypi kernel: [109547.618426] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:30:47 raspberrypi kernel: [109582.434776] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:31:57 raspberrypi kernel: [109652.071255] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:32:42 raspberrypi kernel: [109697.123879] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:33:17 raspberrypi kernel: [109731.940235] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:34:02 raspberrypi kernel: [109776.996672] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 05:04:50 raspberrypi kernel: [115225.403375] usb 2-1.1: Disable of device-initiated U1 failed.
Jul 23 05:04:55 raspberrypi kernel: [115230.523410] usb 2-1.1: Disable of device-initiated U2 failed.

安装时/dev/nas/nas出现如下错误dmesg：

[ 2801.524342] EXT4-fs (dm-0): warning: mounting fs with errors, running e2fsck is recommended
[ 2801.745209] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

因此，我卸载了驱动器并运行 e2fsck：

# e2fsck /dev/nas/nas
e2fsck 1.46.2 (28-Feb-2021)
/dev/nas/nas contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nas/nas: 10984163/366284800 files (0.4% non-contiguous), 921119327/2930259968 blocks
# e2fsck -p /dev/nas/nas
/dev/nas/nas: clean, 10984163/366284800 files, 921119327/2930259968 blocks

但是，重新安装驱动器仍然会产生以下错误dmesg：

[ 4471.872335] usb 2-1.2: reset SuperSpeed USB device number 5 using xhci_hcd
[ 4471.894395] sd 1:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=DRIVER_OK cmd_age=30s
[ 4471.894436] sd 1:0:0:0: [sdb] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 9f b7 48 00 00 00 08 00 00
[ 4471.894454] blk_update_request: I/O error, dev sdb, sector 10467144 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4845.700279] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

所以......在这一点上我有点不知道问题到底是什么是。除了完全更换驱动器之外，我还能做什么来解决这个问题？

相关内容