问题:在没有出现任何问题运行几个月后,我的一个硬盘在使用时似乎会随机锁定为只读模式。这通常可以通过给硬盘重新上电来解决。
还值得注意的是,有时在空闲时,机器会启动某种扫描,这会发出硬盘驱动器旋转和不连续读取的响亮且重复的声音。
设置:2x6TB 硬盘通过 SATA->USB3 桥连接到 Raspberry Pi(运行 Raspbian)。两个驱动器一起跨入一个 LVM 卷,称为nas-nas
. HDD 的电源连接连接到外部 PSU,因此出现问题的行为并不是由于通过 USB 消耗过多电流造成的。
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 5.5T 0 disk
└─nas-nas 254:0 0 10.9T 0 lvm /mnt/nas
sdb 8:16 0 5.5T 0 disk
└─nas-nas 254:0 0 10.9T 0 lvm /mnt/nas
mmcblk0 179:0 0 29.7G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 29.5G 0 part /
更新:USB 信息。两个驱动器通过 USB3 连接到集线器,然后连接到 RPi 上的 USB3 端口。
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
|__ Port 2: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 1: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
# lsusb
Bus 002 Device 005: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 004: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge
Bus 002 Device 002: ID 05e3:0626 Genesys Logic, Inc. USB3.1 Hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 05e3:0610 Genesys Logic, Inc. Hub
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
到目前为止已排除故障:
- 运行 smartmontools 不会
/dev/sdb
出现错误。 - 运行 smartmontools
/dev/sda/
会导致以下错误:
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 da 08 ff ff ff 4f 00 05:34:37.965 READ DMA EXT
25 da 08 ff ff ff 4f 00 05:34:37.866 READ DMA EXT
25 da 08 ff ff ff 4f 00 05:34:34.622 READ DMA EXT
25 da 08 ff ff ff 4f 00 05:34:34.601 READ DMA EXT
25 da 08 ff ff ff 4f 00 05:34:34.601 READ DMA EXT
上述错误重复多次;列出的寄存器和 LBA 地址始终相同。
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 40% 2726 -
# 2 Conveyance offline Completed without error 00% 2699 -
# 3 Short offline Completed without error 00% 2699 -
鉴于这些错误,我尝试按照所写的说明进行操作这里关于如何纠正坏块错误,但是没有相关条目,/var/log/messages
并且该磁盘上列出的块中没有任何读/写错误:
# export i=268435445
# while [ $i -lt 268435465 ]; do echo $i; dd if=/dev/sda of=/dev/null bs=512 count=1 skip=$i; let i+=1; done
268435445
1+0 records in
1+0 records out
512 bytes copied, 0.23521 s, 2.2 kB/s
268435446
1+0 records in
1+0 records out
512 bytes copied, 0.000614278 s, 833 kB/s
268435447
1+0 records in
1+0 records out
512 bytes copied, 0.000601148 s, 852 kB/s
268435448
1+0 records in
1+0 records out
512 bytes copied, 0.00667811 s, 76.7 kB/s
268435449
1+0 records in
1+0 records out
512 bytes copied, 0.000606686 s, 844 kB/s
268435450
1+0 records in
1+0 records out
512 bytes copied, 0.0005965 s, 858 kB/s
268435451
1+0 records in
1+0 records out
512 bytes copied, 0.000601019 s, 852 kB/s
268435452
1+0 records in
1+0 records out
512 bytes copied, 0.000597833 s, 856 kB/s
268435453
1+0 records in
1+0 records out
512 bytes copied, 0.000597778 s, 857 kB/s
268435454
1+0 records in
1+0 records out
512 bytes copied, 0.000447834 s, 1.1 MB/s
268435455
1+0 records in
1+0 records out
512 bytes copied, 0.000444796 s, 1.2 MB/s
268435456
1+0 records in
1+0 records out
512 bytes copied, 0.000975908 s, 525 kB/s
268435457
1+0 records in
1+0 records out
512 bytes copied, 0.000445574 s, 1.1 MB/s
268435458
1+0 records in
1+0 records out
512 bytes copied, 0.000459315 s, 1.1 MB/s
268435459
1+0 records in
1+0 records out
512 bytes copied, 0.000816092 s, 627 kB/s
268435460
1+0 records in
1+0 records out
512 bytes copied, 0.000470667 s, 1.1 MB/s
268435461
1+0 records in
1+0 records out
512 bytes copied, 0.000437908 s, 1.2 MB/s
268435462
1+0 records in
1+0 records out
512 bytes copied, 0.000448389 s, 1.1 MB/s
268435463
1+0 records in
1+0 records out
512 bytes copied, 0.000474222 s, 1.1 MB/s
268435464
1+0 records in
1+0 records out
512 bytes copied, 0.000862722 s, 593 kB/s
此外,当驱动器锁定为只读状态时,会记录以下内容/var/log/kern.log
:
Jul 23 03:26:58 raspberrypi kernel: [109352.963651] EXT4-fs (dm-0): error count since last fsck: 69
Jul 23 03:26:58 raspberrypi kernel: [109352.963680] EXT4-fs (dm-0): initial error at time 1688029202: __ext4_find_entry:1665: inode 242745346
Jul 23 03:26:58 raspberrypi kernel: [109352.963697] EXT4-fs (dm-0): last error at time 1689762775: __ext4_get_inode_loc_noinmem:4418: inode 242745345: block 1941962784
Jul 23 03:28:32 raspberrypi kernel: [109447.265453] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:29:09 raspberrypi kernel: [109484.133832] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:30:12 raspberrypi kernel: [109547.618426] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:30:47 raspberrypi kernel: [109582.434776] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:31:57 raspberrypi kernel: [109652.071255] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:32:42 raspberrypi kernel: [109697.123879] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:33:17 raspberrypi kernel: [109731.940235] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 03:34:02 raspberrypi kernel: [109776.996672] usb 2-1.3: reset SuperSpeed USB device number 6 using xhci_hcd
Jul 23 05:04:50 raspberrypi kernel: [115225.403375] usb 2-1.1: Disable of device-initiated U1 failed.
Jul 23 05:04:55 raspberrypi kernel: [115230.523410] usb 2-1.1: Disable of device-initiated U2 failed.
安装时/dev/nas/nas
出现如下错误dmesg
:
[ 2801.524342] EXT4-fs (dm-0): warning: mounting fs with errors, running e2fsck is recommended
[ 2801.745209] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
因此,我卸载了驱动器并运行 e2fsck:
# e2fsck /dev/nas/nas
e2fsck 1.46.2 (28-Feb-2021)
/dev/nas/nas contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nas/nas: 10984163/366284800 files (0.4% non-contiguous), 921119327/2930259968 blocks
# e2fsck -p /dev/nas/nas
/dev/nas/nas: clean, 10984163/366284800 files, 921119327/2930259968 blocks
但是,重新安装驱动器仍然会产生以下错误dmesg
:
[ 4471.872335] usb 2-1.2: reset SuperSpeed USB device number 5 using xhci_hcd
[ 4471.894395] sd 1:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=DRIVER_OK cmd_age=30s
[ 4471.894436] sd 1:0:0:0: [sdb] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 9f b7 48 00 00 00 08 00 00
[ 4471.894454] blk_update_request: I/O error, dev sdb, sector 10467144 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 4845.700279] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
所以......在这一点上我有点不知道问题到底是什么是。除了完全更换驱动器之外,我还能做什么来解决这个问题?