我正在使用 ddrescue 恢复一个快坏掉的硬盘。这个实用程序在硬盘没有问题的部分运行得很好,但在硬盘有问题的部分运行很慢,似乎导致某些内核模块死锁。
首先:我的系统,
$ uname -a
Linux 3.16.2-1-ARCH #1 SMP PREEMPT Sat Sep 6 13:12:51 CEST 2014 x86_64 GNU/Linux
这是正在发生的事情,我目前正处于恢复的第一阶段,使用ddrescue -dn /dev/sdd ddrescue.img ddrescue.log
在我的内核日志中重复出现的是以下日志
[ 1160.113936] end_request: critical target error, dev sdd, sector 520968448
[ 1191.145082] usb 3-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 1191.159792] xhci_hcd 0000:01:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88044919cf00
[ 1191.159797] xhci_hcd 0000:01:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88044919cf48
[ 1222.107631] usb 3-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 1222.122490] xhci_hcd 0000:01:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88044919cf00
[ 1222.122495] xhci_hcd 0000:01:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88044919cf48
[ 1346.337324] sd 17:0:0:0: [sdd] Unhandled error code
[ 1346.337329] sd 17:0:0:0: [sdd]
[ 1346.337332] Result: hostbyte=0x05 driverbyte=0x00
[ 1346.337334] sd 17:0:0:0: [sdd] CDB:
[ 1346.337336] cdb[0]=0x28: 28 00 1f 0d 59 80 00 00 01 00
[ 1346.337345] end_request: I/O error, dev sdd, sector 520968576
[ 1377.408091] usb 3-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 1377.422946] xhci_hcd 0000:01:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88044919cf00
[ 1377.422951] xhci_hcd 0000:01:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff88044919cf48
我推测这是由于内核级发生 I/O 错误造成的——模块最终重置了与设备的连接。(如果我错了,请纠正我)。
这会持续一段时间并且工作正常,直到最终我得到一个看起来像死锁的情况。
[ 4132.846802] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.866845] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.886878] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.906841] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.926928] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.946948] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.966935] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4132.986990] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4133.007033] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
[ 4133.027030] usb-storage: Error in queuecommand_lck: us->srb = ffff880446c78300
^ 这些信息从未停止过
当它使所有相关 io 锁死,并且终止进程不起作用时——我唯一的解决方案是重新启动系统(有时是强制重新启动)——在我看来,这可能会导致我试图恢复的数据出现潜在的数据损坏。我不应该为了恢复这个驱动器而多次重新启动系统。
- 我知道这个驱动器出现故障,但是为什么这个模块最终会死锁?
- 我应该如何报告/修补此错误?
- 是否有某些内核模块我可以重新启动以从此错误中恢复而无需重新启动?(我最好的尝试是强制删除,
uas
这会停止 ddrescue,但我无法再次启动它)
提前感谢