`dd` I/O 读取错误是否总是表明硬件故障?

`dd` I/O 读取错误是否总是表明硬件故障?

我购买了两个 2TB 驱动器。其中一个已使用 HFS+(非日志式)文件系统进行格式化,并使用rsync.我尝试使用基本dd块副本在第二个位置创建备份,但我的操作反复错误地读取驱动器:

root@deb-server:/home/adm_user# dd if=/dev/sdb bs=32M | pv -s 2000G | dd of=/dev/sdc bs=32M
dd: error reading ‘/dev/sdb’: Input/output error              ]  0% ETA 28:24:40
75+1 records in
75+1 records out
2519728128 bytes (2.5 GB) copied2.35GiB 0:02:00 [19.9MiB/s] [>                                ]  0%             
, 120.663 s, 20.9 MB/s
0+36998 records in
0+36998 records out
2519728128 bytes (2.5 GB) copied, 125.599 s, 20.1 MB/s

root@deb-server:/home/adm_user# dd if=/dev/sdb bs=1M | pv -s 2000G | dd of=/dev/sdc bs=1M
dd: error reading ‘/dev/sdb’: Input/output error              ]  0% ETA 26:07:44
10333+1 records in
10333+1 records out
10.1GiB 0:07:57 [21.6MiB/s] [>                                ]  0%             
10835591168 bytes (11 GB) copied, 477.965 s, 22.7 MB/s
0+152209 records in
0+152209 records out
10835591168 bytes (11 GB) copied, 478.852 s, 22.6 MB/s

root@deb-server:/home/adm_user# dd if=/dev/sdb bs=1M | pv -s 2000G | dd of=/dev/sdc bs=1M
dd: error reading ‘/dev/sdb’: Input/output error              ]  0% ETA 25:55:35
13796+1 records in136KiB/s] [>                                ]  0% ETA 25:58:01
13796+1 records out
14466285568 bytes (14 GB) copied13.5GiB 0:10:34 [21.7MiB/s] [>                                ]  0%             
, 634.609 s, 22.8 MB/s
0+202579 records in
0+202579 records out
14466285568 bytes (14 GB) copied, 635.957 s, 22.7 MB/s

root@deb-server:/home/adm_user# dd if=/dev/sdb of=/dev/sdc
dd: error reading ‘/dev/sdb’: Input/output error
186677728+0 records in
186677728+0 records out
95578996736 bytes (96 GB) copied, 13782 s, 6.9 MB/s

root@deb-server:/home/adm_user# dd if=/dev/sdb of=/dev/sdc
dd: error reading ‘/dev/sdb’: Input/output error
167896800+0 records in
167896800+0 records out
85963161600 bytes (86 GB) copied, 12391.2 s, 6.9 MB/s

这些错误永远不会发生在完全相同的位置,这对我来说表明它没有击中磁盘上的坏扇区。正如你所看到的,我尝试过使用更适中的块大小,而不使用pv,这似乎取得了更多进展,但最终仍然会出错。我又进行了几次尝试,这些尝试都在不同的点遇到了相同的错误。

我读过十几个关于类似问题的论坛帖子和 Stack Exchange 帖子,结论似乎总是“ ddI/O 错误 == 磁盘故障”。其他描述此问题的人通常会尝试恢复旧的/已知的坏磁盘,但在这种情况下似乎不太可能出现硬件故障:这些是两个全新的磁盘(HGST 的知名型号)和 USB 外壳。什么磁盘上的文件很可能已损坏:它们是从十几个不同年龄和状况的其他磁盘合并而来的。根据我的理解,文件系统或文件错误与块复制无关(而且,没有分区被克隆到磁盘)。

我知道我可以指示dd继续复制错误,我的下一步是使用 rsync 进行(可能慢得多)文件系统级备份,但首先我想更确定该磁盘是否良好或不。我考虑了一些其他解释,并正在寻找有关如何诊断此错误的指导。其他可能性:

  • 资源有限:Debian 系统有大约 6.5GB 的可用 RAM 和 2.4 GB 的可用磁盘空间,这对我来说似乎足够了。
  • USB 带宽:该系统只有 USB 2.0 端口,两个驱动器都通过这些端口连接(外部供电)。是否会dd因为尝试读取速度超过链接允许的速度而简单地出错?
  • 我还注意到一些缺失的细节,hdparm例如缓存大小。是否可能缺少对磁盘的某些驱动程序支持?

    /dev/sdb:
    
    ATA device, with non-removable media
        Model Number:       Hitachi HUA723020ALA641                 
        Serial Number:      YGHJ32SD            
        Firmware Revision:  MK7OA840
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b
    Standards:
        Used: unknown (minor revision code 0x0029)
        Supported: 8 7 6 5 
        Likely used: 8
    Configuration:
        Logical             max     current
        cylinders   16383   16383
        heads               16      16
        sectors/track       63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  268435455
        LBA48  user addressable sectors: 3907029168
        Logical  Sector size:                   512 bytes
        Physical Sector size:                   512 bytes
        device size with M = 1024*1024:     1907729 MBytes
        device size with M = 1000*1000:     2000398 MBytes (2000 GB)
        cache/buffer size  = unknown
        Form Factor: 3.5 inch
        Nominal Media Rotation Rate: 7200
        [...]
    
  • 编辑:根据我检查过的建议/var/log/messages。它包含几个如下所示的序列。这是否表明 USB 控制器在读取过程中发生崩溃/故障并丢失磁盘?

    Dec 11 10:15:26 deb-server kernel: [409707.840187] usb 2-1.8: USB disconnect, device number 17
    Dec 11 10:15:26 deb-server kernel: [409707.847408] sd 19:0:0:0: [sdb] Unhandled error code
    Dec 11 10:15:26 deb-server kernel: [409707.847412] sd 19:0:0:0: [sdb]  
    Dec 11 10:15:26 deb-server kernel: [409707.847413] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    Dec 11 10:15:26 deb-server kernel: [409707.847414] sd 19:0:0:0: [sdb] CDB: 
    Dec 11 10:15:26 deb-server kernel: [409707.847415] Read(10): 28 00 00 5e 93 00 00 00 f0 00
    Dec 11 10:15:26 deb-server kernel: [409707.847423] quiet_error: 22 callbacks suppressed
    Dec 11 10:15:26 deb-server kernel: [409707.847473] sd 19:0:0:0: [sdb] Unhandled error code
    Dec 11 10:15:26 deb-server kernel: [409707.847474] sd 19:0:0:0: [sdb]  
    Dec 11 10:15:26 deb-server kernel: [409707.847475] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    Dec 11 10:15:26 deb-server kernel: [409707.847476] sd 19:0:0:0: [sdb] CDB: 
    Dec 11 10:15:26 deb-server kernel: [409707.847477] Read(10): 28 00 00 5e 93 f0 00 00 10 00
    Dec 11 10:15:27 deb-server kernel: [409708.303411] usb 2-1.8: new high-speed USB device number 18 using ehci-pci
    Dec 11 10:15:27 deb-server kernel: [409708.396916] usb 2-1.8: New USB device found, idVendor=2537, idProduct=1066
    Dec 11 10:15:27 deb-server kernel: [409708.396921] usb 2-1.8: New USB device strings: Mfr=1, Product=2, SerialNumber=3
    Dec 11 10:15:27 deb-server kernel: [409708.396924] usb 2-1.8: Product: NS1066
    Dec 11 10:15:27 deb-server kernel: [409708.396926] usb 2-1.8: Manufacturer: Norelsys
    Dec 11 10:15:27 deb-server kernel: [409708.396928] usb 2-1.8: SerialNumber: 0123456789ABCDE
    Dec 11 10:15:27 deb-server kernel: [409708.397214] usb-storage 2-1.8:1.0: USB Mass Storage device detected
    Dec 11 10:15:27 deb-server kernel: [409708.397573] scsi20 : usb-storage 2-1.8:1.0
    Dec 11 10:15:27 deb-server kernel: [409708.984090]  sdc: sdc1
    Dec 11 10:15:28 deb-server kernel: [409709.916622] scsi 20:0:0:0: Direct-Access     ATA      Hitachi HUA72302 A840 PQ: 0 ANSI: 6
    Dec 11 10:15:28 deb-server kernel: [409709.916953] sd 20:0:0:0: Attached scsi generic sg2 type 0
    Dec 11 10:15:28 deb-server kernel: [409709.917560] sd 20:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
    Dec 11 10:15:28 deb-server kernel: [409709.918568] sd 20:0:0:0: [sdb] Write Protect is off
    Dec 11 10:15:28 deb-server kernel: [409709.919565] sd 20:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
    Dec 11 10:15:28 deb-server kernel: [409709.927455]  sdb: sdb1
    Dec 11 10:15:28 deb-server kernel: [409709.930559] sd 20:0:0:0: [sdb] Attached SCSI disk
    

答案1

这取决于您所说的硬件故障是什么意思,但是,是的,这是某种硬件问题。

它可能是暂时性的(电源、过热或只是通信错误),也可能是电源、电缆、硬盘(或某些控制器芯片上的真正硬件问题,但这种情况较少)。

停止使用dd并使用rescuedd以避免进一步损坏硬盘,直到排除磁盘问题。

相关内容