我们的备份“解决方案”包括将 USB 驱动器连接到备份服务器,并运行自定义脚本将rsync
数据保存到 USB 驱动器上。但是,过了一段时间,驱动器变为只读。以下是 dmesg 的输出:
[2502923.708171] sdb: sdb1
[2502923.742767] sd 36:0:0:0: [sdb] Attached SCSI disk
[2502980.368020] kjournald starting. Commit interval 5 seconds
[2502980.482705] EXT3 FS on sdb1, internal journal
[2502980.482705] EXT3-fs: recovery complete.
[2502980.488709] EXT3-fs: mounted filesystem with ordered data mode.
[2590744.432168] usb 1-2: USB disconnect, address 36
[2590744.432655] sd 36:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2590744.432784] end_request: I/O error, dev sdb, sector 795108447
[2590744.432857] Buffer I/O error on device sdb1, logical block 99388548
[2590744.432925] lost page write due to I/O error on sdb1
[2590744.433002] Buffer I/O error on device sdb1, logical block 99388549
[2590744.433070] lost page write due to I/O error on sdb1
[2590744.433139] Buffer I/O error on device sdb1, logical block 99388550
[2590744.433207] lost page write due to I/O error on sdb1
[2590744.433275] Buffer I/O error on device sdb1, logical block 99388551
[2590744.433343] lost page write due to I/O error on sdb1
[2590744.433410] Buffer I/O error on device sdb1, logical block 99388552
[2590744.433478] lost page write due to I/O error on sdb1
[2590744.433545] Buffer I/O error on device sdb1, logical block 99388553
[2590744.433613] lost page write due to I/O error on sdb1
[2590744.433681] Buffer I/O error on device sdb1, logical block 99388554
[2590744.433749] lost page write due to I/O error on sdb1
[2590744.433817] Buffer I/O error on device sdb1, logical block 99388555
[2590744.433884] lost page write due to I/O error on sdb1
[2590744.433953] Buffer I/O error on device sdb1, logical block 99388556
[2590744.434021] lost page write due to I/O error on sdb1
[2590744.434089] Buffer I/O error on device sdb1, logical block 99388557
[2590744.434157] lost page write due to I/O error on sdb1
[2590744.443942] sd 36:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2590744.447945] end_request: I/O error, dev sdb, sector 795108687
[2590744.452065] Aborting journal on device sdb1.
[2590744.452065] __journal_remove_journal_head: freeing b_committed_data
[2590744.452410] EXT3-fs error (device sdb1) in ext3_ordered_writepage: IO failure
[2590744.453795] __journal_remove_journal_head: freeing b_committed_data
[2590744.454481] ext3_abort called.
[2590744.454548] EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal
[2590744.454697] Remounting filesystem read-only
[2590744.457033] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #11968705 offset 0
[2590776.909451] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #122881 offset 0
[2590777.637030] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #30015490 offset 0
[2590949.026134] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591121.070802] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591211.109072] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591300.269439] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591357.322837] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591418.664452] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591572.792037] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591667.952082] EXT3-fs error (device sdb1): ext3_find_entry: reading directory #2 offset 0
[2591669.639597] __ratelimit: 3981 messages suppressed
[2591669.639658] Buffer I/O error on device sdb1, logical block 61014530
[2591669.639698] lost page write due to I/O error on sdb1
我没有在脚本中卸载驱动器;有人能指出导致这种情况的原因吗,以便我可以修复它?
答案1
当我使用固定磁盘时发生这种情况时,这意味着磁盘即将报废。这里很可能就是这种情况。如果这是一个反复连接/断开/在不同位置之间传输的备份驱动器,则很可能是冲击或反复的热变化导致了缺陷。大多数这些 USB 驱动器没有专门针对跌落/冲击或热变化的保护,它们只是 USB 转 SATA 塑料外壳中的标准 SATA 驱动器。
我对磁盘的经验法则(尤其是在备份方面)是:如果有疑问,就将其扔掉。
为了排除 USB 基础设施,您可以在另一台计算机上广泛运行该磁盘,但这并不能实际上解决您的问题,因为您仍然必须备份计算机。
答案2
更多信息请参考上述 David Mackintosh(他的回答非常好)。文件系统本身可以选择在遇到错误时告诉内核以只读方式重新挂载它。
来自 mount(8) 手册页:
错误=继续/错误=remount-ro/错误=恐慌
定义遇到错误时的行为。(忽略错误并将文件系统标记为错误并继续,或以只读方式重新挂载文件系统,或崩溃并停止系统。)默认值在文件系统超级块中设置,可以使用 tune2fs(8) 进行更改。
我保证,如果您没有使用 errors=remount-ro 进行挂载,那么文件系统就会将该选项设置为(下面是我的 dumpe2fs 中的示例)
# dumpe2fs /dev/md0 | grep Error
dumpe2fs 1.41.3 (12-Oct-2008)
Errors behavior: Continue
您可以通过运行 smartctl 来找出 SMART 认为驱动器出了什么问题
smartctl -a /dev/<your drive>
我同意 David 的观点,认真考虑更换驱动器。没有什么比不得不恢复所有数据却发现无法读取更糟糕的了。