在我们的 RHEL 服务器(RHEL 版本 - 7.2)上,我们看到许多 dmesg 行:
关于 sdb 磁盘(硬盘)的示例
[Thu Dec 30 13:07:48 2021] EXT4-fs (sdb): error count since last fsck: 1329
[Thu Dec 30 13:07:48 2021] EXT4-fs (sdb): initial error at time 1614482941: ext4_find_entry:1312: inode 67240512
[Thu Dec 30 13:07:48 2021] EXT4-fs (sdb): last error at time 1640670898: ext4_find_entry:1312: inode 67240512
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 Sense Key : Medium Error [current]
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 CDB: Read(10) 28 00 80 41 13 38 00 00 08 00
[Thu Dec 30 13:12:19 2021] blk_update_request: critical medium error, dev sdb, sector 2151748408
[Thu Dec 30 13:14:38 2021] EXT4-fs warning (device sdb): __ext4_read_dirblock:902: error reading directory block (ino 67240512, block 0)
[Thu Dec 30 13:17:05 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:21:26 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 Sense Key : Medium Error [current]
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 CDB: Read(10) 28 00 80 41 13 38 00 00 08 00
[Thu Dec 30 13:21:59 2021] blk_update_request: critical medium error, dev sdb, sector 2151748408
[Thu Dec 30 13:21:59 2021] EXT4-fs warning (device sdb): __ext4_read_dirblock:902: error reading directory block (ino 67240512, block 0)
[Thu Dec 30 13:25:32 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:27:19 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:29:14 2021] NOHZ: local_softirq_pending 08
问题是基于上述消息:
是吗?最有可能的原因是硬盘因年老而死亡?
如果是,我们应该做什么 - 更换磁盘?
答案1
“Dying of old Age”意味着驱动器已经过时,我们无法从日志中的信息确定这一点。
不过我假设这是在专业环境中;如果是这样,我认为任何磁盘介质错误都应该触发磁盘更换。 “严重介质错误”消息表明此是磁盘错误,与磁盘和系统之间的故障无关(例如电缆故障)。您问题中的日志仅显示一个失败的扇区,因此它很可能是局部故障,但如果您依赖数据存储,则不值得冒险。
如果只有一个(或几个)故障扇区,您可以尝试重新映射它们以继续使用该驱动器(暂时);看smartctl重新测试坏扇区例如。