如何在 Linux 中检查硬盘是否有坏块

2024-5-30 • tag-icon

两天以来，一个 mysql 服务器无法启动。我每 5 秒在系统日志中收到一次这样的信息：

Dec 17 09:24:35 backup kernel: [  681.132013] ata2.00: exception Emask 0x0 SAct 0x50000 SErr 0x0 action 0x0
Dec 17 09:24:35 backup kernel: [  681.132046] ata2.00: irq_stat 0x40000008
Dec 17 09:24:35 backup kernel: [  681.132071] ata2.00: failed command: READ FPDMA QUEUED
Dec 17 09:24:35 backup kernel: [  681.132105] ata2.00: cmd 60/20:80:00:e6:4d/00:00:78:00:00/40 tag 16 ncq 16384 in
Dec 17 09:24:35 backup kernel: [  681.132105]          res 41/40:20:00:e6:4d/00:00:78:00:00/00 Emask 0x409 (media error) <F>
Dec 17 09:24:35 backup kernel: [  681.132167] ata2.00: status: { DRDY ERR }
Dec 17 09:24:35 backup kernel: [  681.132183] ata2.00: error: { UNC }
Dec 17 09:24:35 backup kernel: [  681.165698] ata2.00: configured for UDMA/133
Dec 17 09:24:35 backup kernel: [  681.165714] sd 1:0:0:0: [sdb] Unhandled sense code
Dec 17 09:24:35 backup kernel: [  681.165717] sd 1:0:0:0: [sdb]
Dec 17 09:24:35 backup kernel: [  681.165719] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 17 09:24:35 backup kernel: [  681.165722] sd 1:0:0:0: [sdb]
Dec 17 09:24:35 backup kernel: [  681.165723] Sense Key : Medium Error [current] [descriptor]
Dec 17 09:24:35 backup kernel: [  681.165727] Descriptor sense data with sense descriptors (in hex):
Dec 17 09:24:35 backup kernel: [  681.165729]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 17 09:24:35 backup kernel: [  681.165738]         78 4d e6 00
Dec 17 09:24:35 backup kernel: [  681.165742] sd 1:0:0:0: [sdb]
Dec 17 09:24:35 backup kernel: [  681.165744] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 17 09:24:35 backup kernel: [  681.165747] sd 1:0:0:0: [sdb] CDB:
Dec 17 09:24:35 backup kernel: [  681.165748] Read(16): 88 00 00 00 00 00 78 4d e6 00 00 00 00 20 00 00
Dec 17 09:24:35 backup kernel: [  681.165759] end_request: I/O error, dev sdb, sector 2018371072
Dec 17 09:24:35 backup kernel: [  681.165802] ata2: EH complete
Dec 17 09:24:41 backup /etc/mysql/debian-start[9912]: Upgrading MySQL tables if necessary.
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: Looking for 'mysql' as: /usr/bin/mysql
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: FATAL ERROR: Upgrade failed
Dec 17 09:24:41 backup /etc/mysql/debian-start[9930]: Checking for insecure root accounts.

dmesg：

[  721.604068] ata2.00: exception Emask 0x0 SAct 0x600000 SErr 0x0 action 0x0
[  721.604102] ata2.00: irq_stat 0x40000008
[  721.604127] ata2.00: failed command: READ FPDMA QUEUED
[  721.604161] ata2.00: cmd 60/20:a8:00:e6:4d/00:00:78:00:00/40 tag 21 ncq 16384 in
[  721.604161]          res 41/40:20:00:e6:4d/00:00:78:00:00/00 Emask 0x409 (media error) <F>
[  721.604223] ata2.00: status: { DRDY ERR }
[  721.604239] ata2.00: error: { UNC }
[  721.630858] ata2.00: configured for UDMA/133
[  721.630875] sd 1:0:0:0: [sdb] Unhandled sense code
[  721.630878] sd 1:0:0:0: [sdb]
[  721.630880] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  721.630882] sd 1:0:0:0: [sdb]
[  721.630884] Sense Key : Medium Error [current] [descriptor]
[  721.630887] Descriptor sense data with sense descriptors (in hex):
[  721.630889]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[  721.630898]         78 4d e6 00
[  721.630902] sd 1:0:0:0: [sdb]
[  721.630905] Add. Sense: Unrecovered read error - auto reallocate failed
[  721.630907] sd 1:0:0:0: [sdb] CDB:
[  721.630908] Read(16): 88 00 00 00 00 00 78 4d e6 00 00 00 00 20 00 00
[  721.630919] end_request: I/O error, dev sdb, sector 2018371072
[  721.630962] ata2: EH complete
[  721.673419] init: mysql main process (10229) terminated with status 1
[  721.673442] init: mysql main process ended, respawning

如何检查 HDD（软件 RAID 1）是否有问题？我试过这个：

# smartctl -H /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-35-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

在我看来很好...

答案1

你的硬盘有问题，进行更改并从备份中恢复。

SMART 并不总是可靠的。

答案2

该磁盘快要坏了，看到 ata 命令并不好。您可以使用 smartctl 进行长时间测试：

smartctl --test=long /dev/sdb

但是如果您使用 MDRAID 在 RAID1 中运行它，老实说，我会考虑更换它，因为它看起来不太好 - 除非它通过 raid 卡/ sata 扩展器，在这种情况下尝试将其直接插入主板。

答案1

答案2

相关内容