两天以来,一个 mysql 服务器无法启动。我每 5 秒在系统日志中收到一次这样的信息:
Dec 17 09:24:35 backup kernel: [ 681.132013] ata2.00: exception Emask 0x0 SAct 0x50000 SErr 0x0 action 0x0
Dec 17 09:24:35 backup kernel: [ 681.132046] ata2.00: irq_stat 0x40000008
Dec 17 09:24:35 backup kernel: [ 681.132071] ata2.00: failed command: READ FPDMA QUEUED
Dec 17 09:24:35 backup kernel: [ 681.132105] ata2.00: cmd 60/20:80:00:e6:4d/00:00:78:00:00/40 tag 16 ncq 16384 in
Dec 17 09:24:35 backup kernel: [ 681.132105] res 41/40:20:00:e6:4d/00:00:78:00:00/00 Emask 0x409 (media error) <F>
Dec 17 09:24:35 backup kernel: [ 681.132167] ata2.00: status: { DRDY ERR }
Dec 17 09:24:35 backup kernel: [ 681.132183] ata2.00: error: { UNC }
Dec 17 09:24:35 backup kernel: [ 681.165698] ata2.00: configured for UDMA/133
Dec 17 09:24:35 backup kernel: [ 681.165714] sd 1:0:0:0: [sdb] Unhandled sense code
Dec 17 09:24:35 backup kernel: [ 681.165717] sd 1:0:0:0: [sdb]
Dec 17 09:24:35 backup kernel: [ 681.165719] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 17 09:24:35 backup kernel: [ 681.165722] sd 1:0:0:0: [sdb]
Dec 17 09:24:35 backup kernel: [ 681.165723] Sense Key : Medium Error [current] [descriptor]
Dec 17 09:24:35 backup kernel: [ 681.165727] Descriptor sense data with sense descriptors (in hex):
Dec 17 09:24:35 backup kernel: [ 681.165729] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 17 09:24:35 backup kernel: [ 681.165738] 78 4d e6 00
Dec 17 09:24:35 backup kernel: [ 681.165742] sd 1:0:0:0: [sdb]
Dec 17 09:24:35 backup kernel: [ 681.165744] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 17 09:24:35 backup kernel: [ 681.165747] sd 1:0:0:0: [sdb] CDB:
Dec 17 09:24:35 backup kernel: [ 681.165748] Read(16): 88 00 00 00 00 00 78 4d e6 00 00 00 00 20 00 00
Dec 17 09:24:35 backup kernel: [ 681.165759] end_request: I/O error, dev sdb, sector 2018371072
Dec 17 09:24:35 backup kernel: [ 681.165802] ata2: EH complete
Dec 17 09:24:41 backup /etc/mysql/debian-start[9912]: Upgrading MySQL tables if necessary.
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: Looking for 'mysql' as: /usr/bin/mysql
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Dec 17 09:24:41 backup /etc/mysql/debian-start[9916]: FATAL ERROR: Upgrade failed
Dec 17 09:24:41 backup /etc/mysql/debian-start[9930]: Checking for insecure root accounts.
dmesg:
[ 721.604068] ata2.00: exception Emask 0x0 SAct 0x600000 SErr 0x0 action 0x0
[ 721.604102] ata2.00: irq_stat 0x40000008
[ 721.604127] ata2.00: failed command: READ FPDMA QUEUED
[ 721.604161] ata2.00: cmd 60/20:a8:00:e6:4d/00:00:78:00:00/40 tag 21 ncq 16384 in
[ 721.604161] res 41/40:20:00:e6:4d/00:00:78:00:00/00 Emask 0x409 (media error) <F>
[ 721.604223] ata2.00: status: { DRDY ERR }
[ 721.604239] ata2.00: error: { UNC }
[ 721.630858] ata2.00: configured for UDMA/133
[ 721.630875] sd 1:0:0:0: [sdb] Unhandled sense code
[ 721.630878] sd 1:0:0:0: [sdb]
[ 721.630880] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 721.630882] sd 1:0:0:0: [sdb]
[ 721.630884] Sense Key : Medium Error [current] [descriptor]
[ 721.630887] Descriptor sense data with sense descriptors (in hex):
[ 721.630889] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 721.630898] 78 4d e6 00
[ 721.630902] sd 1:0:0:0: [sdb]
[ 721.630905] Add. Sense: Unrecovered read error - auto reallocate failed
[ 721.630907] sd 1:0:0:0: [sdb] CDB:
[ 721.630908] Read(16): 88 00 00 00 00 00 78 4d e6 00 00 00 00 20 00 00
[ 721.630919] end_request: I/O error, dev sdb, sector 2018371072
[ 721.630962] ata2: EH complete
[ 721.673419] init: mysql main process (10229) terminated with status 1
[ 721.673442] init: mysql main process ended, respawning
如何检查 HDD(软件 RAID 1)是否有问题?我试过这个:
# smartctl -H /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-35-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
在我看来很好...
答案1
你的硬盘有问题,进行更改并从备份中恢复。
SMART 并不总是可靠的。
答案2
该磁盘快要坏了,看到 ata 命令并不好。您可以使用 smartctl 进行长时间测试:
smartctl --test=long /dev/sdb
但是如果您使用 MDRAID 在 RAID1 中运行它,老实说,我会考虑更换它,因为它看起来不太好 - 除非它通过 raid 卡/ sata 扩展器,在这种情况下尝试将其直接插入主板。