Mysql 崩溃。硬盘或硬件损坏?

Mysql 崩溃。硬盘或硬件损坏?

我已经看到高负载和 mysql 在 1 周内崩溃了 2 次。这可能是原因吗?有什么想法吗?

    Jan  3 09:49:19 HOST kernel: [2272100.568769]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:19 HOST kernel: [2272100.569023] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:19 HOST kernel: [2272100.569089] ata2.00: error: { UNC }
    Jan  3 09:49:19 HOST kernel: [2272100.577394] ata2.00: configured for UDMA/133
    Jan  3 09:49:19 HOST kernel: [2272100.577418] ata2: EH complete
    Jan  3 09:49:26 HOST kernel: [2272107.699341] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    Jan  3 09:49:26 HOST kernel: [2272107.699569] ata2.00: BMDMA stat 0x25
    Jan  3 09:49:26 HOST kernel: [2272107.699643] ata2.00: failed command: READ DMA EXT
    Jan  3 09:49:26 HOST kernel: [2272107.699713] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
    Jan  3 09:49:26 HOST kernel: [2272107.699715]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:26 HOST kernel: [2272107.699966] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:26 HOST kernel: [2272107.700030] ata2.00: error: { UNC }
    Jan  3 09:49:26 HOST kernel: [2272107.708509] ata2.00: configured for UDMA/133
    Jan  3 09:49:26 HOST kernel: [2272107.708534] ata2: EH complete
    Jan  3 09:49:33 HOST kernel: [2272114.833522] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    Jan  3 09:49:33 HOST kernel: [2272114.833603] ata2.00: BMDMA stat 0x25
    Jan  3 09:49:33 HOST kernel: [2272114.833669] ata2.00: failed command: READ DMA EXT
    Jan  3 09:49:33 HOST kernel: [2272114.833737] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
    Jan  3 09:49:33 HOST kernel: [2272114.833739]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:33 HOST kernel: [2272114.833992] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:33 HOST kernel: [2272114.834056] ata2.00: error: { UNC }
    Jan  3 09:49:33 HOST kernel: [2272114.842578] ata2.00: configured for UDMA/133
    Jan  3 09:49:33 HOST kernel: [2272114.842604] ata2: EH complete
    Jan  3 09:49:40 HOST kernel: [2272121.959563] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    Jan  3 09:49:40 HOST kernel: [2272121.959644] ata2.00: BMDMA stat 0x25
    Jan  3 09:49:40 HOST kernel: [2272121.959708] ata2.00: failed command: READ DMA EXT
    Jan  3 09:49:40 HOST kernel: [2272121.959778] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
    Jan  3 09:49:40 HOST kernel: [2272121.959780]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:40 HOST kernel: [2272121.961337] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:40 HOST kernel: [2272121.961400] ata2.00: error: { UNC }
    Jan  3 09:49:40 HOST kernel: [2272121.968673] ata2.00: configured for UDMA/133
    Jan  3 09:49:40 HOST kernel: [2272121.968701] sd 1:0:0:0: [sda] Unhandled sense code
    Jan  3 09:49:40 HOST kernel: [2272121.968706] sd 1:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    Jan  3 09:49:40 HOST kernel: [2272121.968714] sd 1:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
    Jan  3 09:49:40 HOST kernel: [2272121.968723] Descriptor sense data with sense descriptors (in hex):
    Jan  3 09:49:40 HOST kernel: [2272121.968729]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
    Jan  3 09:49:40 HOST kernel: [2272121.968743]         35 f1 7f 78
    Jan  3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
    Jan  3 09:49:40 HOST kernel: [2272121.968759] sd 1:0:0:0: [sda] CDB: Read(10): 28 00 35 f1 7f 78 00 00 38 00
    Jan  3 09:49:40 HOST kernel: [2272121.968778] ata2: EH complete
Jan  3 09:47:45 HOST kernel: [2272007.394223]  [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394232]  [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394241]  [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394253]  [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394262]  [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394266]  [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan  3 09:47:45 HOST kernel: [2272007.394270]  [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394274]  [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan  3 09:47:45 HOST kernel: [2272007.394278]  [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan  3 09:47:45 HOST kernel: [2272007.394282]  [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394287]  [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan  3 09:47:45 HOST kernel: [2272007.394295]  [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394299]  [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan  3 09:47:45 HOST kernel: [2272007.394303]  [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan  3 09:47:45 HOST kernel: [2272007.394307]  [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394311]  [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394315]  [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan  3 09:47:45 HOST kernel: [2272007.394319] INFO: task mysqld:1241 blocked for more than 120 seconds.
Jan  3 09:47:45 HOST kernel: [2272007.394389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  3 09:47:45 HOST kernel: [2272007.394581] mysqld        D ffff88004dda2f40     0  1241   3454    0 0x00000000
Jan  3 09:47:45 HOST kernel: [2272007.394585]  ffff88007df63958 0000000000000082 0000000000000000 00000000ffffffff
Jan  3 09:47:45 HOST kernel: [2272007.394590]  ffff8800ffffffff 0000000000055c14 ffff88007df638e8 ffffffff8112806e
Jan  3 09:47:45 HOST kernel: [2272007.394594]  000000000001b900 ffff88004dda3508 ffff88007df63fd8 000000000001e9c0
Jan  3 09:47:45 HOST kernel: [2272007.394598] Call Trace:
Jan  3 09:47:45 HOST kernel: [2272007.394601]  [<ffffffff8112806e>] ? find_get_page+0x1e/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394608]  [<ffffffffa006d0bd>] do_get_write_access+0x29d/0x510 [jbd2]
Jan  3 09:47:45 HOST kernel: [2272007.394612]  [<ffffffff81097db0>] ? wake_bit_function+0x0/0x50
Jan  3 09:47:45 HOST kernel: [2272007.394618]  [<ffffffffa006d481>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan  3 09:47:45 HOST kernel: [2272007.394629]  [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394643]  [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394653]  [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394664]  [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394677]  [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394683]  [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan  3 09:47:45 HOST kernel: [2272007.394690]  [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394697]  [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan  3 09:47:45 HOST kernel: [2272007.394704]  [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan  3 09:47:45 HOST kernel: [2272007.394712]  [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394719]  [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan  3 09:47:45 HOST kernel: [2272007.394730]  [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394738]  [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan  3 09:47:45 HOST kernel: [2272007.394744]  [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan  3 09:47:45 HOST kernel: [2272007.394751]  [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394757]  [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394764]  [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan  3 09:47:52 HOST kernel: [2272013.885915] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  3 09:47:52 HOST kernel: [2272013.885998] ata2.00: BMDMA stat 0x25

答案1

恭喜,你遇到了典型的 URE。你的错误信息甚至明确说明了这一点。

    Jan  3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed

让您的数据中心更换有缺陷的磁盘。

答案2

我看到多个“DRDY ERR”消息,这仅与硬盘故障有关。您是否fsck -cc查找过坏扇区并对其进行标记?

笔记:请确保您启动另一个操作系统,因为您真的不应该在已安装的分区上运行 fsck。并备份备份备份!

答案3

首先你应该备份数据。这是当务之急。

硬盘肯定坏了。你不可能同时收到 DRDY 错误、异常 emask 和 SCSI 感知密钥错误。所有这些都指向一件事,硬盘坏了。

现在,查看调用跟踪。它显示 ext4 已获取 inode、获取数据、弄脏 inode 但无法写入。等待太久,您将面临获得只读文件系统的风险。在备份之前不要运行 fsck。

当您卸载硬盘并运行 fsck 时,请尝试以详细模式运行。

fsck -fyv <partition-name>

如果您能记下错误,那么下次再次遇到该问题时它可能会有用。

相关内容