我已经看到高负载和 mysql 在 1 周内崩溃了 2 次。这可能是原因吗?有什么想法吗?
Jan 3 09:49:19 HOST kernel: [2272100.568769] res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
Jan 3 09:49:19 HOST kernel: [2272100.569023] ata2.00: status: { DRDY ERR }
Jan 3 09:49:19 HOST kernel: [2272100.569089] ata2.00: error: { UNC }
Jan 3 09:49:19 HOST kernel: [2272100.577394] ata2.00: configured for UDMA/133
Jan 3 09:49:19 HOST kernel: [2272100.577418] ata2: EH complete
Jan 3 09:49:26 HOST kernel: [2272107.699341] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 3 09:49:26 HOST kernel: [2272107.699569] ata2.00: BMDMA stat 0x25
Jan 3 09:49:26 HOST kernel: [2272107.699643] ata2.00: failed command: READ DMA EXT
Jan 3 09:49:26 HOST kernel: [2272107.699713] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
Jan 3 09:49:26 HOST kernel: [2272107.699715] res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
Jan 3 09:49:26 HOST kernel: [2272107.699966] ata2.00: status: { DRDY ERR }
Jan 3 09:49:26 HOST kernel: [2272107.700030] ata2.00: error: { UNC }
Jan 3 09:49:26 HOST kernel: [2272107.708509] ata2.00: configured for UDMA/133
Jan 3 09:49:26 HOST kernel: [2272107.708534] ata2: EH complete
Jan 3 09:49:33 HOST kernel: [2272114.833522] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 3 09:49:33 HOST kernel: [2272114.833603] ata2.00: BMDMA stat 0x25
Jan 3 09:49:33 HOST kernel: [2272114.833669] ata2.00: failed command: READ DMA EXT
Jan 3 09:49:33 HOST kernel: [2272114.833737] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
Jan 3 09:49:33 HOST kernel: [2272114.833739] res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
Jan 3 09:49:33 HOST kernel: [2272114.833992] ata2.00: status: { DRDY ERR }
Jan 3 09:49:33 HOST kernel: [2272114.834056] ata2.00: error: { UNC }
Jan 3 09:49:33 HOST kernel: [2272114.842578] ata2.00: configured for UDMA/133
Jan 3 09:49:33 HOST kernel: [2272114.842604] ata2: EH complete
Jan 3 09:49:40 HOST kernel: [2272121.959563] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 3 09:49:40 HOST kernel: [2272121.959644] ata2.00: BMDMA stat 0x25
Jan 3 09:49:40 HOST kernel: [2272121.959708] ata2.00: failed command: READ DMA EXT
Jan 3 09:49:40 HOST kernel: [2272121.959778] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
Jan 3 09:49:40 HOST kernel: [2272121.959780] res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
Jan 3 09:49:40 HOST kernel: [2272121.961337] ata2.00: status: { DRDY ERR }
Jan 3 09:49:40 HOST kernel: [2272121.961400] ata2.00: error: { UNC }
Jan 3 09:49:40 HOST kernel: [2272121.968673] ata2.00: configured for UDMA/133
Jan 3 09:49:40 HOST kernel: [2272121.968701] sd 1:0:0:0: [sda] Unhandled sense code
Jan 3 09:49:40 HOST kernel: [2272121.968706] sd 1:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 3 09:49:40 HOST kernel: [2272121.968714] sd 1:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Jan 3 09:49:40 HOST kernel: [2272121.968723] Descriptor sense data with sense descriptors (in hex):
Jan 3 09:49:40 HOST kernel: [2272121.968729] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Jan 3 09:49:40 HOST kernel: [2272121.968743] 35 f1 7f 78
Jan 3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Jan 3 09:49:40 HOST kernel: [2272121.968759] sd 1:0:0:0: [sda] CDB: Read(10): 28 00 35 f1 7f 78 00 00 38 00
Jan 3 09:49:40 HOST kernel: [2272121.968778] ata2: EH complete
Jan 3 09:47:45 HOST kernel: [2272007.394223] [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394232] [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394241] [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394253] [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394262] [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394266] [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan 3 09:47:45 HOST kernel: [2272007.394270] [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan 3 09:47:45 HOST kernel: [2272007.394274] [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan 3 09:47:45 HOST kernel: [2272007.394278] [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan 3 09:47:45 HOST kernel: [2272007.394282] [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan 3 09:47:45 HOST kernel: [2272007.394287] [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan 3 09:47:45 HOST kernel: [2272007.394295] [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394299] [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan 3 09:47:45 HOST kernel: [2272007.394303] [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan 3 09:47:45 HOST kernel: [2272007.394307] [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan 3 09:47:45 HOST kernel: [2272007.394311] [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan 3 09:47:45 HOST kernel: [2272007.394315] [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan 3 09:47:45 HOST kernel: [2272007.394319] INFO: task mysqld:1241 blocked for more than 120 seconds.
Jan 3 09:47:45 HOST kernel: [2272007.394389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 3 09:47:45 HOST kernel: [2272007.394581] mysqld D ffff88004dda2f40 0 1241 3454 0 0x00000000
Jan 3 09:47:45 HOST kernel: [2272007.394585] ffff88007df63958 0000000000000082 0000000000000000 00000000ffffffff
Jan 3 09:47:45 HOST kernel: [2272007.394590] ffff8800ffffffff 0000000000055c14 ffff88007df638e8 ffffffff8112806e
Jan 3 09:47:45 HOST kernel: [2272007.394594] 000000000001b900 ffff88004dda3508 ffff88007df63fd8 000000000001e9c0
Jan 3 09:47:45 HOST kernel: [2272007.394598] Call Trace:
Jan 3 09:47:45 HOST kernel: [2272007.394601] [<ffffffff8112806e>] ? find_get_page+0x1e/0xa0
Jan 3 09:47:45 HOST kernel: [2272007.394608] [<ffffffffa006d0bd>] do_get_write_access+0x29d/0x510 [jbd2]
Jan 3 09:47:45 HOST kernel: [2272007.394612] [<ffffffff81097db0>] ? wake_bit_function+0x0/0x50
Jan 3 09:47:45 HOST kernel: [2272007.394618] [<ffffffffa006d481>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan 3 09:47:45 HOST kernel: [2272007.394629] [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394643] [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394653] [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394664] [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394677] [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394683] [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan 3 09:47:45 HOST kernel: [2272007.394690] [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan 3 09:47:45 HOST kernel: [2272007.394697] [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan 3 09:47:45 HOST kernel: [2272007.394704] [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan 3 09:47:45 HOST kernel: [2272007.394712] [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan 3 09:47:45 HOST kernel: [2272007.394719] [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan 3 09:47:45 HOST kernel: [2272007.394730] [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan 3 09:47:45 HOST kernel: [2272007.394738] [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan 3 09:47:45 HOST kernel: [2272007.394744] [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan 3 09:47:45 HOST kernel: [2272007.394751] [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan 3 09:47:45 HOST kernel: [2272007.394757] [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan 3 09:47:45 HOST kernel: [2272007.394764] [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan 3 09:47:52 HOST kernel: [2272013.885915] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 3 09:47:52 HOST kernel: [2272013.885998] ata2.00: BMDMA stat 0x25
答案1
恭喜,你遇到了典型的 URE。你的错误信息甚至明确说明了这一点。
Jan 3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
让您的数据中心更换有缺陷的磁盘。
答案2
我看到多个“DRDY ERR”消息,这仅与硬盘故障有关。您是否fsck -cc
查找过坏扇区并对其进行标记?
笔记:请确保您启动另一个操作系统,因为您真的不应该在已安装的分区上运行 fsck。并备份备份备份!
答案3
首先你应该备份数据。这是当务之急。
硬盘肯定坏了。你不可能同时收到 DRDY 错误、异常 emask 和 SCSI 感知密钥错误。所有这些都指向一件事,硬盘坏了。
现在,查看调用跟踪。它显示 ext4 已获取 inode、获取数据、弄脏 inode 但无法写入。等待太久,您将面临获得只读文件系统的风险。在备份之前不要运行 fsck。
当您卸载硬盘并运行 fsck 时,请尝试以详细模式运行。
fsck -fyv <partition-name>
如果您能记下错误,那么下次再次遇到该问题时它可能会有用。