什么原因会导致 mysqlcheck 错误地报告表未损坏?

什么原因会导致 mysqlcheck 错误地报告表未损坏?

我们正在为我们的一位客户管理 MySQL 服务器,该服务器有 100 多个数据库,每个数据库大约有 50 个表,其中许多是 InnoDB 表。服务器崩溃了,我正在尝试找出罪魁祸首。使用 重新启动时innodb_force_recovery = 2,我可以连接,并且在 error.log 中看不到任何错误。更重要的是,mysqlcheck --all-databases所有表都报告“正常”。但是当我删除 时innodb_force_recovery,服务器再次崩溃,将堆栈跟踪写入 error.log,并且只能使用 停止kill -9

在这种情况下,我该如何找到有问题的数据库,什么会导致 mysqlcheck 错过损坏的表?请不要告诉我忽略它并从转储中恢复所有数据库。对于一两个数据库,如果这种情况很少发生,这可能是可以接受的,但我不止一次遇到过同一台服务器的问题,每次从转储中恢复所有内容都需要太多的时间和手动工作。

服务器版本为5.5.46,并且innodb_file_per_table处于活跃状态。

根据要求摘录的 error.log(这是否The tablespace free space info is corrupt意味着存在一个不在特定表中且无法更正的错误?):

180222 17:13:48 mysqld_safe Starting mysqld daemon with databases from /home/mysql
180222 17:13:48 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
180222 17:13:48 [Note] /usr/libexec/mysqld (mysqld 5.5.46) starting as process 26242 ...
180222 17:13:48 [Note] Plugin 'FEDERATED' is disabled.
180222 17:13:48 InnoDB: The InnoDB memory heap is disabled
180222 17:13:48 InnoDB: Mutexes and rw_locks use InnoDB's own implementation
180222 17:13:48 InnoDB: Compressed tables use zlib 1.2.3
180222 17:13:48 InnoDB: Using Linux native AIO
180222 17:13:48 InnoDB: Initializing buffer pool, size = 128.0M
180222 17:13:49 InnoDB: Completed initialization of buffer pool
180222 17:13:49 InnoDB: highest supported file format is Barracuda.
180222 17:13:49  InnoDB: Waiting for the background threads to start
180222 17:13:50 InnoDB: 5.5.46 started; log sequence number 1632912830888
180222 17:13:50 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
180222 17:13:50 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
180222 17:13:50 [Note] Server socket created on IP: '0.0.0.0'.
180222 17:13:50 [Note] Event Scheduler: Loaded 0 events
180222 17:13:50 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.46'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Remi
InnoDB: Dump of the tablespace extent descriptor:  len 40; hex 000000000000000200000000061600000000126e00000004ffffffffffffffffffffffffffffbfaa; asc                    n                    ;
InnoDB: Serious error! InnoDB is trying to free page 512
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180222 17:13:50  InnoDB: Assertion failure in thread 2499464080 in file fsp0fsp.c line 3309
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:13:50 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=0
max_threads=512
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1314506 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x33)[0x842a1f3]
/usr/libexec/mysqld(handle_fatal_signal+0x42b)[0x82d9d3b]
[0x7bc420]
[0x7bc410]
/lib/libc.so.6(gsignal+0x50)[0x626b10]
/lib/libc.so.6(abort+0x101)[0x628421]
/usr/libexec/mysqld[0x85012e7]
/usr/libexec/mysqld[0x850147e]
/usr/libexec/mysqld[0x849c0b1]
/usr/libexec/mysqld[0x84a8a61]
/usr/libexec/mysqld[0x8561fef]
/usr/libexec/mysqld[0x85570a9]
/usr/libexec/mysqld[0x847b082]
/usr/libexec/mysqld[0x846bf04]
/usr/libexec/mysqld[0x846dad4]
/lib/libpthread.so.0[0x50d912]
/lib/libc.so.6(clone+0x5e)[0x6d347e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180222 17:13:50 mysqld_safe Number of processes running now: 0
180222 17:13:50 mysqld_safe mysqld restarted

答案1

错误消息本身会告诉您为什么验证表数据无法解决问题。它尝试释放一个页面(可能来自表或索引);但该页面已被标记为空闲。换句话说,您的一个表或索引正在使用一个页面,而该页面被认为可以提供给另一个表或索引。而且,显然,如果 InnoDB 不知道哪些页面实际上是空闲的,就会发生糟糕的事情。

转储所有表/索引数据并重新加载意味着让 InnoDB 有机会重建其可用页面集合。理想情况下,您会在全新数据库上执行此操作。为什么?因为您永远不应该假设只有一个、单一、孤立的损坏错误。因此,如果您将数据移动到全新安装,则不必担心是否存在其他未检测到的损坏问题。

答案2

按照建议,我最终选择转储并从这些转储中恢复所有数据库。幸运的是,它innodb_force_recovery = 2允许我转储所有内容而没有任何错误,这样我就不必使用备份中的转储。当然,我宁愿找出错误的真正原因,但除了错误报告中提到“表空间可用空间信息已损坏”之外,MySQL 没有任何帮助。如果不找出并消除错误原因,我预计它最终会再次发生 - 我们的客户可能会比我更生气。

可能是硬件故障导致的,但系统所有磁盘的 SMART 数据都正常,/var/log/messages崩溃时没有任何可疑信息。也没有意外断电或重启。

相关内容