我有一个 RDS MySQL 实例,其 MySQL 版本为 5.7.19,其时间点还原失败,状态为“不兼容状态”。
通过实验,我确定时间点恢复直到拍摄下一个快照才会失败,并且在拍摄快照后立即恢复。
AWS Support 告诉我
恢复操作遇到了插入/更新包含非编码二进制字符的字符串的情况,这是解码问题的已知根源。此问题是一个终极问题,因为它使我们无法了解重放中断的确切位置。
我正在尝试确定如何主动缓解此问题。是否有任何日志可以让我主动找到此类“非编码二进制字符”,而无需执行 PIT 还原?或者我是否需要定期执行还原并在发生故障时进行快照,以确保 PIT 还原涵盖大量时间段?
更新
从失败的 PITR 之前恢复快照会在对其运行任何查询之前显示此错误日志:
2018-05-31T07:59:50.880143Z 0 [Warning] 'NO_AUTO_CREATE_USER' sql mode was not set.
2018-05-31T07:59:50.881641Z 0 [Warning] Insecure configuration for --secure-file-priv: Location is accessible to all OS users. Consider choosing a different directory.
2018-05-31T07:59:50.885191Z 0 [Note] /rdsdbbin/mysql/bin/mysqld (mysqld 5.7.19-log) starting as process 3470 ...
2018-05-31T07:59:51.126014Z 0 [Note] InnoDB: PUNCH HOLE support available
2018-05-31T07:59:51.126054Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-05-31T07:59:51.126060Z 0 [Note] InnoDB: Uses event mutexes
2018-05-31T07:59:51.126064Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2018-05-31T07:59:51.126068Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-05-31T07:59:51.126071Z 0 [Note] InnoDB: Using Linux native AIO
2018-05-31T07:59:51.150879Z 0 [Note] InnoDB: Number of pools: 1
2018-05-31T07:59:51.182600Z 0 [Note] InnoDB: Using CPU crc32 instructions
2018-05-31T07:59:51.184624Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 8, chunk size = 128M
2018-05-31T07:59:51.245944Z 0 [Note] InnoDB: Completed initialization of buffer pool
2018-05-31T07:59:51.274809Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-05-31T07:59:51.661104Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2018-05-31T07:59:52.079538Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 205460220725
2018-05-31T07:59:53.616196Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 205460333516
2018-05-31T07:59:54.463445Z 0 [Note] InnoDB: Database was not shutdown normally!
2018-05-31T07:59:54.463472Z 0 [Note] InnoDB: Starting crash recovery.
2018-05-31T07:59:56.869876Z 0 [Note] InnoDB: Transaction 1712897897 was in the XA prepared state.
2018-05-31T07:59:57.059082Z 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2018-05-31T07:59:57.059107Z 0 [Note] InnoDB: Trx id counter is 1712898304
2018-05-31T07:59:57.060227Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2018-05-31T07:59:57.606094Z 0 [Note] InnoDB: Apply batch completed
2018-05-31T07:59:57.606126Z 0 [Note] InnoDB: Last MySQL binlog file position 0 7766, file name mysql-bin-changelog.196126
2018-05-31T08:01:05.616244Z 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions
2018-05-31T08:01:05.616271Z 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2018-05-31T08:01:05.617146Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2018-05-31T08:01:05.617158Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-05-31T08:01:05.617203Z 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-05-31T08:01:07.653103Z 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
2018-05-31T08:01:07.654148Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2018-05-31T08:01:07.654159Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2018-05-31T08:01:07.870099Z 0 [Note] InnoDB: Waiting for purge to start
2018-05-31T08:01:07.920296Z 0 [Note] InnoDB: 5.7.19 started; log sequence number 205460333516
2018-05-31T08:01:07.920583Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 76646ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2018-05-31T08:01:07.920723Z 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
2018-05-31T08:01:07.924167Z 0 [Note] Plugin 'FEDERATED' is disabled.
2018-05-31T08:01:07.938375Z 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
2018-05-31T08:01:07.938453Z 0 [Note] Starting crash recovery...
2018-05-31T08:01:07.938489Z 0 [Note] InnoDB: Starting recovery for XA transactions...
2018-05-31T08:01:07.938496Z 0 [Note] InnoDB: Transaction 1712897897 in prepared state after recovery
2018-05-31T08:01:07.938500Z 0 [Note] InnoDB: Transaction contains changes to 1 rows
2018-05-31T08:01:07.938504Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2018-05-31T08:01:07.938506Z 0 [Note] Found 1 prepared transaction(s) in InnoDB
2018-05-31T08:01:07.975797Z 0 [Note] Crash recovery finished.
2018-05-31T08:01:08.009971Z 0 [Note] Salting uuid generator variables, current_pid: 3470, server_start_time: 1527753590, bytes_sent: 0,
2018-05-31T08:01:08.013725Z 0 [Note] Generated uuid: 'c65e1206-64a8-11e8-acdc-0642b77ee370', server_start_time: 976718170713733380, bytes_sent: 47057241759744
2018-05-31T08:01:08.013758Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: c65e1206-64a8-11e8-acdc-0642b77ee370.
2018-05-31T08:01:08.044320Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2018-05-31T08:01:08.044373Z 0 [Note] IPv6 is available.
2018-05-31T08:01:08.044384Z 0 [Note] - '::' resolves to '::';
2018-05-31T08:01:08.044401Z 0 [Note] Server socket created on IP: '::'.
2018-05-31T08:01:09.042028Z 0 [Note] Event Scheduler: Loaded 0 events
2018-05-31T08:01:09.042160Z 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections.
Version: '5.7.19-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
2018-05-31T08:01:09.042169Z 0 [Note] Executing 'SELECT * FROM INFORMATION_SCHEMA.TABLES;' to get a list of tables using the deprecated partition engine. You may use the startup option '--disable-partition-engine-check' to skip this check.
2018-05-31T08:01:09.042171Z 0 [Note] Beginning of list of non-natively partitioned tables
2018-05-31T08:01:09.889603Z 0 [Note] InnoDB: Buffer pool(s) load completed at 180531 8:01:09
2018-05-31T08:01:14.839437Z 0 [Note] End of list of non-natively partitioned tables
----------------------- END OF LOG ----------------------
对非 InnoDB 表的检查显示以下内容:
+---------------+-------------------------+-------------------------------+------------+--------+---------+------------+------------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+-------------------------+-------------------------------+------------+--------+---------+------------+------------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------------+
| def | promosolutions_co_nz | w90h1_finder_tokens | BASE TABLE | MEMORY | 10 | Fixed | 0 | 459 | 0 | 15040512 | 0 | 0 | NULL | 2018-05-31 08:20:13 | NULL | NULL | utf8_general_ci | NULL | | |
| def | promosolutions_co_nz | w90h1_finder_tokens_aggregate | BASE TABLE | MEMORY | 10 | Fixed | 0 | 474 | 0 | 15061350 | 0 | 0 | NULL | 2018-05-31 08:20:13 | NULL | NULL | utf8_general_ci | NULL | | |
| def | vc | CORE | BASE TABLE | MEMORY | 10 | Fixed | 0 | 777 | 0 | 16627023 | 0 | 0 | NULL | 2018-05-31 08:20:23 | NULL | NULL | latin1_swedish_ci | NULL | | |
| def | vc | CORErs | BASE TABLE | MEMORY | 10 | Fixed | 0 | 786 | 0 | 16649838 | 0 | 0 | NULL | 2018-05-31 08:20:23 | NULL | NULL | latin1_swedish_ci | NULL | | |
| def | vc | vw_mygroups | BASE TABLE | MyISAM | 10 | Fixed | 0 | 0 | 0 | 1970324836974591 | 1024 | 0 | NULL | 2018-05-01 04:44:19 | 2018-05-01 04:44:19 | NULL | utf8_general_ci | NULL | | |
| def | www_automatem_co_nz | jos_finder_tokens | BASE TABLE | MEMORY | 10 | Fixed | 0 | 623 | 0 | 15553818 | 0 | 0 | NULL | 2018-05-31 08:20:27 | NULL | NULL | utf8mb4_general_ci | NULL | | |
| def | www_automatem_co_nz | jos_finder_tokens_aggregate | BASE TABLE | MEMORY | 10 | Fixed | 0 | 639 | 0 | 15582015 | 0 | 0 | NULL | 2018-05-31 08:20:27 | NULL | NULL | utf8mb4_general_ci | NULL | | |
| def | www_bayautomotive_co_nz | jos_finder_tokens | BASE TABLE | MEMORY | 10 | Fixed | 0 | 623 | 0 | 15553818 | 0 | 0 | NULL | 2018-05-31 08:20:28 | NULL | NULL | utf8mb4_general_ci | NULL | | |
| def | www_bayautomotive_co_nz | jos_finder_tokens_aggregate | BASE TABLE | MEMORY | 10 | Fixed | 0 | 639 | 0 | 15582015 | 0 | 0 | NULL | 2018-05-31 08:20:28 | NULL | NULL | utf8mb4_general_ci | NULL | | |
| def | www_xpanda_co_nz | jos_finder_tokens | BASE TABLE | MEMORY | 10 | Fixed | 0 | 623 | 0 | 15553818 | 0 | 0 | NULL | 2018-05-31 08:20:36 | NULL | NULL | utf8mb4_general_ci | NULL | | |
| def | www_xpanda_co_nz | jos_finder_tokens_aggregate | BASE TABLE | MEMORY | 10 | Fixed | 0 | 639 | 0 | 15582015 | 0 | 0 | NULL | 2018-05-31 08:20:36 | NULL | NULL | utf8mb4_general_ci | NULL | | |
+---------------+-------------------------+-------------------------------+------------+--------+---------+------------+------------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------------+
(不用担心,这些都不再由该服务器托管)
我已经在所有这些表(全部为空)上运行了 select *,并在每个表上插入了一条记录。错误日志显示 CloudWatch 中没有其他条目。
答案1
在实例的参数组中,设置binlog_format
为ROW
。 RDS 上的默认值为 ,MIXED
尽管在大多数情况下没有令人信服的理由使用此默认值。 在我的系统中,ROW
始终使用 ,因为它比 具有显着优势MIXED
。
在这种情况下,更令人信服的原因之一是以下这部分解释:
恢复操作遇到包含非编码二进制字符的字符串的插入/更新
虽然这在我看来不是一个完整的解释(因为这实际上并不无效,没有错误),但我们可以轻松避免所描述的问题 -ROW
日志记录格式捕获更改的行图像,而不是捕获插入/更新(/删除)语句作为“字符串”,所以如果RDS基础设施或底层MySQL版本处理这些问题的方式存在问题,这应该可以消除它。
该MIXED
模式允许优化器选择是否将 DML 的结果记录为发出的查询或已更改行的图像,并且如果内部没有任何内容将该语句标记为不确定或与重放不兼容,它将倾向于记录该语句 - 例如使用函数之类的东西UUID()
(将来调用时不会生成相同的值)。 (其他函数,例如NOW()
在日志记录中内置了附加机制,使其具有确定性的重放,这与直觉相反。)
关于模式,我想到的唯一有意义的警告ROW
是,您的表必须都具有主键(无论如何,这应该是标准做法),否则恢复和复制(如果有副本)可能会更慢,因为在这种情况下会扫描整个表以查找删除和更新...但抛开这一点考虑,基于日志记录的优势是如此之大,以至于多年来ROW
我没有机会使用任何其他东西。ROW
第二个问题是确保您没有使用 MyISAM 或任何其他存储引擎,除了默认使用它们的系统表(不要尝试更改这些)。
MyISAM 不具备 InnoDB 的崩溃恢复功能,而 RDS 依靠此功能来使快照可用。它对系统表采取了预防措施,因此它们不会造成问题,但对于您的表,没有任何保护措施。当快照发生时,非 MyISAM 表未处于安全合并状态可能会阻止快照正常工作。
虽然支持人员没有将此视为您的问题,但这与您的观察一致,即在您创建新快照之前,时间点不起作用。
虽然没有任何详细记录,但根据传闻证据,我的假设是,时间点从所需恢复点之前但时间最接近所需恢复点的快照开始,然后应用所有相关的存档二进制日志以向前播放到所需的恢复点。这就是您在 RDS 之外手动执行此操作的方式,并且根据“rdsadmin”用户的可观察操作,没有理由相信 RDS 重新发明了轮子。
当然,包含损坏的 MyISAM 表的快照的问题在于,这种情况直到后你恢复快照和然后尝试访问其中一个表。
如果这确实是问题所在,支持人员可能会认为您已经意识到了以下情况,并错误地排除了这种可能性:
MyISAM 存储引擎不支持可靠恢复,并且在恢复后重新启动 MySQL 时可能会导致数据丢失或损坏,从而阻止时间点恢复或快照恢复按预期工作。
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_MySQL.html
如果您使用 MyISAM 表,则应将其转换为 InnoDB。
答案2
Amazon Web Services 进一步研究了此问题,并找到了罪魁祸首如下:
********* update from service team ***********
This is decoded DDL in the binary log that blocked point-in-time restore for restore-markovina-aws-support. It is a "create table if not exists" for the drsarahhart_com.wp_wfBlocks7 table. I had to manually skip it (as the table already existed) to push the restore to completion (there was also a similar DDL in a later binary log).
# at 375050
#180523 19:10:46 server id 153211832 end_log_pos 375115 CRC32 0x1eec997a Anonymous_GTID last_committed=301 sequence_number=302 rbr_only=no
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 375115
#180523 19:10:46 server id 153211832 end_log_pos 375785 CRC32 0xc43c076b Query thread_id=584850 exec_time=0 error_code=0
SET TIMESTAMP=1527102646/*!*/;
create table IF NOT EXISTS wp_wfBlocks7 (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`type` int(10) unsigned NOT NULL DEFAULT '0',
`IP` binary(16) NOT NULL DEFAULT '\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@',
`blockedTime` bigint(20) NOT NULL,
`reason` varchar(255) NOT NULL,
`lastAttempt` int(10) unsigned DEFAULT '0',
`blockedHits` int(10) unsigned DEFAULT '0',
`expiration` bigint(20) unsigned NOT NULL DEFAULT '0',
`parameters` text,
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `IP` (`IP`),
KEY `expiration` (`expiration`)
) DEFAULT CHARSET=utf8
/*!*/;
# at 375785
The problematic characters were in the sequence: '\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@\^@'
This corresponds to this ASCII code point sequence: 5c 27 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 5c 00 27 00
How is the customer executing the DDL in the first place to create this table? Is the DDL being auto-generated or is someone running a script? If using a script how are they encoding the script contents?
********* end update from service team ********
我很确定这是由 Wordpress WordFence 插件创建的表,我将检查它是否仍然是最新的并在此报告。