我正在尝试修复我们服务器上的 MySQL 配置。我们的应用程序的具体情况是,大量数据存储在单个表中(目前超过 3 亿行)。此表经常用于插入(它们一直出现)。
当我在该表上运行一个选择查询时,如果花费的时间超过几秒钟,那么所有插入(精确地提交)都在等待表访问,并使我们的应用程序无响应。
据我所知,在 select 运行时,InnoDB 不会对表进行任何锁定。那么 select 为何会阻塞表呢?
我尝试使用 innotop 查找原因,但我不知道如何解释其输出以及在哪里搜索。告诉我您需要什么,我会在此处发布。
+-----+---------+-----------+--------+---------+------+----------------+-----------------------------------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+---------+-----------+--------+---------+------+----------------+-----------------------------------------------------------------------------------------------------------------------------------+
| 1 | root | localhost | dbname | Query | 29 | NULL | COMMIT |
| 2 | root | localhost | dbname | Query | 30 | NULL | COMMIT |
| 4 | root | localhost | dbname | Query | 29 | NULL | COMMIT |
| 5 | root | localhost | dbname | Query | 29 | NULL | COMMIT |
| 6 | root | localhost | dbname | Query | 25 | NULL | COMMIT |
| 7 | root | localhost | dbname | Query | 0 | NULL | show full processlist |
| 13 | user | localhost | dbname | Query | 25 | NULL | COMMIT |
| 38 | user | localhost | dbname | Sleep | 0 | | NULL |
| 39 | user | localhost | dbname | Sleep | 9017 | | NULL |
| 40 | user | localhost | dbname | Query | 33 | Sorting result | SELECT * FROM `large_table` WHERE (`large_table`.`hotspot_id` = 3000064) ORDER BY discovered_at LIMIT 799000, 1000 |
| 60 | user | localhost | dbname | Sleep | 1033 | | NULL |
| 83 | root | localhost | dbname | Sleep | 3728 | | NULL |
| 112 | root | localhost | NULL | Sleep | 6 | | NULL |
+-----+---------+-----------+--------+---------+------+----------------+-----------------------------------------------------------------------------------------------------------------------------------+
=====================================
110824 12:24:24 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 19 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1521117, signal count 1471216
Mutex spin waits 0, rounds 20647617, OS waits 239914
RW-shared spins 2119697, OS waits 1037149; RW-excl spins 505734, OS waits 218177
------------
TRANSACTIONS
------------
Trx id counter 0 412917332
Purge done for trx's n:o < 0 412917135 undo n:o < 0 0
History list length 48
Total number of lock structs in row lock hash table 5
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 28363, OS thread id 1092766032
MySQL thread id 83, query id 3249941 localhost root
---TRANSACTION 0 412901582, not started, process no 28363, OS thread id 1144449360
MySQL thread id 60, query id 3677008 localhost user
---TRANSACTION 0 412917189, not started, process no 28363, OS thread id 1144314192
MySQL thread id 43, query id 3905773 localhost root
---TRANSACTION 0 412534255, not started, process no 28363, OS thread id 1092630864
MySQL thread id 39, query id 14279 localhost user
---TRANSACTION 0 412917331, not started, process no 28363, OS thread id 1144179024
MySQL thread id 38, query id 3908045 localhost user
---TRANSACTION 0 412917201, not started, process no 28363, OS thread id 1092495696
MySQL thread id 13, query id 3908257 localhost user
---TRANSACTION 0 412538821, not started, process no 28363, OS thread id 1092360528
MySQL thread id 7, query id 3908258 localhost root
show engine innodb status
---TRANSACTION 0 412917330, ACTIVE 6 sec, process no 28363, OS thread id 1144043856
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 2, query id 3907373 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917331, sees < 0 412917131
---TRANSACTION 0 412917328, ACTIVE 6 sec, process no 28363, OS thread id 1092225360
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 6, query id 3907345 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917329, sees < 0 412917131
---TRANSACTION 0 412917326, ACTIVE 6 sec, process no 28363, OS thread id 1091955024
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 4, query id 3907335 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917327, sees < 0 412917131
---TRANSACTION 0 412917324, ACTIVE 6 sec, process no 28363, OS thread id 1092090192
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 5, query id 3907328 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917325, sees < 0 412917131
---TRANSACTION 0 412917321, ACTIVE (PREPARED) 7 sec, process no 28363, OS thread id 1143908688 preparing
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 1, query id 3907125 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917322, sees < 0 412917131
---TRANSACTION 0 412917131, ACTIVE 20 sec, process no 28363, OS thread id 1074075984, thread declared inside InnoDB 111
mysql tables in use 1, locked 0
MySQL thread id 40, query id 3904958 localhost user Sorting result
SELECT * FROM `large_table` WHERE (`large_table`.`hotspot_id` = 3000064) ORDER BY discovered_at LIMIT 848000, 1000
Trx read view will not see trx with id >= 0 412917132, sees < 0 412917132
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 1; buffer pool: 0
3510225 OS file reads, 284998 OS file writes, 202897 OS fsyncs
1.05 reads/s, 21299 avg bytes/read, 8.10 writes/s, 7.58 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 275, free list len 13392, seg size 13668,
489950 inserts, 491830 merged recs, 10986 merges
Hash table size 8850487, used cells 8127172, node heap has 32697 buffer(s)
71914.53 hash searches/s, 8701.91 non-hash searches/s
---
LOG
---
Log sequence number 157 3331524445
Log flushed up to 157 3331521939
Last checkpoint at 157 3326072846
1 pending log writes, 0 pending chkp writes
199025 log i/o's done, 7.53 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 4788954432; in additional pool allocated 1048576
Buffer pool size 262144
Free buffers 0
Database pages 229447
Modified db pages 1439
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 7453325, created 14887, written 118658
1.37 reads/s, 0.11 creates/s, 0.53 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
1 queries inside InnoDB, 0 queries in queue
7 read views open inside InnoDB
Main thread process no. 28363, id 1091684688, state: flushing log
Number of rows inserted 1093064, updated 249134, deleted 1405, read 1115880534
7.89 inserts/s, 2.47 updates/s, 0.05 deletes/s, 80953.21 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
编辑:
感谢您澄清这一点。
所以我现在必须将我的问题分成两种情况。
锁定此单个表会导致我的整个应用程序“挂起”,这正常吗?DB 不应该响应对其他表的查询吗?也许某些缓冲区设置得太低了?
将此表切换为 MyISAM 有帮助吗?我根本不需要此表上的事务。在这种情况下(长选择 + 多次快速插入)不会有其他锁吗?
编辑2:
插入查询如下所示:
INSERT INTO `large_table` (`device_address`, `hotspot_id`, `minute`, `created_at`, `updated_at`, `discovered_with_hci`, `hour`, `rssi`, `day`, `device_class`, `discovered_at`) VALUES('10:40:03:90:10:40', 3000008, 1, '2011-08-22 05:01:08', '2011-08-22 05:01:08', -1, 5, -79, '2011-08-22 05:01:01', '0', '2011-08-22 05:01:01')
这就是其上定义的索引:
+-------------+------------+----------------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+----------------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| large_table | 0 | PRIMARY | 1 | id | A | 92396334 | NULL | NULL | | BTREE | |
| large_table | 1 | index_large_table_on_discovered_with_hci | 1 | discovered_with_hci | A | 18 | NULL | NULL | YES | BTREE | |
| large_table | 1 | index_large_table_on_hotspot_id | 1 | hotspot_id | A | 18 | NULL | NULL | YES | BTREE | |
| large_table | 1 | index_large_table_on_day_and_hour_and_minute | 1 | day | A | 18 | NULL | NULL | YES | BTREE | |
| large_table | 1 | index_large_table_on_day_and_hour_and_minute | 2 | hour | A | 18 | NULL | NULL | YES | BTREE | |
| large_table | 1 | index_large_table_on_day_and_hour_and_minute | 3 | minute | A | 537187 | NULL | NULL | YES | BTREE | |
| large_table | 1 | index_large_table_on_created_at | 1 | created_at | A | 8399666 | NULL | NULL | YES | BTREE | |
| large_table | 1 | index_large_table_on_rssi | 1 | rssi | A | 18 | NULL | NULL | YES | BTREE | |
+-------------+------------+----------------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
编辑3:
为什么在进行此类查询时我的整个应用程序没有响应?难道它只影响“large_table”吗?
我的 mysql 配置可能出了问题?服务器是 4 核 Xeon 2GHz,配备 16GB RAM。它运行 MySQL + Rails App
我的配置参数:
skip-external-locking
key_buffer = 64M
max_allowed_packet = 16M
thread_stack = 128K
thread_cache_size = 8
query_cache_size = 32M
tmp_table_size = 64M
max_heap_table_size = 64M
table_cache = 256
read_rnd_buffer_size = 512K
sort_buffer_size = 2M
myisam-recover = BACKUP
max_connections = 200
query_cache_limit = 1M
long_query_time = 200
max_binlog_size = 100M
innodb_buffer_pool_size = 4G
safe-updates
max_join_size=100000000
Mysqltuner 脚本仅建议:
long_query_time (<= 10)
innodb_buffer_pool_size (>= 62G)
答案1
请仔细查看进程列表和“show engine innodb status”。你看到了什么???
进程 ID 1、2、4、5、6、13 均正在尝试运行 COMMIT。
谁在阻碍一切??? 进程 ID 40 正在针对 large_table 运行查询。
进程 ID 40 已运行 33 秒。进程 ID 1、2、4、5、6、13 已运行不到 33 秒。进程 ID 40 正在处理某事。发生了什么问题???
首先,查询正在影响 large_table 的聚集索引通过脉动循环控制电路。
在进程 ID 1、2、4、5、6、13 中,有些行具有 MVCC 数据,以保护其事务隔离性。进程 ID 40 有一个查询,该查询正在遍历数据行。如果字段 hotspot_id 上有索引,则该键 + 聚簇索引中实际行的键必须执行内部锁定。(注意:根据设计,InnoDB 中的所有非唯一索引都同时带有您的键(您要索引的列)+ 聚簇索引键)。这种独特的场景本质上是势不可挡的力量与不可移动的物体相遇。
本质上,COMMIT 必须等到可以安全地对 large_table 应用更改时才执行。您的情况并非独一无二、并非个例、也并非罕见现象。
我实际上在 DBA StackExchange 中回答了三个这样的问题。这些问题由同一个人与同一个问题有关。我的回答不是解决方案,但帮助提问者得出了如何处理他的情况的结论。
除了这些答案之外,我回答了另一个人关于 InnoDB 中 SELECT 死锁的问题。
我希望我过去关于这个主题的帖子能帮助您弄清楚发生了什么事。
更新时间:2011-08-25 08:10 EDT
以下是来自进程 ID 40 的查询
SELECT * FROM `large_table`
WHERE (`large_table`.`hotspot_id` = 3000064)
ORDER BY discovered_at LIMIT 799000, 1000;
两个观察结果:
您正在执行 'SELECT *',是否需要获取每一列?如果您只需要特定的列,则应该标记它们,因为 1000 行的临时表可能比您实际需要的要大。
WHERE 和 ORDER BY 子句通常会暴露性能问题或使表设计更加出色。您需要创建一种机制,在收集数据之前加快密钥的收集速度。
根据这两点观察,您必须做出两项重大改变:
重大变化 #1:重构查询
重新设计查询以便
- 密钥是从索引中收集的
- 只有 1000 个或他们被收集
- 重新加入主表
以下是执行以下三项操作的新查询
SELECT large_table.* FROM
large_table INNER JOIN
(
SELECT hotspot_id,discovered_at
FROM large_table
WHERE hotspot_id = 3000064
ORDER BY discovered_at
LIMIT 799000,1000
) large_table_keys
USING (hotspot_id,discovered_at);
子查询 large_table_keys 收集您需要的 1000 个键。然后,子查询的结果将 INNER JOIN 到 large_table。到目前为止,检索的是键,而不是整行。仍然需要读取 799,000 行。有一种更好的方法来获取这些键,这让我们想到了...
重大变化 #2:创建支持重构查询的索引
由于重构后的查询仅包含一个子查询,因此只需创建一个索引。该索引如下:
ALTER TABLE large_table ADD INDEX hotspot_discovered_ndx (hotspot_id,discovered_at);
为什么是这个特定的索引?看看 WHERE 子句。hotspot_id 是一个静态值。这使得所有 hotspot_id 在索引中形成一个顺序列表。现在,看看 ORDER BY 子句。discovered_at 列可能是 DATETIME 或 TIMESTAMP 字段。
索引中呈现的自然顺序如下:
- 索引包含 hostpot_ids 列表
- 每个 hotspot_id 都有一个由 discover_at 字段组成的有序列表
建立此索引还可以消除对临时表进行内部排序的需要。
请将这两个重大变化落实到位,您将看到运行时间的差异。
试一试 !!!
更新时间 2011-08-25 08:15 EDT
我查看了您的索引。您仍需要创建我建议的索引。