各位朋友,
Clickhouse 服务器版本:22.8.21(尝试了多种 22.x 和 23.x 版本来clickhouse-backup
恢复我的数据进行测试,但似乎都无法提高查询性能)。
我发现这里的 clickhouse 服务器存在一些非常奇怪的行为(在专用机器和加载了相同数据的 docker 镜像上)。
有时查询需要大约 50 毫秒才能完成,其他查询则需要 250 秒以上才能完成(如果您经常运行 200 秒的查询,从各方面来看它都会填充到某个缓存中,并且时间会降低到大约 50 毫秒)。
数据集位于具有简单索引的表中(查询不会跨主索引运行,因此它将扫描整个 3.5Gb 数据集)。
CREATE TABLE `cdrs_processed` (
`Id` UInt64,
`Direction` String,
`Time` DateTime('UTC'),
`CallID` String,
`CorrelationID` String,
... <snip> ...
)
ENGINE = ReplacingMergeTree(Id)
PARTITION BY toYYYYMM(Time)
ORDER BY (Id,Time)
PRIMARY KEY Id
SETTINGS index\_granularity = 8192
COMMENT 'Processed and enriched CDR records';
正在运行的查询是这样的SELECT * FROM cdrs_processed WHERE CorrelationID='<some guid>' ORDER BY Id;
不同的 guid 有不同的查询响应时间。这里有趣的部分是:
在流量中运行 PCAP,在执行高性能查询(即记录在约 50 毫秒内返回)时,服务器首先响应列标题,然后响应数据。当选择的 CorrelationID 值导致读取速度缓慢时,PCAP 流量会显示连接和查询,然后显示一大堆 TCP KeepAlive 消息,只有这样(如果没有超时)它才会发送列标题和数据。
部署的 clickhouse-server 软件运行在 AWS EC2 r5a.xlarge 节点上,接收的查询最少(每秒 1-2 次),并且由于某些奇怪的原因,EC2 实例的 4 个 vCPU 以 100% 的速度运转。此服务器的查询性能问题与我们从备份中获取数据并将其还原到在开发人员系统上运行 clickhouse-server 的本地 docker 映像时相同。
clickhouse-client
在运行查询之前使用并运行来调试查询set send_logs_level='trace';
。如果它是一个高性能查询,我们会得到有趣的日志。
[2f7418aeaf9c] 2023.09.06 05:28:27.307306 [ 287 ] {5c35e3ba-15d3-494e-991b-9a93674ae9af} <Debug> executeQuery: (from 172.28.0.1:47140) SELECT * FROM nextvoice.cdrs_processed WHERE CorrelationID='E3065F87-3CB9-4C58-A12D-66A686B8C53D'; (stage: Complete)
[2f7418aeaf9c] 2023.09.06 05:28:27.322028 [ 287 ] {5c35e3ba-15d3-494e-991b-9a93674ae9af} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "CorrelationID = 'E3065F87-3CB9-4C58-A12D-66A686B8C53D'" moved to PREWHERE
[2f7418aeaf9c] 2023.09.06 05:28:27.328348 [ 287 ] {5c35e3ba-15d3-494e-991b-9a93674ae9af} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "CorrelationID = 'E3065F87-3CB9-4C58-A12D-66A686B8C53D'" moved to PREWHERE
[2f7418aeaf9c] 2023.09.06 05:28:27.331671 [ 287 ] {5c35e3ba-15d3-494e-991b-9a93674ae9af} <Trace> ContextAccess (default): Access granted: SELECT(Id, Direction, Time, CallID, .... <snip>
一切正常,但如果查询速度较慢,那就变得有趣了。该日志行的第一行会相应地打印时间戳,第二行和后续行直到查询完成才会打印,但会包含时间戳,就好像它们是在正确的时间发布的一样。即。
打印此行。
[2f7418aeaf9c] 2023.09.06 05:30:06.838300 [ 287 ] {f9d77d15-8b7b-42d7-a344-eb0ad09ee690} <Debug> executeQuery: (from 172.28.0.1:47140) SELECT * FROM nextvoice.cdrs_processed WHERE CorrelationID='03FBC351-48A1-4E3A-8257-AA97ED75F7A5'; (stage: Complete)
然后我们什么也得不到,直到2023.09.06 05:31:35.309735
我们打印出所有其余的跟踪线,比如下面的那些。(我拍了一段视频记录了这一过程)。
[2f7418aeaf9c] 2023.09.06 05:30:06.860733 [ 287 ] {f9d77d15-8b7b-42d7-a344-eb0ad09ee690} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[2f7418aeaf9c] 2023.09.06 05:30:06.864526 [ 287 ] {f9d77d15-8b7b-42d7-a344-eb0ad09ee690} <Debug> nextvoice.cdrs_processed (c8be24d5-22dd-4344-a9bf-10db9eabc46d) (SelectExecutor): Key condition: unknown
[2f7418aeaf9c] 2023.09.06 05:30:06.865117 [ 287 ] {f9d77d15-8b7b-42d7-a344-eb0ad09ee690} <Debug> nextvoice.cdrs_processed (c8be24d5-22dd-4344-a9bf-10db9eabc46d) (SelectExecutor): MinMax index condition: unknown
[2f7418aeaf9c] 2023.09.06 05:30:06.880895 [ 287 ] {f9d77d15-8b7b-42d7-a344-eb0ad09ee690} <Debug> nextvoice.cdrs_processed (c8be24d5-22dd-4344-a9bf-10db9eabc46d) (SelectExecutor): Selected 62/62 parts by partition key, 62 parts by primary key, 1873/1873 marks by primary key, 1873 marks to read from 62 ranges
[2f7418aeaf9c] 2023.09.06 05:31:35.309735 [ 287 ] {f9d77d15-8b7b-42d7-a344-eb0ad09ee690} <Debug> nextvoice.cdrs_processed (c8be24d5-22dd-4344-a9bf-10db9eabc46d) (SelectExecutor): Reading approx. 3852688 rows with 4 streams
- 接下来,server0 在过去 6 个多小时内一直很安静,没有任何类型的查询,唯一发送给它的是 prometheus 指标抓取。然而,跟踪日志不断滚动过去
2023.09.06 10:51:06.000598 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Debug> MergeTask::PrepareStage: Merging 2 parts: from 202309_75514_76920_1105 to 202309_76921_76921_0 into Wide
2023.09.06 10:51:06.000762 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Debug> MergeTask::PrepareStage: Selected MergeAlgorithm: Horizontal
2023.09.06 10:51:06.000789 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Debug> MergeTreeSequentialSource: Reading 10 marks from part 202309_75514_76920_1105, total 70966 rows starting from the beginning of the part
2023.09.06 10:51:06.000924 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Debug> MergeTreeSequentialSource: Reading 2 marks from part 202309_76921_76921_0, total 34 rows starting from the beginning of the part
2023.09.06 10:51:06.029025 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Debug> MergeTask::MergeProjectionsStage: Merge sorted 71000 rows, containing 10 columns (10 merged, 0 gathered) in 0.028416545 sec., 2498544.4219203987 rows/sec., 805.83 MiB/sec.
2023.09.06 10:51:06.030461 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Trace> MergedBlockOutputStream: filled checksums 202309_75514_76921_1106 (state Temporary)
2023.09.06 10:51:06.030766 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Trace> system.trace_log (b09755a5-5004-4d4e-adef-82f4b86b4962): Renaming temporary part tmp_merge_202309_75514_76921_1106 to 202309_75514_76921_1106.
2023.09.06 10:51:06.030821 [ 317 ] {b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106} <Trace> system.trace_log (b09755a5-5004-4d4e-adef-82f4b86b4962) (MergerMutator): Merged 2 parts: from 202309_75514_76920_1105 to 202309_76921_76921_0
2023.09.06 10:51:06.030869 [ 317 ] {} <Debug> MemoryTracker: Peak memory usage to apply mutate/merge in b09755a5-5004-4d4e-adef-82f4b86b4962::202309_75514_76921_1106: 43.34 MiB.
2023.09.06 10:51:07.000655 [ 335 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 804.53 MiB, peak 3.11 GiB, will set to 807.54 MiB (RSS), difference: 3.01 MiB
2023.09.06 10:51:07.593604 [ 349 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushing system log, 2030 entries to flush up to offset 5643012
2023.09.06 10:51:07.594682 [ 349 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 41.29 GiB.
2023.09.06 10:51:07.595125 [ 349 ] {} <Trace> MergedBlockOutputStream: filled checksums 202309_2778_2778_0 (state Temporary)
2023.09.06 10:51:07.595304 [ 349 ] {} <Trace> system.asynchronous_metric_log (65e48ced-63b1-49f3-b373-fe52139c8fd6): Renaming temporary part tmp_insert_202309_2778_2778_0 to 202309_64623_64623_0.
2023.09.06 10:51:07.595444 [ 349 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushed system log up to offset 5643012
2023.09.06 10:51:07.769264 [ 327 ] {} <Trace> system.trace_log (b09755a5-5004-4d4e-adef-82f4b86b4962): Found 2 old parts to remove.
2023.09.06 10:51:07.769291 [ 327 ] {} <Debug> system.trace_log (b09755a5-5004-4d4e-adef-82f4b86b4962): Removing part from filesystem 202309_75514_76856_1041
2023.09.06 10:51:07.769959 [ 327 ] {} <Debug> system.trace_log (b09755a5-5004-4d4e-adef-82f4b86b4962): Removing part from filesystem 202309_76857_76857_0
2023.09.06 10:51:10.302387 [ 475 ] {} <Debug> DNSResolver: Updating DNS cache
2023.09.06 10:51:10.302543 [ 475 ] {} <Debug> DNSResolver: Updated DNS cache
2023.09.06 10:51:10.924813 [ 350 ] {} <Trace> SystemLog (system.metric_log): Flushing system log, 7 entries to flush up to offset 19458
2023.09.06 10:51:10.932845 [ 350 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 41.29 GiB.
2023.09.06 10:51:10.935552 [ 350 ] {} <Trace> MergedBlockOutputStream: filled checksums 202309_2590_2590_0 (state Temporary)
2023.09.06 10:51:10.935969 [ 350 ] {} <Trace> system.metric_log (8c45d18c-4f03-43de-9848-28490ac69588): Renaming temporary part tmp_insert_202309_2590_2590_0 to 202309_60226_60226_0.
2023.09.06 10:51:10.936121 [ 471 ] {} <Debug> system.metric_log (8c45d18c-4f03-43de-9848-28490ac69588) (MergerMutator): Selected 6 parts from 202309_59866_60221_71 to 202309_60226_60226_0
2023.09.06 10:51:10.936149 [ 471 ] {} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 41.29 GiB.
2023.09.06 10:51:10.936234 [ 302 ] {8c45d18c-4f03-43de-9848-28490ac69588::202309_59866_60226_72} <Debug> MergeTask::PrepareStage: Merging 6 parts: from 202309_59866_60221_71 to 202309_60226_60226_0 into Compact
这表明它正在尝试处理某种形式的查询/合并/插入。
一个客户端应用程序(已关闭 6 小时)最多每秒处理 1-2 个请求。其中可能有一个更新请求,但我可以用插入请求代替它们并依赖行为ReplacingMergeTree
。
有人能提供一些想法,说明在哪里/什么地方可以找到需要更改或调整的内容吗?数据中缺少索引似乎对某些查询来说不是问题,因为在此阶段顺序扫描的性能已经足够了。我们正在尝试获得可靠的查询性能,以便我们可以识别出我们需要的跨数据的正确索引/物化视图/投影,以查看它们如何影响事物,但由于我们无法从 clickhouse 获得可靠的性能...)。