这是在具有 4 个虚拟核心的 Hyper-V 虚拟机中运行的 SQL Server 2008 R2。
任务管理器报告 sqlservr.exe 至少占用了 50% 的 CPU,有时几分钟内占用 100%。I/O 非常低。活动监视器中列出的“最近昂贵的查询”最多,大约在 100 毫秒/秒的范围内,逻辑读取/秒的数字也相当低。线程数似乎还可以(大约 120 个打开的连接,大约 180 个,其中很少有正在运行的)。
维护计划运行正常:每 15 分钟进行一次日志备份,仅运行几秒钟。
知道什么可能导致 CPU 负载这么高吗?
更新:我似乎是通过分析一段简短的未过滤 SQL 跟踪发现了这个问题。一个由后台服务反复调用的存储过程发出了一个 SELECT 语句,该语句存在典型的“参数嗅探”问题。修复此问题后,CPU 使用率下降了。
所以我想稍微重新措辞一下这个问题:存储过程或存储过程执行的语句从未出现在“最近的昂贵查询”中,这是正常的吗?
答案1
所以我想稍微重新措辞一下这个问题:存储过程或存储过程执行的语句从未出现在“最近的昂贵查询”中,这是正常的吗?
这些语句很可能存在(至少如果它高于某些指标 [见下文]),但该工具只会显示该语句。其余部分由管理员决定。这是一个非常基本的工具(活动监视器),可用于迅速地浏览一下,看看情况如何。就我个人而言,我不会使用它,因为它很简单,虽然当只是想快速检查时它很有用。
为了进一步回答您的问题,这些语句可能不在“最近的昂贵查询”部分中。为什么?下面是 SSMS 用于确定信息的代码,您可以看到,对于“昂贵”的随机猜测,存在硬编码值。
CREATE PROC #am_get_querystats AS
BEGIN
-- Set @N to the number of queries to capture by each ordering metric. Since there are seven different metrics,
-- if N=10, you will get a minimum of 10 query plans (if the top 20 are the same for all metrics) and a max of 70
-- queries (if the five sets of 10 queries are all disjoint sets). Actually, the query returns 'plan fingerprint'
-- stats, not plan stats.
DECLARE @N int;
SET @N = 20;
SET NOCOUNT ON;
-- Step 1: Capture a snapshot of (cumulative) query stats and store it in #temp_fingerprint_stats
IF OBJECT_ID ('tempdb..#temp_fingerprint_stats') IS NOT NULL
BEGIN
DROP TABLE #temp_fingerprint_stats;
END;
-- Get timestamp of current snapshot
DECLARE @current_collection_time datetime;
SET @current_collection_time = GETDATE();
-- This CTE returns a unified view of the query stats for both in-progress queries (from sys.dm_exec_requests)
-- and completed queries (from sys.dm_exec_query_stats).
WITH merged_query_stats AS
(
SELECT
[sql_handle],
statement_start_offset,
statement_end_offset,
plan_generation_num,
[plan_handle],
query_hash AS query_fingerprint,
query_plan_hash AS plan_fingerprint,
creation_time,
last_execution_time,
execution_count,
total_worker_time / 1000 AS total_worker_time_ms,
min_worker_time / 1000 AS min_worker_time_ms,
max_worker_time / 1000 AS max_worker_time_ms,
total_physical_reads,
min_physical_reads,
max_physical_reads,
total_logical_writes,
min_logical_writes,
max_logical_writes,
total_logical_reads,
min_logical_reads,
max_logical_reads,
total_clr_time,
min_clr_time,
max_clr_time,
total_elapsed_time / 1000 AS total_elapsed_time_ms,
min_elapsed_time / 1000 AS min_elapsed_time_ms,
max_elapsed_time / 1000 AS max_elapsed_time_ms,
total_elapsed_time / 1000 AS total_completed_execution_time_ms
FROM sys.dm_exec_query_stats AS q
-- To reduce the number of rows that we have to deal with in later queries, filter out any very old rows
WHERE q.last_execution_time > DATEADD (hour, -4, GETDATE())
-- The UNIONed query below is a workaround for VSTS #91422, sys.dm_exec_query_stats does not reflect stats for in-progress queries.
UNION ALL
SELECT
[sql_handle],
statement_start_offset,
statement_end_offset,
NULL AS plan_generation_num,
plan_handle,
query_hash AS query_fingerprint,
query_plan_hash AS plan_fingerprint,
start_time AS creation_time,
start_time AS last_execution_time,
0 AS execution_count,
cpu_time AS total_worker_time_ms,
NULL AS min_worker_time_ms, -- min should not be influenced by in-progress queries
cpu_time AS max_worker_time_ms,
reads AS total_physical_reads,
NULL AS min_physical_reads, -- min should not be influenced by in-progress queries
reads AS max_physical_reads,
writes AS total_logical_writes,
NULL AS min_logical_writes, -- min should not be influenced by in-progress queries
writes AS max_logical_writes,
logical_reads AS total_logical_reads,
NULL AS min_logical_reads, -- min should not be influenced by in-progress queries
logical_reads AS max_logical_reads,
NULL AS total_clr_time, -- CLR time is not available in dm_exec_requests
NULL AS min_clr_time, -- CLR time is not available in dm_exec_requests
NULL AS max_clr_time, -- CLR time is not available in dm_exec_requests
total_elapsed_time AS total_elapsed_time_ms,
NULL AS min_elapsed_time_ms, -- min should not be influenced by in-progress queries
total_elapsed_time AS max_elapsed_time_ms,
NULL AS total_completed_execution_time_ms
FROM sys.dm_exec_requests AS r
WHERE [sql_handle] IS NOT NULL
-- Don't attempt to collect stats for very brief in-progress requests; the active statement
-- will likely have changed by the time that we harvest query text, in the next query
AND DATEDIFF (second, r.start_time, @current_collection_time) > 1
)
-- Insert the fingerprint stats into a temp table. SQL isn't always able to produce a good estimate of the amount of
-- memory that the upcoming sorts (for ROW_NUMER()) will need because of lack of accurate stats on DMVs. Staging the
-- data in a temp table allows the memory cost of the sort operations to be more accurate, which avoids unnecessary
-- spilling to tempdb.
SELECT
fingerprint_stats.*,
example_plan.sample_sql_handle,
example_plan.sample_plan_handle,
example_plan.sample_statement_start_offset,
example_plan.sample_statement_end_offset
INTO #temp_fingerprint_stats
FROM
-- Calculate plan fingerprint stats by grouping the query stats by plan fingerprint
(
SELECT
mqs.query_fingerprint,
mqs.plan_fingerprint,
-- The same plan could be returned by both dm_exec_query_stats and dm_exec_requests -- count distinct plan
-- handles only
COUNT(DISTINCT plan_handle) AS plan_count,
MIN (mqs.creation_time) AS creation_time,
MAX (mqs.last_execution_time) AS last_execution_time,
SUM (mqs.execution_count) AS execution_count,
SUM (mqs.total_worker_time_ms) AS total_worker_time_ms,
MIN (mqs.min_worker_time_ms) AS min_worker_time_ms,
MAX (mqs.max_worker_time_ms) AS max_worker_time_ms,
SUM (mqs.total_physical_reads) AS total_physical_reads,
MIN (mqs.min_physical_reads) AS min_physical_reads,
MAX (mqs.max_physical_reads) AS max_physical_reads,
SUM (mqs.total_logical_writes) AS total_logical_writes,
MIN (mqs.min_logical_writes) AS min_logical_writes,
MAX (mqs.max_logical_writes) AS max_logical_writes,
SUM (mqs.total_logical_reads) AS total_logical_reads,
MIN (mqs.min_logical_reads) AS min_logical_reads,
MAX (mqs.max_logical_reads) AS max_logical_reads,
SUM (mqs.total_clr_time) AS total_clr_time,
MIN (mqs.min_clr_time) AS min_clr_time,
MAX (mqs.max_clr_time) AS max_clr_time,
SUM (mqs.total_elapsed_time_ms) AS total_elapsed_time_ms,
MIN (mqs.min_elapsed_time_ms) AS min_elapsed_time_ms,
MAX (mqs.max_elapsed_time_ms) AS max_elapsed_time_ms,
SUM (mqs.total_completed_execution_time_ms) AS total_completed_execution_time_ms
FROM merged_query_stats AS mqs
GROUP BY
mqs.query_fingerprint,
mqs.plan_fingerprint
) AS fingerprint_stats
INNER JOIN
(
-- This query assigns a unique row identifier to each plan that has the same fingerprint -- we'll
-- select each fingerprint's 'Plan #1' (the earliest example that's still in cache) to use as a sample plan
-- for the fingerprint. Later (in the outer query's WHERE clause) we'll filter out all but the first plan,
-- and use that one to get a valid sql_handle/plan_handle.
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY plan_fingerprint
ORDER BY creation_time
) AS plan_instance_number
FROM
(
SELECT
query_hash AS query_fingerprint,
query_plan_hash AS plan_fingerprint,
qs.[sql_handle] AS sample_sql_handle,
qs.plan_handle AS sample_plan_handle,
qs.statement_start_offset AS sample_statement_start_offset,
qs.statement_end_offset AS sample_statement_end_offset,
qs.creation_time
FROM sys.dm_exec_query_stats AS qs
-- To get a sample plan for in-progress queries, we need to look in dm_exec_requests, too
UNION ALL
SELECT
query_hash AS query_fingerprint,
query_plan_hash AS plan_fingerprint,
r.[sql_handle] AS sample_sql_handle,
r.plan_handle AS sample_plan_handle,
r.statement_start_offset AS sample_statement_start_offset,
r.statement_end_offset AS sample_statement_end_offset,
r.start_time AS creation_time
FROM sys.dm_exec_requests AS r
) AS all_plans_numbered
) AS example_plan
ON example_plan.query_fingerprint = fingerprint_stats.query_fingerprint
AND example_plan.plan_fingerprint = fingerprint_stats.plan_fingerprint
-- To improve perf of the next query, filter out plan fingerprints that aren't very interesting according to any of our
-- perf metrics. Note that our most frequent allowed execution rate for this script is one execution every 15 seconds,
-- so, for example, a plan that is executed 50 times in a 15+ second time interval will qualify for further processing.
WHERE plan_instance_number = 1
AND (fingerprint_stats.total_worker_time_ms > 500 -- 500 ms cumulative CPU time
OR fingerprint_stats.execution_count > 50 -- 50 executions
OR fingerprint_stats.total_physical_reads > 50 -- 50 cumulative physical reads
OR fingerprint_stats.total_logical_reads > 5000 -- 5,000 cumulative logical reads
OR fingerprint_stats.total_logical_writes > 50 -- 50 cumulative logical writes
OR fingerprint_stats.total_elapsed_time_ms > 5000) -- 5 seconds cumulative execution time
-- SQL doesn't always have good stats on DMVs, and as a result it may select a loop join-based plan w/the
-- sys.dm_exec_query_stats DMV as the inner table. The DMVs don't have indexes that would support efficient
-- loop joins, and will commonly have a large enough number of rows that unindexed loop joins will be an
-- unattractive option. Given this, we gain much better worst-case perf with minimal cost to best-case perf
-- by prohibiting loop joins via this hint.
OPTION (HASH JOIN, MERGE JOIN);
-- Step 2: Rank the plan fingerprints by CPU use, execution count, etc and store the results in #am_fingerprint_stats_snapshots
-- Now we have the stats for all plan fingerprints. Return only the top N by each of our perf metrics.
-- The reason we need a derived table here is because SQL disallows the direct use of ROW_NUMBER()
-- in a WHERE clause, yet we need to filter based on the row number (rank).
INSERT INTO #am_fingerprint_stats_snapshots
SELECT
@current_collection_time AS collection_time,
ranked_fingerprint_stats.*,
batch_text.text AS batch_text,
plan_info.dbname
FROM
(
SELECT
*,
-- Rank the fingerprints by each of our perf metrics
ROW_NUMBER () OVER (ORDER BY plan_count DESC) AS plan_count_rank,
ROW_NUMBER () OVER (ORDER BY total_worker_time_ms DESC) AS cpu_rank,
ROW_NUMBER () OVER (ORDER BY total_physical_reads DESC) AS physical_reads_rank,
ROW_NUMBER () OVER (ORDER BY total_logical_reads DESC) AS logical_reads_rank,
ROW_NUMBER () OVER (ORDER BY total_logical_writes DESC) AS logical_writes_rank,
ROW_NUMBER () OVER (ORDER BY max_elapsed_time_ms DESC) AS max_duration_rank,
ROW_NUMBER () OVER (ORDER BY execution_count DESC) AS execution_count_rank
FROM #temp_fingerprint_stats
) AS ranked_fingerprint_stats
-- Get the query text
OUTER APPLY sys.dm_exec_sql_text (sample_sql_handle) AS batch_text
OUTER APPLY (SELECT DB_NAME(CONVERT(int, value)) AS dbname FROM sys.dm_exec_plan_attributes(sample_plan_handle) WHERE attribute='dbid') AS plan_info
WHERE
cpu_rank <= @N
OR logical_writes_rank <= @N
OR physical_reads_rank <= @N
OR max_duration_rank <= @N
OR execution_count_rank <= @N
OR plan_count_rank <= @N;
-- Step 3: Calculate the delta since the prior snapshot, and return the fingerprint stats for the just-completed time interval
-- Get timestamp of previous snapshot
DECLARE @previous_collection_time datetime;
SELECT TOP 1 @previous_collection_time = collection_time
FROM #am_fingerprint_stats_snapshots
WHERE collection_time < @current_collection_time
ORDER BY collection_time DESC;
-- Subtract prior stats from current stats to get stats for this time interval
--
-- In this query, the expression below represents 'interval duration'. If we do find the plan in the prior snapshot, we use
-- [current_snapshot_time - prior_snapshot_time] to calculate the interval duration. If we don't find the plan fingerprint in
-- the prior snapshot (either because this is the first snapshot, or because this is the first time this plan has shown up in
-- our TOP N), and if the plan has been around since before our prior snapshot, we amortize its execution cost over the plan
-- lifetime.
--
-- DATEDIFF (second,
-- CASE
-- WHEN (prev_stats.plan_fingerprint IS NULL AND cur_stats.creation_time < @previous_collection_time)
-- THEN cur_stats.creation_time
-- ELSE @previous_collection_time
-- END,
-- @current_collection_time)
--
-- The purpose of this is to avoid the assumption that all execution stats for a 'new' query plan occurred within the just-
-- completed time interval. It also allows us to show some execution stats immediately after the first snapshot, rather than
-- waiting until the second snapshot.
WITH interval_fingerprint_stats AS
(
SELECT
cur_stats.batch_text,
SUBSTRING (
cur_stats.batch_text,
(cur_stats.sample_statement_start_offset/2) + 1,
(
(
CASE cur_stats.sample_statement_end_offset
WHEN -1 THEN DATALENGTH(cur_stats.batch_text)
WHEN 0 THEN DATALENGTH(cur_stats.batch_text)
ELSE cur_stats.sample_statement_end_offset
END
- cur_stats.sample_statement_start_offset
)/2
) + 1
) AS statement_text,
-- If we don't have a prior snapshot, and if the plan has been around since before the start of the interval,
-- amortize the cost of the query over its lifetime so that we don't make the (typically incorrect) assumption
-- that the cost was all accumulated within the just-completed interval.
CASE
WHEN DATEDIFF (second, CASE WHEN (@previous_collection_time IS NULL) OR (prev_stats.plan_fingerprint IS NULL AND cur_stats.creation_time < @previous_collection_time) THEN cur_stats.creation_time ELSE @previous_collection_time END, @current_collection_time) > 0
THEN DATEDIFF (second, CASE WHEN (@previous_collection_time IS NULL) OR (prev_stats.plan_fingerprint IS NULL AND cur_stats.creation_time < @previous_collection_time) THEN cur_stats.creation_time ELSE @previous_collection_time END, @current_collection_time)
ELSE 1 -- protect from divide by zero
END AS interval_duration_sec,
cur_stats.query_fingerprint,
cur_stats.plan_fingerprint,
cur_stats.sample_sql_handle,
cur_stats.sample_plan_handle,
cur_stats.sample_statement_start_offset,
cur_stats.sample_statement_end_offset,
cur_stats.plan_count,
-- If a plan is removed from cache, then reinserted, it is possible for it to seem to have negative cost. The
-- CASE statements below handle this scenario.
CASE WHEN (cur_stats.execution_count - ISNULL (prev_stats.execution_count, 0)) < 0
THEN cur_stats.execution_count
ELSE (cur_stats.execution_count - ISNULL (prev_stats.execution_count, 0))
END AS interval_executions,
cur_stats.execution_count AS total_executions,
CASE WHEN (cur_stats.total_worker_time_ms - ISNULL (prev_stats.total_worker_time_ms, 0)) < 0
THEN cur_stats.total_worker_time_ms
ELSE (cur_stats.total_worker_time_ms - ISNULL (prev_stats.total_worker_time_ms, 0))
END AS interval_cpu_ms,
CASE WHEN (cur_stats.total_physical_reads - ISNULL (prev_stats.total_physical_reads, 0)) < 0
THEN cur_stats.total_physical_reads
ELSE (cur_stats.total_physical_reads - ISNULL (prev_stats.total_physical_reads, 0))
END AS interval_physical_reads,
CASE WHEN (cur_stats.total_logical_writes - ISNULL (prev_stats.total_logical_writes, 0)) < 0
THEN cur_stats.total_logical_writes
ELSE (cur_stats.total_logical_writes - ISNULL (prev_stats.total_logical_writes, 0))
END AS interval_logical_writes,
CASE WHEN (cur_stats.total_logical_reads - ISNULL (prev_stats.total_logical_reads, 0)) < 0
THEN cur_stats.total_logical_reads
ELSE (cur_stats.total_logical_reads - ISNULL (prev_stats.total_logical_reads, 0))
END AS interval_logical_reads,
CASE WHEN (cur_stats.total_elapsed_time_ms - ISNULL (prev_stats.total_elapsed_time_ms, 0)) < 0
THEN cur_stats.total_elapsed_time_ms
ELSE (cur_stats.total_elapsed_time_ms - ISNULL (prev_stats.total_elapsed_time_ms, 0))
END AS interval_elapsed_time_ms,
cur_stats.total_completed_execution_time_ms AS total_completed_execution_time_ms,
cur_stats.dbname
FROM #am_fingerprint_stats_snapshots AS cur_stats
LEFT OUTER JOIN #am_fingerprint_stats_snapshots AS prev_stats
ON prev_stats.collection_time = @previous_collection_time
AND prev_stats.plan_fingerprint = cur_stats.plan_fingerprint AND prev_stats.query_fingerprint = cur_stats.query_fingerprint
WHERE cur_stats.collection_time = @current_collection_time
)
SELECT
SUBSTRING (statement_text, 1, 200) AS [Query],
/* Begin hidden grid columns */
statement_text,
-- We must convert these to a hex string representation because they will be stored in a DataGridView, which can't
-- handle binary cell values (assumes anything binary is an image)
master.dbo.fn_varbintohexstr(query_fingerprint) AS query_fingerprint,
master.dbo.fn_varbintohexstr(plan_fingerprint) AS plan_fingerprint,
master.dbo.fn_varbintohexstr(sample_sql_handle) AS sample_sql_handle,
master.dbo.fn_varbintohexstr(sample_plan_handle) AS sample_plan_handle,
sample_statement_start_offset,
sample_statement_end_offset,
/* End hidden grid columns */
interval_executions * 60 / interval_duration_sec AS [Executions/min],
interval_cpu_ms / interval_duration_sec AS [CPU (ms/sec)],
interval_physical_reads / interval_duration_sec AS [Physical Reads/sec],
interval_logical_writes / interval_duration_sec AS [Logical Writes/sec],
interval_logical_reads / interval_duration_sec AS [Logical Reads/sec],
CASE total_executions
WHEN 0 THEN 0
ELSE total_completed_execution_time_ms / total_executions
END AS [Average Duration (ms)],
plan_count AS [Plan Count],
dbname AS [Database Name]
FROM interval_fingerprint_stats;
-- Step 4: Delete all but the most recent snapshot; we no longer need any older data
DELETE FROM #am_fingerprint_stats_snapshots
WHERE collection_time != @current_collection_time;
END
您将能够看到不同的假设和过滤结果。我无法确定您的“查询”是否符合这些假设或被过滤掉,但这应该可以让您更好地了解为什么它可能还没有出现。
将来我会使用 Glenn Berry 的信息诊断脚本等资源,这些资源是预先编写的并且效果很好:https://sqlserverperformance.wordpress.com/2014/09/17/sql-server-diagnostic-information-queries-for-september-2014/