Is SQL Server under high CPU pressure?

Situation

Sometimes we get a phone call, a trouble ticket or just a visit from a colleague telling us

“The application xy is slow, what the heck is going on with the underlying SQL Server?”

This is the only information that we are provided with – so we have to investigate what’s really going on with SQL Server.
We even have to prove that it is SQL Server that makes the app so slow, or we can prove that it is not SQL Server.

There are plenty of ways to find out what’s going on, some people start with the amazing script by BrentOzar sp_AskBrent®, some rely on the Wait Statistics of SQL Server, you even might visit the server via Remote Desktop and have a look at the Task Manager. But how can you tell that the high green line in the CPU box is caused by SQL Server?

For me it would be handy to have a script that i can execute directly in SQL Server Management Studio that would tell me what’s going on. It would tell me from the insights of SQL Server if there is something going wrong.

In todays blog post i focus on just CPU pressure. We all know there are far more reasons why SQL Server could be slow, but let’s stick with CPU for now.

Start Investigation

Starting with a script taken from the eBook
“Troubleshooting SQL Server A Guide for the Accidental DBA” by Jonathan Kehayias and Ted Krueger
we find a script that shows us the CPU pressure via the signal wait time.

The signal wait time is the time that is measured within SQL Server’s DMVs when a task that is runnable and has all ressources available (e.g. the data pages are read from disk into the Buffer Pool) is waiting for the CPU to become available for him. So the signal wait time is pure CPU wait time.

So here’s the script from the eBook:

SELECT  SUM(signal_wait_time_ms) AS TotalSignalWaitTime ,
        ( SUM(CAST(signal_wait_time_ms AS NUMERIC(20, 2)))
          / SUM(CAST(wait_time_ms AS NUMERIC(20, 2))) * 100 ) AS PercentageSignalWaitsOfTotalTime
FROM    sys.dm_os_wait_stats
-- (Page 79, Listing 3.1: Verifying CPU Pressure via signal wait time.)

This might show you a result like this:

TotalSignalWaitTime

You might be impressed: Wow, SQL Server is waiting about 20% of the time for the CPU to become available for the next runnable task…

The interesting thing here is, that even on a SQL Server box that is sitting around bored, the results may be nearly the same. So i came across a blog post by Paul S. Randal:
Wait statistics, or please tell me where it hurts
In this blog post he clarifies:

“Bunch of waits are being filtered out of consideration, waits happen all the time and these are the benign ones we can usually ignore.” (Paul S. Randal)

So i added the WHERE clause to the upper SELECT statement and got the following script:

SELECT  SUM(signal_wait_time_ms) AS TotalSignalWaitTime ,
        ( SUM(CAST(signal_wait_time_ms AS NUMERIC(20, 2)))
          / SUM(CAST(wait_time_ms AS NUMERIC(20, 2))) * 100 ) AS PercentageSignalWaitsOfTotalTime_WithoutBenignWaits
FROM    sys.dm_os_wait_stats
WHERE	[wait_type] NOT IN (
			N'BROKER_EVENTHANDLER',             N'BROKER_RECEIVE_WAITFOR',
			N'BROKER_TASK_STOP',                N'BROKER_TO_FLUSH',
			N'BROKER_TRANSMITTER',              N'CHECKPOINT_QUEUE',
			N'CHKPT',                           N'CLR_AUTO_EVENT',
			N'CLR_MANUAL_EVENT',                N'CLR_SEMAPHORE',
			N'DBMIRROR_DBM_EVENT',              N'DBMIRROR_EVENTS_QUEUE',
			N'DBMIRROR_WORKER_QUEUE',           N'DBMIRRORING_CMD',
			N'DIRTY_PAGE_POLL',                 N'DISPATCHER_QUEUE_SEMAPHORE',
			N'EXECSYNC',                        N'FSAGENT',
			N'FT_IFTS_SCHEDULER_IDLE_WAIT',     N'FT_IFTSHC_MUTEX',
			N'HADR_CLUSAPI_CALL',               N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
			N'HADR_LOGCAPTURE_WAIT',            N'HADR_NOTIFICATION_DEQUEUE',
			N'HADR_TIMER_TASK',                 N'HADR_WORK_QUEUE',
			N'KSOURCE_WAKEUP',                  N'LAZYWRITER_SLEEP',
			N'LOGMGR_QUEUE',                    N'ONDEMAND_TASK_QUEUE',
			N'PWAIT_ALL_COMPONENTS_INITIALIZED',
			N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
			N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
			N'REQUEST_FOR_DEADLOCK_SEARCH',     N'RESOURCE_QUEUE',
			N'SERVER_IDLE_CHECK',               N'SLEEP_BPOOL_FLUSH',
			N'SLEEP_DBSTARTUP',                 N'SLEEP_DCOMSTARTUP',
			N'SLEEP_MASTERDBREADY',             N'SLEEP_MASTERMDREADY',
			N'SLEEP_MASTERUPGRADED',            N'SLEEP_MSDBSTARTUP',
			N'SLEEP_SYSTEMTASK',                N'SLEEP_TASK',
			N'SLEEP_TEMPDBSTARTUP',             N'SNI_HTTP_ACCEPT',
			N'SP_SERVER_DIAGNOSTICS_SLEEP',     N'SQLTRACE_BUFFER_FLUSH',
			N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
			N'SQLTRACE_WAIT_ENTRIES',           N'WAIT_FOR_RESULTS',
			N'WAITFOR',                         N'WAITFOR_TASKSHUTDOWN',
			N'WAIT_XTP_HOST_WAIT',              N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
			N'WAIT_XTP_CKPT_CLOSE',             N'XE_DISPATCHER_JOIN',
			N'XE_DISPATCHER_WAIT',              N'XE_TIMER_EVENT')
AND		[waiting_tasks_count] > 0

Executing that against the very same SQL Server Instance, i got the following result:

TotalSignalWaitTime_WithoutBenignWaits

Oh wait – this reduced the real wait for CPU to a bit more than 3% Signal Wait Time.
So from this point of view, this SQL Server Instance could not be under high CPU pressure.

So i executed the script without Paul’s benign waits against a server that i know is under CPU pressure:

BothSignalWaits

Now our Percent of Signal Wait Time Without Benign Waits is even higher than the Total Percent Value. That means that the “bad” waits should be taken even more into account as these Waitings take relatively longer than All Waits.
So this must be a very good indicator for high CPU pressure.
It’s just that easy: Just substract the value without the benign waits from the total value – if you get a negative value, there you are: This really must be a SQL Server under high CPU pressure.

This leads us to the following script:

DECLARE @PercentageSignalWaitsOfTotalTime FLOAT
DECLARE @PercentageSignalWaitsOfTotalTime_WithoutBenignWaits FLOAT

SELECT		@PercentageSignalWaitsOfTotalTime = (( SUM(CAST(signal_wait_time_ms AS NUMERIC(20, 2)))
            / SUM(CAST(wait_time_ms AS NUMERIC(20, 2))) * 100 ))
			, @PercentageSignalWaitsOfTotalTime_WithoutBenignWaits =
			  (SELECT
							( SUM(CAST(PaulR.signal_wait_time_ms AS NUMERIC(20, 2)))
							/ SUM(CAST(PaulR.wait_time_ms AS NUMERIC(20, 2))) * 100 )
				FROM		sys.dm_os_wait_stats PaulR
				WHERE		PaulR.wait_type NOT LIKE '%SLEEP%' -- remove eg. SLEEP_TASK and
				-- LAZYWRITER_SLEEP waits
				AND			PaulR.wait_type NOT LIKE 'XE%'	 -- remove Extended Events
				AND			PaulR.wait_type NOT IN (			 -- remove system waits
						N'BROKER_EVENTHANDLER',             N'BROKER_RECEIVE_WAITFOR',
						N'BROKER_TASK_STOP',                N'BROKER_TO_FLUSH',
						N'BROKER_TRANSMITTER',              N'CHECKPOINT_QUEUE',
						N'CHKPT',                           N'CLR_AUTO_EVENT',
						N'CLR_MANUAL_EVENT',                N'CLR_SEMAPHORE',
						N'DBMIRROR_DBM_EVENT',              N'DBMIRROR_EVENTS_QUEUE',
						N'DBMIRROR_WORKER_QUEUE',           N'DBMIRRORING_CMD',
						N'DIRTY_PAGE_POLL',                 N'DISPATCHER_QUEUE_SEMAPHORE',
						N'EXECSYNC',                        N'FSAGENT',
						N'FT_IFTS_SCHEDULER_IDLE_WAIT',     N'FT_IFTSHC_MUTEX',
						N'HADR_CLUSAPI_CALL',               N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
						N'HADR_LOGCAPTURE_WAIT',            N'HADR_NOTIFICATION_DEQUEUE',
						N'HADR_TIMER_TASK',                 N'HADR_WORK_QUEUE',
						N'KSOURCE_WAKEUP',                  N'LAZYWRITER_SLEEP',
						N'LOGMGR_QUEUE',                    N'ONDEMAND_TASK_QUEUE',
						N'PWAIT_ALL_COMPONENTS_INITIALIZED',
						N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
						N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
						N'REQUEST_FOR_DEADLOCK_SEARCH',     N'RESOURCE_QUEUE',
						N'SERVER_IDLE_CHECK',               N'SLEEP_BPOOL_FLUSH',
						N'SLEEP_DBSTARTUP',                 N'SLEEP_DCOMSTARTUP',
						N'SLEEP_MASTERDBREADY',             N'SLEEP_MASTERMDREADY',
						N'SLEEP_MASTERUPGRADED',            N'SLEEP_MSDBSTARTUP',
						N'SLEEP_SYSTEMTASK',                N'SLEEP_TASK',
						N'SLEEP_TEMPDBSTARTUP',             N'SNI_HTTP_ACCEPT',
						N'SP_SERVER_DIAGNOSTICS_SLEEP',     N'SQLTRACE_BUFFER_FLUSH',
						N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
						N'SQLTRACE_WAIT_ENTRIES',           N'WAIT_FOR_RESULTS',
						N'WAITFOR',                         N'WAITFOR_TASKSHUTDOWN',
						N'WAIT_XTP_HOST_WAIT',              N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
						N'WAIT_XTP_CKPT_CLOSE',             N'XE_DISPATCHER_JOIN',
						N'XE_DISPATCHER_WAIT',              N'XE_TIMER_EVENT'))
FROM		sys.dm_os_wait_stats

SELECT		@PercentageSignalWaitsOfTotalTime - @PercentageSignalWaitsOfTotalTime_WithoutBenignWaits AS [IfNegativeThenCPUPressure]
			, @PercentageSignalWaitsOfTotalTime AS [PercentageSignalWaitsOfTotalTime]
			, @PercentageSignalWaitsOfTotalTime_WithoutBenignWaits AS [PercentageSignalWaitsOfTotalTime_WithoutPaulRandalsBenignWaits]

Executed against the SQL Server it puts this result to the grid:

IfNegativeThenCPUPressure

Hey, minus 20% is a big deal, isn’t it?

But wait!
Let’s double check it with SQL Servers we know are just sitting around bored.

BothSignalWaitsOnBoredSQLServer

Hmmm, following my own thoughts how to determine if a SQL Server is under high CPU pressure or not, this would lead to “False Positives”. So we have to set a threshold.

The following article from the Technet Magazine SQL Server: SQL Server Delays Demystified as an excerpt from “SQL Server DMV Starter Pack,” published by Red Gate Books (2010), written by Glenn Berry, Louis Davidson and Tim Ford, states that

The key metric, with regard to potential CPU pressure, is the signal wait as a percentage of the total waits. A high-percentage signal is a sign of excessive CPU pressure. The literature tends to quote “high” as more than about 25 percent, but it depends on your system.

On our systems, we treat values greater than 10 percent to 15 percent as a worrying sign. Overall, the use of wait statistics represents a very effective means of diagnosing response times in your system. In simple terms, you either work or you wait. Response time equals service time plus the wait time.

Applying the threshold of 15 percent for the Percentage of Signal Waits of Total Time i end up with this script:

DECLARE @PercentageSignalWaitsOfTotalTime FLOAT
DECLARE @PercentageSignalWaitsOfTotalTime_WithPaulRandalsExclusions FLOAT
Declare @SubstractedValue FLOAT

SELECT		@PercentageSignalWaitsOfTotalTime = (( SUM(CAST(signal_wait_time_ms AS NUMERIC(20, 2)))
            / SUM(CAST(wait_time_ms AS NUMERIC(20, 2))) * 100 ))
			, @PercentageSignalWaitsOfTotalTime_WithPaulRandalsExclusions =
			  (SELECT
							( SUM(CAST(PaulR.signal_wait_time_ms AS NUMERIC(20, 2)))
							/ SUM(CAST(PaulR.wait_time_ms AS NUMERIC(20, 2))) * 100 )
				FROM		sys.dm_os_wait_stats PaulR
				WHERE		PaulR.wait_type NOT LIKE '%SLEEP%' -- remove eg. SLEEP_TASK and
				-- LAZYWRITER_SLEEP waits
				AND			PaulR.wait_type NOT LIKE 'XE%'	 -- remove Extended Events
				AND			PaulR.wait_type NOT IN (			 -- remove system waits
						N'BROKER_EVENTHANDLER',             N'BROKER_RECEIVE_WAITFOR',
						N'BROKER_TASK_STOP',                N'BROKER_TO_FLUSH',
						N'BROKER_TRANSMITTER',              N'CHECKPOINT_QUEUE',
						N'CHKPT',                           N'CLR_AUTO_EVENT',
						N'CLR_MANUAL_EVENT',                N'CLR_SEMAPHORE',
						N'DBMIRROR_DBM_EVENT',              N'DBMIRROR_EVENTS_QUEUE',
						N'DBMIRROR_WORKER_QUEUE',           N'DBMIRRORING_CMD',
						N'DIRTY_PAGE_POLL',                 N'DISPATCHER_QUEUE_SEMAPHORE',
						N'EXECSYNC',                        N'FSAGENT',
						N'FT_IFTS_SCHEDULER_IDLE_WAIT',     N'FT_IFTSHC_MUTEX',
						N'HADR_CLUSAPI_CALL',               N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
						N'HADR_LOGCAPTURE_WAIT',            N'HADR_NOTIFICATION_DEQUEUE',
						N'HADR_TIMER_TASK',                 N'HADR_WORK_QUEUE',
						N'KSOURCE_WAKEUP',                  N'LAZYWRITER_SLEEP',
						N'LOGMGR_QUEUE',                    N'ONDEMAND_TASK_QUEUE',
						N'PWAIT_ALL_COMPONENTS_INITIALIZED',
						N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
						N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
						N'REQUEST_FOR_DEADLOCK_SEARCH',     N'RESOURCE_QUEUE',
						N'SERVER_IDLE_CHECK',               N'SLEEP_BPOOL_FLUSH',
						N'SLEEP_DBSTARTUP',                 N'SLEEP_DCOMSTARTUP',
						N'SLEEP_MASTERDBREADY',             N'SLEEP_MASTERMDREADY',
						N'SLEEP_MASTERUPGRADED',            N'SLEEP_MSDBSTARTUP',
						N'SLEEP_SYSTEMTASK',                N'SLEEP_TASK',
						N'SLEEP_TEMPDBSTARTUP',             N'SNI_HTTP_ACCEPT',
						N'SP_SERVER_DIAGNOSTICS_SLEEP',     N'SQLTRACE_BUFFER_FLUSH',
						N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
						N'SQLTRACE_WAIT_ENTRIES',           N'WAIT_FOR_RESULTS',
						N'WAITFOR',                         N'WAITFOR_TASKSHUTDOWN',
						N'WAIT_XTP_HOST_WAIT',              N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
						N'WAIT_XTP_CKPT_CLOSE',             N'XE_DISPATCHER_JOIN',
						N'XE_DISPATCHER_WAIT',              N'XE_TIMER_EVENT'))
FROM		sys.dm_os_wait_stats;

SELECT		@SubstractedValue = @PercentageSignalWaitsOfTotalTime - @PercentageSignalWaitsOfTotalTime_WithPaulRandalsExclusions;

SELECT		CASE
				WHEN @PercentageSignalWaitsOfTotalTime < 15.0  THEN 'no'                 WHEN @PercentageSignalWaitsOfTotalTime > 15.0
				     AND @SubstractedValue > 0.0 THEN 'no'
				WHEN @PercentageSignalWaitsOfTotalTime > 15.0
				     AND @SubstractedValue < 0.0 THEN 'YES'
			END AS [IsSQLServerUnderCPUPressure?]
			, @PercentageSignalWaitsOfTotalTime AS [PercentageSignalWaitsOfTotalTime]
			, @PercentageSignalWaitsOfTotalTime_WithPaulRandalsExclusions AS [PercentageSignalWaitsOfTotalTime_WithoutPaulRandalsBenignWaits];

This is what i get to see if i run this script against a SQL Server under high CPU pressure:

FinalResult

Finally we got it! A nearly rock solid indicator that a SQL Server is under high CPU pressure.

What about the current situation?

Now we come to another important point:

Querying against the Dynamic Management Views of SQL Server means querying against data, that is collected from the last SQL Server Instance Restart. Unless you clear the cache e.g. executing this little script:

DBCC SQLPERF (N'sys.dm_os_wait_stats', CLEAR);
GO

which you might do to ensure that you get an isolated set of data from a special given starting point.

So with a SQL Server Instance, that has been up and running for a long time, you get an average value for this period.

So besides the general information if SQL Server is or at least was under CPU pressure since the last restart, we want to know if SQL Server is right now under CPU pressure.

So again Paul Randal provides a script to create a snapshot of the wait statistics, that can be found here:

Wait statistics or please tell me where it hurts

So i took the script and adapted it to come back with a definitive answer whether the SQL Server Instance is under CPU pressure or not.

The most important parts to decide are the following ones:

CPU pressure is indicated by increasing signal-wait times and long Runnable Queues

So this is the final “snapshot” script:


-- Is SQL Server under CPU Pressure right now?

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
WHERE [name] = N'##CMSWaitStatsCPU1')
DROP TABLE [##CMSWaitStatsCPU1];

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
WHERE [name] = N'##CMSWaitStatsCPU2')
DROP TABLE [##CMSWaitStatsCPU2];
GO

;WITH [Waits] AS
(SELECT
[wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage]
FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN (
N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
N'CHKPT', N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
N'EXECSYNC', N'FSAGENT',
N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED',
N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')
AND [waiting_tasks_count] > 0
)

SELECT [wait_type], [WaitS], [ResourceS], [SignalS], [WaitCount], [Percentage]
INTO [##CMSWaitStatsCPU1]
FROM [Waits]

WAITFOR DELAY '00:00:30'
GO
;WITH [Waits] AS
(SELECT
[wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage]
FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN (
N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
N'CHKPT', N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
N'EXECSYNC', N'FSAGENT',
N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED',
N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')
AND [waiting_tasks_count] > 0
)
SELECT [wait_type], [WaitS], [ResourceS], [SignalS], [WaitCount], [Percentage]
INTO [##CMSWaitStatsCPU2]
FROM [Waits]

DECLARE @SignalWait_2_TotalWait_Ratio DECIMAL (16,2)
DECLARE @TotalPercentageOfSOS_SCHEDULER_YIELD DECIMAL (16,2)
DECLARE @TotalWaits DECIMAL (16,2)
DECLARE @ResourceWait DECIMAL (16,2)
DECLARE @SignalWaits DECIMAL (16,2)
DECLARE @WaitCount INT
DECLARE @TopWaitType sysname
;WITH [DiffCPUWaits] AS
(
-- Waits that weren't in the first snapshot
SELECT [CPU2].[wait_type] ,
[CPU2].[WaitS] ,
[CPU2].[ResourceS] ,
[CPU2].[SignalS] ,
[CPU2].[WaitCount],
[CPU2].[Percentage]
FROM [##CMSWaitStatsCPU2] AS [CPU2]
LEFT OUTER JOIN [##CMSWaitStatsCPU1] AS [CPU1] ON [CPU1].[wait_type] = [CPU2].[wait_type]
WHERE [CPU1].[wait_type] IS NULL

UNION
-- Diff of Waits in both snapshots
SELECT [CPU2].[wait_type] ,
[CPU2].[WaitS] - [CPU1].[WaitS],
[CPU2].[ResourceS] - [CPU1].[ResourceS],
[CPU2].[SignalS] - [CPU1].[SignalS],
[CPU2].[WaitCount] - [CPU1].[WaitCount],
[CPU2].[Percentage]
FROM [##CMSWaitStatsCPU2] AS [CPU2]
LEFT OUTER JOIN [##CMSWaitStatsCPU1] AS [CPU1] ON [CPU1].[wait_type] = [CPU2].[wait_type]
WHERE [CPU1].[wait_type] IS NOT NULL
)

-- as we run snapshots with a time difference of 30 seconds,
-- the Signal Wait Time has to be at least 15% of that --> 4.5 seconds.

-- COLLECT the results
SELECT
@SignalWait_2_TotalWait_Ratio = (CAST ( (SUM([DiffCPUWaits].[SignalS]) / SUM([DiffCPUWaits].[WaitS])) * 100.0 AS DECIMAL(16,2)))
, @TotalWaits = (CAST (SUM([DiffCPUWaits].[WaitS]) AS DECIMAL(16,2)))
, @ResourceWait = (CAST (SUM([DiffCPUWaits].[ResourceS]) AS DECIMAL(16,2)))
, @SignalWaits = (CAST (SUM([DiffCPUWaits].[SignalS]) AS DECIMAL(16,2)))
, @WaitCount = (CAST (SUM([DiffCPUWaits].[WaitCount]) AS INTEGER))
, @TopWaitType = (SELECT TOP 1 wait_type FROM [DiffCPUWaits] ORDER BY [WaitS])
, @TotalPercentageOfSOS_SCHEDULER_YIELD = (SELECT [Percentage] FROM [DiffCPUWaits] WHERE [DiffCPUWaits].[wait_type] = 'SOS_SCHEDULER_YIELD')
FROM [DiffCPUWaits]
-- INTERPRET the results
SELECT CASE

WHEN @SignalWait_2_TotalWait_Ratio < 15.0
AND @SignalWaits < 4.5 THEN 'no' WHEN @SignalWait_2_TotalWait_Ratio > 15.0
AND @SignalWaits < 4.5
THEN '1 x YES'

WHEN @SignalWait_2_TotalWait_Ratio < 15.0 AND @SignalWaits > 4.5
THEN '1 x YES'

WHEN @SignalWait_2_TotalWait_Ratio > 15.0
AND @SignalWaits > 4.5
AND @TotalPercentageOfSOS_SCHEDULER_YIELD < 10.0 THEN '2 x YES' WHEN @SignalWait_2_TotalWait_Ratio > 15.0
AND @SignalWaits > 4.5
AND @TotalPercentageOfSOS_SCHEDULER_YIELD > 10.0
THEN '3 x YES - INTERNAL CPU PRESSURE'
WHEN @SignalWait_2_TotalWait_Ratio > 15.0
AND @SignalWaits > 4.5
AND @TotalPercentageOfSOS_SCHEDULER_YIELD > 10.0
AND @TopWaitType = 'SOS_SCHEDULER_YIELD'
THEN '4 x YES - HIGH INTERNAL CPU PRESSURE'

END AS [Is SQL Server under CPU Pressure?]
, @SignalWait_2_TotalWait_Ratio AS [Signal Waits / Total Waits (%) < 15]
, @TotalWaits AS [Waits - from R.ing to R.ing (sec)]
, @ResourceWait AS [Ressource Waits - for Rescources (sec)]
, @SignalWaits AS [Signal Waits - from R.able to R.ing (sec) < 4.5]
, @TotalPercentageOfSOS_SCHEDULER_YIELD AS [SOS_SCHEDULER_YIELDS Waits Of Total (%) < 10]

-- Cleanup
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
WHERE [name] = N'##CMSWaitStatsCPU1')
DROP TABLE [##CMSWaitStatsCPU1];

IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
WHERE [name] = N'##CMSWaitStatsCPU2')
DROP TABLE [##CMSWaitStatsCPU2];

GO

So what do you do with these findings?

From a Full Service Provider’s point of view:
You might only be able to throw hardware to the VM / physical Server.

From a developers point of view:
You are the one that can fix the situation. You can have a look at the implementation of the queries, the proper indexing, is sorting really something for the second most expensive Sorting Server?

In any case:
You will have to investigate further as the information “Is my SQL Server Instance under CPU pressure” is just the tip of the iceberg 🙂

I hope you enjoyed reading this blog post, best of luck investigating further!
Thanks
Christoph

Other blog posts in that series

Advertisements
This entry was posted in SQL Server and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s