Click here to monitor SSC

SQL Monitor’s data repository: Monitoring data – part 1

Published 16 January 2013 2:48 pm

Okay, I had originally intended to blog about SQL Monitor on a fairly frequent basis. Unfortunately I got rather distracted towards the end of last year by a whole host of things, both personal and work related, and I ended up only writing two posts. That was a bit feeble, so now I’m going to try make a much more concerted effort to find time to post more frequently.

In my first post I introduced the SQL Monitor data repository, and described how the monitored objects are stored in a hierarchy in the data schema, in a series of tables with a _Keys suffix. In my second post I introduced how SQL Monitor stores alerts. This was by special request from a SQL Monitor user, and was a sidetrack from my original intention to elaborate further on how monitoring data is stored, which is what this post is about.

I’ve previously described the hierarchy of the different types of monitored object, and how instances of each type of monitored object are stored in a table with a _Keys suffix. The actual monitoring data for each monitored object can be found in tables alongside the _Keys table, in similarly named tables with a _StableSamples or _UnstableSamples suffix. For example, let’s have a look at the tables used to store information about a monitored server’s logical disks:

SELECT sch.name + '.' + obj.name AS [name]
  FROM sys.objects obj 
    JOIN sys.schemas sch
      ON sch.schema_id = obj.schema_id
  WHERE obj.type_desc = 'USER_TABLE'
    AND obj.Name LIKE 'Cluster_Machine_LogicalDisk%'
    AND sch.name = 'data'
  ORDER BY sch.name,
    obj.name;
 name
1data.Cluster_Machine_LogicalDisk_Capacity_StableSamples
2data.Cluster_Machine_LogicalDisk_Keys
3data.Cluster_Machine_LogicalDisk_Sightings
4data.Cluster_Machine_LogicalDisk_UnstableSamples
5data.Cluster_Machine_LogicalDisk_Volume_StableSamples

Let’s quickly recap how logical disk objects are stored in the _Keys tables. The entries of the data.Cluster_Machine_LogicalDisk_Keys table on my laptop are as follows:

SELECT [Id],
    [ParentId],
    [_Name]
  FROM [data].[Cluster_Machine_LogicalDisk_Keys];
 IdParentId_Name
111HarddiskVolume1
221C:
331D:

This indicates that I have three logical disks on my laptop, C:, D: and HarddiskVolume1 (this is a hidden partition that’s not mounted, so it doesn’t get a drive letter). The ParentId references an entry in the data.Cluster_Machine_Keys table, which contains the following:

SELECT [Id],
    [ParentId],
    [_Name]
  FROM [data].[Cluster_Machine_Keys];
 IdParentId_Name
112 

This represents a machine with no name, whose ParentId in turn references an entry in data.Cluster_Keys, which contains the following:

SELECT [Id],
    [_Name]
  FROM [data].[Cluster_Keys];
 Id_Name
12granger
11granger-vm

This shows two detected clusters, granger (my laptop) and granger-vm (a virtual machine I happen to be running). All of the above can be captured in a single query, to show all logical disks across all monitored servers.

SELECT cluster.[_Name] AS ClusterName,
    machine.[_Name] AS MachineName,
    logicalDisk.[_Name] AS LogicalDiskName,
    logicalDisk.[Id] AS LogicalDiskId
  FROM [data].[Cluster_Machine_LogicalDisk_Keys] logicalDisk
  JOIN data.Cluster_Machine_Keys machine ON logicalDisk.ParentId = machine.Id
  JOIN data.Cluster_Keys cluster ON machine.ParentId = cluster.Id
  ORDER BY cluster.[_Name], machine.[_Name], logicalDisk.[_Name];
 ClusterNameMachineNameLogicalDiskNameLogicalDiskId
1granger C:2
2granger D:3
3granger HarddiskVolume11

Okay, as you can probably see, I’m not monitoring very much. I have a development version of SQL Monitor on my laptop that’s only monitoring itself. If you run this query on your own data repository, however, you’ll likely see far more entries.

Okay, so where is the actual raw monitoring data stored for these logical disks? It’s in data.Cluster_Machine_LogicalDisk_UnstableSamples. As you can probably imagine, this table can contain a large number of rows, since disk monitoring data is captured every 15 seconds, and you may have months of data for many more logical disks. Here’s what I’ve collected within the last three minutes.

SELECT Id,
    CollectionDate,
    [_CumulativeIdleTime],
    [_CumulativeReadBytes],
    [_CumulativeReads],
    [_CumulativeReadTime],
    [_CumulativeWriteBytes],
    [_CumulativeWrites],
    [_CumulativeWriteTime],
    [_FreeBytes]
  FROM [data].[Cluster_Machine_LogicalDisk_UnstableSamples]
  WHERE [CollectionDate] > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))
  ORDER BY [Id], [CollectionDate];
 IdCollectionDate_CumulativeIdleTime_CumulativeReadBytes_CumulativeReads_CumulativeReadTime_CumulativeWriteBytes_CumulativeWrites_CumulativeWriteTime_FreeBytes
1163493804537503681112973845580002544645819250002949127178700063963136
2163493804552504539112975356670002544645819250002949127178700063963136
3163493804567510397412976868020002544645819250002949127178700063963136
4163493804582518255812978379640002544645819250002949127178700063963136
5163493804597523114012979891430002544645819250002949127178700063963136
6163493804612532972612981404400002544645819250002949127178700063963136
7163493804627532830512982915540002544645819250002949127178700063963136
8163493804642545689212984427550002544645819250002949127178700063963136
9163493804657552547512985939710002544645819250002949127178700063963136
10163493804672564406212987451750002544645819250002949127178700063963136
11163493804687565264212988963020002544645819250002949127178700063963136
12163493804702565122112990474430002544645819250002949127178700063963136
13263493804537503681195149608500036321630720588634885491300025545304064724638192963135200021247295488
14263493804552504539195164471800036321630720588634885491300025545756672724727192963390100021247295488
15263493804567510397495179474800036321805312588644885552400025546277888724785192963465900021247295488
16263493804582518255895193694400036322259968588709886074100025547313152724934192964014500021247295488
17263493804597523114095208119500036323034112588856886620000025548411392725078192964297200021247295488
18263493804612532972695222996600036323034112588856886620000025549129728725236192964591200021247295488
19263493804627532830595238035500036323034112588856886620000025549285376725272192964672000021247295488
20263493804642545689295252740800036323197952588859886956800025549637632725339192964778000021247295488
21263493804657552547595267845800036323197952588859886956800025550027776725370192964795800021247295488
22263493804672564406295282911000036323214336588860886958600025550510080725475192964859700021247295488
23263493804687565264295295099700036326986752589781889838100025550659072725504192964915600021247295488
24263493804702565122195305605400036332478976591122894221600025556641792725709192965537900021247295488
253634938045375036811129376531100058895462412197129158050004342272014501757686000169272672256
263634938045525045391129391642000058895462412197129158050004342272014501757686000169272672256
273634938045675103974129406755500058895462412197129158050004342272014501757686000169272672256
283634938045825182558129421871700058895462412197129158050004342272014501757686000169272672256
293634938045975231140129433036800058897920012197729553340004342272014501757686000169272672256
303634938046125329726129448166400058897920012197729553340004342272014501757686000169272672256
313634938046275328305129463277800058897920012197729553340004342272014501757686000169272672256
323634938046425456892129478397900058897920012197729553340004342272014501757686000169272672256
333634938046575525475129493519500058897920012197729553340004342272014501757686000169272672256
343634938046725644062129508639900058897920012197729553340004342272014501757686000169272672256
353634938046875652642129523752700058897920012197729553340004342272014501757686000169272672256
363634938047025651221129538866700058897920012197729553340004342272014501757686000169272672256

The values highlighted in red and blue are mentioned further below.

Well, what does all this mean? Let’s look at each of the columns in turn.

  • Id – This is the Id of the logical disk from the data.Cluster_Machine_LogicalDisk_Keys (or LogicalDiskId in my query above).
  • CollectionDate – The time at which the monitoring data was collected.
  • _CumulativeIdleTime – The cumulative time that this disk has spent idling.
  • _CumulativeReadBytes – The cumulative number of bytes read by this disk.
  • _CumulativeReads – The cumulative number of read operations performed by this disk.
  • _CumulativeReadTime – The cumulative time spent by this disk on read operations.
  • _CumulativeWriteBytes – The cumulative number of bytes writen by this disk.
  • _CumulativeWrites – The cumulative number of write operations performed by this disk.
  • _CumulativeWriteTime – The cumulative time spent by this disk on write operations.
  • _FreeBytes – The number of free bytes remaining on this disk.

Okay, there are a few questions that these results raise, which I need to cover in a little detail.

  1. Why aren’t the CollectionDate values datetimes?
  2. What are the units of the time-based properties _CumulativeIdleTime, _CumulativeReadTime and _CumulativeWriteTime?
  3. Most of the properties are cumulative. How do I work with them?
  4. Is there an easier way to tie the raw data back to the cluster, machine and logical disk?

Why aren’t the CollectionDate values datetimes?

The CollectionDate column represents the timestamp for each data collection, but the type is bigint not datetime. This is because the native datetime SQL type can only represent timestamps to a precision of approximately 3.3 milliseconds. For various reasons, including a desire to retain the original sampling accuracy of the raw data, this isn’t sufficiently precise. There is a native datetime2 SQL type which is much more precise, but this was only introduced in SQL Server 2008 and the monitoring code in SQL Monitor predates this. Therefore, to retain the original sampling precision, timestamps are stored by SQL Monitor as bigints with a precision of 100 nanoseconds (in fact, each timestamp value is represented by the number of complete 100 nanosecond intervals that have elapsed since midnight of January 1st in the year 0001 of the Gregorian calendar).

The SQL Monitor data repository includes two scalar functions that can be used to convert between the bigint timestamp representation and a datetime – utils.DateTimeToTicks and utils.TicksToDateTime. You can see an example of utils.DateTimeToTicks in action in the WHERE clause of the above query, where it’s used to limit the results to the most recent 3 minutes. For performance reasons, it is strongly recommended that you use the raw bigint values in any queries for filtering and ordering purposes. For example, in the above query, the WHERE clause

WHERE [CollectionDate] > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))

could have been written as

WHERE utils.TicksToDateTime([CollectionDate]) > DATEADD(minute, -3, GETUTCDATE())

but this would make the query significantly more expensive for SQL Server to evaluate as it can no longer take advantage of any indexes based on the CollectionDate.

What are the units of the time-based properties _CumulativeIdleTime, _CumulativeReadTime and _CumulativeWriteTime?

For the properties _CumulativeReadBytes, _CumulativeWriteBytes and _FreeBytes, it’s fairly obvious that the values represent a number of bytes. Likewise, _CumulativeReads and _CumulativeWrites represent the number of read and write operations that a disk has performed, which are simple unitless counts. So what are the units of the _CumulativeIdleTime, _CumulativeReadTime and _CumulativeWriteTime properties? Well, they represent times but their units aren’t seconds. They’re hundreds of nano-seconds (1 × 10-7 seconds). If you want to convert these values to seconds, you need to divide by 10,000,000. In fact, this is the same unit as for CollectionDate.

Most of the properties are cumulative. How do I work with them?

With the exception of _FreeBytes, all data values are cumulative. For example, if I want to know how much time my C: drive spent idling for the most recent minute, I need to look at the samples at the start and end of that minute, i.e. rows 20 and 24 in the result set above (highlighted in red and blue). The time spent idle, according to the _CumulativeIdleTime values, is given by
(953,056,054,000952,527,408,000) / 10,000,000 = 52.865 seconds
The duration of that interval, according to the CollectionDate values, is given by
(634,938,047,025,651,221634,938,046,425,456,892) / 10,000,000 = 60.019 seconds
The proportion of time spent idle over that minute is thus given by
52.865 / 60.019 = 0.8808 = 88%

There are a number of different strategies for performing this calculation between pairs of samples for an entire series of samples. Here’s one that involves selecting the records of interest into a temporary table, and then performing a second query against the temporary table, that joins together adjacent pairs of records.

CREATE TABLE #tmp_Cluster_Machine_LogicalDisk_UnstableSamples
  (
   [Row] [int] NOT NULL
               PRIMARY KEY,
   [Id] [bigint] NOT NULL,
   [CollectionDate] [bigint] NOT NULL,
   [_CumulativeIdleTime] [bigint] NULL,
   [_CumulativeReadBytes] [bigint] NULL,
   [_CumulativeReads] [bigint] NULL,
   [_CumulativeReadTime] [bigint] NULL,
   [_CumulativeWriteBytes] [bigint] NULL,
   [_CumulativeWrites] [bigint] NULL,
   [_CumulativeWriteTime] [bigint] NULL,
   [_FreeBytes] [bigint] NULL
  );

INSERT INTO #tmp_Cluster_Machine_LogicalDisk_UnstableSamples
    SELECT ROW_NUMBER() OVER (ORDER BY [Id], [CollectionDate]) AS Row,
        Id,
        CollectionDate,
        [_CumulativeIdleTime],
        [_CumulativeReadBytes],
        [_CumulativeReads],
        [_CumulativeReadTime],
        [_CumulativeWriteBytes],
        [_CumulativeWrites],
        [_CumulativeWriteTime],
        [_FreeBytes]
      FROM [data].[Cluster_Machine_LogicalDisk_UnstableSamples]
      WHERE [CollectionDate] > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()));

SELECT b.Id AS Id,
    b.CollectionDate AS CollectionDate,
    (b.[_CumulativeIdleTime] - a.[_CumulativeIdleTime]) * 100.0 / (b.CollectionDate - a.CollectionDate) AS PercentageIdleTime,
    (b.[_CumulativeReadBytes] - a.[_CumulativeReadBytes]) * 10000000.0 / (b.CollectionDate - a.CollectionDate) AS ReadBytesPerSecond,
    (b.[_CumulativeReads] - a.[_CumulativeReads]) * 10000000.0 / (b.CollectionDate - a.CollectionDate) AS ReadsPerSecond,
    (b.[_CumulativeReadTime] - a.[_CumulativeReadTime]) * 100.0 / (b.CollectionDate - a.CollectionDate) AS PercentageReadTime,
    (b.[_CumulativeWriteBytes] - a.[_CumulativeWriteBytes]) * 10000000.0 / (b.CollectionDate - a.CollectionDate) AS WriteBytesPerSecond,
    (b.[_CumulativeWrites] - a.[_CumulativeWrites]) * 10000000.0 / (b.CollectionDate - a.CollectionDate) AS WritesPerSecond,
    (b.[_CumulativeWriteTime] - a.[_CumulativeWriteTime]) * 100.0 / (b.CollectionDate - a.CollectionDate) AS PercentageWriteTime,
    b.[_FreeBytes] AS FreeBytes
  FROM #tmp_Cluster_Machine_LogicalDisk_UnstableSamples a 
  INNER JOIN #tmp_Cluster_Machine_LogicalDisk_UnstableSamples b
  ON
    a.Row = b.Row - 1 AND
    a.Id = b.Id
  ORDER BY [Id],
    [CollectionDate];

DROP TABLE #tmp_Cluster_Machine_LogicalDisk_UnstableSamples;
 IdCollectionDatePercentageIdleTimeReadBytesPerSecondReadsPerSecondPercentageReadTimeWriteBytesPerSecondWritesPerSecondPercentageWriteTimeFreeBytes
11634938933823024834100.5509151543198390.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
21634938933973243426100.5641165908411650.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
31634938934123001991100.5538481221424630.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
41634938934273030573100.5561726898145310.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
51634938934423039153100.5522484113908680.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
61634938934573387752100.5549775691624500.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
71634938934723406333100.5548772655035310.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
81634938934873314907100.5546220458344160.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
91634938935023173478100.5514726281488430.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
101634938935173262063100.5526169761677740.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
111634938935324140693100.5556585448847190.00000000000.00000000000.0000000000000000.00000000000.00000000000.00000000000000063963136
12263493893382302483499.3769823032789190.00000000000.00000000000.000000000000000411180.74712793099.79943947203.39713901698156119059965952
13263493893397324342699.684065738014639272.66931113290.06656965600.003994179362298299084.150648942311.44998083853.10214597138548619059965952
142634938934123001991100.0610549386607700.00000000000.00000000000.000000000000000153710.874566673310.28321819181.12247336237496619059965952
152634938934273030573100.215570923679062273.01464463610.06665396590.00399923795853745866.46029887826.26547280170.58722144024529919059965952
16263493893442303915399.0396682643086140.00000000000.00000000000.00000000000000047510.88237752807.06626247641.75723281961605119059965952
17263493893457338775299.3690669508666320.00000000000.00000000000.00000000000000050672.63712912946.78423348662.11508455758872719059965952
18263493893472340633399.850964461528935273.03284517800.06665840940.00466608866271064981.81715236997.79903390761.96908941566378319059965952
19263493893487331490799.3745694625845757923.76291965790.06670732520.02268049057687697680.870475093710.80658668662.48684908442928619059965952
20263493893502317347891.784539971357394724924.569045837220.219063746478.796293873641701199834.282418187417.81679874686.33864311971852419059965952
21263493893517326206399.6131717811850910.00000000000.00000000000.00000000000000097324.923144554913.65860035252.02813558406190519059965952
22263493893532414069397.350433258838577734343.889522326626.84276759402.34493115426618070108.80202186356.95923604291.52506686997356719059965952
233634938933823024834100.51425058486654542869.01455903380.26665141420.0366645694532930.00000000000.00000000000.000000000000000169272672256
243634938933973243426100.54081721122775528357.60835782560.13313931200.0226336830530270.00000000000.00000000000.000000000000000169272672256
253634938934123001991100.52980943026530728444.71700166190.13354828820.0233709504361230.00000000000.00000000000.000000000000000169272672256
26363493893427303057389.72490321877467342590.28456324400.199961897910.8312694710398580.00000000000.00000000000.000000000000000169272672256
273634938934423039153100.51891698461514642869.01455903380.26665141420.0453307404149810.00000000000.00000000000.000000000000000169272672256
283634938934573387752100.24569633668485342499.63114056020.19953627900.3086161115475370.00000000000.00000000000.000000000000000169272672256
293634938934723406333100.52021489258053942593.12384777180.19997522840.0346623729229910.00000000000.00000000000.000000000000000169272672256
303634938934873314907100.52994033550075628416.25322911810.13341465040.0246817103336590.00000000000.00000000000.000000000000000169272672256
31363493893502317347899.14881678672886828699.05919495250.20018874990.0253572416622060.00000000000.00000000000.000000000000000169272672256
32363493893517326206390.94229251345130642573.25765313860.199881956310.9855123226060120.00000000000.00000000000.000000000000000169272672256
333634938935324140693100.52119375686271842350.33152143540.19883531550.0344647880220010.00000000000.00000000000.000000000000000169272672256

Notice that the calculation for most of the properties involves a multiplication factor, either to convert from "per 100 nanoseconds" to "per seconds", or to convert a bare ratio to a percentage. It’s also important that the multiplication is expressed as a float value rather than a bigint. This avoids rounding errors that otherwise would have occurred with bigint division.

There’s another calculation that’s also very useful, which is where we need to divide two cumulative changes. One classic example of this is for the Disk avg. read time and Disk avg. write time properties. For any given time interval, Disk avg. read time is calculated by dividing the total time spent reading by the number of read operations carried out. Likewise, Disk avg. write time is calculated by dividing the total time spent writing by the number of write operations carried out. These two properties can be obtained by modifying the SELECT query against the temporary table used above:


-- Warning! This query doesn't actually work!
SELECT b.Id AS Id,
    b.CollectionDate AS CollectionDate,
	((b.[_CumulativeReadTime] - a.[_CumulativeReadTime]) / 10000.0) / (b.[_CumulativeReads] - a.[_CumulativeReads]) AS AverageReadTimeInMilliseconds,
	((b.[_CumulativeWriteTime] - a.[_CumulativeWriteTime]) / 10000.0) / (b.[_CumulativeWrites] - a.[_CumulativeWrites]) AS AverageWriteTimeInMilliseconds
  FROM #tmp_Cluster_Machine_LogicalDisk_UnstableSamples a 
  INNER JOIN #tmp_Cluster_Machine_LogicalDisk_UnstableSamples b
  ON
    a.Row = b.Row - 1 AND
    a.Id = b.Id
  ORDER BY [Id],
    [CollectionDate];

Unfortunately this query fails with a "Divide by zero error encountered" error message whenever any interval is encountered in which no read or write operations occur. This can be resolved by adjusting the calculations to cope with a possible divide by zero scenario.

SELECT b.Id AS Id,
    b.CollectionDate AS CollectionDate,
	COALESCE(((b.[_CumulativeReadTime] - a.[_CumulativeReadTime]) / 10000.0) / NULLIF(b.[_CumulativeReads] - a.[_CumulativeReads], 0), 0) AS AverageReadTimeInMilliseconds,
	COALESCE(((b.[_CumulativeWriteTime] - a.[_CumulativeWriteTime]) / 10000.0) / NULLIF(b.[_CumulativeWrites] - a.[_CumulativeWrites], 0), 0) AS AverageWriteTimeInMilliseconds
  FROM #tmp_Cluster_Machine_LogicalDisk_UnstableSamples a 
  INNER JOIN #tmp_Cluster_Machine_LogicalDisk_UnstableSamples b
  ON
    a.Row = b.Row - 1 AND
    a.Id = b.Id
  ORDER BY [Id],
    [CollectionDate];
 IdCollectionDateAverageReadTimeInMillisecondsAverageWriteTimeInMilliseconds
116349390307580992030.0000000000000000000.000000000000000000
216349390309082477910.0000000000000000000.000000000000000000
316349390310583163740.0000000000000000000.000000000000000000
416349390312084149590.0000000000000000000.000000000000000000
516349390313585035440.0000000000000000000.000000000000000000
616349390315085821280.0000000000000000000.000000000000000000
716349390316586207100.0000000000000000000.000000000000000000
816349390318086992940.0000000000000000000.000000000000000000
916349390319587378750.0000000000000000000.000000000000000000
1016349390321088564620.0000000000000000000.000000000000000000
1116349390322588450410.0000000000000000000.000000000000000000
1226349390307580992030.0000000000000000001.305952380952380952
1326349390309082477910.0000000000000000002.992156862745098039
1426349390310583163740.0000000000000000001.428846153846153846
1526349390312084149590.0000000000000000002.668333333333333333
1626349390313585035440.0000000000000000002.262376237623762376
1726349390315085821280.0000000000000000003.438596491228070175
1826349390316586207100.6000000000000000001.563440860215053763
1926349390318086992941.9534031413612565441.564885496183206106
20263493903195873787543.09500000000000000029.070682730923694779
21263493903210885646282.548837209302325581138.539416058394160583
22263493903225884504152.330909090909090909113.118010752688172043
2336349390307580992030.0000000000000000000.000000000000000000
2436349390309082477910.0000000000000000000.000000000000000000
2536349390310583163740.0000000000000000000.000000000000000000
2636349390312084149590.0000000000000000000.000000000000000000
2736349390313585035440.0000000000000000000.000000000000000000
2836349390315085821280.0000000000000000000.000000000000000000
2936349390316586207100.0000000000000000000.000000000000000000
30363493903180869929438.5930635838150289010.000000000000000000
3136349390319587378751.1120780195048762190.000000000000000000
3236349390321088564620.8175030108390204730.000000000000000000
3336349390322588450410.7845192159283840090.000000000000000000

Is there an easier way to tie the raw data back to the cluster, machine and logical disk?

In the queries above, we could further join from the _UnstableSamples tables back to the different _Keys tables and also convert the bigint timestamps to datetimes, to provide more human readable output like this:

SELECT cluster._Name AS ClusterName,
    machine._Name AS MachineName,
    logicalDisk._Name AS LogicalDiskName,
    utils.TicksToDateTime(samples.CollectionDate) AS CollectionDateTime,
    samples._CumulativeIdleTime,
    samples._CumulativeReadBytes,
    samples._CumulativeReads,
    samples._CumulativeReadTime,
    samples._CumulativeWriteBytes,
    samples._CumulativeWrites,
    samples._CumulativeWriteTime,
    samples._FreeBytes
  FROM [data].Cluster_Machine_LogicalDisk_UnstableSamples samples
    JOIN [data].Cluster_Machine_LogicalDisk_Keys logicalDisk
      ON samples.Id = logicalDisk.Id
    JOIN [data].Cluster_Machine_Keys machine
      ON logicalDisk.ParentId = machine.Id 
    JOIN [data].Cluster_Keys cluster
      ON machine.ParentId = cluster.Id
  WHERE samples.CollectionDate > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))
  ORDER BY cluster._Name,
    machine._Name,
    logicalDisk._Name,
    samples.CollectionDate;
 ClusterNameMachineNameLogicalDiskNameCollectionDateTime_CumulativeIdleTime_CumulativeReadBytes_CumulativeReads_CumulativeReadTime_CumulativeWriteBytes_CumulativeWrites_CumulativeWriteTime_FreeBytes
1grangerC:2013-01-16 00:58:22.567104931305700036469454848597259936131700026231646208786208193069234600019057868800
2grangerC:2013-01-16 00:58:37.570104946233300036469458944597260936177700026232695808786341193069482600019057868800
3grangerC:2013-01-16 00:58:52.573104960895100036469463040597261936178300026233865728786497193070200300019057868800
4grangerC:2013-01-16 00:59:07.573104975867700036469475328597262936179800026235663872786596193070364700019057868800
5grangerC:2013-01-16 00:59:22.577104990717300036469475328597262936179800026236491776786711193070753800019057868800
6grangerC:2013-01-16 00:59:37.580105005536400036469475328597262936179800026245685248786809193071037700019057868800
7grangerC:2013-01-16 00:59:52.613105020572600036469475328597262936179800026248634880786957193071162800019057868800
8grangerC:2013-01-16 01:00:07.593105034804900036469692928597285936587500026253995520787173193072231300019057868800
9grangerC:2013-01-16 01:00:22.593105049777400036469692928597285936587500026254995456787319193072446800019057868800
10grangerC:2013-01-16 01:00:37.593105064734400036469791232597287936636500026255652864787396193072547100019057868800
11grangerC:2013-01-16 01:00:52.597105079718000036469791232597287936636500026256245248787473193072759900019057868800
12grangerC:2013-01-16 01:01:07.597105094287000036472880640597370937130800026257645568787590193073181000019057868800
13grangerD:2013-01-16 00:58:22.567139120522300064213708812230033899230004735180822294116500000169272672256
14grangerD:2013-01-16 00:58:37.570139135610200064213708812230033899230004735180822294116500000169272672256
15grangerD:2013-01-16 00:58:52.573139150696500064213708812230033899230004735180822294116500000169272672256
16grangerD:2013-01-16 00:59:07.573139165780200064213708812230033899230004735180822294116500000169272672256
17grangerD:2013-01-16 00:59:22.577139180865800064213708812230033899230004735180822294116500000169272672256
18grangerD:2013-01-16 00:59:37.580139195950700064213708812230033899230004735180822294116500000169272672256
19grangerD:2013-01-16 00:59:52.613139211068100064213708812230033899230004735180822294116500000169272672256
20grangerD:2013-01-16 01:00:07.593139226130000064213708812230033899230004735180822294116500000169272672256
21grangerD:2013-01-16 01:00:22.593139241214600064213708812230033899230004735180822294116500000169272672256
22grangerD:2013-01-16 01:00:37.593139256298900064213708812230033899230004735180822294116500000169272672256
23grangerD:2013-01-16 01:00:52.597139271382800064213708812230033899230004735180822294116500000169272672256
24grangerD:2013-01-16 01:01:07.597139286468200064213708812230033899230004735180822294116500000169272672256
25grangerHarddiskVolume12013-01-16 00:58:22.56713961867690002544645819250002949127178700063963136
26grangerHarddiskVolume12013-01-16 00:58:37.57013963376480002544645819250002949127178700063963136
27grangerHarddiskVolume12013-01-16 00:58:52.57313964885110002544645819250002949127178700063963136
28grangerHarddiskVolume12013-01-16 00:59:07.57313966393480002544645819250002949127178700063963136
29grangerHarddiskVolume12013-01-16 00:59:22.57713967902040002544645819250002949127178700063963136
30grangerHarddiskVolume12013-01-16 00:59:37.58013969410530002544645819250002949127178700063963136
31grangerHarddiskVolume12013-01-16 00:59:52.61313970922270002544645819250002949127178700063963136
32grangerHarddiskVolume12013-01-16 01:00:07.59313972428460002544645819250002949127178700063963136
33grangerHarddiskVolume12013-01-16 01:00:22.59313973936930002544645819250002949127178700063963136
34grangerHarddiskVolume12013-01-16 01:00:37.59313975445350002544645819250002949127178700063963136
35grangerHarddiskVolume12013-01-16 01:00:52.59713976953740002544645819250002949127178700063963136
36grangerHarddiskVolume12013-01-16 01:01:07.59713978462280002544645819250002949127178700063963136

Well, for each _UnstableSamples table, there is a corresponding view suffixed with _UnstableSamples_View that already performs these joins and date conversions for you. The above query can be rewritten as follows, whilst producing exactly the same output albeit with slightly different column names.

SELECT Cluster_Name,
    Cluster_Machine_Name,
    Cluster_Machine_LogicalDisk_Name,
    CollectionDate_DateTime,
    Cluster_Machine_LogicalDisk_CumulativeIdleTime,
    Cluster_Machine_LogicalDisk_CumulativeReadBytes,
    Cluster_Machine_LogicalDisk_CumulativeReads,
    Cluster_Machine_LogicalDisk_CumulativeReadTime,
    Cluster_Machine_LogicalDisk_CumulativeWriteBytes,
    Cluster_Machine_LogicalDisk_CumulativeWrites,
    Cluster_Machine_LogicalDisk_CumulativeWriteTime,
    Cluster_Machine_LogicalDisk_FreeBytes
  FROM [data].[Cluster_Machine_LogicalDisk_UnstableSamples_View]
  WHERE CollectionDate > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))
  ORDER BY Cluster_Name, Cluster_Machine_Name, Cluster_Machine_LogicalDisk_Name, CollectionDate;

Some more examples

Okay, I’ve covered all I wanted to for this post. I haven’t yet explored the _StableSamples or _Instances tables, but most of the interesting stuff is in the _UnstableSamples tables I’ve already described. Now it’s time for a few more examples.

CPU performance monitoring data

The raw data from which the CPU performance metrics in SQL monitor are all derived can be found in the monitoringCluster_Machine_Processors_UnstableSamples table, but it’s easier to examine the corresponding view, in this case to see what’s been going on in the most recent 3 minutes.

SELECT [Cluster_Name],
    [Cluster_Machine_Name],
    [CollectionDate_DateTime],
    [Cluster_Machine_Processors_AverageQueueLength],
    [Cluster_Machine_Processors_CumulativeAverageContextSwitches],
    [Cluster_Machine_Processors_CumulativeDpcTime],
    [Cluster_Machine_Processors_CumulativeIdleTime],
    [Cluster_Machine_Processors_CumulativeInterruptTime],
    [Cluster_Machine_Processors_CumulativePrivilegedTime],
    [Cluster_Machine_Processors_CumulativeUserTime]
  FROM [data].[Cluster_Machine_Processors_UnstableSamples_View]
  WHERE CollectionDate > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))
  ORDER BY [Cluster_Name],
    [Cluster_Machine_Name],
    [CollectionDate];
 Cluster_NameCluster_Machine_NameCollectionDate_DateTimeCluster_Machine_Processors_AverageQueueLengthCluster_Machine_Processors_CumulativeAverageContextSwitchesCluster_Machine_Processors_CumulativeDpcTimeCluster_Machine_Processors_CumulativeIdleTimeCluster_Machine_Processors_CumulativeInterruptTimeCluster_Machine_Processors_CumulativePrivilegedTimeCluster_Machine_Processors_CumulativeUserTime
1granger2013-01-16 10:17:24.7070123225982444368849130435181717523950248502305328177539212684362
2granger2013-01-16 10:17:39.7200123236935444466349130449871161723952978512305519278739214107871
3granger2013-01-16 10:17:54.7200123247119444505350130464650306423954343522305642129539215004877
4granger2013-01-16 10:18:09.7200123257612444544350130479177899523955318532305880031039217422893
5granger2013-01-16 10:18:24.7230123269326444583350130493789343223957463542306092582439219099903
6granger2013-01-16 10:18:39.7330123279756444622350130508560687923960583562306246633339220074910
7granger2013-01-16 10:18:54.7330123291363444641850130523138981323961948572306369484139222980428
8granger2013-01-16 10:19:09.7430123301483444680851130537781625223964678592306465034739225671445
9granger2013-01-16 10:19:24.7500123313007444739351130552486669523967213602306611285739227231455
10granger2013-01-16 10:19:39.7570123323596444758851130567271664223969163622306697086239228596464
11granger2013-01-16 10:19:54.7570123335291444856352130581966958423970723632306835537139230331975
12granger2013-01-16 10:20:09.7570123346296444934352130596572552123972673642306979838039232749991

All of these raw values correspond to the raw Windows Performance Counters provided by Windows. Just like the time-based properties we’ve explored in Cluster_Machine_LogicalDisk_UnstableSamples, most of the properties in Cluster_Machine_Processors_UnstableSamples are cumulative time samples, based on 100 nanosecond units.

SQL Monitor actually only displays two metrics derived from these values:

  1. Avg. CPU queue length – This corresponds directly to the Cluster_Machine_Processors_AverageQueueLength column in the above query.
  2. Machine: processor time – This is trivially derived from the Cluster_Machine_Processors_CumulativeIdleTime column in the above query (i.e. % cpu usage = 100 – % idle time).

SQL Server statistics

The Cluster_SqlServer_SqlStatistics_UnstableSamples table contains the raw monitoring data for the Batch requests/sec, Compilations/sec and Compilations/batch metrics displayed in SQL Monitor’s Analysis tab. Let’s have a look at the corresponding view, to see what’s been going on in the most recent 3 minutes.

SELECT Cluster_Name,
    Cluster_SqlServer_Name,
    CollectionDate_DateTime,
    Cluster_SqlServer_SqlStatistics_CumulativeBatchRequests,
    Cluster_SqlServer_SqlStatistics_CumulativeCompilations,
    Cluster_SqlServer_SqlStatistics_CumulativeRecompilations
  FROM [data].[Cluster_SqlServer_SqlStatistics_UnstableSamples_View]
  WHERE [CollectionDate] > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))
  ORDER BY Cluster_Name,
    Cluster_SqlServer_Name,
    CollectionDate;
1Cluster_NameCluster_SqlServer_NameCollectionDate_DateTimeCluster_SqlServer_SqlStatistics_CumulativeBatchRequestsCluster_SqlServer_SqlStatistics_CumulativeCompilationsCluster_SqlServer_SqlStatistics_CumulativeRecompilations
2dev-chrisl22013-01-16 11:33:20.00318378727970523198
3dev-chrisl22013-01-16 11:33:35.02318381827975223199
4dev-chrisl22013-01-16 11:33:50.03718387927985023212
5dev-chrisl22013-01-16 11:34:05.04718393927994623223
6dev-chrisl22013-01-16 11:34:20.04318395527996023223
7dev-chrisl22013-01-16 11:34:35.07018398628000723224
8dev-chrisl22013-01-16 11:34:50.07718403528008923235
9dev-chrisl22013-01-16 11:35:05.08018405328010523235
10dev-chrisl22013-01-16 11:35:20.08318406728011723235
11dev-chrisl22013-01-16 11:35:35.08718409828016423236
12dev-chrisl22013-01-16 11:35:50.09018415128024923247
13dev-chrisl22013-01-16 11:36:05.08718416728026323247
14dev-chrisl2sql20052013-01-16 11:33:20.003233113336228625574
15dev-chrisl2sql20052013-01-16 11:33:35.023233167236234625575
16dev-chrisl2sql20052013-01-16 11:33:50.037233228936246425588
17dev-chrisl2sql20052013-01-16 11:34:05.047233281736253425598
18dev-chrisl2sql20052013-01-16 11:34:20.043233324036256125598
19dev-chrisl2sql20052013-01-16 11:34:35.070233372236262325599
20dev-chrisl2sql20052013-01-16 11:34:50.077233427636273025610
21dev-chrisl2sql20052013-01-16 11:35:05.080233477136276525610
22dev-chrisl2sql20052013-01-16 11:35:20.083233519336279625610
23dev-chrisl2sql20052013-01-16 11:35:35.087233566736285625611
24dev-chrisl2sql20052013-01-16 11:35:50.090233624936295925622
25dev-chrisl2sql20052013-01-16 11:36:05.087233674436298625622
26dev-chrisl2sql20082013-01-16 11:33:14.55719930032043838234
27dev-chrisl2sql20082013-01-16 11:33:29.56019946532058138259
28dev-chrisl2sql20082013-01-16 11:33:44.56319956232074138300
29dev-chrisl2sql20082013-01-16 11:33:59.57019960532079638324
30dev-chrisl2sql20082013-01-16 11:34:14.58719964532084538344
31dev-chrisl2sql20082013-01-16 11:34:29.59319970032093438369
32dev-chrisl2sql20082013-01-16 11:34:44.59019981332113138422
33dev-chrisl2sql20082013-01-16 11:34:59.61719990232120938446
34dev-chrisl2sql20082013-01-16 11:35:14.60019993832125838466
35dev-chrisl2sql20082013-01-16 11:35:29.60319999132134538491
36dev-chrisl2sql20082013-01-16 11:35:44.61320007432148738530
37dev-chrisl2sql20082013-01-16 11:35:59.61720011232154138554
38dev-chrisl2sql20122013-01-16 11:33:14.46018380828370323199
39dev-chrisl2sql20122013-01-16 11:33:29.46018383728375123200
40dev-chrisl2sql20122013-01-16 11:33:44.46318390028386623213
41dev-chrisl2sql20122013-01-16 11:33:59.45018391728388023213
42dev-chrisl2sql20122013-01-16 11:34:14.47018393528389623213
43dev-chrisl2sql20122013-01-16 11:34:29.50018396428394223214
44dev-chrisl2sql20122013-01-16 11:34:44.48318404528408823235
45dev-chrisl2sql20122013-01-16 11:34:59.48318406128410323235
46dev-chrisl2sql20122013-01-16 11:35:14.48718407728411723235
47dev-chrisl2sql20122013-01-16 11:35:29.51718410628416323236
48dev-chrisl2sql20122013-01-16 11:35:44.51318416128426623247
49dev-chrisl2sql20122013-01-16 11:35:59.50018417528427923247

The SQL Monitor metrics are derived from this table as follows:

  • Batch requests/sec is based on the increase in Cluster_SqlServer_SqlStatistics_CumulativeBatchRequests between successive samples.
  • Compilations/sec is based on the increase in Cluster_SqlServer_SqlStatistics_CumulativeCompilations between successive samples.
  • Compilations/batch is based on the increase in Cluster_SqlServer_SqlStatistics_CumulativeCompilations divided by the increase in Cluster_SqlServer_SqlStatistics_CumulativeBatchRequests between successive samples.

If you want to calculate Compilations/batch in your own queries, don’t forget to handle the possible "divide by zero" error in the case where there are no cumulative batch requests within any time interval your query is concerned with.

Custom metrics

Retrieving the raw monitoring data for custom metrics is a little different to all of the other metrics, mainly because Cluster_SqlServer_Database_CustomMetric_UnstableSamples_View only contains a Cluster_SqlServer_Database_CustomMetric_MetricId property, not the actual custom metric name. You can still make use of this view, but a single join to another table is required in order to include the name of the custom metric in the results. Here are all of the custom metrics that have been running on my laptop for the past 3 minutes.

SELECT samples.Cluster_Name AS Cluster,
    samples.Cluster_SqlServer_Name AS SqlServer,
    samples.Cluster_SqlServer_Database_Name AS [Database],
    customMetrics.Name AS CustomMetric,
    samples.CollectionDate_DateTime AS CollectionDate,
    samples.Cluster_SqlServer_Database_CustomMetric_Value AS Value
  FROM [data].[Cluster_SqlServer_Database_CustomMetric_UnstableSamples_View] samples 
    JOIN [settings].[CustomMetrics] customMetrics
      ON samples.Cluster_SqlServer_Database_CustomMetric_MetricId = customMetrics.Id
  WHERE samples.CollectionDate > utils.DateTimeToTicks(DATEADD(minute, -3, GETUTCDATE()))
  ORDER BY samples.Cluster_Name,
    samples.Cluster_SqlServer_Name,
    samples.Cluster_SqlServer_Database_Name,
    customMetrics.Name;
 ClusterSqlServerDatabaseCustomMetricCollectionDateValue
1grangerDebaclePercentage of database free space2013-01-16 13:43:11.24361.71875
2grangerDebaclePercentage of database free space2013-01-16 13:44:11.32061.71875
3grangerDebaclePercentage of database free space2013-01-16 13:45:11.26361.71875
4grangerSqlMonitorDataPercentage of database free space2013-01-16 13:43:11.2372.794894
5grangerSqlMonitorDataPercentage of database free space2013-01-16 13:44:11.3132.75088
6grangerSqlMonitorDataPercentage of database free space2013-01-16 13:45:11.2572.728873
7grangermasterPercentage of database free space2013-01-16 13:43:11.21748.4375
8grangermasterPercentage of database free space2013-01-16 13:44:11.26748.4375
9grangermasterPercentage of database free space2013-01-16 13:45:11.23748.4375
10grangermodelPercentage of database free space2013-01-16 13:43:11.22753.819444
11grangermodelPercentage of database free space2013-01-16 13:44:11.29753.819444
12grangermodelPercentage of database free space2013-01-16 13:45:11.24753.819444
13grangermsdbPercentage of database free space2013-01-16 13:43:11.23313.701923
14grangermsdbPercentage of database free space2013-01-16 13:44:11.30713.701923
15grangermsdbPercentage of database free space2013-01-16 13:45:11.25313.701923
16grangertempdbCached pages in TempDB (MB)2013-01-16 13:43:11.2100
17grangertempdbCached pages in TempDB (MB)2013-01-16 13:44:11.2600
18grangertempdbCached pages in TempDB (MB)2013-01-16 13:45:11.2330
19grangertempdbPercentage of database free space2013-01-16 13:43:11.22055.190058
20grangertempdbPercentage of database free space2013-01-16 13:44:11.27355.116959
21grangertempdbPercentage of database free space2013-01-16 13:45:11.24055.116959
22grangertempdbPlan cache hit ratio2013-01-16 13:43:11.16779.1
23grangertempdbPlan cache hit ratio2013-01-16 13:44:11.19380.39
24grangertempdbPlan cache hit ratio2013-01-16 13:45:11.19079
25grangertempdbPlan cache reuse2013-01-16 13:43:11.20339.4994927291174
26grangertempdbPlan cache reuse2013-01-16 13:44:11.25339.4994927291174
27grangertempdbPlan cache reuse2013-01-16 13:45:11.22739.4994927291174

The interpretation of the raw custom metric values obviously vary from metric to metric. For any metrics imported from Red Gate’s own SQL Monitor Metrics web-site, you should consult the individual metric descriptions. For your own metrics, I hope you already know how they work!

Right, that’s it for this post. I hope you’re inspired to investigate some of the other _UnstableSamples tables to see what other raw data is available. If you have any questions, please either leave a comment or email me directly (chris.lambrou@red-gate.com) and I’ll see what I can do to help.

One Response to “SQL Monitor’s data repository: Monitoring data – part 1”

  1. Teun says:

    Great post! I was struggling with the cumulative data. I was very close to my goal. With this article I was finally able to complete my quest to get detailed CPU percentages over a longer time frame.
    Thanks!

Leave a Reply