16 January 2013

SQL Monitor's data repository: Monitoring data – part 1

Okay, I had originally intended to blog about SQL Monitor on a fairly frequent basis. Unfortunately I got rather distracted towards the end of last year by a whole host of things, both personal and work related, and I ended up only writing two posts. That was a bit feeble, so now I’m going to try make a much more concerted effort to find time to post more frequently.

In my first post I introduced the SQL Monitor data repository, and described how the monitored objects are stored in a hierarchy in the data schema, in a series of tables with a _Keys suffix. In my second post I introduced how SQL Monitor stores alerts. This was by special request from a SQL Monitor user, and was a sidetrack from my original intention to elaborate further on how monitoring data is stored, which is what this post is about.

I’ve previously described the hierarchy of the different types of monitored object, and how instances of each type of monitored object are stored in a table with a _Keys suffix. The actual monitoring data for each monitored object can be found in tables alongside the _Keys table, in similarly named tables with a _StableSamples or _UnstableSamples suffix. For example, let’s have a look at the tables used to store information about a monitored server’s logical disks:

  name
1 data.Cluster_Machine_LogicalDisk_Capacity_StableSamples
2 data.Cluster_Machine_LogicalDisk_Keys
3 data.Cluster_Machine_LogicalDisk_Sightings
4 data.Cluster_Machine_LogicalDisk_UnstableSamples
5 data.Cluster_Machine_LogicalDisk_Volume_StableSamples

Let’s quickly recap how logical disk objects are stored in the _Keys tables. The entries of the data.Cluster_Machine_LogicalDisk_Keys table on my laptop are as follows:

  Id ParentId _Name
1 1 1 HarddiskVolume1
2 2 1 C:
3 3 1 D:

This indicates that I have three logical disks on my laptop, C:, D: and HarddiskVolume1 (this is a hidden partition that’s not mounted, so it doesn’t get a drive letter). The ParentId references an entry in the data.Cluster_Machine_Keys table, which contains the following:

  Id ParentId _Name
1 1 2  

This represents a machine with no name, whose ParentId in turn references an entry in data.Cluster_Keys, which contains the following:

  Id _Name
1 2 granger
1 1 granger-vm

This shows two detected clusters, granger (my laptop) and granger-vm (a virtual machine I happen to be running). All of the above can be captured in a single query, to show all logical disks across all monitored servers.

  ClusterName MachineName LogicalDiskName LogicalDiskId
1 granger   C: 2
2 granger   D: 3
3 granger   HarddiskVolume1 1

Okay, as you can probably see, I’m not monitoring very much. I have a development version of SQL Monitor on my laptop that’s only monitoring itself. If you run this query on your own data repository, however, you’ll likely see far more entries.

So where is the actual raw monitoring data stored for these logical disks? It’s in data.Cluster_Machine_LogicalDisk_UnstableSamples. As you can probably imagine, this table can contain a large number of rows, since disk monitoring data is captured every 15 seconds, and you may have months of data for many more logical disks. Here’s what I’ve collected within the last three minutes.

  Id CollectionDate _CumulativeIdleTime _CumulativeReadBytes _CumulativeReads _CumulativeReadTime _CumulativeWriteBytes _CumulativeWrites _CumulativeWriteTime _FreeBytes
1 1 634938045375036811 1297384558000 254464 58 1925000 294912 71 787000 63963136
2 1 634938045525045391 1297535667000 254464 58 1925000 294912 71 787000 63963136
3 1 634938045675103974 1297686802000 254464 58 1925000 294912 71 787000 63963136
4 1 634938045825182558 1297837964000 254464 58 1925000 294912 71 787000 63963136
5 1 634938045975231140 1297989143000 254464 58 1925000 294912 71 787000 63963136
6 1 634938046125329726 1298140440000 254464 58 1925000 294912 71 787000 63963136
7 1 634938046275328305 1298291554000 254464 58 1925000 294912 71 787000 63963136
8 1 634938046425456892 1298442755000 254464 58 1925000 294912 71 787000 63963136
9 1 634938046575525475 1298593971000 254464 58 1925000 294912 71 787000 63963136
10 1 634938046725644062 1298745175000 254464 58 1925000 294912 71 787000 63963136
11 1 634938046875652642 1298896302000 254464 58 1925000 294912 71 787000 63963136
12 1 634938047025651221 1299047443000 254464 58 1925000 294912 71 787000 63963136
13 2 634938045375036811 951496085000 36321630720 588634 8854913000 25545304064 724638 1929631352000 21247295488
14 2 634938045525045391 951644718000 36321630720 588634 8854913000 25545756672 724727 1929633901000 21247295488
15 2 634938045675103974 951794748000 36321805312 588644 8855524000 25546277888 724785 1929634659000 21247295488
16 2 634938045825182558 951936944000 36322259968 588709 8860741000 25547313152 724934 1929640145000 21247295488
17 2 634938045975231140 952081195000 36323034112 588856 8866200000 25548411392 725078 1929642972000 21247295488
18 2 634938046125329726 952229966000 36323034112 588856 8866200000 25549129728 725236 1929645912000 21247295488
19 2 634938046275328305 952380355000 36323034112 588856 8866200000 25549285376 725272 1929646720000 21247295488
20 2 634938046425456892 952527408000 36323197952 588859 8869568000 25549637632 725339 1929647780000 21247295488
21 2 634938046575525475 952678458000 36323197952 588859 8869568000 25550027776 725370 1929647958000 21247295488
22 2 634938046725644062 952829110000 36323214336 588860 8869586000 25550510080 725475 1929648597000 21247295488
23 2 634938046875652642 952950997000 36326986752 589781 8898381000 25550659072 725504 1929649156000 21247295488
24 2 634938047025651221 953056054000 36332478976 591122 8942216000 25556641792 725709 1929655379000 21247295488
25 3 634938045375036811 1293765311000 588954624 121971 2915805000 43422720 1450 1757686000 169272672256
26 3 634938045525045391 1293916420000 588954624 121971 2915805000 43422720 1450 1757686000 169272672256
27 3 634938045675103974 1294067555000 588954624 121971 2915805000 43422720 1450 1757686000 169272672256
28 3 634938045825182558 1294218717000 588954624 121971 2915805000 43422720 1450 1757686000 169272672256
29 3 634938045975231140 1294330368000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
30 3 634938046125329726 1294481664000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
31 3 634938046275328305 1294632778000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
32 3 634938046425456892 1294783979000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
33 3 634938046575525475 1294935195000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
34 3 634938046725644062 1295086399000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
35 3 634938046875652642 1295237527000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256
36 3 634938047025651221 1295388667000 588979200 121977 2955334000 43422720 1450 1757686000 169272672256

The values highlighted in red and blue are mentioned further below.

Well, what does all this mean? Let’s look at each of the columns in turn.

  • Id – This is the Id of the logical disk from the data.Cluster_Machine_LogicalDisk_Keys (or LogicalDiskId in my query above).
  • CollectionDate – The time at which the monitoring data was collected.
  • _CumulativeIdleTime – The cumulative time that this disk has spent idling.
  • _CumulativeReadBytes – The cumulative number of bytes read by this disk.
  • _CumulativeReads – The cumulative number of read operations performed by this disk.
  • _CumulativeReadTime – The cumulative time spent by this disk on read operations.
  • _CumulativeWriteBytes – The cumulative number of bytes written by this disk.
  • _CumulativeWrites – The cumulative number of write operations performed by this disk.
  • _CumulativeWriteTime – The cumulative time spent by this disk on write operations.
  • _FreeBytes – The number of free bytes remaining on this disk.

Hmm, there are a few questions that these results raise, which I need to cover in a little detail.

  1. Why aren’t the CollectionDate values datetimes?
  2. What are the units of the time-based properties _CumulativeIdleTime, _CumulativeReadTime and _CumulativeWriteTime?
  3. Most of the properties are cumulative. How do I work with them?
  4. Is there an easier way to tie the raw data back to the cluster, machine and logical disk?

Why aren’t the CollectionDate values datetimes?

The CollectionDate column represents the timestamp for each data collection, but the type is bigint not datetime. This is because the native datetime SQL type can only represent timestamps to a precision of approximately 3.3 milliseconds. For various reasons, including a desire to retain the original sampling accuracy of the raw data, this isn’t sufficiently precise. There is a native datetime2 SQL type which is much more precise, but this was only introduced in SQL Server 2008 and the monitoring code in SQL Monitor predates this. Therefore, to retain the original sampling precision, timestamps are stored by SQL Monitor as bigints with a precision of 100 nanoseconds (in fact, each timestamp value is represented by the number of complete 100 nanosecond intervals that have elapsed since midnight of January 1st in the year 0001 of the Gregorian calendar).

The SQL Monitor data repository includes two scalar functions that can be used to convert between the bigint timestamp representation and a datetime – utils.DateTimeToTicks and utils.TicksToDateTime. You can see an example of utils.DateTimeToTicks in action in the WHERE clause of the above query, where it’s used to limit the results to the most recent 3 minutes. For performance reasons, it is strongly recommended that you use the raw bigint values in any queries for filtering and ordering purposes. For example, in the above query, the WHERE clause

could have been written as

but this would make the query significantly more expensive for SQL Server to evaluate as it can no longer take advantage of any indexes based on the CollectionDate.

What are the units of the time-based properties _CumulativeIdleTime, _CumulativeReadTime and _CumulativeWriteTime?

For the properties _CumulativeReadBytes, _CumulativeWriteBytes and _FreeBytes, it’s fairly obvious that the values represent a number of bytes. Likewise, _CumulativeReads and _CumulativeWrites represent the number of read and write operations that a disk has performed, which are simple unitless counts. So what are the units of the _CumulativeIdleTime, _CumulativeReadTime and _CumulativeWriteTime properties? Well, they represent times but their units aren’t seconds. They’re hundreds of nano-seconds (1 × 10-7 seconds). If you want to convert these values to seconds, you need to divide by 10,000,000. In fact, this is the same unit as for CollectionDate.

Most of the properties are cumulative. How do I work with them?

With the exception of _FreeBytes, all data values are cumulative. For example, if I want to know how much time my C: drive spent idling for the most recent minute, I need to look at the samples at the start and end of that minute, i.e. rows 20 and 24 in the result set above (highlighted in red and blue). The time spent idle, according to the _CumulativeIdleTime values, is given by
(953,056,054,000 – 952,527,408,000) / 10,000,000 = 52.865 seconds
The duration of that interval, according to the CollectionDate values, is given by
(634,938,047,025,651,221 – 634,938,046,425,456,892) / 10,000,000 = 60.019 seconds
The proportion of time spent idle over that minute is thus given by
52.865 / 60.019 = 0.8808 = 88%

There are a number of different strategies for performing this calculation between pairs of samples for an entire series of samples. Here’s one that involves selecting the records of interest into a temporary table, and then performing a second query against the temporary table, that joins together adjacent pairs of records.

  Id CollectionDate PercentageIdleTime ReadBytesPerSecond ReadsPerSecond PercentageReadTime WriteBytesPerSecond WritesPerSecond PercentageWriteTime FreeBytes
1 1 634938933823024834 100.550915154319839 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
2 1 634938933973243426 100.564116590841165 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
3 1 634938934123001991 100.553848122142463 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
4 1 634938934273030573 100.556172689814531 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
5 1 634938934423039153 100.552248411390868 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
6 1 634938934573387752 100.554977569162450 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
7 1 634938934723406333 100.554877265503531 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
8 1 634938934873314907 100.554622045834416 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
9 1 634938935023173478 100.551472628148843 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
10 1 634938935173262063 100.552616976167774 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
11 1 634938935324140693 100.555658544884719 0.0000000000 0.0000000000 0.000000000000000 0.0000000000 0.0000000000 0.000000000000000 63963136
12 2 634938933823024834 99.376982303278919 0.0000000000 0.0000000000 0.000000000000000 411180.7471279309 9.7994394720 3.397139016981561 19059965952
13 2 634938933973243426 99.684065738014639 272.6693111329 0.0665696560 0.003994179362298 299084.1506489423 11.4499808385 3.102145971385486 19059965952
14 2 634938934123001991 100.061054938660770 0.0000000000 0.0000000000 0.000000000000000 153710.8745666733 10.2832181918 1.122473362374966 19059965952
15 2 634938934273030573 100.215570923679062 273.0146446361 0.0666539659 0.003999237958537 45866.4602988782 6.2654728017 0.587221440245299 19059965952
16 2 634938934423039153 99.039668264308614 0.0000000000 0.0000000000 0.000000000000000 47510.8823775280 7.0662624764 1.757232819616051 19059965952
17 2 634938934573387752 99.369066950866632 0.0000000000 0.0000000000 0.000000000000000 50672.6371291294 6.7842334866 2.115084557588727 19059965952
18 2 634938934723406333 99.850964461528935 273.0328451780 0.0666584094 0.004666088662710 64981.8171523699 7.7990339076 1.969089415663783 19059965952
19 2 634938934873314907 99.374569462584575 7923.7629196579 0.0667073252 0.022680490576876 97680.8704750937 10.8065866866 2.486849084429286 19059965952
20 2 634938935023173478 91.784539971357394 724924.5690458372 20.2190637464 78.796293873641701 199834.2824181874 17.8167987468 6.338643119718524 19059965952
21 2 634938935173262063 99.613171781185091 0.0000000000 0.0000000000 0.000000000000000 97324.9231445549 13.6586003525 2.028135584061905 19059965952
22 2 634938935324140693 97.350433258838577 734343.8895223266 26.8427675940 2.344931154266180 70108.8020218635 6.9592360429 1.525066869973567 19059965952
23 3 634938933823024834 100.514250584866545 42869.0145590338 0.2666514142 0.036664569453293 0.0000000000 0.0000000000 0.000000000000000 169272672256
24 3 634938933973243426 100.540817211227755 28357.6083578256 0.1331393120 0.022633683053027 0.0000000000 0.0000000000 0.000000000000000 169272672256
25 3 634938934123001991 100.529809430265307 28444.7170016619 0.1335482882 0.023370950436123 0.0000000000 0.0000000000 0.000000000000000 169272672256
26 3 634938934273030573 89.724903218774673 42590.2845632440 0.1999618979 10.831269471039858 0.0000000000 0.0000000000 0.000000000000000 169272672256
27 3 634938934423039153 100.518916984615146 42869.0145590338 0.2666514142 0.045330740414981 0.0000000000 0.0000000000 0.000000000000000 169272672256
28 3 634938934573387752 100.245696336684853 42499.6311405602 0.1995362790 0.308616111547537 0.0000000000 0.0000000000 0.000000000000000 169272672256
29 3 634938934723406333 100.520214892580539 42593.1238477718 0.1999752284 0.034662372922991 0.0000000000 0.0000000000 0.000000000000000 169272672256
30 3 634938934873314907 100.529940335500756 28416.2532291181 0.1334146504 0.024681710333659 0.0000000000 0.0000000000 0.000000000000000 169272672256
31 3 634938935023173478 99.148816786728868 28699.0591949525 0.2001887499 0.025357241662206 0.0000000000 0.0000000000 0.000000000000000 169272672256
32 3 634938935173262063 90.942292513451306 42573.2576531386 0.1998819563 10.985512322606012 0.0000000000 0.0000000000 0.000000000000000 169272672256
33 3 634938935324140693 100.521193756862718 42350.3315214354 0.1988353155 0.034464788022001 0.0000000000 0.0000000000 0.000000000000000 169272672256

Notice that the calculation for most of the properties involves a multiplication factor, either to convert from “per 100 nanoseconds” to “per seconds”, or to convert a bare ratio to a percentage. It’s also important that the multiplication is expressed as a float value rather than a bigint. This avoids rounding errors that otherwise would have occurred with bigint division.

There’s another calculation that’s also very useful, which is where we need to divide two cumulative changes. One classic example of this is for the Disk avg. read time and Disk avg. write time properties. For any given time interval, Disk avg. read time is calculated by dividing the total time spent reading by the number of read operations carried out. Likewise, Disk avg. write time is calculated by dividing the total time spent writing by the number of write operations carried out. These two properties can be obtained by modifying the SELECT query against the temporary table used above:

Unfortunately this query fails with a “Divide by zero error encountered” error message whenever any interval is encountered in which no read or write operations occur. This can be resolved by adjusting the calculations to cope with a possible divide by zero scenario.

  Id CollectionDate AverageReadTimeInMilliseconds AverageWriteTimeInMilliseconds
1 1 634939030758099203 0.000000000000000000 0.000000000000000000
2 1 634939030908247791 0.000000000000000000 0.000000000000000000
3 1 634939031058316374 0.000000000000000000 0.000000000000000000
4 1 634939031208414959 0.000000000000000000 0.000000000000000000
5 1 634939031358503544 0.000000000000000000 0.000000000000000000
6 1 634939031508582128 0.000000000000000000 0.000000000000000000
7 1 634939031658620710 0.000000000000000000 0.000000000000000000
8 1 634939031808699294 0.000000000000000000 0.000000000000000000
9 1 634939031958737875 0.000000000000000000 0.000000000000000000
10 1 634939032108856462 0.000000000000000000 0.000000000000000000
11 1 634939032258845041 0.000000000000000000 0.000000000000000000
12 2 634939030758099203 0.000000000000000000 1.305952380952380952
13 2 634939030908247791 0.000000000000000000 2.992156862745098039
14 2 634939031058316374 0.000000000000000000 1.428846153846153846
15 2 634939031208414959 0.000000000000000000 2.668333333333333333
16 2 634939031358503544 0.000000000000000000 2.262376237623762376
17 2 634939031508582128 0.000000000000000000 3.438596491228070175
18 2 634939031658620710 0.600000000000000000 1.563440860215053763
19 2 634939031808699294 1.953403141361256544 1.564885496183206106
20 2 634939031958737875 43.095000000000000000 29.070682730923694779
21 2 634939032108856462 82.548837209302325581 138.539416058394160583
22 2 634939032258845041 52.330909090909090909 113.118010752688172043
23 3 634939030758099203 0.000000000000000000 0.000000000000000000
24 3 634939030908247791 0.000000000000000000 0.000000000000000000
25 3 634939031058316374 0.000000000000000000 0.000000000000000000
26 3 634939031208414959 0.000000000000000000 0.000000000000000000
27 3 634939031358503544 0.000000000000000000 0.000000000000000000
28 3 634939031508582128 0.000000000000000000 0.000000000000000000
29 3 634939031658620710 0.000000000000000000 0.000000000000000000
30 3 634939031808699294 38.593063583815028901 0.000000000000000000
31 3 634939031958737875 1.112078019504876219 0.000000000000000000
32 3 634939032108856462 0.817503010839020473 0.000000000000000000
33 3 634939032258845041 0.784519215928384009 0.000000000000000000

Is there an easier way to tie the raw data back to the cluster, machine and logical disk?

In the queries above, we could further join from the _UnstableSamples tables back to the different _Keys tables and also convert the bigint timestamps to datetimes, to provide more human readable output like this:

  ClusterName MachineName LogicalDiskName CollectionDateTime _CumulativeIdleTime _CumulativeReadBytes _CumulativeReads _CumulativeReadTime _CumulativeWriteBytes _CumulativeWrites _CumulativeWriteTime _FreeBytes
1 granger C: 2013-01-16 00:58:22.567 1049313057000 36469454848 597259 9361317000 26231646208 786208 1930692346000 19057868800
2 granger C: 2013-01-16 00:58:37.570 1049462333000 36469458944 597260 9361777000 26232695808 786341 1930694826000 19057868800
3 granger C: 2013-01-16 00:58:52.573 1049608951000 36469463040 597261 9361783000 26233865728 786497 1930702003000 19057868800
4 granger C: 2013-01-16 00:59:07.573 1049758677000 36469475328 597262 9361798000 26235663872 786596 1930703647000 19057868800
5 granger C: 2013-01-16 00:59:22.577 1049907173000 36469475328 597262 9361798000 26236491776 786711 1930707538000 19057868800
6 granger C: 2013-01-16 00:59:37.580 1050055364000 36469475328 597262 9361798000 26245685248 786809 1930710377000 19057868800
7 granger C: 2013-01-16 00:59:52.613 1050205726000 36469475328 597262 9361798000 26248634880 786957 1930711628000 19057868800
8 granger C: 2013-01-16 01:00:07.593 1050348049000 36469692928 597285 9365875000 26253995520 787173 1930722313000 19057868800
9 granger C: 2013-01-16 01:00:22.593 1050497774000 36469692928 597285 9365875000 26254995456 787319 1930724468000 19057868800
10 granger C: 2013-01-16 01:00:37.593 1050647344000 36469791232 597287 9366365000 26255652864 787396 1930725471000 19057868800
11 granger C: 2013-01-16 01:00:52.597 1050797180000 36469791232 597287 9366365000 26256245248 787473 1930727599000 19057868800
12 granger C: 2013-01-16 01:01:07.597 1050942870000 36472880640 597370 9371308000 26257645568 787590 1930731810000 19057868800
13 granger D: 2013-01-16 00:58:22.567 1391205223000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
14 granger D: 2013-01-16 00:58:37.570 1391356102000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
15 granger D: 2013-01-16 00:58:52.573 1391506965000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
16 granger D: 2013-01-16 00:59:07.573 1391657802000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
17 granger D: 2013-01-16 00:59:22.577 1391808658000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
18 granger D: 2013-01-16 00:59:37.580 1391959507000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
19 granger D: 2013-01-16 00:59:52.613 1392110681000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
20 granger D: 2013-01-16 01:00:07.593 1392261300000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
21 granger D: 2013-01-16 01:00:22.593 1392412146000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
22 granger D: 2013-01-16 01:00:37.593 1392562989000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
23 granger D: 2013-01-16 01:00:52.597 1392713828000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
24 granger D: 2013-01-16 01:01:07.597 1392864682000 642137088 122300 3389923000 47351808 2229 4116500000 169272672256
25 granger HarddiskVolume1 2013-01-16 00:58:22.567 1396186769000 254464 58 1925000 294912 71 787000 63963136
26 granger HarddiskVolume1 2013-01-16 00:58:37.570 1396337648000 254464 58 1925000 294912 71 787000 63963136
27 granger HarddiskVolume1 2013-01-16 00:58:52.573 1396488511000 254464 58 1925000 294912 71 787000 63963136
28 granger HarddiskVolume1 2013-01-16 00:59:07.573 1396639348000 254464 58 1925000 294912 71 787000 63963136
29 granger HarddiskVolume1 2013-01-16 00:59:22.577 1396790204000 254464 58 1925000 294912 71 787000 63963136
30 granger HarddiskVolume1 2013-01-16 00:59:37.580 1396941053000 254464 58 1925000 294912 71 787000 63963136
31 granger HarddiskVolume1 2013-01-16 00:59:52.613 1397092227000 254464 58 1925000 294912 71 787000 63963136
32 granger HarddiskVolume1 2013-01-16 01:00:07.593 1397242846000 254464 58 1925000 294912 71 787000 63963136
33 granger HarddiskVolume1 2013-01-16 01:00:22.593 1397393693000 254464 58 1925000 294912 71 787000 63963136
34 granger HarddiskVolume1 2013-01-16 01:00:37.593 1397544535000 254464 58 1925000 294912 71 787000 63963136
35 granger HarddiskVolume1 2013-01-16 01:00:52.597 1397695374000 254464 58 1925000 294912 71 787000 63963136
36 granger HarddiskVolume1 2013-01-16 01:01:07.597 1397846228000 254464 58 1925000 294912 71 787000 63963136

Well, for each _UnstableSamples table, there is a corresponding view suffixed with _UnstableSamples_View that already performs these joins and date conversions for you. The above query can be rewritten as follows, whilst producing exactly the same output albeit with slightly different column names.

Some more examples

Okay, I’ve covered all I wanted to for this post. I haven’t yet explored the _StableSamples or _Instances tables, but most of the interesting stuff is in the _UnstableSamples tables I’ve already described. Now it’s time for a few more examples.

CPU performance monitoring data

The raw data from which the CPU performance metrics in SQL monitor are all derived can be found in the monitoringCluster_Machine_Processors_UnstableSamples table, but it’s easier to examine the corresponding view, in this case to see what’s been going on in the most recent 3 minutes.

  Cluster_Name Cluster_Machine_Name CollectionDate_DateTime Cluster_Machine_Processors_AverageQueueLength Cluster_Machine_Processors_CumulativeAverageContextSwitches Cluster_Machine_Processors_CumulativeDpcTime Cluster_Machine_Processors_CumulativeIdleTime Cluster_Machine_Processors_CumulativeInterruptTime Cluster_Machine_Processors_CumulativePrivilegedTime Cluster_Machine_Processors_CumulativeUserTime
1 granger 2013-01-16 10:17:24.707 0 123225982 444368849 1304351817175 2395024850 23053281775 39212684362
2 granger 2013-01-16 10:17:39.720 0 123236935 444466349 1304498711617 2395297851 23055192787 39214107871
3 granger 2013-01-16 10:17:54.720 0 123247119 444505350 1304646503064 2395434352 23056421295 39215004877
4 granger 2013-01-16 10:18:09.720 0 123257612 444544350 1304791778995 2395531853 23058800310 39217422893
5 granger 2013-01-16 10:18:24.723 0 123269326 444583350 1304937893432 2395746354 23060925824 39219099903
6 granger 2013-01-16 10:18:39.733 0 123279756 444622350 1305085606879 2396058356 23062466333 39220074910
7 granger 2013-01-16 10:18:54.733 0 123291363 444641850 1305231389813 2396194857 23063694841 39222980428
8 granger 2013-01-16 10:19:09.743 0 123301483 444680851 1305377816252 2396467859 23064650347 39225671445
9 granger 2013-01-16 10:19:24.750 0 123313007 444739351 1305524866695 2396721360 23066112857 39227231455
10 granger 2013-01-16 10:19:39.757 0 123323596 444758851 1305672716642 2396916362 23066970862 39228596464
11 granger 2013-01-16 10:19:54.757 0 123335291 444856352 1305819669584 2397072363 23068355371 39230331975
12 granger 2013-01-16 10:20:09.757 0 123346296 444934352 1305965725521 2397267364 23069798380 39232749991

All of these raw values correspond to the raw Windows Performance Counters provided by Windows. Just like the time-based properties we’ve explored in Cluster_Machine_LogicalDisk_UnstableSamples, most of the properties in Cluster_Machine_Processors_UnstableSamples are cumulative time samples, based on 100 nanosecond units.

SQL Monitor actually only displays two metrics derived from these values:

  1. Avg. CPU queue length – This corresponds directly to the Cluster_Machine_Processors_AverageQueueLength column in the above query.
  2. Machine: processor time – This is trivially derived from the Cluster_Machine_Processors_CumulativeIdleTime column in the above query (i.e. % cpu usage = 100 – % idle time).

SQL Server statistics

The Cluster_SqlServer_SqlStatistics_UnstableSamples table contains the raw monitoring data for the Batch requests/sec, Compilations/sec and Compilations/batch metrics displayed in SQL Monitor’s Analysis tab. Let’s have a look at the corresponding view, to see what’s been going on in the most recent 3 minutes.

1 Cluster_Name Cluster_SqlServer_Name CollectionDate_DateTime Cluster_SqlServer_SqlStatistics_CumulativeBatchRequests Cluster_SqlServer_SqlStatistics_CumulativeCompilations Cluster_SqlServer_SqlStatistics_CumulativeRecompilations
2 dev-chrisl2 2013-01-16 11:33:20.003 183787 279705 23198
3 dev-chrisl2 2013-01-16 11:33:35.023 183818 279752 23199
4 dev-chrisl2 2013-01-16 11:33:50.037 183879 279850 23212
5 dev-chrisl2 2013-01-16 11:34:05.047 183939 279946 23223
6 dev-chrisl2 2013-01-16 11:34:20.043 183955 279960 23223
7 dev-chrisl2 2013-01-16 11:34:35.070 183986 280007 23224
8 dev-chrisl2 2013-01-16 11:34:50.077 184035 280089 23235
9 dev-chrisl2 2013-01-16 11:35:05.080 184053 280105 23235
10 dev-chrisl2 2013-01-16 11:35:20.083 184067 280117 23235
11 dev-chrisl2 2013-01-16 11:35:35.087 184098 280164 23236
12 dev-chrisl2 2013-01-16 11:35:50.090 184151 280249 23247
13 dev-chrisl2 2013-01-16 11:36:05.087 184167 280263 23247
14 dev-chrisl2 sql2005 2013-01-16 11:33:20.003 2331133 362286 25574
15 dev-chrisl2 sql2005 2013-01-16 11:33:35.023 2331672 362346 25575
16 dev-chrisl2 sql2005 2013-01-16 11:33:50.037 2332289 362464 25588
17 dev-chrisl2 sql2005 2013-01-16 11:34:05.047 2332817 362534 25598
18 dev-chrisl2 sql2005 2013-01-16 11:34:20.043 2333240 362561 25598
19 dev-chrisl2 sql2005 2013-01-16 11:34:35.070 2333722 362623 25599
20 dev-chrisl2 sql2005 2013-01-16 11:34:50.077 2334276 362730 25610
21 dev-chrisl2 sql2005 2013-01-16 11:35:05.080 2334771 362765 25610
22 dev-chrisl2 sql2005 2013-01-16 11:35:20.083 2335193 362796 25610
23 dev-chrisl2 sql2005 2013-01-16 11:35:35.087 2335667 362856 25611
24 dev-chrisl2 sql2005 2013-01-16 11:35:50.090 2336249 362959 25622
25 dev-chrisl2 sql2005 2013-01-16 11:36:05.087 2336744 362986 25622
26 dev-chrisl2 sql2008 2013-01-16 11:33:14.557 199300 320438 38234
27 dev-chrisl2 sql2008 2013-01-16 11:33:29.560 199465 320581 38259
28 dev-chrisl2 sql2008 2013-01-16 11:33:44.563 199562 320741 38300
29 dev-chrisl2 sql2008 2013-01-16 11:33:59.570 199605 320796 38324
30 dev-chrisl2 sql2008 2013-01-16 11:34:14.587 199645 320845 38344
31 dev-chrisl2 sql2008 2013-01-16 11:34:29.593 199700 320934 38369
32 dev-chrisl2 sql2008 2013-01-16 11:34:44.590 199813 321131 38422
33 dev-chrisl2 sql2008 2013-01-16 11:34:59.617 199902 321209 38446
34 dev-chrisl2 sql2008 2013-01-16 11:35:14.600 199938 321258 38466
35 dev-chrisl2 sql2008 2013-01-16 11:35:29.603 199991 321345 38491
36 dev-chrisl2 sql2008 2013-01-16 11:35:44.613 200074 321487 38530
37 dev-chrisl2 sql2008 2013-01-16 11:35:59.617 200112 321541 38554
38 dev-chrisl2 sql2012 2013-01-16 11:33:14.460 183808 283703 23199
39 dev-chrisl2 sql2012 2013-01-16 11:33:29.460 183837 283751 23200
40 dev-chrisl2 sql2012 2013-01-16 11:33:44.463 183900 283866 23213
41 dev-chrisl2 sql2012 2013-01-16 11:33:59.450 183917 283880 23213
42 dev-chrisl2 sql2012 2013-01-16 11:34:14.470 183935 283896 23213
43 dev-chrisl2 sql2012 2013-01-16 11:34:29.500 183964 283942 23214
44 dev-chrisl2 sql2012 2013-01-16 11:34:44.483 184045 284088 23235
45 dev-chrisl2 sql2012 2013-01-16 11:34:59.483 184061 284103 23235
46 dev-chrisl2 sql2012 2013-01-16 11:35:14.487 184077 284117 23235
47 dev-chrisl2 sql2012 2013-01-16 11:35:29.517 184106 284163 23236
48 dev-chrisl2 sql2012 2013-01-16 11:35:44.513 184161 284266 23247
49 dev-chrisl2 sql2012 2013-01-16 11:35:59.500 184175 284279 23247

The SQL Monitor metrics are derived from this table as follows:

  • Batch requests/sec is based on the increase in Cluster_SqlServer_SqlStatistics_CumulativeBatchRequests between successive samples.
  • Compilations/sec is based on the increase in Cluster_SqlServer_SqlStatistics_CumulativeCompilations between successive samples.
  • Compilations/batch is based on the increase in Cluster_SqlServer_SqlStatistics_CumulativeCompilations divided by the increase in Cluster_SqlServer_SqlStatistics_CumulativeBatchRequests between successive samples.

If you want to calculate Compilations/batch in your own queries, don’t forget to handle the possible “divide by zero” error in the case where there are no cumulative batch requests within any time interval your query is concerned with.

Custom metrics

Retrieving the raw monitoring data for custom metrics is a little different to all of the other metrics, mainly because Cluster_SqlServer_Database_CustomMetric_UnstableSamples_View only contains a Cluster_SqlServer_Database_CustomMetric_MetricId property, not the actual custom metric name. You can still make use of this view, but a single join to another table is required in order to include the name of the custom metric in the results. Here are all of the custom metrics that have been running on my laptop for the past 3 minutes.

  Cluster SqlServer Database CustomMetric CollectionDate Value
1 granger Debacle Percentage of database free space 2013-01-16 13:43:11.243 61.71875
2 granger Debacle Percentage of database free space 2013-01-16 13:44:11.320 61.71875
3 granger Debacle Percentage of database free space 2013-01-16 13:45:11.263 61.71875
4 granger SqlMonitorData Percentage of database free space 2013-01-16 13:43:11.237 2.794894
5 granger SqlMonitorData Percentage of database free space 2013-01-16 13:44:11.313 2.75088
6 granger SqlMonitorData Percentage of database free space 2013-01-16 13:45:11.257 2.728873
7 granger master Percentage of database free space 2013-01-16 13:43:11.217 48.4375
8 granger master Percentage of database free space 2013-01-16 13:44:11.267 48.4375
9 granger master Percentage of database free space 2013-01-16 13:45:11.237 48.4375
10 granger model Percentage of database free space 2013-01-16 13:43:11.227 53.819444
11 granger model Percentage of database free space 2013-01-16 13:44:11.297 53.819444
12 granger model Percentage of database free space 2013-01-16 13:45:11.247 53.819444
13 granger msdb Percentage of database free space 2013-01-16 13:43:11.233 13.701923
14 granger msdb Percentage of database free space 2013-01-16 13:44:11.307 13.701923
15 granger msdb Percentage of database free space 2013-01-16 13:45:11.253 13.701923
16 granger tempdb Cached pages in TempDB (MB) 2013-01-16 13:43:11.210 0
17 granger tempdb Cached pages in TempDB (MB) 2013-01-16 13:44:11.260 0
18 granger tempdb Cached pages in TempDB (MB) 2013-01-16 13:45:11.233 0
19 granger tempdb Percentage of database free space 2013-01-16 13:43:11.220 55.190058
20 granger tempdb Percentage of database free space 2013-01-16 13:44:11.273 55.116959
21 granger tempdb Percentage of database free space 2013-01-16 13:45:11.240 55.116959
22 granger tempdb Plan cache hit ratio 2013-01-16 13:43:11.167 79.1
23 granger tempdb Plan cache hit ratio 2013-01-16 13:44:11.193 80.39
24 granger tempdb Plan cache hit ratio 2013-01-16 13:45:11.190 79
25 granger tempdb Plan cache reuse 2013-01-16 13:43:11.203 39.4994927291174
26 granger tempdb Plan cache reuse 2013-01-16 13:44:11.253 39.4994927291174
27 granger tempdb Plan cache reuse 2013-01-16 13:45:11.227 39.4994927291174

The interpretation of the raw custom metric values obviously vary from metric to metric. For any metrics imported from Red Gate’s own SQL Monitor Metrics web-site, you should consult the individual metric descriptions. For your own metrics, I hope you already know how they work!

Right, that’s it for this post. I hope you’re inspired to investigate some of the other _UnstableSamples tables to see what other raw data is available. If you have any questions, please either leave a comment or email me directly (chris.lambrou@red-gate.com) and I’ll see what I can do to help.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 5279 times – thanks for reading.

  • Rate
    [Total: 0    Average: 0/5]
  • Share

Chris Lambrou

View all articles by Chris Lambrou