Click here to monitor SSC
  • Av rating:
  • Total votes: 21
  • Total comments: 3
Thomas LaRock

Using Operations Manager Reports to Validate Your Uptime

09 March 2009

Operations Manager has a number of reports to help you monitor the uptime of your applications, but reporting can be difficult to learn until you understand all the different options, the different parameters possible, and the way the Operations Manager health model is structured. Firstly, you need a clear idea about the way that your organisation defines 'uptime'. then you can start your reports from  any of the views in the Monitoring tab, and then add or remove objects to get the report you need. Thomas LaRock explains...

Introduction

I often see statements written in requirements documents, such as “we require the servers to have 99.999% uptime”. And then I will see people fall over themselves setting up expensive technology to meet this criterion. The truth of the matter is the percentage is meaningless unless you know precisely how ‘Uptime’ is defined.

Let’s examine the opposite of Uptime, its evil twin called Downtime.  Downtime is simply the amount of time that something is not available. It means that the user cannot access the resource they want. So, if it is a file on a network share somewhere and they cannot access that share, then the system is “down”. Of course, the file server could be up and running, but that is little consolation to the end user trying to access the file.

What is this 99.999% number that gets bandied about? Here are some numbers that give the reality behind the high percentage commonly referred to as “five-nines”.

Figure 1 show a table that helps to explain the amount of Downtime that is represented by various percentages of Uptime, assuming a 365 day year.

 

Downtime (as HH:MM:SS)

Uptime

Per day

Per week

Per Month

Per Year

99.999%

00:00:00.86

00:00:06

00:00:26

00:05:15

99.99%

00:00:09

00:01:00

00:04:23

00:52:33

99.9%

00:01:26

00:10:05

00:43:48

08:45:36

99%

00:14:24

01:40:48

07:18:00

87:36:00

98%

00:28:48

03:21:36

14:36:00

175:12:00

Figure 1

So, given sixty seconds in a minute, and sixty minutes in an hour, and twenty-four hours in a day, and 365 days in a year (let’s keep this simple, shall we?), we arrive at a total of 31,536,000 seconds in a year. That means “five-nines” of Uptime represents the same as 00.001% Downtime, which is 315.36 seconds, or approximately five minutes fifteen seconds (plus a fraction). The chart above simply lists this in the HH:MM:SS format so you see 00:05:15. That means your system needs to be available for all but slightly more than five minutes for the entire year in order to meet a 99.999% uptime.

Impossible? Probably not. But is it practical to think that your system is truly available for all of that time? For example, when it comes to database servers, I could leave the server up and running for a year, never reboot the thing, but how do I know that the instance has been accessible throughout? How exactly do you measure periods of uptime and downtime in your shop?

Operations Manager Availability Reporting

Assuming you have been able to define what it means for your servers to be “up” or “down”, you must be able to report those metrics periodically. When I give presentations on Operations Manager I am often asked about reporting. I am usually embarrassed to tell people that I do not use the reporting functionality in Operations Manager. In my present position I have never been asked to provide regular reports. This is quite a relief as I have always found the reports in Operations Manager to be difficult to work with.

I decided to use Operations Manager to report Uptime so as to become familiar with the reporting function and to get an idea of how well the systems are performing.

To access the Operations Manager reports, you need to open up the Operations Manager console and navigate to the Reporting tab. Select the ‘Microsoft Generic Reporting Library’ on the left, then on the right double-click to open the ‘Availability’ report (Figure 2).

Figure 2

 

You will then be presented with the report parameters screen (Figure 3). Here is the first problem I always encounter whenever running a report in Operations Manager, what object is my target? If you select the ‘Add Object’ button you will be presented with a myriad of objects, but in this particular example we are looking to focus on the database engine. But it could be the case that you want the database engine and the SQL Server Agent to be represented in your availability report. You are given a lot of available options to choose from, and there advantages and drawbacks to this. Once you become familiar with the reports it becomes easier to understand, but until then, it can be daunting.

I found that, when running reports from this library, I either need to manually select the database engine target(s), or select a group of SQL Servers that I want to run the report against. Note how I am only focusing on the database engine and not anything else as I am only concerned with knowing about any issues with only the engine. In general this report itself takes a few seconds to run. If you decide to run against several servers you can expect the running time to last more than just a few seconds.

Figure 3

   Figure 4

If you did not want to select objects or groups manually for this type of report then go back and start the process from the Database State view on the monitoring tab. Select the server name you want and, on the right hand side, you should see a list of available reports that can be run against the database engine (Figure 4). When you run the ‘Availability’ report you will notice that the object target is selected for you. At this point you can make modifications to the number of objects or groups if you need to. I prefer this method for executing the reports because it requires fewer clicks to be up and running against an individual server.

Figure 5

The time parameters selections shown in Figure 5 are sufficient, as many quick selections are available. But the results can be deceiving. For example, the total number of minutes shown as ‘Uptime’ might list the amount of time possible rather than the actual amount of uptime. As an example, if your server was built one month ago, but you run a default report going against the past year, you will see the total number of available minutes for the year as your possible uptime (Figure 7).

The options listed on the far right in Figure 3 are interesting, they are:

  • Warning
  • Monitoring unavailable
  • Planned maintenance
  • Unplanned maintenance
  • Monitor disabled
  • Unmonitored

Be careful here, because you are specifying your criteria for Uptime or Downtime. The default choice is to enable only the option for ‘Unplanned Maintenance’, which may not be the your only criterion for Downtime. My own choice would be to select every option except for ‘Warning’ as my set of criteria for Downtime.

And how does Operations Manager determine whether maintenance is ‘Planned’ or ‘Unplanned’? Well, you specify that whenever you set a server to be in ‘Maintenance Mode’. There is a checkbox for you to enable (Figure 6), and this is how Operations Manager then reports back to you later if the maintenance was planned or unplanned.

Figure 6

The way I define Downtime may be stricter than most. I understand that the Operations Manager agent may have been unavailable, but the database instance was available to handle requests. Nonetheless, I expect to have more than just the instance available; I expect to have the Operations Manager agent up and running as well. Some people tell me that planned maintenance should not count as downtime, but I disagree.

Figure 7

The above report was run against one database engine, for a period of time including the entire previous year, and with the default ‘Unplanned Maintenance’ checkbox enabled. These setting could lead to the conclusion that the instance was up 100% of the time, a span of 9,528 minutes. In reality, out of the 9,528 minutes, the server was only running during a fraction of the time which can be seen by the little green bar all the way to the right of what is labeled ‘Availability Tracker’.

You need to click on the Availability Tracker and drill-into the report to understand fully what the report is trying to tell you. In this case the report will tell me that the server has only been running since last month. Now, look at the difference if I select all options other than ‘Warning’ (Figure 8).

Figure 8

That is a big difference in uptime, isn’t it? The Availability Tracker tells me that the server has not been monitored for most of 2008, so if I decide to include that as downtime then my report output is much different. Essentially Operation manager thinks the server was up for 2008, but the agent was disabled. It would be nice if there was a way for Operations Manager to know when the agent was installed, in order to avoid this reporting issue.

Figure 4 also shows a few of the other reports available and there are many more to choose from. I would encourage you to experiment with the reports and play with a lot of the options. One report that I want to point out is the Health report, as it ties into the Availability report.

Figure 9

The Health report is intended to display the ‘Entity Health’. What’s that you say? Is that not part of ‘Availability’ for your instance? Yes it is. Actually the ‘Availability’ report rolls up into the ‘Entity Health’, so be mindful of this (Figure 9). When you are running the ‘Availability’ report, you are running a report that only checks the entities that are currently a part of the Availability aggregate monitor for the database engine. The ‘Entity Health’ encompasses a handful of aggregate monitors, which means you could have very different results.

So, how best to decide which one to use? Well, it all depends on what you have defined for your Uptime (and conversely, your Downtime) continuums. If you want everything that could possibly affect your instance then it may be better for you to examine the Health report. You might, for example, even consider your server to be ‘down’ if it was not compliant with the service pack compliance monitor.

Summary

If you go looking to find your true Uptime, you should

  1. Ensure that you have properly defined what Uptime (or Downtime) means in your enterprise.
  2. Examine the options that are available in the Operations Manager reports and decide what works best for you,
  3. Let the reports run and see how your shop fares against the mythical “five-nines”.

I don’t think that you should be discouraged if you come up short, as most servers need more than five minutes of love per year.

Operations Manager offers a myriad of reports. The information is there, but reporting can be difficult until you understand all the different options, the different parameters possible, and the way the Operations Manager health model is structured. It takes time to run reports with various options in order to understand exactly what the report is telling you. To make things easier on yourself, start your reports from the database state view, or any other views in the Monitoring tab. This will allow you to only see the reports that are relevant for the objects you are viewing, and will even pre-fill the object selection in the report parameters. From there it makes it easier for you to add or remove objects because you have an example to work with.

Thomas LaRock

Author profile:

Thomas LaRock is a seasoned IT professional with over a decade of technical and management experience. Currently serving as a Senior Database Administrator manager for Confio Software, Thomas has progressed through several roles including programmer, analyst, and DBA. Prior to that, he worked at several software and consulting companies, working at customer sites in the United States and abroad. Thomas holds a MS degree in Mathematics from Washington State University and is a member of the Usability Professional’s Association. Thomas is also a member of Quest Software’s Association of SQL Server Experts, currently serves on the Board of Directors for the Professional Association for SQL Server (PASS), and is a SQL Server MVP. Thomas can also be found blogging at http://thomaslarock.com and is the author of DBA Survivor: Become a Rock Star DBA (http://dbasurvivor.com).

Search for other articles by Thomas LaRock

Rate this article:   Avg rating: from a total of 21 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Excellent
Posted by: Granted (view profile)
Posted on: Monday, March 16, 2009 at 8:47 AM
Message: Focused and well written. I wrestle so much with the reports in Operations Manager. It's good to see some basics laid out and clarified. This is a very useful article.

Subject: Down time
Posted by: Anonymous (not signed in)
Posted on: Wednesday, March 18, 2009 at 7:05 AM
Message: Excellent post, could you please go into more detail on examples of the down times below... this would really help my understanding further

Warning
Monitoring unavailable
Planned maintenance
Unplanned maintenance
Monitor disabled
Unmonitored

Subject: Another great
Posted by: monsterjta (view profile)
Posted on: Sunday, March 22, 2009 at 12:47 PM
Message: Another great article, Thomas. Thanks for all the useful information you've been sharing with the community.

-Jonathan

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.