Use Operations Manager to Create a Custom Monitor for SQL Agent Jobs
The more I play with Systems Center Operations Manager 2007, the more I become comfortable tinkering around and configuring it to monitor my SQL Servers. Out of the box, Operations Manager gives a decent amount of functionality, but if you are willing to dig a little deeper you will find ways to make your job as a DBA a little bit easier.
Operations Manager is an enterprise level monitoring, reporting, and alerting system. It is part of the Microsoft Operations Framework, an implementation of the IT Infrastructure Library (ITIL). Operations Manager 2007 relies on the use of Management Packs for the actual work of monitoring systems. There are two SQL management packs, one for SQL 2000 and another separate pack for SQL 2005. Out of the box, Operations Manager gives enough basic information to help any DBA effectively monitor their server environment.
The one common item that all DBAs share is the diversity of the shops they administer. In some shops the DBA is more of a developer, in others it is more of a server engineer role, in others it is BI, or architecture. As such, it is difficult for anyone to build a tool that can satisfy the needs of everyone. But with Operations Manager, each shop is given the ability to take the tool and shape it to work specifically for their needs. Think of it as an open source monitoring tool, if you will, in the sense that you can make changes to how the system will collect, monitor, report, and alert you to potential issues.
One item that tends to pop up in our shop is the sudden appearance of jobs within SQL Agent. When these mysterious jobs appear, and subsequently fail, it forces us to spend time investigating what has happened. Of course the job fails at a time that is most inconvenient, which only adds to our frustration. In order to avoid this altogether, I started to think of how I could use Operations Manager to quickly alert our team to any jobs that had been recently created or modified.
In this article, I will describe how to create a customized monitor within Operations Manager 2007. The purpose of this custom monitor is to look for jobs within SQL Agent that have either been created or modified within the past day. Before you create this customized monitor, you must enable the SQL Agent Job discovery in Operations Manager. You can find out how to enable the discovery by going here, then downloading the SQL 2000/2005 Management Pack, and running the .msi package. The package unzips a handful of files to your local C: drive (C:\Program Files\System Center Management Packs\SQL Server Management Pack ), and in there you should be able to find the OM2007_MP_SQLSrvr.doc document. Once you have enabled the discovery, the rest of this article will function as expected.
Choosing the Right Monitor
It is easy to initiate a monitor in Operations Manager, but it can be harder to implement it. The basic idea is simple; just do a right-click and select ‘Create Unit Monitor’, but there is more work to be done than just a few clicks. To make things more frustrating, the help files are not very helpful, and I found myself scouring the internet looking for pieces of information that would help me put everything together.
So, I want to create a custom monitor that will raise an alert should a job be created (or modified) inside of SQL Agent within the past day. I am aware that there are two columns in the msdb..sysjobs table, named date_created and date_modified, so I plan on using these columns in my monitor. The next question is: how do I actually go about using those columns in the monitor?
I decide to open up and examine the default SQL 2005 discovery Management Pack to see how Operations Manager is actually discovering the jobs. What I find is that Operations manager executes a stored procedure (sp_help_job) to return a result set that is then used to populate a custom class. Inspired by this, I decide that what I need to do is create my own VB script that runs a SELECT statement against the sysjobs table, places that information into a property bag, and then use it in order to view the state of the SQL Agent job inside the Operations Manager console. Sounds simple, right?
Well, what monitor do I create? From the Authoring tab in Operations Manager, I can right-click on ‘Monitors’, and select ‘Unit Monitor’ (see Figure 1):
This results in the display shown in Figure 2:
So, we have eight choices for types of monitor, each with sub-choices. How are we to know which one to select? If you were to navigate through all the possibilities (as I did), it should become clear that what you really want to use is a Scripting type of monitor. The reason for this is that we are going to be querying a table in the msdb database, and the other monitor types are monitoring specific objects such as log files, services, performance counters, etc. With a scripting monitor you have more flexibility, which is just what we need for this situation.
Figure 3 shows the exact type of monitor we will be creating:
This is exactly what we want, a monitor that executes a script on a fixed schedule. This particular monitor allows for us to define a healthy, degraded, and unhealthy state, thus a “Three State” monitor. Later on you will see how we will define each of these states. Of course, we do need to put the script together, so perhaps we should look at that first before going any further with the creation of the monitor.
Creating the Monitoring Script
The script itself is very basic, with the idea being that we want to collect a few pieces of information and store them in a property bag. I have included the script with this article, should you want to get this monitor working in your shop as well. The section of code that is doing the actual work is found inside the GetJobCrMoDt function, and the actual SELECT statement being used in the script as follows:
SCRIPT_SQL = "SET NOCOUNT ON" & VbCrLf &_
"SELECT" & VbCrLf &_
"[name]" & VbCrLf &_
", [days_since_creation] = datediff(dd, date_created, getdate()) " & VbCrLf &_
", [days_since_modification] = datediff(dd, date_modified, getdate())" & VbCrLf &_
" " & VbCrLf &_
That is it, nothing more: just tell me the job name, and the date difference from today with regards to when it was created as well as modified. We are interested in a datediff result of one (1) or zero (0), which would indicate that the job has been created, or modified, within the past day. The function then populates and returns the property bag. If you prefer to be alerted if the job has been created or modified in the past hour, or the past week then simply modify the logic appropriately.
Creating the Job Monitor
Once you have your script working, go back to the creation of the Unit Monitor (Figure 3). If you have not yet customized your Operations Management environment to have a custom Management Pack, then now would be a good time to do so. For simplicity, I will use the Default Management Pack, but this is not a recommended best practice. If you rely on the default management pack for modifications, your modifications could be erased during an upgrade. By creating your own management pack you avoid this possible issue. Once you have decided which Management Pack to use (default or custom), click ‘Next’. You should see the following screen (Figure 4):
Here you will find a few items of interest. I selected a name for the monitor (SqlAgentJobCreated) and a monitor target of SQL 2005 Agent Job. I defined the Parent Monitor to be ‘Configuration’. Lastly, I have disabled the monitor. The last three items are the ones that are of most interest. Let’s examine these choices more closely.
If you are like me, you might want to hunt for the generic SQL Agent Job discovery, only to find that (1) no such discovery exists and (2) you are not able to right-click and create your own custom discoveries. Oh well, maybe in the next release, but it sure would be nice to not have to create a new monitor for every version of MS SQL. Better yet, it would be nice to create discoveries for things such as linked servers, or SSIS packages.
First, we have to select a monitor target. In this case, we select SQL 2005 Agent Jobs. Why? Well, because we know those jobs are being discovered, as we have enabled that discovery already (in our shop’s case, we also have the SQL 2000 Agent Job discovery enabled, and I have created this exact same monitor for SQL 2000 jobs).
The second item of interest is the choice of the Parent Monitor. The Parent Monitor is a reference to the aggregate rollup monitors associated with the chosen target, a SQL 2005 Agent Job. If you notice, there are four Parent Monitors to choose from: Availability, Configuration, Performance, and Security (Figure 5):
What we have done here is essentially placed our custom monitor inside the Aggregate Rollup monitor that is in the SQL 2005 Agent Job target, as shown in Figure 6. ‘Entity Health’ is an aggregate rollup that is used to define the overall health of the target, a SQL 2005 Agent job. The health of the target is defined to be the lowest state of any child monitor. What you should be seeing now is the hierarchy of monitors that are built into the health definitions. You have the overall health, then four categories (availability, configuration, performance, security), and then monitors assigned to those categories. You could define custom monitors at any level; we have simply chosen to place our custom monitor inside the configuration aggregate rollup.
This fact will be important later for two reasons. First, it will be displayed in our Database State view as a configuration item that is being monitored. This will allow for us to have a visual representation of the health of our systems that will include the availability, configuration, and performance of SQL 2005 Agent jobs. Second, it will be important when we create our own rollup monitor. This monitor will be necessary for us to create a similar visual interpretation of the health of our system.
Lastly, we need to decide whether to leave the monitor disabled upon creation, or enable the monitor to run against all SQL 2005 Agent Job targets. I recommend you leave it disabled for now as a precaution. After the monitor is in place, you can go back and enable it, or set an override for a specific server (or groups of servers) as a way of taking a small step just in case there are any errors.
Click ‘Next’ and you will see Figure 7:
Here you can specify the schedule on which to run the monitor. Every fifteen minutes might be too frequent, especially if you are going to be configuring alerts. If you have subscriptions tied to the alerts you could end up with more emails than desired. I would recommend you decide how frequently you want to be alerted, and tie that frequency to this schedule, as well as to the datediff function in the script. One other factor to consider is the frequency at which to run the SQL Agent Job Discovery itself, which is defaulted to 6000 seconds. In our shop, I have set an override to run that discovery every hour, and I also run this custom monitor every hour, checking for jobs that have been created or modified within the past day. Click ‘Next’ and you will see Figure 8, where you enter details of the script you wish to run.
One thing to note here about the script is that it expects an input parameter. In this specific case, it is looking for the SQL connection string. If you click on the ‘Parameters’ button you should see Figure 9:
From here, click on the ‘Target’ button, then select (Host=SQL 2005 Agent), as shown in Figure 10:
Then, select (Host=SQL 2005 DB Engine), as seen in Figure 11:
And finally, select Connection String (SQL DB Engine).
Your Parameters screen should now be populated with something similar to that shown in Figure 13:
Click ‘OK’, then ‘Next’, and you should see the Build Event Expression screen, shown in Figure 14:
Now we get into some of the nuances of Operations Manager. What we need to do over the next few screens is to define an event expression that describes various possible “states” of our monitor: unhealthy, degraded or healthy. With Figure 14, we are plunged straight in to defining an “unhealthy” state for our monitor and, unfortunately, it is less than intuitive. The first thing we see is ‘Parameter Name’ for the first column. Next, is the ‘Operator’ column, which is a drop down, so thankfully we are limited in our choices there (‘Greater than’, ‘Less than or equal to’ and so on), and finally we have a ‘Value’ column.
From Figure 14, click on the ‘Insert’ button and let me save you the time of trying to decipher what is needed next. It’s clear that we’re being asked to provide a parameter along with some sort of “condition” that can be tested for in order to determine the “state” of the monitor. But what parameter are they talking about?
If you examine the script that will be used, you will find that we created a property bag to which we added two properties and associated values.
Call oBag.AddValue(jobName & "-Created", CInt(dCreated))
Call oBag.AddValue(jobName & "-Modified", CInt(dModified))
Each value contains the job name along with the text “-Created” or “-Modified”. Next, the Parameter Name column wants you to input the name of the parameter being used by the property bag. Okay then, in this case, the format we want for this particular event expression is as follows:
Like I said, it is less than intuitive. It took more than one attempt at getting this format correct. I stumbled upon it while examining the ellipses found to the right of the Value column shown in Figure 15. If you click on the ellipses and select “Job Name”, then you get:
Look familiar? It should, it is embedded in the event expression above. So, we take that, append the “-Created” onto the end, stuff it into the @Name variable, enclose the entire statement inside of brackets, and put the name ‘Property’ at the beginning. Did I mention this is less than intuitive? The good news is that once you start playing with this tool it gets easier to see how all the pieces fit and work together.
Now, if we find a job that has been created within the past day, we want the monitor to reflect an unhealthy state, as shown in Figure 15:
Click ‘Next’ and you will be required to conFigure for a similar expression to define a ‘Degraded’ state, as shown in Figure 16:
For a degraded state we set the “-Created” parameter to be ‘greater than or equal to 1’, and the “-Modified” parameter to be ‘less than or equal to 1’. In this case, the monitor will raise a warning if we have a job that has been modified recently, but had been created more than one day previously. Click ‘Next’ and you will see Figure 17, where we define a “healthy” state:
Here we set the healthy state defined as both parameter values are greater than 1, which means the job has been created and modified more than one day ago. Click ‘Next’ and you will see Figure 18, where you can conFigure the health states:
We will accept the defaults and click ‘Next’. Figure 18 should give you a sense of what the preceding steps have done, and that is to define the health of the SQL Agent job. If the job has been created and modified more than one day ago, then we have a health state equal to “Healthy”. If the job has been modified within the past day, but created more than one day ago, then we have a health state of “Degraded”. Finally, if the job has been created less than one day ago, we have a health state of “Unhealthy”.
At the next screen, shown in Figure 19, we need to decide if we want to conFigure alerts to be raised for this monitor:
We will leave this unchecked for now, in order to avoid sending out a flood of emails upon initial creation of the monitor. I recommend waiting a few business days to make certain the monitor is working as expected before you enable alerts, especially if you have subscriptions.
While alerts are certainly the best way to get notified when jobs have been created or modified, it is not the only way. Once the monitor is created, and the Operations Manager agents receive their new instructions, you will start to get information that can be viewed in the SQL Agent State view inside the SQL Management Pack on the Monitoring tab, as shown in Figure 20:
However, at the moment the associated view is not populated. In order to “activate” it, we first need to enable a Dependency monitor for the unit monitor we just created. The Dependency monitor will allow for the SQL Agent State view to accurately display the health of the SQL Agent service, and we want to include the monitor we have just created.
First, go to the Authoring tab, right-click on ‘Monitors’, then ‘Create a Monitor’, and then select ‘Dependency Rollup Monitor’ (see Figure 1). You should see the screen shown in Figure 21.
In Figure 21 I have already filled in the necessary information. I gave this monitor a name (SqlAgentJobCreated, but I could have chosen any name I wish), then set the target to be ‘SQL 2005 Agent’, and set the Parent Monitor to be ‘Configuration’. I have also disabled this monitor by default, and will use an override later, but you are free to enable it here if you wish. Lastly I saved this monitor to the Default Management Pack for simplicity, but again you are free to save it to any pack or create a new one. After you have input the information, click ‘Next’ and you should see Figure 22.
Here you set the dependency for this monitor by selecting the newly created unit monitor, SqlAgentJobCreated. What you have now done is tied the health of your SQL Agents to the health of the SQL Agent jobs. So, if someone creates a new job within SQL Agent, you will be able to visually see that through the Operations Manager console.
After making your selection, click ‘Next’, and you will see Figure 23.
Here we can conFigure the rollup policy. In this case, we are going to define our health based upon the worst state of any member. However, there could be cases where the health state should be defined as the best state of any member, or the health could be based upon a percentage of health states. We will accept the defaults here and click ‘Next’, bringing us to Figure 24, where we can conFigure alerts for the rollup monitor:
Again, I would recommend you leave alerts disabled for now, until the monitors have been running and you make certain you will not be flooded with emails from any subscriptions. Click on ‘Create’ to complete the process.
If you now click on that state view from Figure 20, you should be shown the state of each of your SQL Agent jobs, as seen in Figure 25.
If you select one of the jobs, as I have in Figure 25, you should see a detail view similar to Figure 26 (I have purposely blurred the job names).
And there you have it, a custom monitor that will allow you to quickly see jobs that have been created or modified within a given period of time. Furthermore, you can conFigure alerts, which when tied to a subscription, can notify you should a SQL Agent job be created or modified within a given period of time.
Operations Manager is nothing more than a big tinker set. All the pieces are there and the more you look around the more you start to see ways that it can help make your life easier. In the above scenario we were concerned with knowing when jobs within SQL Agent were being modified or created. But you can manipulate Operations Manager in so many different ways.
Have you ever had the SQL Agent not start successfully after a server reboot? Typically when that happens, the resolution is for someone to manually start the SQL Agent service. With Operations manager you can conFigure the default monitor to issue a net start command whenever it sees the agent is stopped. In short, you are being given the building blocks for helping the system to “heal thyself”.
The possibilities are truly endless. If you are more concerned with database development, you could conFigure Operations Manager to collect specific performance data. Or you could build a view of the servers and computers (called a Distributed Application within Operations Manager) that give you a complete view of every connected point for your application. How many times will the phone ring, you answer, and the voice on the other end says the system is “slow”? With Operations manager you would have the ability to build a Distributed Application view that could allow you to drill down to such a level of detail that you could tell the voice on the other end that a network card is not functioning properly.
Can your current monitoring system, especially those that are home grown, give you that level of detail right now? If not, then Operations Manager is a tool that you should consider using.