Click here to monitor SSC

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page. As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management. In his spare time, he enjoys running, football, contemporary fiction and real ale.

The DBA Daily Checklist

Published 13 May 2008 9:56 am

The average DBA has to perform many routine checks on his or her servers. There will be daily checks, weekly checks and, probably, monthly checks. A short while ago, we were having quite a debate about what items would be on the DBA Daily checklist. Because we all like lists with ten items in them, we compiled a list that goes something like this:

 

  1. Connectivity: Make sure each database is available and accessible both by logging in via user applications, as well as running test scripts.
  2. Backups: Check database and log backups, archiving and offsite storage.
  3. Events: check all database logs, application logs and system logs, Agent history, device logs, NIC logs etc. Investigate any job failures
  4. Processes: Check that all required processes on the server are running, including Replication
  5. Integrity: Perform all database and server integrity checks; look for objects that break rules
  6. Indexes: Check on indexes to see if they are being used, need re-creating, or if any are missing
  7. Volumetrics: Check resources on the server such as files sizes and disk space, and monitor growth
  8. Performance: Check application performance, and performance statistics, using the Perfmon tool; research and resolve any issues.
  9. Procedures: Check all Disaster Recovery Plans
  10. Security: Look for security policy violations

 

We thought we had a fairly exhaustive list, but then a Simple-Talk forum entry made us think again. Randyvol had posted in response to our DDL Trigger workbench:

 

…last week some idiot turned a host of triggers off in our ERP system, causing a cascade of posting problems on dozens of orders before we caught the root cause…

 

He wanted to know if there was some way of checking whether Triggers have been disabled. It may be unusual and obscure, but it makes perfect sense that on some production servers you will need to check to see if triggers have somehow been disabled. The consequences could be dire.

 

We added this check to our list but then began thinking: what else are we missing? There are so many different ways that SQL Server is used that there must be a host of other checks that could make all the difference.

 

As always, we’d like to hear what you think. Post your suggestions as a comment to this blog, and you’ll go into the draw to receive one of five Simple-Talk gift bags!

 

Cheers,

Tony.

12 Responses to “The DBA Daily Checklist”

  1. ChssAddct says:

    You mention checking for missing Indexes. How about checking for missing Tables!

    For Instance: A curious type sees a table named Numbers. They look inside and see it has a single column, filled only with the numbers from 1 to 10,000. What a ridiculous, useless table, right? Clearly a dust-bunny from somebody experimenting with some SQL! What use could a table with all the numbers from 1 to 10,00 possibly have? I think I’ll drop this table right now. And anyway, if it’s really needed, how hard can it be to write a loop and create a new one?

    Of course, they’ve now just dropped the “Swiss Army Knife” of SQL developers (see
    http://www.simple-talk.com/sql/t-sql-programming/the-helper-table-workbench/). It’s used in UDFs, Stored Procs, Scripts, Scheduled Jobs, and apps. A seemingly useless looking table, yet can’t do without it!

    Note, Robin & Phil, a ‘cleverer’ way in this article of creating the table instead of using a loop: http://www.sqlservercentral.com/articles/TSQL/62867/

    Of course, checking for missing tables would also entail those tables that contain crucial data, but this is the first table that comes to mind that would simply start to wreak havoc were it to go missing. Easily created — sure. Unrecoverable loss of business data? — nope. Crucial to have around? Absolutely.

    Although, as much as I use it, I wouldn’t ever have to perform a check for it having gone missing. The cell phone would be blaring “help! things just stopped working”.

    Oh, and I

  2. Phil Factor says:

    Agreed that there are loads of clever ways of doing the numbers table, but we thought that introducing them would distract from the main point.

    Having spent a while doing a daily policing job on a database-driven website, my daily checks always included a lot of checks for intrusion. I was always amazed by the number of such attempts, some of them of great ingenuity. In the end, having done the obvious checks, I was looking for any unusual activity or changes in the pattern of usage. Snort (http://www.snort.org/) is great, but I’d love a way of doing it specifically for SQL Server that is less propellor-head.

  3. RLaubert-MCITP says:

    One of the things that is becoming almost daily are patches. I currently have patches for the OS, SQL Server, Assorted OS programs (IE, .NET, ASP etc), the applications, IIS, Visual Studio and the list goes on and on.

    Each of these patches has to be tested on development, staging and then production. So it is an almost daily process that somewhere, one or more of these are taking up some of my time.

  4. MVV says:

    - Find what it is there and shouldn’t (Unused indexes/tables/SP) left behind by previous incarnations of the apps. Check as well for unused or must delete access (fired workers or quitters.) I was amazed by the resilience of logins. Some of the logins i found were so ancients , nobody remembered the guys/gals.

  5. Rodney says:

    These may fall somewhere within the listed categories, but these are some I thought of as I read through; ones that I check regularly.

    * Long running or “hung” processes, especially those tied to a SQL Agent job.
    * Make sure that SQL Mail, SQL Agent Mail, Database Mail, SQL Backup or xp_SMTP_Sendmail is working correctly, especially if you have jobs or backup code notifications sewed to these. Issues with mail queues with MAPI clients are common (not sql 2005 or smpt thankfully)
    * SQL Agent Job modifications. In organizations that do not have a centralized scheduling system and rely on SQL Agent, any schedule modificaiton can have negative impact. I use a report that shows which jobs were modified and when, unfortuantely it does not reveal “who”, but with SSRS execution snapsots, I can see what was changed.

  6. johnc says:

    SQL Agent up and running.

    SQL Agent Jobs that fail.

    Databases that automatically grew.

    -jfc-

  7. jimbow says:

    Clearly, depending on your enterprise scope, how many hours are there in the business day to deal with all the 10 or more points – on a daily basis? Just checking the complete status of 100 database backups is not a trivial task. Most times you would rely on the engine to report success for the whole process in a single notification. Is that good enough to get on with the next item in the day’s chores ?

    Similarly if it can be automated and all the health checks can be trusted to be done that way – daily checks are then to ensure that the automation itself is working and the notifications to be dealt with etc are as reliable and minimal as possible.

    The DBA is not necessarily an individual.

  8. randyvol says:

    Randyvol here again (in the process of creating DDLtriggers for my databases!)

    Referring to the post about missing tables, (I think I’ve posted this elsewhere, but I will do so here again), what about a way to list the last time a table had a SELECT, INSERT, UPDATE or DELETE operation against it?

    Why would I want to know that? I have a legacy SQL Server 2000 system with literally hundreds, maybe thousands of tables across a dozen or so databases that I suspect are in the ‘dustbunny’ category. But we still have so many jobs running against this system that it would take more time than I have to parse through all the scripts, sprocs, triggers and what have you to collect a list of ‘keepers’.

    If I could just figure a way to know the above and run a query to find tables that say, have had no activity in the last qtr, 6 months, or year; I’d say it is a candidate to get blown away and give me back some disk!

    Regards.

  9. imassi says:

    Checking to make sure there is a pot of coffee brewing actually has proven an effective check. There’ve been a couple of times this week that someone asked me to look into something, but have it turn out that they weren’t paying attention and didn’t click the right thing. They apologized and said that they hadn’t had their coffee yet. If only I could write a script to automate this process…

  10. Ron Dameron says:

    We use log shipping extensively.

    So, checking that all the different log shipping plans are synchronized is on my top ten list.

  11. sagreene says:

    There used to be a good page here with daily/weekly/monthly tasks. Maybe it will be back up:
    It’s the sql 2000 Operations Guide.
    http://www.microsoft.com/err/technet/prodtechnol/sql/2000/maintain/sqlops2

    The trigger issue is easy to fix – take away DDL rights from people.

    It’s best to have competent people at the controls. Trying to plug up every possible mistake with monitoring systems is probably a poor idea.

    Monitoring a few critical functions and restricting access is a nice balance.

    Here is how I did a few monitoring tasks. Props to Rodney for the SSIS package which was the basis for all of this.

    Connectivity:

    Set up a SSIS which polls all servers ‘select @@servername’ every 10 minutes. Pages and emails on failures. http://www.zabbix.com – this monitoring software can monitor applications and post data to forms – then you can use regex to verify you get the correct results. Also gives you nice graphs of response times and can alert if they fall out of bounds. I was hoping to use this to monitor sql reports as a start, but it cannot authenticate to windows…and we don’t have kerberos working.

    Backups: Check database and log backups, archiving and offsite storage.

    I’ve been working on a SSIS package that retrieves the most recent backup file location and restores it on a test server. I’ve been told the only way to really test a backup is to restore and use it. This will let me restore a chunk of databases – say 50 or so and notify analysts to change their connections and test. Sweet!

    Volumetrics/Performance:

    We’ve been using software called Zabbix for monitoring this type of stuff, free, template features and nice graphing. I monitor, drive space, cpu usage and memory – handles have been useful and have alerted us to problems on a few servers.

    Security: Look for security policy violations

    You can do the same thing with an SSIS package, gathering security facts and putting them in a repository then generating sql reports that can be scheduled.

    I also do a weekly pager test from our main clusters by raising a high severity alert and making sure that sends an email and page.

    This article was great for getting that going: http://www.simple-talk.com/sql/database-administration/sql-server-alerts-soup-to-nuts/

  12. Eric Russell says:

    Regarding the previously described issues of disabled triggers and dropped tables in production, it sounds like enforcing a change management policy would make the DBA’s life a lot easier. Ideally, only the DBAs and the application or services have a login to production.
    One thing I like to do occasionally (especially when troubleshooting) is use SQL Compare to generate a report of object schema differences between a daily snapshot of production and the pre-prod QA server, just to confirm that nothing has been ad-hocly deployed since the last official release. In addition to just schemas, changes to meta data and configuration settings that can cause subtle changes in the applications behaviour that are hard to diagnose, and using SQL Data Compare on specific user and system tables can cover that base as well.

Leave a Reply