Centralizing and Analyzing SQL Backup Pro Backup and Restore Data

16 January 2013
by Rodney Landrum

The more servers that a DBA is responsible for, the more important it is to have  an automated way of documenting and reporting the admin events and activity in these servers. It is particularly useful to know about the outcomes of backups and restores.

Any SQL Server DBA who counts their servers on the fingers of more than one hand needs an automated way to document those servers, and to report on core administrative activity across those servers. We're interested in information regarding backup successes and failures, changes in user and role permissions, SQL Agent jobs, and myriad other types of reporting data that a DBA must analyze, and use to react quickly if we spot a problem.

In my book, SQL Server Tacklebox (available as a free eBook), I describe my solution to this problem, which is a centralized DBA repository and reporting solution that combines SQL Server Reporting Services and SQL Server Integration Services to deliver daily reports on the status of our servers.

In this article, I focus in on the topic of analyzing database backup and restore data. Our organization uses Red Gate SQL Backup Pro and the solution I present assumes its use, although I'm sure it could be adapted for those using native backups or other tools. I should note that SQL Backup Pro provides a reporting feature for multiple servers, but I chose not to utilize it for a number of reasons, mainly because it uses linked servers (which, in my experience, can sometimes cause the queries to be painfully slow), but also because the reports themselves don't address all the issues I need to investigate.

My solution is standalone, but works off the same basic idea of a central DBA Repository, and uses SSIS, plus a built-in extended stored procedure provided by SQL Backup Pro, to pull all of the data to a central repository database, for reporting. Having centralized the data, I demonstrate a few useful queries to answer the sort of questions asked frequently of my team, such as:

  • Which are the top 10 largest databases, and what are their growth rates?
  • Which databases are restored to other environments and which ones have never been restored? This type of query addresses concerns for Disaster Recovery.
  • What are the main causes for backup or restore failures across all servers?

Setting up the DBA Repository

This solution uses SSIS to pull data from multiple servers into a central database and requires just a minimal amount of setup.

I provide instructions in a ReadMe.txt, as part of the code download (see the speech bubble at the top right of this page) but, briefly, the first step is to create the central repository database, DBA_Rep (Create_Database_and_Data_Tables.sql). This database contains five tables, as follows:

  • ServerList_SSIS – contains a list of instances from which to collect data
  • RG_Backup –stores all the backup information (backup size, duration, location and so on)
  • RG_Restore – stores all the restore information for each backup
  • RG_Backup_Stage and RG_Restore_Stage – the staging tables

As we'll discuss in more detail as we progress, the SSIS package extracts into two temporary tables the data from each specified instance, performs some minor data manipulation to extract the server name and path in the required format and then loads the data into each of the two staging tables. From here, the package merges the backup data, and the restore data, with the historical data in the RG_Backup and RG_Restore tables, respectively

The first task is to populate the ServerList_SSIS table with SQL Server instance names (Server column) and connection properties (Connect column). Simply specify valid SQL Server\instance names in the Server column, and a 0 or 1 in the Connect column, the latter indicating that the SSIS package should to connect to that server. Each SQL Server instance from which you wish to collect information must have installed SQL Backup Pro server components.

The SSIS Package

This task complete, we can now execute the SSIS package, Redgate_SQL_Backup_Reporting.dtsx. The package contains six steps, as shown in Figure 1.

The six steps in the SSIS package

Figure 1 – Red Gate SQL Backup Reporting SSIS Package Layout

After populating an ADO variable with a list of server (instance) names, it truncates the staging tables, loops through each server one by one, pulling data down to central staging tables for both backup and restore data, and then merges that data from the staging tables to the final destination tables. Let's take a quick look behind the scenes, for each step. As you will see, it's possible to keep each step very simple and yet pull a large amount of data from many servers.

In Step 1, the Execute SQL task, called Populate ADO variable for SQL Server Names, retrieves a single result set of server names using the query:

SELECT DISTINCT Server
FROM    ServerList_SSIS
WHERE   ( [Connect] = 1 )

It populates the object variable SQL_RS with the results, as you can see in Figure 2.

Populate SQL_RS Object Variable

Figure 2 – Populate SQL_RS Object Variable.

The sole function of the Execute SQL task in step 2, called Truncate Staging Tables…, is to clean out the staging tables. After this, we get the heart of the package, in the form, of the two Foreach Loop containers, one for backups called Backup Load, and one for restores called Restore Load, that house the data flow tasks to pull the data we seek. The overall goal of each task is to extract the backup and restore data, respectively, from the specified servers, load it into some temporary tables, for manipulation, and then into two staging tables. Let's look at how it works.

Each of the containers uses the variable, SQL_RS, we just populated in the Step 1. On the Collection tab of the Foreach Loop container that you can see in Figure 3, we choose Foreach ADO Enumerator for the enumerator, and then select User::SQL_RS as the ADO object source variable.

Configuring the Foreach Loop Containers

Figure 3 – Configuring the Foreach Loop Containers

As the container enumerates through each of the server instances, stored in our Serverlist_SSIS table, it needs to change the data source connection on the fly. In order to enable this, we map the SQL_RS object variable (the server instance name) to a string variable called SRV_Conn (the expression value needs to be a string). The Variable Mappings tab of the ForeachLoop container in Figure 4 shows the mapping.

Mapping the Object Variable SQL_RS to a String Variable SRV_Conn

Figure 4 – Mapping the Object Variable SQL_RS to a String Variable SRV_Conn

Then, we simply assign the string variable SRV_Conn as the ServerName value in the source Connection Manager object, MultiServer. You can see the assignment in the Properties tab for the MultiServer connection in Figure 5.

Mapping SRV_Conn to ServerName value via an Expression

Figure 5 – Mapping SRV_Conn to ServerName value via an Expression

Having rigged each Foreach Loop container to enumerate through all of the desired server instances, we can consider in more detail what tasks each container performs, for each server.

Each Foreach Loop container contains an Execute SQL task and a Data Flow task. The Execute SQL task, in each case, creates a temporary table that will reside on the source server instance and populates it with the results of a query that interrogates the backup (or restore) logs. The query interrogates the logs using an extended stored procedure, master..sqbdata, which is a DLL that is installed with the Red Gate SQL Backup Pro server components. Information such as backup duration, compressed and uncompressed sizes, errors, times and dates, as well as file location are all available, and much more.

Native backup and restore reporting

Of course, it is certainly possible to add to this solution support for native SQL Server backup and restore data. We'd just need to develop queries that pull similar information from the MSDB databases tables (see http://msdn.microsoft.com/en-us/library/ms188653.aspx#TsqlStatementsForBackupHistory).

Listing 1 shows the query for the SQL Backup logs only, but the one for the restore logs is nearly identical.

DECLARE @TSQL NVARCHAR(2000)

 IF EXISTS ( SELECT *
             FROM   tempdb.dbo.sysobjects
             WHERE  id = OBJECT_ID(N'[tempdb].[dbo].[tmp_RG_Backup]') ) 
    DROP TABLE [tempdb].[dbo].[tmp_RG_Backup]; 

 CREATE TABLE tempdb.dbo.tmp_RG_Backup
    (
      Server NVARCHAR(60) NOT NULL ,
      Entry_Type INT NULL ,
      Code INT NULL ,
      Message NVARCHAR(1024) ,
      Backup_Start DATETIME NULL ,
      Backup_End DATETIME NULL ,
      Backup_type NVARCHAR(3) NULL ,
      DBName NVARCHAR(128) NULL ,
      Compression_Level INT NULL ,
      Duration INT NULL ,
      Size BIGINT NULL ,
      Compressed_Size BIGINT NULL ,
      Speed REAL NULL ,
      UserName NVARCHAR(128) NULL ,
      Path NVARCHAR(260) ,
      FileName NVARCHAR(100)
    )

GO
 DECLARE @sqlstatement2 VARCHAR(1000)

 SELECT @sqlstatement2 = 'SELECT
''server'' as server, 
Backuplog.entry_type,
Backuplog.code,
Backuplog.message,
Backuphistory.Backup_start,
Backuphistory.Backup_end,
Backuphistory.Backup_type,
Backuphistory.dbname,
Backuphistory.compression_level,
Backuphistory.duration,
Backupfiles.size,
Backupfiles.compressed_size,
Backuphistory.speed,
Backuphistory.user_name,
Backupfiles.name,
''FN'' as FileName

   FROM Backuplog 
   LEFT JOIN Backuphistory ON
      Backuplog.Backup_id = Backuphistory.id
   LEFT OUTER JOIN Backupfiles ON
      Backuplog.Backup_id = Backupfiles.Backup_id 
   WHERE 
 Backuphistory.Backup_start >= ''01/01/2012'' ';

 INSERT INTO [tempdb].[dbo].[tmp_RG_Backup]
        EXEC master..sqbdata @sqlstatement2

GO
 UPDATE [tempdb].[dbo].[tmp_RG_Backup]
 SET    Server = CONVERT(VARCHAR(100), SERVERPROPERTY('Servername'))

GO

 UPDATE [tempdb].[dbo].[tmp_RG_Backup]
 SET    FileName = RIGHT(Path, CHARINDEX('\', REVERSE(Path)) - 1)
            
GO

 SELECT *
 FROM   [tempdb].[dbo].[tmp_RG_Backup]

Listing 1 – Query using master..sqbdata extended stored procedure to interrogate the SQL Backup logs

We populate a temporary table instead of just pulling this directly as a source object in a data flow task, because we need to capture the server name in the result set as well as the file name itself, rather than just the entire path to the file (see the UPDATE statements at the end of Listing 1).

The resulting temporary tables, tmp_RG_Backup and tmp_RG_Restore, are the source objects in the subsequent Data Flow tasks, which simply push the data into the destination staging tables, RG_Backup_Stage and RG_Restore_Stage respectively.

The final two, separate Execute SQL tasks, Merge Stage Backup (/Restore) with Final, simply merge the data from the staging tables to the final tables, RG_Backup and RG_Restore. Listing 2 shows the merge code for the backup data (again, the equivalent for the restore data is very similar).

MERGE RG_Backup AS Target
    USING RG_Backup_Stage AS Source
    ON ( Source.Server = Target.Server
         AND Source.Backup_Start = Target.Backup_Start
         AND Source.DBName = Target.DBName
         AND Source.Message = Target.Message
         AND Source.FileName = Target.Filename
       )
    WHEN NOT MATCHED BY Target 
        THEN
    INSERT  (
              [Server] ,
              [Entry_Type] ,
              [Code] ,
              [Message] ,
              [Backup_Start] ,
              [Backup_End] ,
              [Backup_type] ,
              [DBName] ,
              [Compression_Level] ,
              [Duration] ,
              [Size] ,
              [Compressed_Size] ,
              [Speed] ,
              [UserName] ,
              [Path] ,
              [FileName]
            )
          VALUES
            ( [Server] ,
              Source.[Entry_Type] ,
              Source.[Code] ,
              Source.[Message] ,
              Source.[Backup_Start] ,
              Source.[Backup_End] ,
              Source.[Backup_type] ,
              Source.[DBName] ,
              Source.[Compression_Level] ,
              Source.[Duration] ,
              Source.[Size] ,
              Source.[Compressed_Size] ,
              Source.[Speed] ,
              Source.[UserName] ,
              Source.[Path] ,
              Source.[FileName]
            );

Listing 2 – Merging staging data to final

The code contains only INSERTs i.e. there is no UPDATE or DELETE objects to worry over, which is the beauty of log entries. The logic states if the row does not already exist in the final tables, insert it. Note that the code to pull down both backup and restore data can be limited to a specific period, 2012 in this case.

That is it! Simply load the SSIS package Redgate_SQL_Backup_Reporting.dtsx into VSDT Visual Studio Data Tools (VSDT) for SQL Server 2012 in an existing solution and change the DBA_REP Connection Manager object to point to your DBA_REP database

A quick execution and, if all goes well, you will have data to analyze. I say "quick" but, of course, that depends on the volume of backups and restores per server, times the number of servers you are querying. In my case, I rolled through 91 servers in under 14 minutes, which is certainly acceptable for a package that I may only run twice a month. Of course, I could use some additional logic to make this an incremental load process, which pulls only the new records since the last execution. I may do that one day soon!

Analyzing the Data

Having collected our backup and restore data into a central DBA_REP repository, it's time to write a few queries to find some answers to the three answers questions I listed earlier:

  • Which are the top 10 largest databases, and what are their growth rates?
  • Which databases are restored to other environments and which ones have never been restored?
  • What are the main causes for backup or restore failures across all servers?

Top Ten Largest databases and their growth rates

I am going to assume that the largest databases in the environment got that way because of a heavy and steady transaction rate and therefore are growing the most rapidly still. It may be a false assumption but for the first query, I am going to limit the growth rate analysis to the top ten largest databases.

My goal was to be able to see a list a of servers and databases ordered and grouped by the backup date and the backup size for each of those dates, so I could easily see the pattern of growth or reduction in size, over time.

The query, shown in Listing 3, takes the form of a Common Table Expression and requires SQL Server 2012 (the platform where my repository database resides) since it uses the new LAG Windows Function. This allows me to find the difference in size from the current row and the previous row, partitioned by server and database name and ordered by the backup start time. Note that it:

  • Uses the full, uncompressed backup size to gauge the size of the database
  • Only selects the top 10 largest databases by server and database name.
  • Excludes all rows with errors
  • Filters out records where there were no changes in size from day to day

The Diff_MB column is the results records the differences in size from previous full backup of the same database, by server.

WITH    LagGrowth_DBs ( SERVER, Backup_Start, DBName, Size_MB, Diff_MB,
                        RowID )
          AS ( SELECT   Server ,
                        Backup_Start ,
                        DBName ,
                        Size / 1048576 AS SIZE_MB ,
                        Size / 1048576
                        - LAG(SIZE / 1048576, 1, 0) 
                               OVER ( PARTITION BY Server, DBName
                                      ORDER BY backup_start ) AS DIFF_MB ,
                        ROW_NUMBER() OVER ( PARTITION BY Server, DBName
                                      ORDER BY backup_start ) AS RowID
               FROM     RG_Backup
               WHERE    MESSAGE = ''
                        AND backup_type = 'D'
                        AND size > 0
                        AND dbname IN ( SELECT TOP 10
                                                dbname
                                        FROM    rg_backup
                                        WHERE   dbname <> ''
                                                AND entry_type = 0
                                                AND Code = 0
                                                AND backup_type = 'D'
                                        GROUP BY server ,
                                                dbname
                                        ORDER BY MAX(size) DESC )
                        AND SERVER IN ( SELECT TOP 10
                                                Server
                                        FROM    rg_backup
                                        WHERE   dbname <> ''
                                                AND entry_type = 0
                                                AND Code = 0
                                                AND backup_type = 'D'
                                        GROUP BY server
                                        ORDER BY MAX(size) DESC )
             )
    SELECT  Server ,
            Backup_Start ,
            DBName ,
            Size_MB ,
            Diff_MB ,
            RowId
    FROM    LagGrowth_DBs
    WHERE   Diff_MB <> 0
    ORDER BY server ,
            dbname ,
            backup_start; 

Listing 3 – Query to Find Database Growth Over Time.

Figure 6 shows the results for successive backups of the same large database (excluding any rows where there was no change in size from the previous backup).

Growth Rates for a Large Database

Figure 6: Growth Rates for a Large Database

You can see, for example, that that database grows in size regularly starting at 931,384 MB on January 1, 2012 and ending at 977,607 MB on March 18, 2012. It has a growth rate of around 5GB weekly. This is exactly what I expected for such a large database.

As I began to analyze the results of this query, across all my servers and database, I did see expected patterns for some databases, but I also uncovered patterns that really did not make much sense until I looked more closely. For example, a database whose size value was 630 Gigabytes one day had a size of 243 Gigabytes less than two months later. What would account for such a staggering decrease in size, I wondered? This sort of discovery leads to more investigation and ultimately makes the query more useful. In this case, it turned out that a data archival process ran over a weekend that freed up all of the space.

Which Backups have been Restored and to Where?

The next query will reveal those databases for which we are verifying backups, regularly, via test restores. Of course, with a slight alteration to the JOIN and WHERE clauses, it could equally well reveal those databases with no associated restores. This data can also tell us how much additional space we require to store multiple copies of the same database.

The query, shown in Listing 4, joins the two tables, RG_Backup and RG_Restore, on the file names and compressed file size, not the database names. The advantage of this is that even if the database name changes on the target restore environment, we still get to see both the source and target server in the same row. Notice also that the query considers only full database backups (backup_type = 'D') and restores (restore_type = 'D') and excludes errors (code and entry_type = 0).

SELECT DISTINCT
        RGB.[Server] AS Backup_Server ,
        RGR.[Server] AS Restore_Server ,
        RGB.[DBName] AS Backup_DBName ,
        RGR.[DBName] AS Restore_DBName ,
        RGB.[Backup_Start] ,
        RGB.[Backup_type] ,
        RGB.[Size] ,
        RGB.[Compressed_Size] ,
        RGB.[FileName] ,
        RGR.[Restore_Start] ,
        RGR.[Restore_End] ,
        RGR.[Restore_type] ,
        RGR.[Size] ,
        RGR.[Compressed_Size] ,
        RGR.[FileName]
FROM    rg_backup RGB
        INNER JOIN RG_Restore RGR ON RGB.filename = RGR.filename
                       AND rgb.compressed_size = rgr.compressed_size
WHERE   rgb.dbname <> ''
        AND RGB.entry_type = 0
        AND RGB.Code = 0
        AND RGB.backup_type = 'D'
        AND RGR.entry_type = 0
        AND RGR.Code = 0
        AND RGR.restore_type = 'D'
ORDER BY RGB.Server ,
        RGB.Backup_Start ,
        RGB.dbname

Listing 4 – Backup and Restore Query

Figure 7 shows some sample output, showing the original database names from the source server and the new database names on the target, along with backup and restore times and so on. In our organization, there is no single schedule for test restores; we restore many databases daily, some only weekly or even monthly. In additions, there are numerous ad-hoc requests from the business. In total, across our servers, this query returns 4536 rows for a period of about one year, which is about what I would expect to see.

#

Figure 7 – Backup and Restores of Same Backup File

Causes of Backup Failures

The final query is one that I am mandated to run frequently to report on backup failures, especially for production systems. SQL Backup Pro has come a long way over the years in it resiliency to errors but there will be times when fatal backup failures still occur and we need to respond quickly. Failures range from warnings to full-blown hardware problems or space issues. Resource contention for is a common problem, which can be alleviated, generally, by changing the backup schedule to less busy times.

Listing 5 shows a basic query to find distinct errors, by server, since September 1st (roughly 3 months of data).

SELECT  COUNT(DISTINCT backup_start) Distinct_Error_Count ,
        MAX(backup_start) Last_Failure_Date ,
        Server 
        --DBName
FROM    RG_Backup rgb
WHERE   ( rgb.entry_type = 1
          OR rgb.Code >= 1
        )
        AND rgb.backup_type = 'D'
        AND RGB.Backup_Start >= '09/01/2012'
GROUP BY Server
ORDER BY Distinct_Error_Count DESC

Listing 5 – Server with Most Backup Errors

Ordering by decreasing numbers of errors, we can focus our efforts on the worst offending servers, like the top three servers in Figure 8 with 41, 26 and 21 errors, respectively.

#

Figure 8 – High Database Failure Rates on Servers

Summary

In this article, I presented a means to centralize backup and restore reporting for SQL Backup backups, across all of your servers, using a simple repository and an SSIS package. I offered three queries to interrogate the resulting backup and restore data, but I've since thought of many more such queries that could be useful in analyzing the data. The next steps would be to add these queries as SSRS reports or dashboards and perhaps even tie in Key Performance Indicators to see how the situation is improving over time, in terms of reducing errors, saving disk space and so on.

It is not a complete solution to be sure; if you decide to give this a try, you will want to add in missing indexes, alter the base tables to add more columns that I have not included, such encryption, verification and speed of backups. At any rate, if you use Red Gate SQL Backup Pro or native backups (or both) and manage many SQL Server instances, I hope this solution provides a solid foundation for you to start logging centrally and reporting from your backup and restore data.


© Simple-Talk.com