Click here to monitor SSC
  • Av rating:
  • Total votes: 12
  • Total comments: 6
Feodor Georgiev

Handling Backups for Rapid Resilience

20 November 2012

The backup and restore system in SQL Server hasn't changed a great deal over the years despite a huge growth in the typical size of databases. When disaster strikes, and an important service is taken offline while a restore is performed, there is often time to reflect on whether it might be possible to design databases for a more rapid recovery of the most critical parts of a database application.

In my previous article, Designing Databases for Rapid Resilience, I suggested some ways to improve your disaster recovery strategy by ensuring a robust design of your databases.

In this article I would like to take the topic a bit further, and set out some of the opportunities that already exist, and some potential opportunities, for maintaining a database service in the face of unforeseeable events.

Even if it is true that ‘any backup is better than no backup’, there is still a lot to think about when it comes to improving your backup and recovery strategy.

Types of database objects…

SQL Server Backup isn’t discriminating. It saves everything that is required for the restoration of a database, whatever its value. To SQL Server, the backups are nothing more than a collection of 8kb pages which are written one next to the other in a backup file upon request.

Much of what is backed up isn’t data. In fact, only a small minority may be data. If you are using a lot of XML data, for example, and extracting data from it, the chances are that your indexes will dwarf size of the original data. These indexes are quick to generate, yet SQL Server has no sense of their relative value and so includes them in the backup. Because databases aren’t functional until they are all pulled out from the backup, and recovery is completed, it means that the service is likely to be down for longer.

This is the safest way of doing it, but it means that, unless you are taking strategic advantage of filegroups with Enterprise edition, the database cannot function at all until everything is restored. (In Standard edition we can also take advantage of restoring one filegroup at a time, but we have two serious disadvantages: an object cannot span over more than one filegroup and the entire database has to become OFFLINE while the filegroup is being restored. )

To put it simply, could any database be more quickly restored if only the data and essential database objects were backed up, the rest being recreated via a DDL/DRI script?

To consider what opportunities there may be, let’s look into what objects there are in a database. Each database has some of the following:

  • Heaps
  • Clustered indexes
  • Non-clustered indexes
  • Statistics
  • System data (users, settings, etc)
  • DMOs data

And all of these objects are persisted on disk, except for the DMOs - which are in-memory objects containing dynamically collected system data and can be queried as if they were database tables or functions.

I mention the DMOs here because they are still very important beyond a SQL Server instance restart or database restore even though their data is not persisted on disk. It would be useful to have the database performance data at the time of the last backup if a disaster happens.

The question of cost vs. value

Even though all of these objects are the building blocks of a database, each object can be measured in three aspects:

  • Performance value – what is the performance benefit when the object is present
  • Performance cost – what is the performance cost when the object is restored from backup
  • Importance – how vital it is to have the object in place in order to have a functional system on a minimum level

So, let’s look at each one of them:

  • Heaps – These cost a lot in I/O when restored from a backup, and are vital for the data (heaps are the database tables which have no clustered index)
  • Clustered indexes – These indexes cost a lot in I/O when restored from a backup, and are vital for the data availability and the query performance
  • Non-clustered indexes – This type of index costs a lot in I/O when restored from a backup, but are not vital for the data; They may be vital for performance. Of course, important question is if the non-clustered index is used to ensure data integrity, in the case of unique constraints, which makes the non-clustered index vital for the data.
  • Statistics – They have some cost in I/O, especially when they are rebuilt; and they are vital for good performance, but not for the data availability itself
  • System data (users, settings etc.) –This costs very little in I/O when restored from a backup and is vital for the integrity of a database. System tables are very important part of the backup showing what logins were created, what settings were adjusted and so on (for example: index stats rebuild setting overwritten after backup restore etc)
  • DMOs data – This data cannot be included in a backup together with the database, but DMOs data is important for performance debugging, and hence it is a really good idea to offload the DMO data to persisted storage together with the backing up of a database.

How do we backup each one of them?

As of the time of writing this article I am not aware of any tool or method for selective backup on a database object level. However, here are a few potential ways that I believe would really improve the real-world DBA’s working life and the speed of disaster-recovery:

  • I would like to have a way to backup only the heaps, clustered indexes and the system objects. If I could do this, then I would be able to restore the entire database much faster after a disaster, without having to perform that extra I/O for the non-clustered indexes. If this were feasible, I would prefer to have an easy way of generating scripts for only the non-clustered indexes and pairing them in the archive with the backups of the heaps, clustered indexes and the system objects. This way, I could restore the data quickly, bring it online and then immediately start restoring the non-clustered indexes by executing the scripts.
  • There is a way to use an SMO script to create and archive the source script of the indexes and also the statistics of a database; and this is a great way for performance tuning off-site without compromising the security of the data. This method is also great for performance tuning of VLDBs; let’s say we have a 1 Tb database and we need to back it up and restore it to a laptop for some performance tuning. We do not need to go and buy a 1 Tb hard drive; instead all we need is to script the schema, the indexes and the statistics and restore it to a laptop. The execution plans will be the same, even though the data is not present.
  • I would like to have an automated way to backup all metadata from DMOs related to a specific database with the click of a button, and keep it paired with the database backup. This would really help me keep track of performance data and will help maintain the databases by providing object usage statistics and so on.
  • I would like to have a way to determine the importance of objects (heaps and clustered indexes) and define which ones should go in a backup and which ones should not. For example, I would like to backup only the Sales and Customer data, but not the Vendor data, since only the first two are of critical importance to the business, and the last one can be restored later on.

In my previous article (“Designing Databases for Rapid Resilience”) I covered some of the benefits of partitioning and how it can help with the faster restore after a failure.

In reality, however, it is only the Enterprise edition which supports the splitting of the same object over multiple filegroups, which means that in the case of disaster, the filegroups can be restored in the order of importance.

In any other edition of SQL Server, we can store one object on one filegroup only, which takes care of the idea to restore certain objects first after a disaster has happened.

In either case, however, we cannot avoid the backing up of less important non-clustered indexes.

What this means is that we have to go through the I/O penalty of backing them up, and then sooner or later we will have to go through the I/O penalty of restoring them from a backup.

And when a disaster happens…

When a real disaster happens, we all hope that we have the most recent backups, that they have been tested and that they can be restored as fast as possible (ideally without any of the users or bosses experiencing any disruption).

In reality, if a disaster happens and part of the database is not available, what we can do is to backup all filegroups except for the damaged one.

Then the smart thing to do is to look in the cache to see if any data from the failed filegroup is still cached, so we can offload it to a temporary location.

Here is an example: if we have Table1 on UserFileGroup1 and if the disk where the UserFileGroup1 resides has failed, the chances are high for the data to be in cache if the data from Table1 has recently been worked with.

In this case we can query the data as if there was nothing wrong with the UserFileGroup1’s disk, and we can insert the queried data into a table residing on a different filegroup.

Of course, the chances of all the data being available in cache are slim, and they diminish as time passes by. This is why we should have a very smart and tested alert system and smartly designed and tested database backups.

Could a better way of restoring a database be used that has the strategy of restoring at least a limited ‘critical’ functionality more quickly and then allows full range of less vital services to be restored in background in an incremental way?

In versions of SQL Server other than Enterprise there are opportunities to design for rapid recovery. In any version of SQL Server, the possibility exists of creating database applications as a series of databases on the same instance that are backed up separately, so that the most vital one can be restored first, whilst the others, if damaged, are merely built from script with only static data, with their functionality disabled in software. Then, as each database is restored, its functionality can be restored. As well as requiring some clever scripting to automate the process, it cannot be achieved without planning and a careful analysis of the acceptable service level for each part of the application just as I described in the first part of this series.

To conclude

In conclusion, there are vitally important objects and not-so-important objects in every database and it would be very hard for a DBA to select only the relevant objects to be included in a backup; however, as described above, there are steps which can be taken to significantly minimize the impact of a database disaster.

As mentioned in the first part of the series, there are several ways to use smart database design to achieve great results; and as mentioned in this article, some ‘positive discrimination’ on an database object level could speed up the backup and restore processes, utilize system resources in a better way and most importantly - minimize the downtime of our database systems.

Feodor Georgiev

Author profile:

Feodor has been working with SQL Server since 2002, starting on the 2000 version and mixing it up as newer versions - 2005, 2008 and 2012 - were released. He specializes in database performance tuning, documentation and scalability management. He also works as project leader and mentor on SQL Server and Business Intelligence projects on Microsoft-based solutions. HIs specialties include: Database Architecture, Microsoft SQL Server Data Platform, Data Model Design, Database Design, Integration Solutions, Business Intelligence, Reporting, Performance Optimization, Big Data. When he is not busy with his DBA work, keeping up with the latest SQL Server tricks or sharing tips on forums, he writes articles on SQLConcept.com.

Search for other articles by Feodor Georgiev

Rate this article:   Avg rating: from a total of 12 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Am I missing something?
Posted by: GCHQ (not signed in)
Posted on: Thursday, November 29, 2012 at 6:54 PM
Message: Based on the premise that timed backups are occuring it should only take a matter of minutes to restore a DB unless every single DB on the server has crapped out. Our method is a little crude in that it stops the DB, deletes the DB and log files copies them from the backup and restarts the DB again. We use the same method to automate rollback if a large procedure fails part way through, backing up before the procedure starts (as per code snippet below)

Public Function RestoreDB_Global(ByVal HOAID As Integer) As Boolean
Try
vService = New Service1Client
strSQL = "SELECT * FROM HOA_Connections WHERE Connection_ID = " & HOAID
Dim DS As DataSet = vService.ReturnDataSetHAS(strSQL)
Dim DT As New DataTable
DT = DS.Tables(0).Copy
Dim FilePrefix As String = "Error"
For Each Row As DataRow In DT.Rows
FilePrefix = Row("Connection_String")
Next
If FilePrefix = "Error" Then
Return False
End If

'stop the DB
strSQL = "STOP DATABASE " & FilePrefix & " UNCONDITIONALLY"
If vService.InsertDataHAS(strSQL, "Functions 2725") = False Then
Return False
End If

'Delete the DB
strSQL = "xp_cmdshell 'del /F c:\\DB3\\" & FilePrefix & ".*', 'no_output'"
If vService.InsertDataHAS(strSQL, "Functions 2731") = False Then
Return False
End If



'Copy the DB from the backup
strSQL = "xp_cmdshell 'xcopy c:\\DB3\\System\\" & Current_HOA_ID & "\\" & FilePrefix & ".* C:\\DB2\\', 'no_output'"
If vService.InsertDataHAS(strSQL, "Functions 2739") = False Then
Return False
End If
'Restart the database
strSQL = "START DATABASE 'C:\\DB3\\" & FilePrefix & ".db' AUTOSTOP OFF"
If vService.InsertDataHAS(strSQL, "Functions 2744") = False Then
Return False
End If
Return True
Catch ex As Exception
EmailError(ex)
Return False
Finally
If Not vService Is Nothing Then
vService.Close()
vService = Nothing
End If
End Try
End Function


Subject: Am I missing something?
Posted by: GCHQ (not signed in)
Posted on: Friday, November 30, 2012 at 7:10 AM
Message: Based on the premise that timed backups are occuring it should only take a matter of minutes to restore a DB unless every single DB on the server has crapped out. Our method is a little crude in that it stops the DB, deletes the DB and log files copies them from the backup and restarts the DB again. We use the same method to automate rollback if a large procedure fails part way through, backing up before the procedure starts (as per code snippet below)

Public Function RestoreDB_Global(ByVal HOAID As Integer) As Boolean
Try
vService = New Service1Client
strSQL = "SELECT * FROM HOA_Connections WHERE Connection_ID = " & HOAID
Dim DS As DataSet = vService.ReturnDataSetHAS(strSQL)
Dim DT As New DataTable
DT = DS.Tables(0).Copy
Dim FilePrefix As String = "Error"
For Each Row As DataRow In DT.Rows
FilePrefix = Row("Connection_String")
Next
If FilePrefix = "Error" Then
Return False
End If

'stop the DB
strSQL = "STOP DATABASE " & FilePrefix & " UNCONDITIONALLY"
If vService.InsertDataHAS(strSQL, "Functions 2725") = False Then
Return False
End If

'Delete the DB
strSQL = "xp_cmdshell 'del /F c:\\DB3\\" & FilePrefix & ".*', 'no_output'"
If vService.InsertDataHAS(strSQL, "Functions 2731") = False Then
Return False
End If



'Copy the DB from the backup
strSQL = "xp_cmdshell 'xcopy c:\\DB3\\System\\" & Current_HOA_ID & "\\" & FilePrefix & ".* C:\\DB2\\', 'no_output'"
If vService.InsertDataHAS(strSQL, "Functions 2739") = False Then
Return False
End If
'Restart the database
strSQL = "START DATABASE 'C:\\DB3\\" & FilePrefix & ".db' AUTOSTOP OFF"
If vService.InsertDataHAS(strSQL, "Functions 2744") = False Then
Return False
End If
Return True
Catch ex As Exception
EmailError(ex)
Return False
Finally
If Not vService Is Nothing Then
vService.Close()
vService = Nothing
End If
End Try
End Function


Subject: Am I missing something?
Posted by: sibir1us (view profile)
Posted on: Friday, November 30, 2012 at 10:39 AM
Message: @GCHQ - The article is a mental exercise on internal database objects and its purpose is to bring to the reader's attention that backing up everything in a database is not always needed, and also during restores, certain objects should take precedence over others.
The method you suggest is quite interesting, and I guess that it works quite well in some cases.
One question, though: how do you copy the data and log files in the first place? In order to create a copy of the data and log files of a database which is ONLINE, you would either need a special tool or you would need to take the database OFFLINE. In either case, the most important thing is to have the database in a consistent state. I would like to know how you achieve that.

Subject: Backing up
Posted by: GCHQ (not signed in)
Posted on: Friday, November 30, 2012 at 12:25 PM
Message: I have no idea how that got double posted - think I just refreshed the page the following day - sorry!

Backup is achieved by issuing the backup command and the directory to backup to (as per this code snippet)

Public Function BackupDB_Global(ByVal HOAID As Integer) As Boolean
Try
vService = New ServiceReference1.Service1Client
strSQL = "BACKUP DATABASE DIRECTORY 'C:\\DB3\\System\\" & HOAID & "'"
If vService.InsertDataHOA(strSQL, "Functions 91 " & strSQL, HOAID) = False Then
vService.Close()
vService = Nothing
Return False
End If
Return True
Catch ex As Exception
EmailError(ex)
Return False
Finally
If Not vService Is Nothing Then
vService.Close()
End If
End Try
End Function


Scheduled backups are a server event


BEGIN
DECLARE day_of_week VARCHAR(9);
DECLARE backup_stmt LONG VARCHAR;
SET day_of_week = substr(dayname(today()),1,3);
SET backup_stmt = 'BACKUP DATABASE DIRECTORY '||
'''C:\\DB3\\BACKUP\\DBName\\' || day_of_week || ''' ';
EXECUTE IMMEDIATE backup_stmt;
END

Subject: Re: Backing up
Posted by: sibir1us (view profile)
Posted on: Friday, November 30, 2012 at 12:44 PM
Message: The exercise in the article points out to the fact that certain database objects have more value for the operations of the business than others.
Let's take an example of a database I work with: The database is 900Gb, the compressed backup is about 300Gb. If I drop the non-clustered indexes and compress the backup, the backup will be about 100GB, which makes a big difference for the money we spend on backup storage (not only space, but also throughput!), and the time it takes to restore and have the business online (re-creating the non-clustered indexes is a secondary task).

Subject: Re: Backing up
Posted by: GCHQ (not signed in)
Posted on: Friday, November 30, 2012 at 6:13 PM
Message: Yes I understand the concept - but part of the argument was based on speed of recovery.

We do have a different situation with each customer having a DB (or multiple DB's) each. The average size is around 16Gb and we can run up to 100 DB's per server. The reason we opt for this model is based on recovery time - if the backup is local, or on 40Gb Ethernet, it's under 60 seconds - if we need to go to an older backup on 1Gb NAS it's still under ten minutes.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.