Click here to monitor SSC
  • Av rating:
  • Total votes: 10
  • Total comments: 2
Douglas Reilly

Minimizing the need to restore a backup

15 September 2005

Part three in a series exploring major facets of SQL Server backup

The second installment of this series discussed the cost of maintaining systems with various levels of downtime, ways data can be lost, and what constitutes a backup. In this installment, I discuss how to configure your hardware to minimize the need to restore a backup.

Getting the hardware right

Properly setting up hardware is the first step in keeping your data safe. Decisions you make as you specify hardware and set up databases can determine whether you are a database hero or scapegoat. Your average hardware guy does not know SQL Server, and your average DBA does not know hardware. Being the one to understand enough about both worlds can make a real difference in the outcome.

Think for a moment about your average, run-of-the-mill, low-end server. It might have one or two hard drives and one or two power supplies. When SQL Server is installed with the default settings, all programs, data and logs are placed on the same drive, in the same folder (awkwardly, in my view, in the Program Files folder).

How many single points of failure do you have with the default setup? First, the failure of either a single hard drive or single power supply will bring down the server, at least temporarily. In the case of a hard-drive failure, you will need to get a new drive, reinstall SQL Server and restore backups. This will take some time.

There are a number of things you can do to reduce such possible exposure to catastrophic loss of data. The first thing is to consider purchasing a server that has at least two power supplies. A power supply is a fairly common point of failure, and often getting redundant power supplies adds only about $250 to the cost of the system. This is money well spent.

RAID levels

RAID (redundant array of independent [or inexpensive] drives) hard disks are two or more drives used in combination to provide fault tolerance and/or performance benefits. Doing a Google search for RAID will get you a large number of articles covering RAID in great detail, but below is a quick explanation of the popular RAID levels.

  • RAID 0—striped disk array without fault tolerance. Blocks of data are spread over two or more drives, providing better performance but no redundancy.
  • RAID 1—mirroring and duplexing. Writes data to multiple disks at the same time. Given proper hardware support, speed is about the same as non-RAID drives. You get only one disk’s worth of capacity for each two disks you have in RAID 1 configuration.
  • RAID 5—data and parity information spread across multiple drives. RAID 5 uses a distributed parity arrangement, so that parity information on a single drive does not become a bottleneck. There is some additional overhead to read and write parity information.
  • RAID 0+1—two RAID 0 stripes with a RAID 1 mirror created over them.
  • RAID 10—multiple RAID 1 mirrors are created, and a RAID 0 is created on top of these.

There are a number of other RAID levels, but these are the ones used most often. RAID 4, for example, is like RAID 5, but has a dedicated parity drive. It is unlikely anyone would use RAID 4 when the same hardware can generally do a better job in RAID 5 configuration.

In the case of the RAID levels that provide redundancy, you might ask what exactly happens when one of the drives go bad? This also depends on the hardware involved.

In many cases, RAID drives are "hot swappable," meaning that when the controller senses a failure, you can simply remove the offending drive and replace it while the server continues to operate. After the drive is replaced, the new drive will be updated so that it will contain the data and parity information required. A RAID 5 configuration will fail if another drive fails before the first bad drive is replaced. Some controllers allow a "hot standby" drive that contains no data but will automatically be used when one of the drives fails.

Other possible points of failure are the network card, memory, processor and motherboard. Many servers offer redundant network cards and error-correcting memory to reduce the chances of catastrophic failure from those components. In the event of processor or motherboard failure, you will likely need to do some repairs.

Configuration for SQL Server

How does this information on hardware figure into how you will install and configure SQL Server? When specifying the hardware and OS setup for a SQL Server machine, there are a number of things you can do to improve reliability and recoverability.

First and foremost, you want to keep the log and the data on different drives.

Whenever SQL Server makes a change to the database, in addition to changing the data, information on the changes are written to the log (yes, this is a bit of an exaggeration—some operations are not logged). By placing the log on one drive or RAID array and the data on another, you have the greatest chance of eliminating a catastrophic loss of data. If the log volume fails, you can simply repair the volume with no loss of data. If the data volume fails, given a backup and the log, you should be able to recover up to the point of failure.

I recently had an embarrassing lapse in SQL Server backup procedures that stemmed from not placing data and logs on separate volumes. While I have overseen proper backup and recovery procedures on a number of client sites, my own personal sites (which were really hobby sites, not a part of my business) were less well protected. I was using a very old server that had been repaved a few times, and unknown to me, the different logical volumes I was using for data, backup and logs were on the same physical volume. You can guess the rest: hard drive failure, taking all recent data with it.

Fortunately, this was personal data that was not needed for my business; my business data was properly backed up and was restored within an hour or two. But, just because this data has no commercial value, doesn’t mean it isn’t important. Recovering it involved some clever use of the Google cache of a page that the data was on. What is that story about the shoeless son of the shoemaker?

While SQL Server can be installed on FAT, FAT32 or NTFS volumes, there is no reason not to install SQL Server on an NTFS volume. NTFS supports compression of a volume or directory, however it is strongly recommended not to compress any volume or directory being used for SQL Server data or log files. Compression seriously degrades SQL Server performance. If a volume is to be dedicated to SQL Server’s use for log or data, format the drive with a 64 KB cluster size (a size that matches the SQL Server data extent size).

Rather than just using RAID 1 (or RAID 5 or RAID 10) for both data and log arrays, our understanding of how the data on the log and data volumes will be used can allow a more intelligent choice. Presuming you are interested in the best in read-and-write performance and cannot tolerate any data loss, using RAID 10 or 0+1 for data and RAID 1 for the log offers outstanding performance and reliability, but at a substantial cost. For even more reliability, you can have multiple mirrored copies of the RAID 10 or 0+1 data volume and RAID 1 log volume.

Presuming your needs are a little less extreme, and especially if write performance is not critical, using a RAID 5 volume for the data and a RAID 1 volume for the log can offer much of what the more expensive RAID 10/RAID 1 option does.

Using either RAID 5 or RAID 10 for your data, you most likely will be able to recover from a single disk failure. Disk failure is the most likely hardware-related problem with your server.

Eliminating remaining points of failure

Even with the RAID setup described above, there are still multiple points of failure to consider. Should the motherboard or disk controller fail, your data will likely still be on the hard disks, but inaccessible until the hardware problem is resolved. I say the data is "likely" to be on the hard disks, since it is possible for a disk controller to fail in a spectacular enough way that it takes all the data with it.

I commonly work on web applications, and one of the great things about the protocols used on the web is the ease with which you can eliminate a single point of failure. Depending upon a number of factors, you can create a cluster of web servers, with a load balancer in the front of the cluster to distribute incoming requests. The stateless nature of HTTP requests generally means that each single request can be handled by any machine in the cluster. If a user requests a list of items, clicks on one of the items and is sent to a different web server, the request can be handled cleanly. This ignores things like session state, but many web technologies allow session state to be shared among machines in a cluster.

Clustering SQL Servers is much more difficult. To take advantage of the Microsoft solution, you must be using the Windows 2000/2003 Advanced Server or Datacenter version, as well as SQL Server Enterprise Edition. While I have often used clustering for web servers, I have not used clustering on SQL Server. I imagine that is due in part to the overwhelming cost of such a solution, as well as the difficulty of setting it up.

Note also that if you cluster SQL Server on two machines in the same physical location, you are not protecting against all of the types of failure I discussed in part two of this series. Clustering does not prevent catastrophic site failures, such as extended power outages, floods and hurricanes. There are more mundane site failures as well, such as loss of Internet connectivity.

There are a number of possible solutions for mundane site failures. Presuming normal connectivity between two sites, you can create a server that will contain a very recent copy of the live database. I discussed some ways to replicate data from one SQL Server to another in an article on synchronizing SQL Server databases using Red Gate Software’s SQL Toolkit .

Something I have learned about recently is log shipping. Books have been written on the topic of SQL Server replication, but while it sounds like it might be a simple solution to the problem, it is not simple at all. More important, it is not cheap. It involves deploying the most reliable servers, adding a second fall-over site, keeping the two sites synchronized, and managing client access to the correct location. Such a setup and configuration can easily bring your costs into the millions of dollars.

Next: specifics of backing up SQL Server data

Now that you know why to backup, what exactly a backup is, and how to configure your hardware to minimize the need to restore a backup, the next step is to explore the options for SQL Server backup. Where should you backup? How often? And what should you do with those backups after you make them? All that in the next article in this series.

Douglas Reilly

Author profile:

The late Douglas Reilly was the owner of Access Microsystems Inc., a small software development company specializing in ASP.NET and mobile development, often using Microsoft SQL Server as a database. He died late in 2006 and is greatly missed by the SQL Server community as one of the industry's personalities.

Search for other articles by Douglas Reilly

Rate this article:   Avg rating: from a total of 10 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: backup and recovery
Posted by: Anonymous (not signed in)
Posted on: Tuesday, April 17, 2007 at 8:35 PM
Message: hi Douglas

This is a good article MUST be read

I had few question need to consult u

Will it be good idea to store the mdf and ldf file at same drive at RAID10 or rather storing ldf file at RAID1 different drive

I also storing the SQL backup file at RAID1 as well.

Can u point me to some infor on how to recover the database if we only have mdf file.
how to recover the database if we only have the SQL backup file BAK and ldf file

Thanks a lot

gbkhor@gmail.com

Subject: My Methodology
Posted by: SQLWayne (view profile)
Posted on: Monday, August 13, 2007 at 11:17 AM
Message: Very good series. I would like to point out one thing re: RAID controller failure. I have heard of, though not experienced, situations where the controller failed and an identical replacement could not be obtained. When a new card was installed, it couldn't recognize the RAID array because of differences between how the two cards implemented it. So probably a good idea to buy two cards so you have a spare when setting up RAID arrays. Also important to have the RAID cards at the same firmware level.

For my work set up, I have my databases doing transaction log backups (or incremental database backups, depending on the recovery model in place) every 15 minutes. When that backup is done, it copies the log backup file to our SNAP server, or, for incrementals, backs up again to the SNAP box. That server is backed up twice an hour by our off-site Tivoli system. At noon, I do incremental backups of all databases, locally and to the SNAP server. I have two large databases, 50 & 150gig, that don't change at all (raster data for GIS), I back those up on the weekend.

I also have a system in place so that I have the last ten days of nightly backups on SNAP (automated, so no maintenance required) and also the backup from the first of the month for the last year there.

So theoretically, if the server goes up in flames, I'm out 15 minutes top to recover from SNAP. If the data center goes up in flames, I'm out half an hour from Tivoli. I certainly hope the data center doesn't go *poof!* because my desk is about 20' from my server racks!

And since I have these aged backups on my SNAP server, unless I lose my data center, I should be able to restore any database without having to go to tape. Fortunately I have yet to have to restore a production database at my current employ.

I have the advantage that my shop is not 24/7, so I can take my time with backup schedules.


One thing that I don't recall seeing mentioned in your series is the importance of backing up Master and MSDB. You might as well include Model while you're at it, I back up my three system databases in a single job to a single device: back up Master to initialize the device, then append the backups for Model and MSDB.


And the final thing (HONEST!) re: delete queries. My boss at a former company ran a delete statement in Query Analyzer written by a consultant in another state. Instead of deleting 80 rows, it deleted 80,000. Fortunately I was able to do a point-in-time recovery and we didn't lose any data. Anyway, I showed him how to write a delete statement as follows:

--delete {tablename}
select * from {tablename}
where {delete condition}

This way you can see what will be affected by your statement before you commit. Uncomment the first line, comment out the second line, and you're gold.

 
Simple-Talk Database Delivery

DLM
Patterns & Practices Library

Visit our patterns and practices library to learn more about database lifecycle management.

Find out how to automate the process of building, testing and deploying your database changes to reduce risk and make rapid releases possible.

Get started

Phil Factor
Documenting your SQL Server Database

One of the shocks that a developer can get when starting to program in T-SQL is that there is no simple way of... Read more...

 View the blog

Top Rated

Using the T-SQL PERCENTILE Analytic Functions in SQL Server 2000, 2005 and 2008
 Percentiles give meaning to measurements by telling you the percentage of the population being measured... Read more...

A Start with Automating Database Configuration Management
 For a number of reasons, it pays to have the up-to-date source of all the databases and servers that... Read more...

Archiving Hierarchical, Deleted Transactions Using XML
 When you delete a business transaction from the database, there are times when you might want to keep a... Read more...

Rollback and Recovery Troubleshooting; Challenges and Strategies
 What happens if your database deployment goes awry? Do you restore from a backup or snapshot and lose... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.