In most databases, very small percentage of data changes between SQL Server backups. We tolerate this because it keeps things simple, and we have grown used to cheap storage and fast networks. To exploit the value of Cloud services, should we be rethinking this in favour of backup strategies that optimise resilience and minimise downtime?
Our data is growing in size and complexity, but so, at the same time, are everyone’s expectations about our ability to make this data constantly available without interruption. To meet these demands, I'll be arguing that we need to adopt more sophisticated backup strategies, also making use of cloud services can be efficiently used to deliver coherent backup solutions to the enterprise, and make disaster-recovery quicker.
At a price, we can make any service resilient. How, though, is it possible to achieve an efficient and economic backup strategy in the enterprise? We face having to provide ever more data with increasing data dependencies across more disparate systems.
To answer this question we need to first look at the strategies for achieving backup resilience as well as at the classification of the data that is stored in order to get a better estimate of the requirements for backup.
A brief summary of strategies for achieving backup resilience
A while back I wrote an article on Handling Backups for rapid Resilience, which outlined the approach to planning resilience by identifying the database objects most in need of backing up and thinking about the possible approaches to achieve more resilient backups. If you take this approach, you’ll find that any database has objects which are important to the functioning of the database, but which can be reconstituted without needing a backup.
In the article I highlighted the following database objects:
- Clustered indexes
- Non-clustered indexes
- System data (users, settings, etc)
- DMOs data
I wondered where the balance lay with the average database between the performance value, backup cost and importance of the above objects to the functioning database. Full database backups are very inefficient ways of storing information since so little data changes between each backup. Deduplication technology illustrates the point. They are very similar.
A brief summary of data classification
In another article (Disaster recovery for the enterprise) I wrote briefly about how data can be classified, and why it is important to classify data before you plan how to meet the requirements of the business to provide resilience for their systems.
In that article I tried to emphasize how important it is to classify the data we work with: depending on the patterns of access and use (transactional and analytical) and date patterns (how much of the data is merely for historical reference and how much of it is current).
The general idea in that article is that no one can afford to keep their data all together without thinking about segregation and classification in a smart way in order to cut costs.
In this article I will discuss how proper classification and specific attention to database objects translates into designing a successful cloud backup service.
There is also another way to classify data in order to get enough of an understanding of it to design an efficient backup regime: we can think of the data querying patterns.
In this classification we focus on how many rows are queried and what is returned. Are there many rows queried and aggregated to return one single row, or are we looking for single rows at a time?
To take this further, we should determine what part of the data is queried the most and how it is aggregated.
This is important, because we may want to attempt to backup only the ‘most important’ parts of the data, or even to pre-aggregate data and then back it up in order to save time, space and network bandwidth.
How can we segregate data within a database according to classification?
Straight out of the package, the system has no idea where in the database the more important data is. It just doggedly backs everything up with the same thoroughness. However, a logical database can, and often does, exist in several databases on the same server, or even on several servers. Each database can have its own backup regime. Within a database, you can back up individual files and filegroups (a logical grouping of database files) within a database. By using a File Differential backup, you can back up only the data modified since the latest full backup within a database file or filegroup. This, combined together, will give you several options for a backup strategy.
Taking a more radical stance, we could start by backing up the schema of our databases. The schema is just as important as the data and it is much easier and faster to backup. After securing the availability of the schema, we need to think about extracting and backing up only the critically important data, i.e. heaps, clustered indexes, unique indexes, lookup tables (tables referenced by foreign keys). In the case of failure, if we have the schema we can still restore it, and then proceed to restore the most valuable data from other sources. A database that provided the minimum acceptable service level (such as being able to take orders) could be created surprisingly quickly.
Can we make effective use of BaaS (Backups as a Service)?
The challenge is that we have gigabytes of data to backup and severe limitations of network and internet speeds. How can we make use of this technology in an intelligent way so as to minimize the backup times and ensure reasonable costs?
We cannot count on constantly backing up our entire databases and uploading the files to the cloud. Even if we were running full backups once a week and differential backups once per day, depending on the activity on the database, we could still have way too much data flowing as a backup to the cloud. /p>
IIt is not just the time it takes to offload the backups to the cloud every time. We also need local resources, particularly local storage, for both backups and restores.
Identifying the most vital data
To start with, we should estimate how much of our data we really need to backup; as I mentioned earlier, each business has its own pattern of collecting, storing, querying and distributing data, so the answer is bound to be different for every organization. Each business will consider some part of their data to be more valuable than other parts and thus having higher priority to be restored than the rest of the data.
Furthermore, as mentioned above, some businesses may discover that they can restore a service much more quickly if they pre-aggregate data, back it up as such and then restore the pre-aggregated data first, and much faster, in the case of disaster; The original/granular data can be restored in due course since its absence will not impact the important parts of the service.
Even though this topic is specific to each company and its data, I am confident that there are ways to track data patterns throughout the enterprise to determine data value and priority. (Nowadays collecting performance and usage of data is almost trivial, for example there are plenty of tools which can give comprehensive index usage, operational and physical statistics.) /p>
AAfter figuring out the value to the business of each category of data, it is necessary to find a strategy for doing backups in a smart way, without extreme overhead (and by ‘overhead’ I am referring to price, time, reliability and so on).
Different data categories require different backup strategies
Running the regular ‘full – incremental – log backups’ restore process will probably not be efficient in a big enterprise for several reasons: the price of reliable local storage, the network costs and the inefficient use of resources by backing up the same data over and over again.
Instead, by analyzing the data access patterns we can determine what data to backup and restore first, as well as how to approach it. (For example, if we use actively only ten percent of our data for daily business operations, we definitely do not need to backup the remaining ninety percent every day.)
Making economical use of the internet bandwidth
By using cloud backups in a smart way, we can also save money and effort. We can stream only the deltas of our backups to the cloud. We might do this by using a temporary inexpensive local drive to store the backup and then to stream only the difference to the cloud.
This way we do not need to overload the network by repeatedly sending gigabytes of the same data over and over again, but instead we copy only the deltas on a backup file level; this process is also known as deduplication. (There are several different ways to achieve data deduplication, however the technical details are not in the scope of this article. The important thing to remember is to only transit differences or ‘deltas’ over the network.)
However, a big problem posed by the cloud is latency of the data transfer. This is not only because of the time it takes to upload the files, but mainly because of the time it takes to download the files when needed, as well as achieving acceptable restore speeds. DBAs worry about doing local backups or restores over a local network that is an order of magnitude faster.
Hence, a further improvement of the eduplicated uploads to the cloud service would be to keep a local copy of our backups on local disks. Should this be available when we need to restore data, we could run a check which would merge the data available from the local disk and the cloud to restore the data from a combination of both, which would be much faster.
Conventional compression can work together with deduplication. If the deltas are transferred in compressed form, then the process will take around a tenth of the time. Depending on the degree of normalization of the data, we can get much higher compression ratios for our backups.
As mentioned above, if we have a way to backup only part of the database objects and if we manage to transpose the data to a column major order, then we can get a much higher compression ratio.
TThe tradeoff for all of this is some. Trading internet bandwidth for CPU is an easy decision to make nowadays.
Thinking outside the box for offsite backups
Another option for disaster recovery is to keep a hot standby in the cloud that is kept in sync with log shipping. This could provide an emergency service from the cloud if there was a disaster to the on-site database, and would allow the offsite backup to be merely a series of compressed transaction log backups. The drawbacks would be the extra cost of the server over cheap backup storage, and also the fact that one could not prioritize the various categories of data unless they were kept in separate databases.
As the quantity of data increases, backup strategies will need to be reconsidered and refined within the IT industry. The conventional ways of performing backups to local discs are not always optimal anymore, and to continue to use a monolithic backup strategy regardless of the value or uniqueness of the data will prove to be more and more expensive from now in terms of time, reliability, flexibility and money.
In conclusion, can we turn efficiently to BaaS and can we make it fly?
Considering all mentioned above, I don’t see why not. /p>