Click here to monitor SSC
  • Av rating:
  • Total votes: 24
  • Total comments: 7
Alexander Karmanov

Database Deployment Challenges

30 January 2013

Database delivery patterns & practices

STAGE 1 VERSION CONTROL

There are a number of challenges that make the deployment task more difficult. Alex reviews the common techniques for deploying new databases and upgrading existing ones, and their flaws, and argues the advantages of an automated, incremental, script-based approach to deployments

Many development teams give insufficient thought to how they plan to deploy a database in an IT project, both in terms of the initial deployment, into various environments, and the subsequent and inevitable "fixes" to the database's structure and configuration during upgrades. The problem stems from the fact that, often, the database is not subject to the same rigorous versioning and testing procedures as the application code. With no version control, developers attempt to create a database deployment script from the moving target of a live development database. It is a very fragile, generally non-repeatable process. It is also very difficult to work out 'what changed' between deployments, or to rollback a previous working build in the face of problems.

When database version control exists, the situation is better but the build and deploy process still presents a significant obstacle. It assumes that, for each deployment, the DBA collects all the changes, and applies the scripts in the correct order to create (or recreate) the database, while maintaining a history of these changes. Too often, this is a frustratingly manual process, often involving a desperate search of a developer's machine for some changes that weren't checked in, and frustration when a deployment that works for one environment fails in another, due to differences in configuration or permissions.

These, and other difficulties, mean that teams often prefer to delay database deployment until as late as possible in the development process, which is a mistake. In fact, they must plan it carefully, script the entire process, and automate it as far as possible, with all necessary component parts, including configuration details, stored in a version control system.

Over the course of this short series of articles, I describe the important factors to consider when dealing with the code for a database deployment, and my solution to the problems. It comprises a set of scripts glued together with environment-aware configuration files and, as a whole, offers a database development and maintenance framework that supports an evolutionary (incremental) database development approach, and so fits well into an "agile" development methodology.

In this first article, I'll review some common techniques for deploying new databases and upgrading existing ones, and their flaws, and the advantages of an automated, incremental, script-based approach to deployments.

New Database Deployments

In the database lifecycle, there are two major types of deployment tasks: the deployment of a new database and an upgrade of the existing one (plus other tasks, such as the data migration, or data export).

The database comes alive when we deploy it for the first time into the target environment and so let's start by reviewing a few common approaches to this process.

Direct Database Creation

Sometimes developers deploy a new database directly into a "production" environment, creating all the objects manually, through SSMS. This is appropriate only for workgroup-based systems shared between few users.

Development Database Copy (Backup/Restore)

Developers create a copy of the database with all the objects and initial data. The deployment is merely a copy operation between development (staging) and production environments.

There are several reasons why this approach is not applicable generally, perhaps the most significant being the underlying assumption that it's acceptable to copy around production data, possibly containing sensitive information, to various, possibly insecure development environments.

Abstract Database Definition

In projects concerned with portability, the database is supposed to be vendor-independent and so it becomes important to avoid all vendor-specific functionality in the database. We standardize the database structure, as far as possible, describe in a generic form, and apply it to specific database servers, via a built-in application process.

While this is a viable approach, I will not consider it further in this article, because the complexity inherent in such an approach in turn requires a database deployment solution more complex that one based on simple scripting.

Scripting Database Creation

In some cases, the development team may decide to promote the database between environments by generating a database deployment script from the development copy. They can use SMO Scripting, SSMS, or a tool like Red Gate's SQL Compare to script the development database into a single script. They can also script the data (i.e. the system and reference data), or load it from an external data source (Excel or other files).

Single Script

The benefit is that a single script is easy to apply to the target environment and it is easy to put it in source control. The problem is that all objects are contained in the same, often rather big file. Collaborative modification of such a file is usually complicated.

At the time of deployment, complications can arise if it proves to be necessary to modify or supplement the database script to accommodate the differences between the development and target environments. For example, we may need to modify the security settings (object permissions) to reflect different user names, logins, and so on, or the names and properties of the database files. We can anticipate such problems by ensuring that we configure all the environments identically. It does not guarantee problem-free deployment, but at least gives some process repeatability.

Multiple Scripts

In this approach, we script each database object into a separate file, or extract each script directly from source control. Again, we can script the data or load it from an external data source.

One of the major questions is how to apply those scripts, specifically in what order. The simplest option is to create a batch file and execute them in the preferred order. A slightly more advanced option is to use a configuration-driven tool, for instance a batch file supplied with a list of files to apply. In absence of such a tool, the only option is to document each step and ask the DBA to execute the scripts in the described order. Although this might work, it is a task crying out for automation.

Again, any deployment of a database from multiple scripts, automated or otherwise, will need to deal with issues of differing target environment configurations, such as file layout, size and growth options, and security options.

Database Upgrades

Once a database is in a production environment, we cannot, usually, simply drop it and recreate it using the new version of the code. From that moment, the database support team must provide a reliable and repeatable way to promote any database changes.

Usually, we can manage the required changes to a database schema, from one version to another, without any 'explicit migration', or special scripted steps, being required for adapting the data, in order to preserve it in the new schema. A table alteration is generally benign, and the addition of a column is not going to result in data loss. However, deletion of a column will. A synchronization tool will usually infer how to preserve data and thereby avoid the need for a manual script to perform the migration of the data. It will perform an implicit migration of data into the updated schema

Where there is an obvious single solution to preserving data, a synchronization tool can do any necessary implicit migration of data as part of the synchronization process. There are, however, certain schema changes that leave too many questions and so require an explicit migration script. To work reliably in a way that preserves data, your database comparison tool needs to be able to run hand-cut scripts instead of the auto-generated ones, to be able to progress, and roll-back transitions from one version of a database to another when a refactoring process has made automatic synchronization impossible.

Compare and Synchronize Schemas and Data

We can upgrade a database by comparing the schema of the production database with that of the staging, or development, database and producing a script to identify differences in database structure and data. We can then execute TSQL scripts against multiple servers to synchronize them. There are very good Schema Comparison and Data Comparison tools available from Red Gate and other vendors, but generating a synchronization script is only the first stage, or at least it should be just the first stage; in practice, developers often execute the change script generated by the tool immediately.

It scares me to think that a tool alone is making the decisions about what column to add or to drop during the production deployment, without a person to analyze and approve such changes. Someone with intimate knowledge of the deployed database must analyze the result of the synchronization process, comparing the database structure before and after, in order to verify that the resulting status is acceptable and all the differences are either expected or tolerable. Such differences might arise in user permissions, or in the names of objects, such as indexes or constraints, inserted due to it being a participant in a replication process, or auto-named objects concerned with declarative referential integrity (DRI), where the name varies from build to build. The resulting modifications to the deployments scripts may be either time-consuming when applied to the production size database, or simply disregarded as not important. Moreover, discovering the differences at the very last moment makes each deployment unique and unrepeatable; we cannot be confident of a consistent result.

Perhaps the biggest problem with this approach is that the scripts produced by such tools are applicable to a specific combination of source and target database. I've seen cases where someone directly modified the target database after generation of the synchronization script, rendering it invalid for the new deployment.

Additional considerations are as follows:

  • Installation and security – either we must install the synchronization tool on the target machine, or the DBA's machine must have access to the target database server. In some cases, the production environment (along with some other high-level ones such as UAT or Performance Test) may be locked-down environments meaning it's not possible to generate scripts by comparing to the live database.
  • Dealing with radical DDL changes – for example, if there is a table-split, and the existing production data differs from the staging data, there must be extra scripting to ensure that the production data is migrated to the new data model
  • Promoting changes to or from the wrong database – I've worked in teams where all developers and testers created their environments from copies of the same Virtual PC. In such cases, it can be surprisingly easy to connect to the wrong database by mistake.

"Apply These Scripts…"

The second option is a variation of the first, but we perform the DDL comparison in advance, and generate scripts for review, rather than synchronizing through a schema comparison tool. In other words, developers prepare a set of database modification scripts and pass them to the DBA/deployment team, along with instructions. Sometimes the DBA team will track each applied change in a change management system (CMS), though in most cases, its usage is limited to process logging/tracking. Some teams may use a source control system to preserve the scripts, along with additional information (such as deployment instructions).

With well-established change management procedures, this approach delivers stable, repeatable results but still requires DBA resources. However, in my experience, this process is often fairly ad-hoc, and disorganized, with deployment instructions dispersed over an email thread or IM, or, in the worst case, extracted from the developer's memory over a phone call. This can severely limit the repeatability of the deployment process.

Change Log-based Automated Upgrade

In this approach, we use a Change Log to track all changes made to the database structure and its static data. In essence, the change log organizes change scripts into a structure driving the deployment. It looks something like this:

<Builds> 
	<Build Number="0.0.0.0" /> 
	<Build Number="0.0.0.1" /> 
... 
<Build Number="0.1.0.3"> 
		<Step 
			Number="1" 
			Name="dbo.UserActivity table added to TrackingDB" 
			Type="sql" DBName="TrackingDB" 
			FileName="\0.1.0.3\dbo.UserActivity table added.sql" 
		/> 
... 
		<Step 
			Number="10" 
			Name="dbo.Events table indexes added" 
			Type="sql" DBName=" TrackingDB" 
			FileName="\0.1.0.3\dbo.Events table indexes added.sql" 
		/> 
... 
</Build> 
	<Build Number="0.1.4.0" /> 
</Builds>

This change log defines which DDL scripts to apply, based on the required database version (or label of some sort), in order to update the target database from the current build to the new one. It must also redeploy all stored procedures, and redefine permissions as described in associated configuration files. Upon success, it would stamp the database with the new build number.

This, simply put, is a "Press This Big Button" approach. It allows the deployment team to apply all the changes without any knowledge of the database internals and the database itself. The deployment automation utility simply reports success or failure of the execution, so the administrator has just enough information. We can log the additional information into a file for later analysis, as needed.

Each individual developer produces each change script in his or her preferred manner:

  • Write the script manually
  • Modify a local database copy in SSMS and then produce a difference script with SQL Compare or SSMS scripting facilities
  • Modify the source ERwin model and generate the new version of the database, then derive the database modification script

The important part if that each script then goes into Source Control System (SCS), along with an associated entry in the Change Log, so that the SCS contains a series of the database modification scripts linked together with the change log file. Whoever applies these changes needs to simply download the latest version of the database code and run the database upgrade tool.

These scripts form part of a database upgrade package, containing:

  • All the scripts required for upgrade a database from all of previous versions
  • A list of changes (a.k.a. Change Log), in a form of a configuration file
  • A deployment utility to upgrade the database
  • Additional configuration files to define security configuration, database and server settings, and so on, independently of the code and schema change scripts.

All the files included in the package must be stored in a source control system, with all the benefits that brings. That system becomes the one and only source of the database changes for everybody; nobody should make changes to the target database in any other way. These, in essence, are the mechanisms that underpin my automated deployment framework, which I'll go on to describe in subsequent articles.

Scripting Databases vs. Shared Database Development

In my experience, the two main database development models in common use are:

  • Shared live database, no source control – in its most common form, all developers work on a central copy of the database
  • Dedicated development databases with Source Control – each developer works on his or her own copy of the database generating a script for each change, which is checked into Source Control

A Critique of the "Shared Live" Model

It is common to find small-scale development projects in which only one copy of the database exists. We create a copy of the database, typically by restoring a backup of the production database. All developers then make their changes directly to this database. At an agreed milestone, the developers generate a deployment script, using comparison tools, to migrate the final state of the database to a testing environment. Typically, the live, shared database is the central reference point for a database deployment. I see several flaws with this approach, and Troy Hunt discusses several others in his article, here: http://www.simple-talk.com/sql/sql-tools/the-unnecessary-evil-of-the-shared-development-database/.

Fragility

The live database (especially a shared one) is too fragile and vulnerable to be a reference point. It is too easy to make a change in the database. It is complicated to track the changes and even more difficult to reproduce or replicate those changes into another environment. Once we allow direct, ad-hoc database changes we can never assume it to be in a stable and reliable state.

Yes, one can put the whole database, its backup file under a source control. However, that only preserves a snapshot of the database. To answer a simple question such as "what has changed between these two versions?" we need to restore the database backup and to use a special tool (such as Red-Gate SQL Compare and Data Compare).

Proper Rollback is impossible

It must be possible for the deployment application, or script, to query the current version of a database so that it can apply the correct tested script to roll forward or roll back the database, whilst preserving the data within it.

When generating change scripts from a live database, rolling back a database change is never a pure rollback but rather a compensation for the previously made erroneous steps.

Non-repeatable Deployments

If we generate the migration script by comparing one live database to another, we have a script that is useful in exactly one database deployment. What if, as often happens, the DBA must propagate the latest changes rapidly to many servers, whilst keeping these servers in a consistent state?

The live database lacks any information about other environments in which another version of the database may exist. What the DBA needs is a deployment technique whereby by simply specifying the target environment, the changes can be deployed as appropriate for than environment i.e. taking into account any differences in security settings and so on.

Advantages of an Automated, Scripting Approach

In my opinion, database scripting, when combined with appropriate automation, simplifies database development and maintenance. If we script a database into text files, we can then put them into a version control system where we can track, roll back or re-order changes. We can detect easily the differences between the database builds using common source control tools. We can automate deployment of those scripts to a point where there is no need for a DBA to create the deployment script/batch file, manually, each time.

By making the source-controlled version the reference point for a deployment, rather than a live, shared development database, we also get the following benefits, among others:

  • Having as many copies of the database as one wants without bothering DBAs
  • Switching from shared database development environment to isolated development boxes
  • Rather than have environment-specific details, such as file layout and characteristics or security settings (user logins in particular), buried in the code for a particular deployment, we can separate them into configuration files, making the actual deployment code much more reusable, when a database exists in several environments.
  • The database change process becomes evolutionary – a series of smaller, incremental steps – and therefore much more controllable
  • We can group parts of the database into sub-projects. Database functionality may be spread between tables/stored procedures/UDFs and other objects. Seeing them in the database together is quite difficult but having them scripted together in a folder is much more convenient

Summary

In this article, I described the various approaches to database deployments, and explained why I advocate an automated, evolutionary, script-based approach, using a version control system. In the next article, I'll explain in more detail the goals and advantages of an automated, script-based framework for database deployments, and its most important components.

This article is part of our database delivery patterns & practices series on Simple Talk.

Find more articles for version control, automated testing, continuous integration & deployment.

Alexander Karmanov

Author profile:

Alexander lives and works in Toronto, Ontario, Canada.

Search for other articles by Alexander Karmanov

Rate this article:   Avg rating: from a total of 24 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: A series of articles to watch for
Posted by: Philip Kelley (view profile)
Posted on: Tuesday, February 05, 2013 at 8:26 AM
Message: Looks like a good start to an interesting series of articles. I've used a series-of-scripts update methodology for years, and often compare it with accounting journal entries: treat it as a series of entries done in a specific sequence where you can never go back and undo an action (such as, pretend you never withdrew that $100) -- instead, you have to make correcting entries to "undo" prior actions.

I look forward to seeing how you handle the need to roll back Production database changes.

Subject: Automated deployment tools
Posted by: WBrewer (view profile)
Posted on: Thursday, February 07, 2013 at 11:19 AM
Message: Great article!
There is an interesting white paper that can be downloaded here from the Red Gate site ...Automated deployment using Red Gate tools (PDF)
This describes the technical details of a new tool, SQL Deployment Manager, that Red Gate is actively developing. From what I've seen of it, it should be able to get around many of these problems you mention.
William.

Subject: To Philip Kelley
Posted by: Alexander Karmanov (view profile)
Posted on: Thursday, February 07, 2013 at 12:34 PM
Message: Yes, that's a very good analogy.

>> I look forward to seeing how you handle the need to roll back Production database changes.

Ha, that's simple - rollback the database(s) to the point prior to the deployment, if you can.
If not - quickly hack up a correction and deploy it. Your accounting journal analogy suits here perfectly.

Originally I did have rollback entries in the change log, so for each step forward I could have a step back, and it did work.
Luckily there was a very wise operations chief which pointed out that nobody would guarantee that the system is working after such rollback. Nobody would do the whole system regression test for the system after rolling update back, which may not happen... So I removed that option wishing I consulted that man before.

Subject: To WBrewer
Posted by: Alexander Karmanov (view profile)
Posted on: Thursday, February 07, 2013 at 12:35 PM
Message: Thanks, I will read it.
Mind it didn't exist six years ago when I created mine...

Subject: Keeping a Ledger of script deployments
Posted by: Phil Factor (view profile)
Posted on: Friday, February 08, 2013 at 3:02 AM
Message: @Alex @Philip
Philip has a good point. The versioning history has to be like a ledger. A deployment cannot be wiped from the record if there has been any refactoring, and any data has changed between deployment and rollback.
I've made myself unpopular in the past by insisting on tested rollback scripts. If it is a simple deployment without messy refactoring (I don't get many of these) then the database reverts to a known version that will pass that version's tests. However, if you rollback a live system, there is normally new data that has no possible place in the previous version so a pure 'rollback' would lose that data. You usually cope with added tables to hold that data, but the version isn't strictly the old version since the metadata has changed. You can't really manufacture a new version, so this fact has to be recorded in the 'ledger' so that the next deployment can know that it must deal satisfactorily with the orphan data.

Subject: Using SSDT - modern tools are a huge improvement
Posted by: paschott (view profile)
Posted on: Thursday, February 21, 2013 at 1:50 PM
Message: We started using SSDT for our releases. It (and Red-Gate's Source Control) has the advantage that you tell the DB what you want it to look like and it handles the appropriate changes to make it look like that. If we want, we can script those changes as well so we can review them before making the changes. I went through the "save a bunch of scripts" methodology and it worked well until we then had to cherry-pick changes out of the release folder because not everything could go at that time. Weeks of poring through scripts, test runs, tests, rollbacks, repeats - I don't want to go through that again.

It's a tricky thing to handle, but I'll admit that I like the tools available now as opposed to saving lots of scripts. Besides, I can generate a release package and hand that off to someone else to push the changes if necessary - no worries about something else getting in or being left out. The same package pushed to our pre-prod servers is pushed to production so all is the same once released.

Subject: SSDT
Posted by: Alexander Karmanov (view profile)
Posted on: Friday, February 22, 2013 at 10:50 AM
Message: 2paschott:
>>tell the DB what you want it to look like and it handles the appropriate changes to make it look like that. If we want, we can script those changes as well so we can review them before making the changes.

That's exactly what I loath to do. At the moment of the database update the least thing I want is to think why the hell there is such a difference and who did it. I can think of it during the development and initial test deployments but not afterwards. Unfortunately no tool provides that level of assurance yet. Our development cycle could stretch over few months, so for me it is simply impossible to remember all the modifications and why there were done.

>>It's a tricky thing to handle, but I'll admit that I like the tools available now as opposed to saving lots of scripts.

Once you grasp the idea - the rest is simply implementation... with any tool or process. Provided there is an idea behind it.

>>Besides, I can generate a release package and hand that off to someone else to push the changes if necessary - no worries about something else getting in or being left out. The same package pushed to our pre-prod servers is pushed to production so all is the same once released.

And to do it for each release? No thanks. With my approach the TFS trunk contains the current stable release, so whoever can take and update whichever environment. No need even to bother me.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.