Many development teams give insufficient thought to how they plan to deploy a database in an IT project, both in terms of the initial deployment, into various environments, and the subsequent and inevitable “fixes” to the database’s structure and configuration during upgrades. The problem stems from the fact that, often, the database is not subject to the same rigorous versioning and testing procedures as the application code. With no version control, developers attempt to create a database deployment script from the moving target of a live development database. It is a very fragile, generally non-repeatable process. It is also very difficult to work out ‘what changed’ between deployments, or to rollback a previous working build in the face of problems.
When database version control exists, the situation is better but the build and deploy process still presents a significant obstacle. It assumes that, for each deployment, the DBA collects all the changes, and applies the scripts in the correct order to create (or recreate) the database, while maintaining a history of these changes. Too often, this is a frustratingly manual process, often involving a desperate search of a developer’s machine for some changes that weren’t checked in, and frustration when a deployment that works for one environment fails in another, due to differences in configuration or permissions.
These, and other difficulties, mean that teams often prefer to delay database deployment until as late as possible in the development process, which is a mistake. In fact, they must plan it carefully, script the entire process, and automate it as far as possible, with all necessary component parts, including configuration details, stored in a version control system.
Over the course of this short series of articles, I describe the important factors to consider when dealing with the code for a database deployment, and my solution to the problems. It comprises a set of scripts glued together with environment-aware configuration files and, as a whole, offers a database development and maintenance framework that supports an evolutionary (incremental) database development approach, and so fits well into an “agile” development methodology.
In this first article, I’ll review some common techniques for deploying new databases and upgrading existing ones, and their flaws, and the advantages of an automated, incremental, script-based approach to deployments.
New Database Deployments
In the database lifecycle, there are two major types of deployment tasks: the deployment of a new database and an upgrade of the existing one (plus other tasks, such as the data migration, or data export).
The database comes alive when we deploy it for the first time into the target environment and so let’s start by reviewing a few common approaches to this process.
Direct Database Creation
Sometimes developers deploy a new database directly into a “production” environment, creating all the objects manually, through SSMS. This is appropriate only for workgroup-based systems shared between few users.
Development Database Copy (Backup/Restore)
Developers create a copy of the database with all the objects and initial data. The deployment is merely a copy operation between development (staging) and production environments.
There are several reasons why this approach is not applicable generally, perhaps the most significant being the underlying assumption that it’s acceptable to copy around production data, possibly containing sensitive information, to various, possibly insecure development environments.
Abstract Database Definition
In projects concerned with portability, the database is supposed to be vendor-independent and so it becomes important to avoid all vendor-specific functionality in the database. We standardize the database structure, as far as possible, describe in a generic form, and apply it to specific database servers, via a built-in application process.
While this is a viable approach, I will not consider it further in this article, because the complexity inherent in such an approach in turn requires a database deployment solution more complex that one based on simple scripting.
Scripting Database Creation
In some cases, the development team may decide to promote the database between environments by generating a database deployment script from the development copy. They can use SMO Scripting, SSMS, or a tool like Red Gate’s SQL Compare to script the development database into a single script. They can also script the data (i.e. the system and reference data), or load it from an external data source (Excel or other files).
The benefit is that a single script is easy to apply to the target environment and it is easy to put it in source control. The problem is that all objects are contained in the same, often rather big file. Collaborative modification of such a file is usually complicated.
At the time of deployment, complications can arise if it proves to be necessary to modify or supplement the database script to accommodate the differences between the development and target environments. For example, we may need to modify the security settings (object permissions) to reflect different user names, logins, and so on, or the names and properties of the database files. We can anticipate such problems by ensuring that we configure all the environments identically. It does not guarantee problem-free deployment, but at least gives some process repeatability.
In this approach, we script each database object into a separate file, or extract each script directly from source control. Again, we can script the data or load it from an external data source.
One of the major questions is how to apply those scripts, specifically in what order. The simplest option is to create a batch file and execute them in the preferred order. A slightly more advanced option is to use a configuration-driven tool, for instance a batch file supplied with a list of files to apply. In absence of such a tool, the only option is to document each step and ask the DBA to execute the scripts in the described order. Although this might work, it is a task crying out for automation.
Again, any deployment of a database from multiple scripts, automated or otherwise, will need to deal with issues of differing target environment configurations, such as file layout, size and growth options, and security options.
Once a database is in a production environment, we cannot, usually, simply drop it and recreate it using the new version of the code. From that moment, the database support team must provide a reliable and repeatable way to promote any database changes.
Usually, we can manage the required changes to a database schema, from one version to another, without any ‘explicit migration’, or special scripted steps, being required for adapting the data, in order to preserve it in the new schema. A table alteration is generally benign, and the addition of a column is not going to result in data loss. However, deletion of a column will. A synchronization tool will usually infer how to preserve data and thereby avoid the need for a manual script to perform the migration of the data. It will perform an implicit migration of data into the updated schema
Where there is an obvious single solution to preserving data, a synchronization tool can do any necessary implicit migration of data as part of the synchronization process. There are, however, certain schema changes that leave too many questions and so require an explicit migration script. To work reliably in a way that preserves data, your database comparison tool needs to be able to run hand-cut scripts instead of the auto-generated ones, to be able to progress, and roll-back transitions from one version of a database to another when a refactoring process has made automatic synchronization impossible.
Compare and Synchronize Schemas and Data
We can upgrade a database by comparing the schema of the production database with that of the staging, or development, database and producing a script to identify differences in database structure and data. We can then execute TSQL scripts against multiple servers to synchronize them. There are very good Schema Comparison and Data Comparison tools available from Red Gate and other vendors, but generating a synchronization script is only the first stage, or at least it should be just the first stage; in practice, developers often execute the change script generated by the tool immediately.
It scares me to think that a tool alone is making the decisions about what column to add or to drop during the production deployment, without a person to analyze and approve such changes. Someone with intimate knowledge of the deployed database must analyze the result of the synchronization process, comparing the database structure before and after, in order to verify that the resulting status is acceptable and all the differences are either expected or tolerable. Such differences might arise in user permissions, or in the names of objects, such as indexes or constraints, inserted due to it being a participant in a replication process, or auto-named objects concerned with declarative referential integrity (DRI), where the name varies from build to build. The resulting modifications to the deployments scripts may be either time-consuming when applied to the production size database, or simply disregarded as not important. Moreover, discovering the differences at the very last moment makes each deployment unique and unrepeatable; we cannot be confident of a consistent result.
Perhaps the biggest problem with this approach is that the scripts produced by such tools are applicable to a specific combination of source and target database. I’ve seen cases where someone directly modified the target database after generation of the synchronization script, rendering it invalid for the new deployment.
Additional considerations are as follows:
- Installation and security – either we must install the synchronization tool on the target machine, or the DBA’s machine must have access to the target database server. In some cases, the production environment (along with some other high-level ones such as UAT or Performance Test) may be locked-down environments meaning it’s not possible to generate scripts by comparing to the live database.
- Dealing with radical DDL changes – for example, if there is a table-split, and the existing production data differs from the staging data, there must be extra scripting to ensure that the production data is migrated to the new data model
- Promoting changes to or from the wrong database – I’ve worked in teams where all developers and testers created their environments from copies of the same Virtual PC. In such cases, it can be surprisingly easy to connect to the wrong database by mistake.
“Apply These Scripts…”
The second option is a variation of the first, but we perform the DDL comparison in advance, and generate scripts for review, rather than synchronizing through a schema comparison tool. In other words, developers prepare a set of database modification scripts and pass them to the DBA/deployment team, along with instructions. Sometimes the DBA team will track each applied change in a change management system (CMS), though in most cases, its usage is limited to process logging/tracking. Some teams may use a source control system to preserve the scripts, along with additional information (such as deployment instructions).
With well-established change management procedures, this approach delivers stable, repeatable results but still requires DBA resources. However, in my experience, this process is often fairly ad-hoc, and disorganized, with deployment instructions dispersed over an email thread or IM, or, in the worst case, extracted from the developer’s memory over a phone call. This can severely limit the repeatability of the deployment process.
Change Log-based Automated Upgrade
In this approach, we use a Change Log to track all changes made to the database structure and its static data. In essence, the change log organizes change scripts into a structure driving the deployment. It looks something like this:
<Build Number="0.0.0.0" />
<Build Number="0.0.0.1" />
id="dbo.UserActivity" table added to TrackingDB"
FileName="\0.1.0.3\dbo.UserActivity table added.sql"
id="dbo.Events" table indexes added"
Type="sql" DBName=" TrackingDB"
FileName="\0.1.0.3\dbo.Events table indexes added.sql"
<Build Number="0.1.4.0" />
This change log defines which DDL scripts to apply, based on the required database version (or label of some sort), in order to update the target database from the current build to the new one. It must also redeploy all stored procedures, and redefine permissions as described in associated configuration files. Upon success, it would stamp the database with the new build number.
This, simply put, is a “Press This Big Button” approach. It allows the deployment team to apply all the changes without any knowledge of the database internals and the database itself. The deployment automation utility simply reports success or failure of the execution, so the administrator has just enough information. We can log the additional information into a file for later analysis, as needed.
Each individual developer produces each change script in his or her preferred manner:
- Write the script manually
- Modify a local database copy in SSMS and then produce a difference script with SQL Compare or SSMS scripting facilities
- Modify the source ERwin model and generate the new version of the database, then derive the database modification script
The important part if that each script then goes into Source Control System (SCS), along with an associated entry in the Change Log, so that the SCS contains a series of the database modification scripts linked together with the change log file. Whoever applies these changes needs to simply download the latest version of the database code and run the database upgrade tool.
These scripts form part of a database upgrade package, containing:
- All the scripts required for upgrade a database from all of previous versions
- A list of changes (a.k.a. Change Log), in a form of a configuration file
- A deployment utility to upgrade the database
- Additional configuration files to define security configuration, database and server settings, and so on, independently of the code and schema change scripts.
All the files included in the package must be stored in a source control system, with all the benefits that brings. That system becomes the one and only source of the database changes for everybody; nobody should make changes to the target database in any other way. These, in essence, are the mechanisms that underpin my automated deployment framework, which I’ll go on to describe in subsequent articles.
Scripting Databases vs. Shared Database Development
In my experience, the two main database development models in common use are:
- Shared live database, no source control – in its most common form, all developers work on a central copy of the database
- Dedicated development databases with Source Control – each developer works on his or her own copy of the database generating a script for each change, which is checked into Source Control
A Critique of the “Shared Live” Model
It is common to find small-scale development projects in which only one copy of the database exists. We create a copy of the database, typically by restoring a backup of the production database. All developers then make their changes directly to this database. At an agreed milestone, the developers generate a deployment script, using comparison tools, to migrate the final state of the database to a testing environment. Typically, the live, shared database is the central reference point for a database deployment. I see several flaws with this approach, and Troy Hunt discusses several others in his article, here: http://www.simple-talk.com/sql/sql-tools/the-unnecessary-evil-of-the-shared-development-database/.
The live database (especially a shared one) is too fragile and vulnerable to be a reference point. It is too easy to make a change in the database. It is complicated to track the changes and even more difficult to reproduce or replicate those changes into another environment. Once we allow direct, ad-hoc database changes we can never assume it to be in a stable and reliable state.
Yes, one can put the whole database, its backup file under a source control. However, that only preserves a snapshot of the database. To answer a simple question such as “what has changed between these two versions?” we need to restore the database backup and to use a special tool (such as Red-Gate SQL Compare and Data Compare).
Proper Rollback is impossible
It must be possible for the deployment application, or script, to query the current version of a database so that it can apply the correct tested script to roll forward or roll back the database, whilst preserving the data within it.
When generating change scripts from a live database, rolling back a database change is never a pure rollback but rather a compensation for the previously made erroneous steps.
If we generate the migration script by comparing one live database to another, we have a script that is useful in exactly one database deployment. What if, as often happens, the DBA must propagate the latest changes rapidly to many servers, whilst keeping these servers in a consistent state?
The live database lacks any information about other environments in which another version of the database may exist. What the DBA needs is a deployment technique whereby by simply specifying the target environment, the changes can be deployed as appropriate for than environment i.e. taking into account any differences in security settings and so on.
Advantages of an Automated, Scripting Approach
In my opinion, database scripting, when combined with appropriate automation, simplifies database development and maintenance. If we script a database into text files, we can then put them into a version control system where we can track, roll back or re-order changes. We can detect easily the differences between the database builds using common source control tools. We can automate deployment of those scripts to a point where there is no need for a DBA to create the deployment script/batch file, manually, each time.
By making the source-controlled version the reference point for a deployment, rather than a live, shared development database, we also get the following benefits, among others:
- Having as many copies of the database as one wants without bothering DBAs
- Switching from shared database development environment to isolated development boxes
- Rather than have environment-specific details, such as file layout and characteristics or security settings (user logins in particular), buried in the code for a particular deployment, we can separate them into configuration files, making the actual deployment code much more reusable, when a database exists in several environments.
- The database change process becomes evolutionary – a series of smaller, incremental steps – and therefore much more controllable
- We can group parts of the database into sub-projects. Database functionality may be spread between tables/stored procedures/UDFs and other objects. Seeing them in the database together is quite difficult but having them scripted together in a folder is much more convenient
In this article, I described the various approaches to database deployments, and explained why I advocate an automated, evolutionary, script-based approach, using a version control system. In the next article, I’ll explain in more detail the goals and advantages of an automated, script-based framework for database deployments, and its most important components.