30 January 2013

Database Deployment Challenges

Traditionally, Database Deployment is a process that isn't associated with smiles, bonhomie and tranquility. There are a number of challenges that make the task more difficult. Alex reviews the common techniques for deploying new databases and upgrading existing ones, and their flaws, and argues the advantages of an automated, incremental, script-based approach to deployments

2091-stage4.png

DATABASE LIFECYCLE MANAGEMENT PATTERNS & PRACTICES LIBRARY

Automated Deployment

Many development teams give insufficient thought to how they plan to deploy a database in an IT project, both in terms of the initial deployment, into various environments, and the subsequent and inevitable “fixes” to the database’s structure and configuration during upgrades. The problem stems from the fact that, often, the database is not subject to the same rigorous versioning and testing procedures as the application code. With no version control, developers attempt to create a database deployment script from the moving target of a live development database. It is a very fragile, generally non-repeatable process. It is also very difficult to work out ‘what changed’ between deployments, or to rollback a previous working build in the face of problems.

When database version control exists, the situation is better but the build and deploy process still presents a significant obstacle. It assumes that, for each deployment, the DBA collects all the changes, and applies the scripts in the correct order to create (or recreate) the database, while maintaining a history of these changes. Too often, this is a frustratingly manual process, often involving a desperate search of a developer’s machine for some changes that weren’t checked in, and frustration when a deployment that works for one environment fails in another, due to differences in configuration or permissions.

These, and other difficulties, mean that teams often prefer to delay database deployment until as late as possible in the development process, which is a mistake. In fact, they must plan it carefully, script the entire process, and automate it as far as possible, with all necessary component parts, including configuration details, stored in a version control system.

Over the course of this short series of articles, I describe the important factors to consider when dealing with the code for a database deployment, and my solution to the problems. It comprises a set of scripts glued together with environment-aware configuration files and, as a whole, offers a database development and maintenance framework that supports an evolutionary (incremental) database development approach, and so fits well into an “agile” development methodology.

In this first article, I’ll review some common techniques for deploying new databases and upgrading existing ones, and their flaws, and the advantages of an automated, incremental, script-based approach to deployments.

New Database Deployments

In the database lifecycle, there are two major types of deployment tasks: the deployment of a new database and an upgrade of the existing one (plus other tasks, such as the data migration, or data export).

The database comes alive when we deploy it for the first time into the target environment and so let’s start by reviewing a few common approaches to this process.

Direct Database Creation

Sometimes developers deploy a new database directly into a “production” environment, creating all the objects manually, through SSMS. This is appropriate only for workgroup-based systems shared between few users.

Development Database Copy (Backup/Restore)

Developers create a copy of the database with all the objects and initial data. The deployment is merely a copy operation between development (staging) and production environments.

There are several reasons why this approach is not applicable generally, perhaps the most significant being the underlying assumption that it’s acceptable to copy around production data, possibly containing sensitive information, to various, possibly insecure development environments.

Abstract Database Definition

In projects concerned with portability, the database is supposed to be vendor-independent and so it becomes important to avoid all vendor-specific functionality in the database. We standardize the database structure, as far as possible, describe in a generic form, and apply it to specific database servers, via a built-in application process.

While this is a viable approach, I will not consider it further in this article, because the complexity inherent in such an approach in turn requires a database deployment solution more complex that one based on simple scripting.

Scripting Database Creation

In some cases, the development team may decide to promote the database between environments by generating a database deployment script from the development copy. They can use SMO Scripting, SSMS, or a tool like Red Gate’s SQL Compare to script the development database into a single script. They can also script the data (i.e. the system and reference data), or load it from an external data source (Excel or other files).

Single Script

The benefit is that a single script is easy to apply to the target environment and it is easy to put it in source control. The problem is that all objects are contained in the same, often rather big file. Collaborative modification of such a file is usually complicated.

At the time of deployment, complications can arise if it proves to be necessary to modify or supplement the database script to accommodate the differences between the development and target environments. For example, we may need to modify the security settings (object permissions) to reflect different user names, logins, and so on, or the names and properties of the database files. We can anticipate such problems by ensuring that we configure all the environments identically. It does not guarantee problem-free deployment, but at least gives some process repeatability.

Multiple Scripts

In this approach, we script each database object into a separate file, or extract each script directly from source control. Again, we can script the data or load it from an external data source.

One of the major questions is how to apply those scripts, specifically in what order. The simplest option is to create a batch file and execute them in the preferred order. A slightly more advanced option is to use a configuration-driven tool, for instance a batch file supplied with a list of files to apply. In absence of such a tool, the only option is to document each step and ask the DBA to execute the scripts in the described order. Although this might work, it is a task crying out for automation.

Again, any deployment of a database from multiple scripts, automated or otherwise, will need to deal with issues of differing target environment configurations, such as file layout, size and growth options, and security options.

Database Upgrades

Once a database is in a production environment, we cannot, usually, simply drop it and recreate it using the new version of the code. From that moment, the database support team must provide a reliable and repeatable way to promote any database changes.

Usually, we can manage the required changes to a database schema, from one version to another, without any ‘explicit migration’, or special scripted steps, being required for adapting the data, in order to preserve it in the new schema. A table alteration is generally benign, and the addition of a column is not going to result in data loss. However, deletion of a column will. A synchronization tool will usually infer how to preserve data and thereby avoid the need for a manual script to perform the migration of the data. It will perform an implicit migration of data into the updated schema

Where there is an obvious single solution to preserving data, a synchronization tool can do any necessary implicit migration of data as part of the synchronization process. There are, however, certain schema changes that leave too many questions and so require an explicit migration script. To work reliably in a way that preserves data, your database comparison tool needs to be able to run hand-cut scripts instead of the auto-generated ones, to be able to progress, and roll-back transitions from one version of a database to another when a refactoring process has made automatic synchronization impossible.

Compare and Synchronize Schemas and Data

We can upgrade a database by comparing the schema of the production database with that of the staging, or development, database and producing a script to identify differences in database structure and data. We can then execute TSQL scripts against multiple servers to synchronize them. There are very good Schema Comparison and Data Comparison tools available from Red Gate and other vendors, but generating a synchronization script is only the first stage, or at least it should be just the first stage; in practice, developers often execute the change script generated by the tool immediately.

It scares me to think that a tool alone is making the decisions about what column to add or to drop during the production deployment, without a person to analyze and approve such changes. Someone with intimate knowledge of the deployed database must analyze the result of the synchronization process, comparing the database structure before and after, in order to verify that the resulting status is acceptable and all the differences are either expected or tolerable. Such differences might arise in user permissions, or in the names of objects, such as indexes or constraints, inserted due to it being a participant in a replication process, or auto-named objects concerned with declarative referential integrity (DRI), where the name varies from build to build. The resulting modifications to the deployments scripts may be either time-consuming when applied to the production size database, or simply disregarded as not important. Moreover, discovering the differences at the very last moment makes each deployment unique and unrepeatable; we cannot be confident of a consistent result.

Perhaps the biggest problem with this approach is that the scripts produced by such tools are applicable to a specific combination of source and target database. I’ve seen cases where someone directly modified the target database after generation of the synchronization script, rendering it invalid for the new deployment.

Additional considerations are as follows:

  • Installation and security – either we must install the synchronization tool on the target machine, or the DBA’s machine must have access to the target database server. In some cases, the production environment (along with some other high-level ones such as UAT or Performance Test) may be locked-down environments meaning it’s not possible to generate scripts by comparing to the live database.
  • Dealing with radical DDL changes – for example, if there is a table-split, and the existing production data differs from the staging data, there must be extra scripting to ensure that the production data is migrated to the new data model
  • Promoting changes to or from the wrong database – I’ve worked in teams where all developers and testers created their environments from copies of the same Virtual PC. In such cases, it can be surprisingly easy to connect to the wrong database by mistake.

“Apply These Scripts…”

The second option is a variation of the first, but we perform the DDL comparison in advance, and generate scripts for review, rather than synchronizing through a schema comparison tool. In other words, developers prepare a set of database modification scripts and pass them to the DBA/deployment team, along with instructions. Sometimes the DBA team will track each applied change in a change management system (CMS), though in most cases, its usage is limited to process logging/tracking. Some teams may use a source control system to preserve the scripts, along with additional information (such as deployment instructions).

With well-established change management procedures, this approach delivers stable, repeatable results but still requires DBA resources. However, in my experience, this process is often fairly ad-hoc, and disorganized, with deployment instructions dispersed over an email thread or IM, or, in the worst case, extracted from the developer’s memory over a phone call. This can severely limit the repeatability of the deployment process.

Change Log-based Automated Upgrade

In this approach, we use a Change Log to track all changes made to the database structure and its static data. In essence, the change log organizes change scripts into a structure driving the deployment. It looks something like this:

This change log defines which DDL scripts to apply, based on the required database version (or label of some sort), in order to update the target database from the current build to the new one. It must also redeploy all stored procedures, and redefine permissions as described in associated configuration files. Upon success, it would stamp the database with the new build number.

This, simply put, is a “Press This Big Button” approach. It allows the deployment team to apply all the changes without any knowledge of the database internals and the database itself. The deployment automation utility simply reports success or failure of the execution, so the administrator has just enough information. We can log the additional information into a file for later analysis, as needed.

Each individual developer produces each change script in his or her preferred manner:

  • Write the script manually
  • Modify a local database copy in SSMS and then produce a difference script with SQL Compare or SSMS scripting facilities
  • Modify the source ERwin model and generate the new version of the database, then derive the database modification script

The important part if that each script then goes into Source Control System (SCS), along with an associated entry in the Change Log, so that the SCS contains a series of the database modification scripts linked together with the change log file. Whoever applies these changes needs to simply download the latest version of the database code and run the database upgrade tool.

These scripts form part of a database upgrade package, containing:

  • All the scripts required for upgrade a database from all of previous versions
  • A list of changes (a.k.a. Change Log), in a form of a configuration file
  • A deployment utility to upgrade the database
  • Additional configuration files to define security configuration, database and server settings, and so on, independently of the code and schema change scripts.

All the files included in the package must be stored in a source control system, with all the benefits that brings. That system becomes the one and only source of the database changes for everybody; nobody should make changes to the target database in any other way. These, in essence, are the mechanisms that underpin my automated deployment framework, which I’ll go on to describe in subsequent articles.

Scripting Databases vs. Shared Database Development

In my experience, the two main database development models in common use are:

  • Shared live database, no source control – in its most common form, all developers work on a central copy of the database
  • Dedicated development databases with Source Control – each developer works on his or her own copy of the database generating a script for each change, which is checked into Source Control

A Critique of the “Shared Live” Model

It is common to find small-scale development projects in which only one copy of the database exists. We create a copy of the database, typically by restoring a backup of the production database. All developers then make their changes directly to this database. At an agreed milestone, the developers generate a deployment script, using comparison tools, to migrate the final state of the database to a testing environment. Typically, the live, shared database is the central reference point for a database deployment. I see several flaws with this approach, and Troy Hunt discusses several others in his article, here: http://www.simple-talk.com/sql/sql-tools/the-unnecessary-evil-of-the-shared-development-database/.

Fragility

The live database (especially a shared one) is too fragile and vulnerable to be a reference point. It is too easy to make a change in the database. It is complicated to track the changes and even more difficult to reproduce or replicate those changes into another environment. Once we allow direct, ad-hoc database changes we can never assume it to be in a stable and reliable state.

Yes, one can put the whole database, its backup file under a source control. However, that only preserves a snapshot of the database. To answer a simple question such as “what has changed between these two versions?” we need to restore the database backup and to use a special tool (such as Red-Gate SQL Compare and Data Compare).

Proper Rollback is impossible

It must be possible for the deployment application, or script, to query the current version of a database so that it can apply the correct tested script to roll forward or roll back the database, whilst preserving the data within it.

When generating change scripts from a live database, rolling back a database change is never a pure rollback but rather a compensation for the previously made erroneous steps.

Non-repeatable Deployments

If we generate the migration script by comparing one live database to another, we have a script that is useful in exactly one database deployment. What if, as often happens, the DBA must propagate the latest changes rapidly to many servers, whilst keeping these servers in a consistent state?

The live database lacks any information about other environments in which another version of the database may exist. What the DBA needs is a deployment technique whereby by simply specifying the target environment, the changes can be deployed as appropriate for than environment i.e. taking into account any differences in security settings and so on.

Advantages of an Automated, Scripting Approach

In my opinion, database scripting, when combined with appropriate automation, simplifies database development and maintenance. If we script a database into text files, we can then put them into a version control system where we can track, roll back or re-order changes. We can detect easily the differences between the database builds using common source control tools. We can automate deployment of those scripts to a point where there is no need for a DBA to create the deployment script/batch file, manually, each time.

By making the source-controlled version the reference point for a deployment, rather than a live, shared development database, we also get the following benefits, among others:

  • Having as many copies of the database as one wants without bothering DBAs
  • Switching from shared database development environment to isolated development boxes
  • Rather than have environment-specific details, such as file layout and characteristics or security settings (user logins in particular), buried in the code for a particular deployment, we can separate them into configuration files, making the actual deployment code much more reusable, when a database exists in several environments.
  • The database change process becomes evolutionary – a series of smaller, incremental steps – and therefore much more controllable
  • We can group parts of the database into sub-projects. Database functionality may be spread between tables/stored procedures/UDFs and other objects. Seeing them in the database together is quite difficult but having them scripted together in a folder is much more convenient

Summary

In this article, I described the various approaches to database deployments, and explained why I advocate an automated, evolutionary, script-based approach, using a version control system. In the next article, I’ll explain in more detail the goals and advantages of an automated, script-based framework for database deployments, and its most important components.

This article is part of our Database Lifecycle Management patterns & practices.

Return to the library to find more articles on DLM, or visit red-gate.com/dlm for more information about SQL Server products for database lifecycle management.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 26967 times – thanks for reading.

  • Rate
    [Total: 32    Average: 4/5]
  • Share

Alexander Karmanov

View all articles by Alexander Karmanov

Related articles

Also in AWS

PowerShell One-Liners: Variables, Parameters, Properties, and Objects

PowerShell isn't a conventional language, though it draws inspiration widely. Many people learn it, and use it, best by collecting snippets, or one-liners, and adapting them for use. Michael Sorens provides the second in a series of collections of general-purpose one-liners to cover most of what you'll need to get useful scripting done.… Read more

Also in Database

Relational Algebra and its implications for NoSQL databases

With the rise of NoSQL databases that are exploiting aspects of SQL for querying, and are embracing full transactionality, is there a danger of the data-document model's hierarchical nature causing a fundamental conflict with relational theory? We asked our relational expert, Hugh Bin-Haad to expound a difficult area for database theorists.… Read more

Also in Database Administration

The SQL Server 2016 Query Store: Forcing Execution Plans using the Query Store

The SQL Server 2016 Query Store can give you valuable performance insights by providing several new ways of troubleshooting queries, studying their plans, exploring their context settings, and checking their performance metrics. However, it can also directly affect the performance of queries by forcing Execution Plans for specific queries.… Read more

Also in database deployments

Rapid SQL Server Database Deployment

With the right tools, it is possible to rapidly deploy upgraded databases. By deploying early in development, it's easier to iron out any potential problems that the new deployment could cause in production long before it's time to actually release the software. Tony Davis explains.… Read more
  • Philip Kelley

    A series of articles to watch for
    Looks like a good start to an interesting series of articles. I’ve used a series-of-scripts update methodology for years, and often compare it with accounting journal entries: treat it as a series of entries done in a specific sequence where you can never go back and undo an action (such as, pretend you never withdrew that $100) — instead, you have to make correcting entries to "undo" prior actions.

    I look forward to seeing how you handle the need to roll back Production database changes.

  • WBrewer

    Automated deployment tools
    Great article!
    There is an interesting white paper that can be downloaded here from the Red Gate site …Automated deployment using Red Gate tools (PDF)
    This describes the technical details of a new tool, SQL Deployment Manager, that Red Gate is actively developing. From what I’ve seen of it, it should be able to get around many of these problems you mention.
    William.

  • Alexander Karmanov

    To Philip Kelley
    Yes, that’s a very good analogy.

    >> I look forward to seeing how you handle the need to roll back Production database changes.

    Ha, that’s simple – rollback the database(s) to the point prior to the deployment, if you can.
    If not – quickly hack up a correction and deploy it. Your accounting journal analogy suits here perfectly.

    Originally I did have rollback entries in the change log, so for each step forward I could have a step back, and it did work.
    Luckily there was a very wise operations chief which pointed out that nobody would guarantee that the system is working after such rollback. Nobody would do the whole system regression test for the system after rolling update back, which may not happen… So I removed that option wishing I consulted that man before.

  • Alexander Karmanov

    To WBrewer
    Thanks, I will read it.
    Mind it didn’t exist six years ago when I created mine…

  • Phil Factor

    Keeping a Ledger of script deployments
    @Alex @Philip
    Philip has a good point. The versioning history has to be like a ledger. A deployment cannot be wiped from the record if there has been any refactoring, and any data has changed between deployment and rollback.
    I’ve made myself unpopular in the past by insisting on tested rollback scripts. If it is a simple deployment without messy refactoring (I don’t get many of these) then the database reverts to a known version that will pass that version’s tests. However, if you rollback a live system, there is normally new data that has no possible place in the previous version so a pure ‘rollback’ would lose that data. You usually cope with added tables to hold that data, but the version isn’t strictly the old version since the metadata has changed. You can’t really manufacture a new version, so this fact has to be recorded in the ‘ledger’ so that the next deployment can know that it must deal satisfactorily with the orphan data.

  • paschott

    Using SSDT – modern tools are a huge improvement
    We started using SSDT for our releases. It (and Red-Gate’s Source Control) has the advantage that you tell the DB what you want it to look like and it handles the appropriate changes to make it look like that. If we want, we can script those changes as well so we can review them before making the changes. I went through the "save a bunch of scripts" methodology and it worked well until we then had to cherry-pick changes out of the release folder because not everything could go at that time. Weeks of poring through scripts, test runs, tests, rollbacks, repeats – I don’t want to go through that again.

    It’s a tricky thing to handle, but I’ll admit that I like the tools available now as opposed to saving lots of scripts. Besides, I can generate a release package and hand that off to someone else to push the changes if necessary – no worries about something else getting in or being left out. The same package pushed to our pre-prod servers is pushed to production so all is the same once released.

  • Alexander Karmanov

    SSDT
    2paschott:
    >>tell the DB what you want it to look like and it handles the appropriate changes to make it look like that. If we want, we can script those changes as well so we can review them before making the changes.

    That’s exactly what I loath to do. At the moment of the database update the least thing I want is to think why the hell there is such a difference and who did it. I can think of it during the development and initial test deployments but not afterwards. Unfortunately no tool provides that level of assurance yet. Our development cycle could stretch over few months, so for me it is simply impossible to remember all the modifications and why there were done.

    >>It’s a tricky thing to handle, but I’ll admit that I like the tools available now as opposed to saving lots of scripts.

    Once you grasp the idea – the rest is simply implementation… with any tool or process. Provided there is an idea behind it.

    >>Besides, I can generate a release package and hand that off to someone else to push the changes if necessary – no worries about something else getting in or being left out. The same package pushed to our pre-prod servers is pushed to production so all is the same once released.

    And to do it for each release? No thanks. With my approach the TFS trunk contains the current stable release, so whoever can take and update whichever environment. No need even to bother me.

  • erich8

    Automated SQL generation tool
    I know this article is a few years old, but I found it when researching how to improve on our current database change management process.

    We currently use DBSteward (https://github.com/nkiraly/DBSteward), which is an open-source tool for automating the SQL generation portion of database change automation. The full database schema and static data are defined in XML files stored in version control, and DBSteward compares two different versions (branches) of those database definition files to generate the SQL necessary to transform one version into the next (or conversely, to roll-back). It’s also used in test and development environments to build fresh copies of the database as part of our CI.

    However, DBSteward isn’t a full-fledged solution like the RedGate products. You still need to integrate the SQL generation calls into your build process, and the packaging and execution of the generated scripts into your deployment process. But it has allowed us to nearly eliminate developer-written SQL and all the problems that come along with managing those changes (though the process still allows for custom statements or code to be executed between or within the upgrade stages).

    I’d be interested to see what Redgate’s DLM Automation Suite could do, but we’re currently using PostgreSQL exclusively.

Join Simple Talk

Join over 200,000 Microsoft professionals, and get full, free access to technical articles, our twice-monthly Simple Talk newsletter, and free SQL tools.

Sign up