17 June 2014

Using Migration Scripts in Database Deployments

A SQL migration script is similar to a SQL build script, except that it changes a database from one version to another, rather than builds it from scratch. Although they're simple in essence, it is worth knowing how to use them effectively for stress-free database updates and deployments. It takes care to get them right, but it is worth the trouble.

2091-stage4.png

DATABASE LIFECYCLE MANAGEMENT PATTERNS & PRACTICES LIBRARY

Automated Deployment

What is a Migration Script?

Whereas a build script creates a database, a migration script, or ‘change’ script, alters a database. It is called a migration script because it changes all or part of a database from one version to another. It ‘migrates’ it between versions. This alteration can be as simple as adding or removing a column to a table, or a complex refactoring task such as splitting tables or changing column properties in a way that could affect the data it stores.

For every likely migration path between database versions, we need to store in version control the migration scripts that describe precisely those steps required to perform the change and, if necessary, moving data around and transforming it in the process

Migration scripts are of two types:

  • Automated Migration Script – this is a script generated by a synchronization tool such as SQL Compare and normally checked, and perhaps altered, by a developer or DBA.
  • Manual Migration Script – this is a script written by a database programmer. Sometimes it is impossible for a tool to create a migration script that will preserve the data in the correct place. Although it knows what the ‘before’ and ‘after’ versions of the database look like, there is sometimes ambiguity regarding the correct transition path. The classic example is that of renaming a table from A to B. We know it’s the same table with a different name, but a tool would likely perform the transition by dropping A, and any data it contains, and creating B. Of course, there are other, more subtle examples where there is not enough evidence in the source and target databases for the tool to be certain of where the data should go without human intervention.

Migration scripts can be forward or “up” migrations that go to a newer version of a database, or backward, or “down” migrations to fall back to a previous version of a database. Any manual migration script is generally used to move between consecutive versions of a database.

A migration script must be idempotent, in that it will have no additional effect if run more than once on the same database. It isn’t wise to change a script once it has been run successfully, and the same set of migration scripts should be run in every server environment where the deployment is taking place.

With this discipline in place, we have a much greater chance of error-free deployments. To bring a database up-to-date, for example, all that is necessary is to run in the correct order the “up” migration scripts that will transform the target database from its current version to the latest version in source control. A ‘build script’ is then merely a special case of a migration script that migrates the database from nowhere to the first version.

It is a good precaution to write integration tests to check that the migration scripts have worked, and it is also essential to be able to roll back to a known version if an error occurs during the build process.

Why some refactoring tasks can’t be automated

Although a synchronization tool can describe most migrations, there are some database changes that can’t be inferred by comparing the beginning and end states of the database, because the tool has insufficient information to know how to migrate the existing data.

Let’s consider a few examples.

Table renaming

A table named People in one database might be named Customers in another. Although they’re the same table and contain the same data, it is impossible to tell whether the two are the same table with any certainty. In assessing the differences between the databases, any automated process will interpret them as two different tables. The People table only exists in the target database, and the Customers table only exists in the source database, and an automatically-generated comparison script would drop the People table and create a new, empty table named Customers. When the script is run, the data in the People table is lost. To get around a great deal of work, it is better to replace this with a manual migration script that does the rename using sp_rename.

Splitting or merging columns

When we split a column, an automated migration script would likely create two or more new columns and drop the original column. Likewise, a column merge would create a new column and drop the original columns.

In either case, any data in the dropped column(s) will be lost.

Instead, we can create a manual migration script to create the new column(s), run custom SQL to move the data, and then drop the original column(s).

Changing the data type or size of a column

When we change a column’s data type (for example, from INT to SMALLINT), data may be truncated during deployment if the new data type doesn’t accommodate some of the rows. Similarly, changes to the size of some columns can result in truncation; for example, VARCHAR(50) to VARCHAR(20). We can create a manual migration script to modify rows appropriately that would otherwise be truncated.

Dealing with large data movements

Regardless of how we generate the migration script, we may not be able to run it “as is” on a large database, especially one that is in production use.

An ALTER statement on a heavily used OLTP table that is being actively updated can have serious repercussions. Any operation, for example, that acquires a schema-modification lock will prevent user access to the table or its metadata during the modification. Alterations to a table are fully logged and so I/O-intensive. If they require table locks, then all other active users of the table will be forced to wait.

Operations such as dropping a column, or adding a NOT NULL column with a default value, can take a long time to complete and will generate many log records.

Just as we might break into smaller batches INSERT, UPDATE, or DELETE statements against large active tables, that affect many rows, so large-scale data migrations in such circumstances are best done incrementally in discrete batches. However this doesn’t apply to metadata. An individual column insertion is a single operation and can’t be broken down and so, to avoid taking the database offline, would require a series of steps involving the creation of a new table, population with data,  and a subsequent rename, or ideally a synonym change. This process is usually done with indirection using changes of Synonyms because of their late-binding, or with partition-switching (Enterprise Edition only).

Writing Migration Scripts

Every migration script needs a way to check the preconditions in the environment in which it will run to make sure the script has the right and that it is idempotent, meaning that running it repeatedly will have no further effect. For this purpose, we include in every script a guard clause that will, for example, check to see whether the migration task has already been accomplished before continuing to run it.

We might also write a clause that checks whether the migration script is valid for the version of the target database. For example, if we know that a migration script is intended for version 1.5, we can have the script start by interrogating the current version number of the target database, and aborting if it isn’t “1.5”.

Unfortunately, there is no standard way of attaching a version to a SQL Server Database. The obvious place to store the current version is in the extended properties of the database, but these are not implemented in Windows Azure SQL Database (formerly SQL Azure, SQL Server Data Services, and later SQL Services). Therefore, it is probably safest to store them in a special-purpose table-valued function or view in the target database if your database has to be portable to Azure SQL.

The following sections review some typical IF EXISTS-style guard clauses.

Guard Clauses

Every migration script should include a guard clause that will stop the migration script making changes if it doesn’t need to be run. It is dangerous to assume that a script will only be run in the right preconditions. If you don’t include a guard clause, the migration script might fail or create unexpected changes in the database.

Guard clauses are standard practice for SQL Server scripting. When, for example, developing a stored procedure, one can use CREATE PROCEDURE if the procedure doesn’t exist, or the ALTER PROCEDURE if it does. The former will fail with an error if it already exists, and the latter will fail if it doesn’t. To prevent errors, a guard clause is generally inserted to delete the procedure first if it exists. After that, it is safe to use CREATE PROCEDURE. Since SQL Server 2008, it has been possible to do it a more elegant way that preserves existing permissions on the procedure, having the guard clause run SET NOEXEC ON if the procedure already exists, which then parses, but does not execute the CREATE statement that creates a blank procedure. This avoids deleting an object in a running system, and preserves all the object’s existing attributes when making a change to it. It also gets round the problem of doing a conditional ‘CREATE’ when it must be at the start of a batch. see Listing 1 in ‘Automating SQL Server Database Deployments: Scripting Details’.

In other circumstances, a guard clause can be used to abort a script if it determines that it has already been run, or to perform different logic depending on the database context in which the script is being run. In the following examples, print statements are used to show where the SQL Statements would be.

To check that a table exists

To check that a function exists

To check that a procedure exists

To check that a particular column exists in a table exists

To check whether a column has any sort of constraint

To check that a column has a check constraint

To check that a column has a foreign key constraint

To check that a column participates in a primary key

Examples of migration scripts

The following section provides typical examples of manual migration scripts for various common database refactoring tasks.

Updating a Stored Procedure

This technique preserves permissions on a routine (procedure or function) whilst doing the update. (SQL 2008 or above only)

Renaming a table

This migration renames the Person table to Customer. The script includes a guard clause to check that the table Person exists in the database. If the table doesn’t exist, no changes are made.

Updating a column definition

This migration updates the column definition in the table Widgets with a default value for existing rows before adding a NOT NULL constraint. It includes a guard clause to check that the Widgets table contains a Description column. If the column doesn’t exist, no changes are made.

Splitting a column

In this example, The Customer table has a column Address with the postcode embedded, which means that reports that determine regions based on postcode are difficult.

2002-1-bca8930d-567b-473f-89aa-e397b9802

Figure 1: An Address column with embedded postcodes

The manual migration script splits Address into the columns StreetAddress and PostCode, and updates existing rows. Fortunately, the extraction of the postcode is relatively easy since it is separated from the address by a semicolon character ‘;’

The script includes a guard clause to check that the Customer table contains an Address column. If the column doesn’t exist, then no changes are made. If the column does exist, the guard clause checks that the StreetAddress and PostCode columns also exist. If they don’t exist, the script creates the columns before updating the rows.

This would have the following affect when run:

2002-1-c60b496d-38ea-468a-8ef0-7387d5867

Figure 2: The refactored Customer table

It it was inadvertently run again, the output would be …

Moving Data

In this example, a table, PersonData, has two email columns in it, to try to accommodate the fact that some customers have two email addresses. It needs to be normalized, of course.

2002-1-2df7a4f0-17e5-4e64-99f1-9a41a1fe0

Figure 3: The PersonData table has two email columns

The migration script moves data from the table PersonData to the tables Person and Email. It includes a guard clause to check that the table PersonData exists in the database. If the table doesn’t exist, no changes are made. If the table exists, the guard clause checks that the Person and Email tables exist. If they don’t exist, the script creates the tables before updating the rows.

Here is the resulting table filled with the email addresses:

2002-1-c6f2457f-cfdc-4d18-b7c1-31cc722ee

Figure 4: The refactored PersonData table

Table refactoring

Imagine that we have inherited a database that hasn’t been properly normalized and needs refactoring. Data for customers is held in table named PersonData, which also contains all their contact details.

To normalize the table, we’ll break it into different tables in a new schema named Contacts:

We now want to migrate Email1 and Email2 from PersonData into Contacts.Email, creating two separate rows for a person having both Email1 and Email2. Similarly, we migrate the phone and address details.

After running the migration script, we can drop the original PersonData table as a separate commit.

Conclusions

Although much of the chore of synchronizing databases for a deployment can be done using a tool such as SQL Compare, there are times when a step has to be done by hand, either because it is a large table, or it isn’t unambiguously obvious how the existing data should be placed in the target revision of the database, in such a way as to be sure of preserving it.

For the automated deployment of databases, hand-crafted SQL migration scripts will occasionally have a place, and will be stored in version control to run before any automated process for deploying changes to the database. They should be written so that, if it is run at the wrong time, or run repeatedly, it will have no effect. Guard Clauses are a good way of ensuring that Murphy’s law will have no consequences.

This article is part of our Database Lifecycle Management patterns & practices.

Return to the library to find more articles on DLM, or visit red-gate.com/dlm for more information about SQL Server products for database lifecycle management.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 47814 times – thanks for reading.

  • Rate
    [Total: 37    Average: 4.4/5]
  • Share

Jonathan Hickford

View all articles by Jonathan Hickford

Related articles

Also in Database

Relational Algebra and its implications for NoSQL databases

With the rise of NoSQL databases that are exploiting aspects of SQL for querying, and are embracing full transactionality, is there a danger of the data-document model's hierarchical nature causing a fundamental conflict with relational theory? We asked our relational expert, Hugh Bin-Haad to expound a difficult area for database theorists.… Read more

Also in Database Administration

The SQL Server 2016 Query Store: Forcing Execution Plans using the Query Store

The SQL Server 2016 Query Store can give you valuable performance insights by providing several new ways of troubleshooting queries, studying their plans, exploring their context settings, and checking their performance metrics. However, it can also directly affect the performance of queries by forcing Execution Plans for specific queries.… Read more

Also in Source control

PowerShell Desired State Configuration: LCM and Push Management Model

PowerShell's Desired State Configuration (DSC) framework depends on the Local Configuration Manager (LCM) which has a central role in a DSC architecture. It runs on all nodes that have PowerShell 4.0 or above installed in order to control the execution of DSC configurations on target nodes. Nicolas Prigent illustrates the role of the LCM in the 'Push' mode of configuring nodes.… Read more

Also in SQL

SQL Server System Functions: The Basics

Every SQL Server Database programmer needs to be familiar with the System Functions. These range from the sublime (such as @@rowcount or @@identity) to the ridiculous (IsNumeric()) Robert Sheldon provides an overview of the most commonly used of them.… Read more
  • paschott

    Good read
    Only thing I’d say you don’t have to worry about as much w/ tools are the rename operations. I know SSDT offers a pretty decent renaming handler and creates a "refactor log" table to store those rename operations. Of course, it could be pretty basic just to say if table "ABC" exists, rename to "DEF", but it’s still there. I thought Red Gate offered something similar as well, but don’t know the Source Control product enough to say for sure.

    The majority of the other cases – I agree that those do come up from time to time and need special handling. I rarely look forward to trying to do those sorts of manipulations just because they’re complex and run the risk of breaking something or losing data. I appreciate you pointing out the different scenarios and providing example scripts for handling them.

  • Jonathan.Hickford

    Thanks Peter
    Thanks for your comments. You’re right there is a feature within SQL Source Control to add these migration scripts, in order to override the default behavior of the automated comparison engine. The same migrations are then honored by other Red Gate tools (SQL Compare, SQL CI, Deployment Manager) when deploying elsewhere. More details on http://www.red-gate.com/migrations if you want to find out more.

  • faustkeil

    ID Migration
    Hej,

    at our company it is often the case, that when we migrate data from db a to db b, we need to take care of identities from database a and move them proper to database b. so for ex we have a table "gender" with id 1 = male, id 2 = female, but in the destination database we have id 1 = female and id 2 = male.
    so you could set identity insert on and update the values but how do you handle the corresponding references let’s say in table adress where the fk of the gender id is? and in our scenario, we have plenty of foreign tables that point to the gender table, so all tables need to be updated with the new values in order to have correct data afterwards.

  • ToddHarry

    Dealing with script size and long-term retention
    Jonathan:

    I want you to know that I’ve read and re-read this article a few times. I enjoyed your article immensely, as we struggle with some of the same issues you addressed. I do wonder if you have a suggestion I can follow on storing scripts like the Pre- and Post-Deploy scripts you mentioned in this article.

    First some quick background, so you can see what I’m working through: at my shop, we use Team Foundation Server for source control and Visual Studio for development. I’m the only developer currently deploying database changes via the Deploy option of the SQL Server Database Project in my application’s VS solution file. Everyone else just hand-rolls TSQL scripts and checks-in those scripts to whatever "DB Stuff" folder they produce in TFS (so the database is defined outside of the solution), OR they just don’t version control the database at all (which I’m trying to sway the culture away from! 🙂

    Things are working well for me so far… I’ve been able to make slight table changes, stored procedure logic changes, etc. Recently however, I realized I needed to change the relationship between a few tables, and so that I don’t lose any data, I’ve decided to go with a Pre- and Post-Deployment script as you detailed above to manually drop the constraints, alter the tables, migrate the data, and setup new constraints.

    The problem with this plan is… do I -really- just write a few "Guard Clauses" into the scripts around the sections that make the change (that only affects this version of the database), and then just leave this added code for this one-time migration in my Pre- and Post-Deploy scripts -forever-?!

    I think over time, that would negatively affect the readability of those scripts, as they clutter up with more and more one-time deltas. If we deploy to a new environment for example, then this logic will never be needed, as the schema and relationships between tables in the SQL Server Database Project will be "all new" to the new environment. I can also see some future developer who is new to my application trying to get a sense of what’s going on by going through the scripts, and seeing all of this code that is no longer needed. It feels equivocal to scripting every change I’ve ever made to the schema into a single, huge log file and replaying it when I deploy.

    The only alternatives I can think of are:

    1) Script these changes into the Pre- and Post-deploy scripts, check them in, right-click Deploy the SQL Server Database Project as normal (which will execute the new logic), and then after confirming all is well, delete these changes and commit the reduced Pre- and Post-deploy scripts back to TFS, so that I won’t have this mass of code in the Pre- and Post-Scripts for the next deployment. I’d still have a history in TFS of the old commit I guess, if I ever needed to reconstruct the evolution of the application (but who really needs to do that very far back, anyway?) OR

    2) Script these changes and manually execute the scripts, forgoing the use of the automated deployment for this build, and just check-in these scripts into some folder for safe keeping. This would avoid the benefit of the Pre- and Post-deploy scripts, but might enhance the ease-of-comprehension of the SQL Server Database Project.

    (Whew long post.) So, given these choices, which would you choose? Is there a better option I don’t understand? Can you invalidate my gut feeling that populating a bunch of one-time-only logic into the Pre- and Post-deploy scripts is not really a problem to worry about?

    Thanks in advance.

  • paschott

    Pre/Post Scripts
    Todd, you might want to do one of a couple of things in your scenario.

    1. Write your scripts to check for certain conditions before running. Don’t run if those conditions aren’t met (for example, don’t run if the changes were made already). You might do this with a table, with a schema check, or some other way.

    2. When your changes are complete (and assuming only for one system), comment the relevant lines out of your pre/post deploy scripts. That will prevent them from being included in future builds.

    3. Possible alternative, though a bit messier – add a SQLCMD variable to your project for this particular change, perhaps defaulted to "False". Only run when the "True" parameter is passed to the build/sqlpackage command for that variable. (I’d still have the checks in place to avoid doing the work again, but this might be a way to stop it from running unless needed.)

    Reading your post, I’d probably do a variation of your first option in putting the script into the build with the appropriate safeguards around multiple releases, but comment out the script from the pre/post area when released. You’ll still have it in version control and still be able to pull it back pretty quickly if needed.

Join Simple Talk

Join over 200,000 Microsoft professionals, and get full, free access to technical articles, our twice-monthly Simple Talk newsletter, and free SQL tools.

Sign up