Database Deployment: The Bits - Database Version Drift

30 November 2012
by Phil Factor
Database Lifecycle Management Patterns & Practices Library Stage 4


Stage 4: Release Management

When you are about to deploy a new version of a database by updating the current version, one of the essential pre-deployment checks is to make sure that the version  in production is exactly what  it should be. If changes have somehow slipped in, you'll need to understand them and deal with them before you can deploy.

Version-drift (or Version-creep) is the ironic name given to the habit of 'patching' the production database to make any type of change, without deploying it properly. The change could be as innocent as adding or changing an index to aid performance, but you can’t take that for granted. If you find changes in a production system that weren’t there when the version was deployed, they need to be investigated.

At the simplest level, when we detect version-drift, we are discovering that the version in production is not the version we think it is.

If we don't check for version-drift, then it is likely that beneficial changes can be lost, or undesirable changes can be retained unintentionally: The exact consequences depend on the methodology we're using for a deployment. If there has been drift, we need to find out why and take action to ensure that nothing gets lost, be it changes to indexes, configuration settings, or forensic evidence of fraud.

When deploying a database as a data-tier application, we might get a warning message about version-drift when we attempt the deployment. The safest way to check for version-drift is to compare the metadata before each deployment. Are the tables, routines, settings, static data, indexes, and so on, in production the same as in the archived version, usually represented by the scripts in source-control? I'll show you how to do this.

Why is drift a bad thing?

I've made plenty of changes to a live system in the past: They're called deployments; it is what they’re for. It should be easy to do the right thing. Version-drift shouldn't happen, but it does. If the deployment process wasn’t so irksome, then the temptation to circumvent it, and 'patch' in a change wouldn’t be so strong. If you are finding a lot of version-drift then there is something wrong with the deployment system, either its design or implementation. If drift has happened, then we must delay deployment whilst we put it right.

If we are creating the deployment script by running a Synchronization tool, and press on without correcting the drift, we will retain stuff that maybe shouldn't be there, whereas if we do it from source control, we will lose stuff that perhaps ought to be there. In either case, there is a chance that something will break. The only way to correct drift is to look at the changes, and 'sort out the sheep from the goats'; in other words, keep the changes that should have been deployed properly, and discard the alterations that were caused by humanity's imperfect state.

There is no sense in having an automated process to preserve unexpected changes since not all changes prove to be desirable and we need to inspect all of them before deciding how to deal with them. After all, while the unexpected change could be a genuine and effective change to improve performance, it could equally be part of a fraud, or an attempt to compromise the system's security. Even if unintentional, the mindless thrashings of a DBA in a muck-sweat can result in a change that leaves a security vulnerability, or a performance time-bomb, in the database. We must treat all unexpected changes as suspect until we prove them otherwise.

When there is an auditing system in place for a production system, then it is unlikely that you’ll hit version drift, because you’ll know of any change. Many shops will have an automated alarm that detects an unscheduled metadata change in a production system which then rings the big red bell on the wall, similar to the ones like you see in the Bond films. (Okay, it sends an email alert). In any event, it is part of a good database intrusion-detection system. We can and should, of course, trap and record any DDL changes as they happen and keep a log of them, as part of an auditing system. Depending on the method of doing this, it can come with a cost in extra resources, though In the case of Extended Events this is isn't significant.

How does drift happen?

When circumventing the deployment process in order to make a change to a live database system, we make a judgment that the risks of not making the change are greater than the risks of doing so, and hit the button. It could be that the production system isn't important enough to matter. It could be that the proper deployment process is horribly broken, or that the person in charge of deployment is a power-crazed cabbage. One of the downsides of teamwork is that we learn to tolerate the failings of others, so we manage the change retrospectively. We can't really assume that we can simply blitz all such database changes. We just have to accept that legitimate version-drift happens. It is frustrating at times, but one can feel a twinge of sympathy with anyone who prefers to go home in good time rather than rolling up their sleeves, updating the same version in Source Control, installing it on the integration server, running the automated integration tests, and then doing the deployment.

Version-Drift and Database Deployment

We need to check for 'uncontrolled' version drift before any deployment. If the metadata check reveals that the scripts in Source Control reflect what is in production and they are at the same version, then we breathe a sigh of relief. If there has been version-drift, without the changes going into Source Control, then our day starts to get more complicated.

Some differences, such as replication metadata, are added to the production server automatically on deployment and should be ignored. We can also ignore bogus differences arising from machine-generated names in the SQL Server metadata for 'anonymous' keys.

Other modifications, however, we can't ignore. We need to review all the changes. If we're in luck, the developer will have documented the change in the live database, via extended properties or headers. If we need to keep the changes then we need to update source control and rerun the pre-deployment automated tests before attempting any deployment.

Checking for Version-Drift

So, what's required? As discussed, we need to inspect all changes. This process needs a human. Changes need to be understood. This requires eye-balling. We need to look at the differences between what's on the live system and what source Control thinks is in this version. The best way of doing this is by means of a schema-comparison tool. This will give you a side-by-side rendering of the metadata in the production database, and the metadata as represented by the version in source control, or the build from it on the integration server. You can then scroll through all the changes rapidly and  get a quick appreciation of any ‘drift’.

I generally don't even touch the production system, but look at a recent backup set from it. I also compare directly with what's in source control via a Scripts Folder. If you have the SQL Toolbelt, and you haven't yet discovered SQL Compare, then you're in for a happy surprise, because SQL Compare does this fast and elegantly. You don't even need a database, but can compare a scripts folder directly with a backup set.

With a schema-comparison tool, the process is straightforward. You just kick off a new project if it is the first time you’re doing the comparison, or select an existing one. You then specify whether you wish to use a database, backup, snapshot or scripts folder for the  database you want to check, and fill in the appropriate details. You do the same thing for the database you want to compare it against and away you go. You don’t need the advanced synchronization features: only metadata comparison, and you can navigate through the source by the type of database object and get immediate information about what database objects have changed, and how many of each.

If you don't have access to a database-schema comparison tool, then you can still do it, but in a rather more longwinded way. You use a differencing tool on the two source files. A Text comparison tool is part of every programmer's toolkit, even if only used when embedded into your source control system. There are a few snags that I’ve come across with using this method: You need to script the live system, and compare it with an integration database (one built from the source in source control for a particular version), the script-generation process is surprisingly slow, and the comparison is purely text-based  rather than being semantic.

Hah! The foreign keys are unchecked in the clone

Using a Programmers File-Comparison Tool

WinMerge is on SourceForge, published under a GNU GPL. It is free and open-source. It is probably the most popular of the many tools that show the difference between two text files. There are plenty of others such as ExamDiff and Beyond Compare. This will not help us much with an ordinary build script, such as the ones generated from SSMS, because we need the order of all the objects to be the same in both scripts. SSMS will sort the tables in dependency order, but you’ll soon find that this can result in any number of orders for identical databases. We have to resort to SMO to get the compare script we want, which is merely a sequenced collection of all the individual scripts. Remember that, unlike a proper database comparison tool, a file comparison tool has no knowledge of semantic, only text differences: it doesn't understand SQL!

To generate the scripts, we need to build the integration or staging server to the version in source control and compare this to an exact copy of what is in production, by restoring a backup. By happy coincidence, a test restore is something we need to do anyway, occasionally. We can then run a script that will do something like this. (I use PowerShell, but C#, F# or IronPython is just as good)

<# This script creates two comparison database scripts that you can then compare with a file comparison tool. This is used for checking that the production database matches the copy of the database in Integration or Staging. We store them in a directory, placed in subdirectories according to server and database. #>
$FileComparisonTool='C:\Program Files (x86)\WinMerge\WinmergeU.exe' #or whatever you choose
$directory='E:\MyScriptsDirectory' # the directory where you want to store the scripts
$filenames = @(); #we make a list of the filenames
<# and we have a list of the serverInstance/Database we want the script from. There is a third column which is blank if it is Windows authentication and has the default Login if it is SQL Server authentication #>
$Databases = @(
       ('MyServer',' MyDatabase','PhilFactor'),
       (' MyProductionServer’, 'MyDatabase',''))
<# This script will actually create as many comparison scripts as you want. You can then use them later on.#>
Trap {
  # Handle the error
  $err = $_.Exception
  write-host $err.Message
  while( $err.InnerException ) {
   $err = $err.InnerException
   write-host $err.Message
  # End the script.

set-psdebug -strict
$ErrorActionPreference = "stop" #

#Load SMO assemblies, checking that it was successful
@('.SMO','.SmoExtended','.ConnectionInfo') |
     foreach-object {
        if ([System.Reflection.Assembly]::LoadWithPartialName("$MS$_") -eq $null)
           {"missing SMO component $MS$_"}
# just to prevent the need for retyping
$My="$MS.Management.Smo" #

$Databases | foreach-object{
        $ServerName = $_[0] #the name of the server or instance for this job
        $DatabaseName = $_[1] #the name of the database for this job
        <# and we can now establish the filename based on the server and database and maybe create the directories #>
        $Scriptdirectory = "$directory\$($ServerName -replace '[\\\/\:\.]','-')\$($DatabaseName -replace '[\\\/\:\.]','-' )"
        # a little easier than doing it the Powershell way
   if (![System.IO.Directory]::Exists($Scriptdirectory)) {$null=[System.IO.Directory]::CreateDirectory($Scriptdirectory)}
   $filename ="$($Scriptdirectory)\$($DatabaseName)_Compare.sql" #note different suffix to build
        $Filenames += $filename #remember the filename
        if ($Login -ne '')
                $Credentials=get-credential -Credential $Login
      $mySrvConn = new-object Microsoft.SqlServer.Management.Common.ServerConnection
      $mySrvConn.LoginSecure = $false
      $mySrvConn.Login = $Credentials.UserName
      $mySrvConn.SecurePassword = $Credentials.Password
      $s= new-object ("$My.Server") $mySrvConn }
                {$s = new-object ("$My.Server") $ServerName } #find the server instance
        if ($s.Version -eq  $null ){Throw "Can't find the instance $ServerName"}
        $TheDB= $s.Databases[$DatabaseName]
        if ($ -ne $DatabaseName){Throw "Can't find the database '$DatabaseName' in $ServerName"};
        <# now we will use the canteen system of SMO to specify what we want from the script. It is best to have a list of the defaults to hand and just over-ride the defaults where necessary, but there is a chance that a later revision of SMO could change the defaults so beware! #>
   $scripter = new-object ("$My.Scripter") $s #script out the database creation
   $Scripter.Options.AllowSystemObjects = $false #who wants 'em?
        $Scripter.Options.ExtendedProperties= $true # yes we want these
        $Scripter.Options.DRIAll= $true # and all the constraints
        $Scripter.Options.Indexes= $true # Yup, these would be nice
        $Scripter.Options.Triggers= $true # This should be included when scripting a database
        $Scripter.Options.IncludeHeaders = $false; # they have today's date!
        $Scripter.Options.ToFileOnly = $true #no need of string output as well
        $Scripter.Options.IncludeIfNotExists = $false #not necessary
        $Scripter.Options.Filename = $filename;
   $scripter.options.AppendToFile=$false #because this is the first part that gets written
   $scripter.Script($TheDB) #do it
   $scripter.options.AppendToFile=$true #we add each object script
foreach ($object in   @("Tables","Functions", "Views","XmlSchemaCollections", "Assemblies", "StoredProcedures", "Schemas", "Triggers", "Roles", "MessageTypes", "Contracts",  "Queues", "Routes", "Services", "RemoteServiceBindings")) #there are more in later versions, but not so important.
        $ObjectCollection=$TheDB.($object) #get the collection
   if ($ObjectCollection.count -gt 0) { #if there is anything there
    $ObjectCollection  | #for each in the object collection
      Sort-Object name  | # just in case
        foreach-object { #prefetch makes no difference at all
                      $scripter.Script($_) #and script each indivisual object
                "done $object for $servername, $DatabaseName at $(get-date)"
if ($filenames.count -gt 1)
        {&$FileComparisonTool $filenames(0)  $filenames(1)}

A Watched kettle never boils

Just don't hang about, watching the screen, because little furry mammals evolve in the time this script takes to run; AdventureWorks takes two minutes locally. No, there are definitely good reasons for having SQL Comparison tools. However, with advance planning, we can prepare all the scripts overnight.

Of course, for a great deal of the database, it is much quicker to get the script from the metadata, via SQL, but the build-script for tables and the related DRI aren't stored within the database metadata. We can create a SQL routine to do our table-build script but to get everything we need is going to be immensely complex. It is also possible to pull down the metadata to a local database but the devil is in the detail.

One way of speeding the process in C# is to increase the number of threads to an optimum of around six (thanks to Linchi Shea for discovering this), or in PowerShell we can run the scripts for each database simultaneously.

My solution would be to shrug and plan this in advance, running this on a scheduler.


Before any deployment, it is worth checking that the version of the database in production is the same as the version you think is there. If all is well, then we will know exactly what is in the production database since, for all practical purposes, it will be what's in source control for the same version. If we've suffered 'Version-Creep' , and find there are differences, then we need to understand why these changes have happened and decide if they need to be made permanent by retrospectively legitimizing them in source control.

Red Gate DLM Dashboard

Want an easy way to discover database drift? DLM Dashboard tracks every change to your database schemas, so you can make sure all your databases are in the right state. As soon as a change occurs, DLM Dashboard will alert you to the details of who, what, and how of the database schema changes, via email notifications and a clear dashboard overview. Find out more.

This article is part of our Database Lifecycle Management patterns & practices.

Return to the library to find more articles on DLM, or visit for more information about SQL Server products for database lifecycle management.