Click here to monitor SSC
  • Av rating:
  • Total votes: 16
  • Total comments: 3
Phil Factor

Database Deployment: The Bits - Database Version Drift

30 November 2012

Database delivery patterns & practices

STAGE 3 CONTINUOUS INTEGRATION

When you are about to deploy a new version of a database by updating the current version, one of the essential pre-deployment checks is to make sure that the version  in production is exactly what  it should be. If changes have somehow slipped in, you'll need to understand them and deal with them before you can deploy.

Version-drift (or Version-creep) is the ironic name given to the habit of 'patching' the production database to make any type of change, without deploying it properly. The change could be as innocent as adding or changing an index to aid performance, but you can’t take that for granted. If you find changes in a production system that weren’t there when the version was deployed, they need to be investigated.

At the simplest level, when we detect version-drift, we are discovering that the version in production is not the version we think it is.

If we don't check for version-drift, then it is likely that beneficial changes can be lost, or undesirable changes can be retained unintentionally: The exact consequences depend on the methodology we're using for a deployment. If there has been drift, we need to find out why and take action to ensure that nothing gets lost, be it changes to indexes, configuration settings, or forensic evidence of fraud.

When deploying a database as a data-tier application, we might get a warning message about version-drift when we attempt the deployment. The safest way to check for version-drift is to compare the metadata before each deployment. Are the tables, routines, settings, static data, indexes, and so on, in production the same as in the archived version, usually represented by the scripts in source-control? I'll show you how to do this.

Why is drift a bad thing?

I've made plenty of changes to a live system in the past: They're called deployments; it is what they’re for. It should be easy to do the right thing. Version-drift shouldn't happen, but it does. If the deployment process wasn’t so irksome, then the temptation to circumvent it, and 'patch' in a change wouldn’t be so strong. If you are finding a lot of version-drift then there is something wrong with the deployment system, either its design or implementation. If drift has happened, then we must delay deployment whilst we put it right.

If we are creating the deployment script by running a Synchronization tool, and press on without correcting the drift, we will retain stuff that maybe shouldn't be there, whereas if we do it from source control, we will lose stuff that perhaps ought to be there. In either case, there is a chance that something will break. The only way to correct drift is to look at the changes, and 'sort out the sheep from the goats'; in other words, keep the changes that should have been deployed properly, and discard the alterations that were caused by humanity's imperfect state.

There is no sense in having an automated process to preserve unexpected changes since not all changes prove to be desirable and we need to inspect all of them before deciding how to deal with them. After all, while the unexpected change could be a genuine and effective change to improve performance, it could equally be part of a fraud, or an attempt to compromise the system's security. Even if unintentional, the mindless thrashings of a DBA in a muck-sweat can result in a change that leaves a security vulnerability, or a performance time-bomb, in the database. We must treat all unexpected changes as suspect until we prove them otherwise.

When there is an auditing system in place for a production system, then it is unlikely that you’ll hit version drift, because you’ll know of any change. Many shops will have an automated alarm that detects an unscheduled metadata change in a production system which then rings the big red bell on the wall, similar to the ones like you see in the Bond films. (Okay, it sends an email alert). In any event, it is part of a good database intrusion-detection system. We can and should, of course, trap and record any DDL changes as they happen and keep a log of them, as part of an auditing system. Depending on the method of doing this, it can come with a cost in extra resources, though In the case of Extended Events this is isn't significant.

How does drift happen?

When circumventing the deployment process in order to make a change to a live database system, we make a judgment that the risks of not making the change are greater than the risks of doing so, and hit the button. It could be that the production system isn't important enough to matter. It could be that the proper deployment process is horribly broken, or that the person in charge of deployment is a power-crazed cabbage. One of the downsides of teamwork is that we learn to tolerate the failings of others, so we manage the change retrospectively. We can't really assume that we can simply blitz all such database changes. We just have to accept that legitimate version-drift happens. It is frustrating at times, but one can feel a twinge of sympathy with anyone who prefers to go home in good time rather than rolling up their sleeves, updating the same version in Source Control, installing it on the integration server, running the automated integration tests, and then doing the deployment.

Version-Drift and Database Deployment

We need to check for 'uncontrolled' version drift before any deployment. If the metadata check reveals that the scripts in Source Control reflect what is in production and they are at the same version, then we breathe a sigh of relief. If there has been version-drift, without the changes going into Source Control, then our day starts to get more complicated.

Some differences, such as replication metadata, are added to the production server automatically on deployment and should be ignored. We can also ignore bogus differences arising from machine-generated names in the SQL Server metadata for 'anonymous' keys.

Other modifications, however, we can't ignore. We need to review all the changes. If we're in luck, the developer will have documented the change in the live database, via extended properties or headers. If we need to keep the changes then we need to update source control and rerun the pre-deployment automated tests before attempting any deployment.

Checking for Version-Drift

So, what's required? As discussed, we need to inspect all changes. This process needs a human. Changes need to be understood. This requires eye-balling. We need to look at the differences between what's on the live system and what source Control thinks is in this version. The best way of doing this is by means of a schema-comparison tool. This will give you a side-by-side rendering of the metadata in the production database, and the metadata as represented by the version in source control, or the build from it on the integration server. You can then scroll through all the changes rapidly and  get a quick appreciation of any ‘drift’.

I generally don't even touch the production system, but look at a recent backup set from it. I also compare directly with what's in source control via a Scripts Folder. If you have the SQL Toolbelt, and you haven't yet discovered SQL Compare, then you're in for a happy surprise, because SQL Compare does this fast and elegantly. You don't even need a database, but can compare a scripts folder directly with a backup set.

With a schema-comparison tool, the process is straightforward. You just kick off a new project if it is the first time you’re doing the comparison, or select an existing one. You then specify whether you wish to use a database, backup, snapshot or scripts folder for the  database you want to check, and fill in the appropriate details. You do the same thing for the database you want to compare it against and away you go. You don’t need the advanced synchronization features: only metadata comparison, and you can navigate through the source by the type of database object and get immediate information about what database objects have changed, and how many of each.

If you don't have access to a database-schema comparison tool, then you can still do it, but in a rather more longwinded way. You use a differencing tool on the two source files. A Text comparison tool is part of every programmer's toolkit, even if only used when embedded into your source control system. There are a few snags that I’ve come across with using this method: You need to script the live system, and compare it with an integration database (one built from the source in source control for a particular version), the script-generation process is surprisingly slow, and the comparison is purely text-based  rather than being semantic.

Hah! The foreign keys are unchecked in the clone

Using a Programmers File-Comparison Tool

WinMerge is on SourceForge, published under a GNU GPL. It is free and open-source. It is probably the most popular of the many tools that show the difference between two text files. There are plenty of others such as ExamDiff and Beyond Compare. This will not help us much with an ordinary build script, such as the ones generated from SSMS, because we need the order of all the objects to be the same in both scripts. SSMS will sort the tables in dependency order, but you’ll soon find that this can result in any number of orders for identical databases. We have to resort to SMO to get the compare script we want, which is merely a sequenced collection of all the individual scripts. Remember that, unlike a proper database comparison tool, a file comparison tool has no knowledge of semantic, only text differences: it doesn't understand SQL!

To generate the scripts, we need to build the integration or staging server to the version in source control and compare this to an exact copy of what is in production, by restoring a backup. By happy coincidence, a test restore is something we need to do anyway, occasionally. We can then run a script that will do something like this. (I use PowerShell, but C#, F# or IronPython is just as good)

<# This script creates two comparison database scripts that you can then compare with a file comparison tool. This is used for checking that the production database matches the copy of the database in Integration or Staging. We store them in a directory, placed in subdirectories according to server and database. #>
$FileComparisonTool='C:\Program Files (x86)\WinMerge\WinmergeU.exe' #or whatever you choose
$directory='E:\MyScriptsDirectory' # the directory where you want to store the scripts
$filenames = @(); #we make a list of the filenames
<# and we have a list of the serverInstance/Database we want the script from. There is a third column which is blank if it is Windows authentication and has the default Login if it is SQL Server authentication #>
$Databases = @(
       ('MyServer',' MyDatabase','PhilFactor'),
       (' MyProductionServer’, 'MyDatabase',''))
   
<# This script will actually create as many comparison scripts as you want. You can then use them later on.#>
Trap {
  # Handle the error
  $err = $_.Exception
  write-host $err.Message
  while( $err.InnerException ) {
   $err = $err.InnerException
   write-host $err.Message
   };
  # End the script.
  break
  }

set-psdebug -strict
$ErrorActionPreference = "stop" #

#Load SMO assemblies, checking that it was successful
$MS='Microsoft.SQLServer'
@('.SMO','.SmoExtended','.ConnectionInfo') |
     foreach-object {
        if ([System.Reflection.Assembly]::LoadWithPartialName("$MS$_") -eq $null)
           {"missing SMO component $MS$_"}
     }
# just to prevent the need for retyping
$My="$MS.Management.Smo" #

$Databases | foreach-object{
        $ServerName = $_[0] #the name of the server or instance for this job
        $DatabaseName = $_[1] #the name of the database for this job
        $Login=$_[2].Trim()
        <# and we can now establish the filename based on the server and database and maybe create the directories #>
        $Scriptdirectory = "$directory\$($ServerName -replace '[\\\/\:\.]','-')\$($DatabaseName -replace '[\\\/\:\.]','-' )"
        # a little easier than doing it the Powershell way
   if (![System.IO.Directory]::Exists($Scriptdirectory)) {$null=[System.IO.Directory]::CreateDirectory($Scriptdirectory)}
   $filename ="$($Scriptdirectory)\$($DatabaseName)_Compare.sql" #note different suffix to build
        $Filenames += $filename #remember the filename
        if ($Login -ne '')
                {
                $Credentials=get-credential -Credential $Login
      $mySrvConn = new-object Microsoft.SqlServer.Management.Common.ServerConnection
      $mySrvConn.ServerInstance=$ServerName
      $mySrvConn.LoginSecure = $false
      $mySrvConn.Login = $Credentials.UserName
      $mySrvConn.SecurePassword = $Credentials.Password
      $s= new-object ("$My.Server") $mySrvConn }
   else
                {$s = new-object ("$My.Server") $ServerName } #find the server instance
        if ($s.Version -eq  $null ){Throw "Can't find the instance $ServerName"}
        $TheDB= $s.Databases[$DatabaseName]
        if ($TheDB.name -ne $DatabaseName){Throw "Can't find the database '$DatabaseName' in $ServerName"};
        <# now we will use the canteen system of SMO to specify what we want from the script. It is best to have a list of the defaults to hand and just over-ride the defaults where necessary, but there is a chance that a later revision of SMO could change the defaults so beware! #>
   $scripter = new-object ("$My.Scripter") $s #script out the database creation
   $Scripter.Options.AllowSystemObjects = $false #who wants 'em?
        $Scripter.Options.ExtendedProperties= $true # yes we want these
   $Scripter.Options.ContinueScriptingOnError=$false
        $Scripter.Options.DRIAll= $true # and all the constraints
        $Scripter.Options.Indexes= $true # Yup, these would be nice
        $Scripter.Options.Triggers= $true # This should be included when scripting a database
        $Scripter.Options.IncludeHeaders = $false; # they have today's date!
        $Scripter.Options.ToFileOnly = $true #no need of string output as well
        $Scripter.Options.IncludeIfNotExists = $false #not necessary
        $Scripter.Options.Filename = $filename;
   $scripter.options.AppendToFile=$false #because this is the first part that gets written
       $scripter.options.IncludeDatabaseContext=$true
   $scripter.Script($TheDB) #do it
   $scripter.options.AppendToFile=$true #we add each object script
foreach ($object in   @("Tables","Functions", "Views","XmlSchemaCollections", "Assemblies", "StoredProcedures", "Schemas", "Triggers", "Roles", "MessageTypes", "Contracts",  "Queues", "Routes", "Services", "RemoteServiceBindings")) #there are more in later versions, but not so important.
       {
        $ObjectCollection=$TheDB.($object) #get the collection
   if ($ObjectCollection.count -gt 0) { #if there is anything there
    $ObjectCollection  | #for each in the object collection
      Sort-Object name  | # just in case
        foreach-object { #prefetch makes no difference at all
                      $scripter.Script($_) #and script each indivisual object
                  }
                "done $object for $servername, $DatabaseName at $(get-date)"
    }
  }            
}
       
if ($filenames.count -gt 1)
        {&$FileComparisonTool $filenames(0)  $filenames(1)}

A Watched kettle never boils

Just don't hang about, watching the screen, because little furry mammals evolve in the time this script takes to run; AdventureWorks takes two minutes locally. No, there are definitely good reasons for having SQL Comparison tools. However, with advance planning, we can prepare all the scripts overnight.

Of course, for a great deal of the database, it is much quicker to get the script from the metadata, via SQL, but the build-script for tables and the related DRI aren't stored within the database metadata. We can create a SQL routine to do our table-build script but to get everything we need is going to be immensely complex. It is also possible to pull down the metadata to a local database but the devil is in the detail.

One way of speeding the process in C# is to increase the number of threads to an optimum of around six (thanks to Linchi Shea for discovering this), or in PowerShell we can run the scripts for each database simultaneously.

My solution would be to shrug and plan this in advance, running this on a scheduler.

Conclusions

Before any deployment, it is worth checking that the version of the database in production is the same as the version you think is there. If all is well, then we will know exactly what is in the production database since, for all practical purposes, it will be what's in source control for the same version. If we've suffered 'Version-Creep' , and find there are differences, then we need to understand why these changes have happened and decide if they need to be made permanent by retrospectively legitimizing them in source control.

This article is part of our database delivery patterns & practices series on Simple Talk.

Find more articles for version control, automated testing, continuous integration & deployment.

Phil Factor

Author profile:

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Google + To translate this article...

Search for other articles by Phil Factor

Rate this article:   Avg rating: from a total of 16 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Using checksum queries to identify schema changes
Posted by: Michael Baylon (not signed in)
Posted on: Thursday, December 06, 2012 at 6:33 AM
Message: Nice post.

Another relatively simple way to identify 'unexpected' changes to a production database is by carrying out checksum queries on INFORMATION_SCHEMA views for different types of sql objects.

In outline, run the checksum queries on your production schema (and then ideally store the returned values in a ' build version' table).

Then run the same queries before the carrying out the next deployment.

If the production schema has not changed between deployments the checksum values will be the same. If different - it indicates a change has taken place.

More detail on this process can be found on the following post -

http://michaelbaylon.wordpress.com/2010/10/16/using-checksums-to-test-for-unexpected-database-schema-changes/

Subject: SSDT is free
Posted by: Peter Schott (not signed in)
Posted on: Thursday, December 06, 2012 at 10:05 AM
Message: Assuming this is for SQL Server, SQL Server Data Tools is available for free and works backwards to SQL 2005. There's a built-in Schema Comparison tool that is serviceable. It's better to keep up with the schema changes, but it's at least something. Wish it could ignore non-essentials (square braces, extra parentheses) a little better, but it works.

Setting up DDL Audits is definitely worthwhile. Automated notices of schema changes to tell of changes make keeping up with that drift a bit easier. I blogged about doing that a while back, but it could save someone some time.

http://schottsql.blogspot.com/2010/02/ddl-schema-change-auditing-on-sql.html

Subject: SSDT + Git works well
Posted by: Anonymous (not signed in)
Posted on: Monday, December 10, 2012 at 3:18 PM
Message: I've recently switched away from using SSMS + Redgate Source Control for this purpose to using SSDT + Git and I'm very happy with the results. Redgate has served me well but it's always been a bit slow and not well integrated.

Switching to the full project model and deployment methodology in SSDT really makes this work a lot easier, especially the ability to merge schema changes in either direction (so you can easily review and accept/reject those rogue DBA changes from production :)

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.