24 July 2009

Doing Schema Comparisons with Production Servers

There are usually strict rules against accessing production servers that generally apply even to the developers of the systems involved. John O'Shea comes up with a very elegant and versatile solution with Powershell that uses SQL Compare's ability to take, and compare, snapshots of databases to enable the developers to then compare development databases against the latest snapshots of production ones. Result? Happy developers!

Last week I was asked to help out on a project where a group of users that didn’t have access to our Production databases needed to be able to perform schema compares of Development versus Production databases.  We currently use Red-Gate SQL Compare (which is a great product by the way) so a solution was designed around it.  The solution we decided on was to use SQL Compare to generate Snapshots of Production nightly to a network share that could then be compared against without the need for granting access to Production (Note: a SQL Compare Snapshot is a flat file which contains all information needed to perform a complete schema compare, but no actual data). 

We wanted the process that generated the Snapshots to automatically run nightly against all online, non-system databases on all Production SQL Instances, also excluding any databases reserved for DBA usage only.  The process should also be able to be run manually for an individual database, one Instance or a group of SQL Instances.  My final solution involved 2 PowerShell scripts that reside locally in a script repository (C:\Script) on our SQL2008 Central Management Server & a scheduled SQLAgent job.  Note: a Central Management Server is not required for this functionality; any SQL Instance with PowerShell installed can run the SQLAgent job.  One script actually creates the Snapshots & one script runs 4 iterations of the create script in parallel (more on this later) in order to maximize resource utilization.

SQL Compare 8.0 Professional Edition is required to create Snapshots from the command line & to compare against the Snapshots.  There is a free utility from Red-Gate that is designed for command line creation of Snapshots (they are also usable by SQL Compare 7.0 and higher) that can be found here SQLSnapper, in our case we needed some of the features offered by Professional Edition so we went that route. 

The automated steps look like this…

  1. Scheduled job “Create RG Snaps” runs… 
  2. PWS Script “Run-RGSnaps.ps1” which makes 4 parallel calls to… 
  3. PWS Script “Create-RGSnaps.ps1” which save the snaps to…  \\server\FOLDER\RedGate_Snaps\ in… 
  4. SERVER$INSTANCE.database naming format

There are 3 methods by which the PWS Script “Create-RGSnaps.ps1” can be run manually.

  • One Database
    • Open PowerShell
    • Navigate to scripts repository, issue a “sl C:\Script” (no quotes) command
    • To run against yourdb on SERVER1\INSTANCE1, for example, issue a “./Create-RGSnaps.ps1 SERVER1\INSTANCE1 yourdb” (no quotes) command. .Example output can be seen in Appendix A.

  • One Instance (all databases)
    • Open PowerShell
    •  Navigate to scripts folder, issue a “sl C:\Script” (no quotes) command
    • To run against SERVER1\INSTANCE1, for example, issue a “./Create-RGSnaps.ps1 SERVER1\INSTANCE1” (no quotes) command.  Example output can be seen in Appendix A, but will contain more lines for additional databases.

    • Multiple Instances
      • Open powershell
      • Navigate to scripts folder, issue a “sl C:\Script” (no quotes) command
      • To run against for example, create a file named Instances.txt (or a name of your choice) in “C:\Script” that contains entries for (one line per instance)
      • Issue a “Get-Content Instances.txt ForEach {./Create-RGSnaps $_}” (no quotes) command.  Example output can be seen in Appendix A, but will contain more lines for additional databases.

The script that does the work “Create-RGSnaps.ps1” takes a required parameter of Instance & an optional parameter of Database.  If a value for Database is passed the script calls the command line interface for SQL Compare with no further processing.  If no value for Database is passed the script parses all Databases on the Instance that are active & accessible and created a Snapshot for each database.  The code is here, I’ll go over some of the functional points below.

Using the = $(throw ”) syntax on a parameter makes it required & generates an error if it’s not passed.

Variables defined for 2 values used later in the code, these could just as easily be defined at the time needed, it’s just a little neater & easier to find this way.

Since we re-create all snaps nightly we decided an occasional failure (for example if we were temporarily unable to connect to the server for some reason) was acceptable & we didn’t want to see any error message..

Is a generic function to return a result set from a SQL query into a hash table.  The function connects to a SQL Instance via the SqlDataAdapter Class in the .NET Framework using the connection information defined in $ConnString & a query defined in $Query.  In this case the query returns a list consisting of all core SQL instances to create snaps against for the scheduled run.

This function returns a list of accessible databases (AND has_dbaccess(name) = 1) from the SQL Instance specified in $Instance using the parameters defined in $Query which excludes system databases (dbid > 4) and DBA reserved (name NOT IN (‘distribution’,’DBADatabase’)) databases (including distribution) that are not developed against.

This is the function that does the actual work of creating the snaps, we build the command we want to run into $Command & then use an Invoke-Expression $Command to run the command.  This syntax ($InstanceStrng = ($Instance -replace('\\','`$')).ToUpper()) is used to change the instance name from a SERVER\INSTANCE format to a SERVER$INSTANCE format since \ is an invalid character in a filename. 

Looking a little more closely at the -replace syntax \\ resolves to \ & we need to escape the $ character which we do with the PowerShell escape character of `. 

In our case our machine is x64 so we must account for this in our path to the .exe (C:\Program Files (x86)\), if you machine is x86 just remove that part of the path (C:\Program Files\). 

The /force switch overwrites any existing files with the same name, this works for us since we do not have the need for historical schema.  If we did need a history we could easily accomplish this by removing the /force switch & adding a datestamp (see below for an example of this).  Notes: the month must be specified as MM (mm returns minutes; HH returns hours in 24hr format, hh returns hours in 12hr format.

You can get granular all the way to the millisecond level with -Format by extending the format definition (Get-Date -Format 'yyyyMMddHHmmssms').  The date will format in exactly the pattern in specified in the -Format  parameter; for example Get-Date -Format 'yyyyMMddHHmmssms' formats as 200907231613421342, Get-Date -Format 'yyyy-MM-dd HH:mm:ss.ms' formats as 2009-07-23 16:13:42.1342.

This portion of the code controls the workflow based on the parameters passed.  PowerShell is object oriented so code blocks (denoted by curly brackets {}) (Get-Databases $Instance, for example) results in an object that can then be piped to other objects (this will be important in a minute) using the pipe “|” command (no quotes). 

If a database is specified (If ($Database)) the code calls {Invoke-SQLCompare $Instance $Database $Path} directly passing parameters for the SQL Instance ($Instance), database ($Database) & file path ($Path)

If no database is specified the code calls the Get-Databases $Instance function, passing a parameter for SQL Instance $Instance which generates a list of databases to run against; the list of databases is piped to Invoke-SQLCompare passing parameters for the SQL Instance ($Instance), database ($_.name) & file path ($Path) using ForEach which will loop through each database generated by Get-Databases.   

The parameter for database breaks down like this $_. acts a placeholder for the current object.  The official definition for $_. is the current pipeline object; used in script blocks, filters, the process clause of functions, where-object, foreach-object and switches.  In other words, it holds the values you just piped.  We use name because the column name was selected in the query used to get the list of databases (SELECT name FROM dbo.sysdatabases (NOLOCK)...), if a different column name had been specified in the result set from the query, for example, DB (SELECT name AS [DB] FROM dbo.sysdatabases (NOLOCK)...) the variable would be $_.DB.

The script that runs the Create script in parallel “Run-RGSnaps.ps1” is based on code written by my boss; his blog is here & was only modified by me to meet my needs.  Since in PowerShell 1.0 there’s no way to spawn background processes the Run script launches new PowerShell threads until it reaches the number defined as it’s max & then checks every 15 seconds for less than the max threads (excluding itself) & if the threads are less than the defined max, spawns a new thread.  I take no credit for his excellent work, just want to include for completeness.  The script takes no parameters, it has a built-in function to get a list of Production Instances we have defined in a database & pulls back the ones marked active & passes them to the create script to run in parallel. 

I tested running serially and with 4 & 8 iterations in parallel with nothing else running on the host server against 447 databases on 26 Instances.  The best balance between performance & resource utilization was at 1 iteration per CPU core (in our case, 4 parallel iterations).  The main server used for testing is a Proliant DL580 G3 with 4 single core processors, 16GB of RAM & Gigabit NIC.  I also tested on machines with both fewer & greater cores & confirmed the iteration per core balance.

Serially resource usage was minimal we saw 8% CPU, completion time was 56 minutes on average; at 4 parallel iteration we saw 33% CPU & completion time of 15 minutes on average; at 8 parallel iteration we saw 85% CPU & completion time of 12 minutes on average.  The code is here, I’ll go over some of the functional points below.

A collection of control variables defined at the top of the script for manageability. $ScriptRoot is the working directory; $SQLPSXServer, $SQLPSXDatabase & $Query define where we’ll pull our list of SQL Instances to run against from.  $ServerList defines an array the list of SQL Instances will be loaded into.

Calls the Get-SqlData function (defined separately in this script but the same as discussed above) passing parameters for SQL Instance ($SQLPSXServer), database ($SQLPSXDatabase) & query to run ($Query).  The results are piped to the $ServerList array defined at the beginning of the script using the add ({$ServerList.Add($_.Server) > $null}) method.

Is the code that actually launches the new PowerShell threads (basically just starts another process).  $StartInfo defines a new object in PowerShell from the ProcessStartInfo .NET Class.  $StartInfo.FileName defines the exe that will be started using the $pshome built-in variable which contains PowerShell’s home folder (where the exe resides).  $StartCommand defines the script to be run & any variables that should be passed (+ "Create-RGSnaps.ps1 "). $StartInfo.Arguments defines any arguments against the exe when started. -NoProfile starts PowerShell (or any exe) minimized & -Command runs the command (defined in $StartCommand).  [System.Diagnostics.Process]::Start actually starts the process.

Calls the Get-SqlList function (see above) passing parameters for SQL Instance ($SQLPSXServer), database ($SQLPSXDatabase) & query to run ($Query) which populates the array ($ServerList) used in the main While loop.

This is the main control flow of the script.  While the array remains populated (While ($ServerList.Count -gt 0)) it

  • checks the number of PowerShell processes running (If ($(Get-Process | where {$_.ProcessName -eq 'PowerShell' -and $_.Id -ne $PID} | Measure-Object).count -lt $MaxThread)) &

    • if the number of processes running is less than the max defined in $MaxThread
      • takes the top value from the array ($Server = $ServerList[0])
      • calls the LaunchThread $Server function passing a variable for SQL Instance ($Server) &
      • removes the value from the array ($ServerList.Remove("$server"))

      o   if the number of processes running is not less (equals or greater) than the max defined in $MaxThread (Else)

      • Sleeps for 15 seconds ([System.Threading.Thread]::Sleep(15000)) and re-checks

That’s basically it, our users have been highly satisfied with this solution & the process has been running flawlessly for several weeks now with no intervention needed on our part.

Appendix A

761-Appendix%20A-smallr.jpg

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

Downloads

This post has been viewed 12877 times – thanks for reading.

Tags: , , , , ,

  • Rate
    [Total: 17    Average: 4.9/5]
  • Share

John O’Shea is a SQL Database Administrator with 10 years of direct SQL experience ranging from SQL 7 to SQL 2008. He currently works for Raymond James, a diversified financial services holding company with subsidiaries engaged primarily in investment and financial planning, in addition to investment banking and asset management located in St Petersburg Florida. His blog SQL, Powershell & Etcetera, can be found here. John is interested in SQL, Powershell, Strength Athletics & Sci-Fi.

View all articles by John O'Shea

  • JohnC

    Wouldn’t it be nice.
    If the Red-Gate people put all this behind a UI? I need to “automagically” archive and generate reports of schema (and configuration) differences from week to week to email to audit.. The Red-Gate software is almost there it just needs a nice UI and a tie-in to the window scheduler.. Then they could sell the product to the thousands of auditors looking for a solution to track schema/config changes. All this powershell stuff is great until it breaks. It’s a whole market they are missing..