Click here to monitor SSC
Av rating:
Total votes: 31
Total comments: 3


William Brewer
Database Source Control - The Cribsheet
08 November 2011

Database Source Control- The Crib Sheet

For things you need to know rather than the things you want to know

As part of our long-running Cribsheet series, we asked William to come up with a brief summary of what was involved in bringing database development work under source control. What are the advantages it brings, and are there disadvantages?

Why have source control in a database?

Source Control is a requirement for any team development projects. In fact, it is essential for any development work. When an application depends on the coordination of a development that involves both database and application development, the scripts must be treated together, with the same version numbers applied,  so that any working build can be reproduced. To do this, it is important to be able to tie database changes to the code changes that they relate to, by checking them into source-control as a part of the same changeset transaction. This makes it far easier to make consistent builds and deployments. The complications with  integrating the database development into application development come with deployment of new versions from source control to existing live applications

With most developments, there are several points in the process where a consistent build that works should be available. For databases, there must be a correct version of the build script, not just for the database structure and routines, but also for at least the key ‘static’ data, and configuration information.  Everything that goes towards the build must be in source control, including configuration files, and it should be possible to build the entire application from these scripts. Where a deployment needs to be made to a live database, there must sometimes  be additional 'migration scripts' or 'change scripts' to enable existing data to fit in any changed schema.

Although the use of a predefined interface between the application and the database will help to keep conflicts to a minimum, it does not obviate the need for Source control, but just minimises the pain: It is likely that this interface will need to  change when the requirements of the applications change. It is much better to use source control than to stick rigidly to an interface definition that is too archaic and inflexible to meet the needs of the application.  The same is true of an Object-relational mapper (ORM). The conflicts just become more subtle and hard to eradicate if source control is not used.

Source Control allows developers more freedom to create branches of the codebase for testing or for releases, thereby making the role of the testers, configuration managers and releasers much easier. The alterations in the branches can then be merged back into the main trunk of development. This makes it easier for the team to work concurrently.

Traditional source control systems are not always an easy fit with databases, because of the diffuse nature of the code and the close interdependency of DDL and data. Problems have emerged when teams have assumed that database source code consists merely of a number of ‘objects’. A glance at the diagram of the interrelationships of the SQL Server 2008 system views will tell you why.  The dependencies within a working system are complex: for example, one needs only to change one item in the database configuration (e.g. the collation) to stop the database working.

However, source control brings with it obvious benefits

Protecting production systems from ‘uncontrolled’ changes and fraud.

Because of the ease with which alterations can be made to the code, structure, or configuration of a production database, there must be checks against ‘uncontrolled’ changes. The code that has been tested must be compared against the code that is released to make sure it is identical, and that the team knows about the release, can see the changes and have agreed to the release. It must be possible to check that no unplanned and untested fixes or functionality are made directly on the live system. If the code on the production system is identical to what is in source control, then it is easy to see who did what, and when it was done. Hopefully, also, it will tell you why it was done.

Coordinating and monitoring the development work

Everything that is to be released should be kept in source control so that there is a complete record of all changes that have been made to the live system. It must be possible to see which developer is working on which particular module, to see which changes have been applied and when they were applied, which modules are available for release, the current state of the production system and the current state of any test systems

Maintaining a history of changes made to the source

At all times, the scripts within the Source Control system  represent the master version of the source, and provide a historical archive of all the changes so that

  • The database structure can be rolled back to a previous version
  • Older versions can be created to find when a subtle bug was introduced.
  • A code review can check coding standards and  conventions
  • The reason for an obscure change done in the past can never be lost.
  • A series of changes done to a module for a temporary problem can be reversed out when no longer needed.

Supporting team-working

Source control is the most effective way of ensuring that each developer is always working on the latest version of a script for a given object (procedure, table etc)

Making more effective use of time and code

Where developments result in branches and merges, the complexity of database code quickly precludes the use of informal approaches to archiving source. Most commonly, branches are used for a release that requires a code-freeze so that development can continue. Merges are often required if a change is added to the release fork, (normally as a result of a bugfix) which weren’t also added to the development fork. It may also be necessary to develop two versions of the software at the same time. This could be where one version, the branch, has bugs fixed, but no new features, while the other version, the trunk is being worked on to apply new features.

Facilitating  testing

Source control also makes it easier to release code to the various stages of testing.  For example, the continuous-integration server must be able to build and update its own copy of the database, so that it can run automated tests of code and scripts that are checked in at the same time.

Source control clients

SSMS, the most commonly-used  IDE for developing SQL Server databases,  allows integrated source control within the Query Window. The client plugin varies according to the source control provider.  It is not supplied with SSMS but purchased separately.  After the Source Control  components are installed, they can be configured via the tools menu (Options ->Source Control-> Plug-in selection) which allows you to select a source control product.  The problem with this approach is that it does not enforce any restriction on the database, and so does not prevent alterations from being made outside source control.  SQL Source Control does not use this architecture as it aims for a much closer-integrated source control with SSMS

Types of Source Control Providers

 Although the most attention is generally given to the client that interacts with the Source control system in use, such as VisualSVN, TortoiseSVN or SQL Source Control, the most important part is the actual Source control system , such as SubVersion ,Vault,  GIT,  SourceSafe or TFS. These can use either the centralized or distributed model

Centralised Source Control Model

A large number of revision-control systems exist and are used for Development source Control. The majority are centralized, and based on the ‘library’ or ‘repository’ model, and based on CVS.  These use the concept of check-out and check-in and rely on file or module  locking to prevent clashes.

‘Check-out’ refers to the process of getting exclusive write-access to that module. Others can then read it but not alter it. ‘Check-in’  tells the source control system that you are relinquishing write access and are updating  the source so as to make your changes available to all co-workers. This process is usually ‘atomic’ in that it either succeeds in its entirety or is rolled-back.  It is easy to subvert this method of revision control.

 A more subtle approach uses version-merging, which attempts to deal with simultaneous edits by merging the results of the edits of different developers.  This works well, and creates less distraction for developers, until several people work on the same code, at which point it can easily result in inconsistent code that does not compile or run.

Distributed Source control

A distributed model no longer has the concept of a single ‘master’ source, and is used for large projects such as Linux and Mozilla where a number of developers need to co-work, sometimes in cases where network-contact isn’t permanent. As all co-workers keep a full copy of the project, it is very resilient.  It also allows developers a great deal of freedom to safely ‘sandbox’ their work. Most of these systems allow a centralised control of the ‘release version’ of a project if necessary.

 Distributed source control systems give each developer a local copy of the entire development history, and changes are copied from one such repository to another. These changes are imported as additional development branches, and can be merged in the same way as a locally developed branch. Merging in many of these systems can be extremely sophisticated, based on the histories of the common ancestors, or merit of previous changes, and users are informed if a merge is impossible.

Team Working Database Applications  with source control

Application developers are used to agreeing  up-front on standards  so that they can  work together on code effectively, without friction or misunderstandings. These standards usually include

  • Styles and structure for commenting and documenting  code
  • check in/check out from source control
  • Coding best-practices and the definition of ‘code smells’.
  • Sharing a consensus Object model
  • development tools and software, and when they are used
  • Naming conventions
  • Procedures for defining the features and fixes that will be included in each build of the application

Development teams will  have processes for developing against several versions of the code-base, testing builds, packaging builds for deployment to test and production, and for rolling back a build.

Experience has shown that attempts to couple the development of the database too closely to this paradigm become fraught.  It certainly can lead to application developers criticising, or interfering with, base tables that aren’t even accessible to the application, and with only a partial understanding of the performance issues. It can lead to bizarre naming conventions, and tortuous attempts to bend the relational model to the object-oriented world-view.

 Where tables, views or routines (TVFs or stored procedures) make up the application-interface, they not only need to be in source control but should be commented and documented to project standards.  Although a mechanism exists to attach comments to tables and other database objects for which no script is stored within the database, (extended properties)  it is rare to find it used properly. The trickiest problem has always been to make this documentation available for the application developers in visual studio who are using ASP.NET, in the form of intellisense, though Entity framework and Linq have made inroads on this problem.

Database Development Methodologies.

The most draconian approach to source control involves the use of a single build-script for the entire database or schema, and to use it for all DDL as well as DML. This means that there is only one source control ‘object’ to check in or out per schema. (a database can have as many schemas as you wish)   This allows permanent comments and comment-blocks anywhere, including tables, columns, and parameters. This ensures that build scripts will compile as the order is predetermined in the script.  This approach has serious disadvantages, though:  It allows only one person to work on a schema at a time, and it has to include the BCP routines for the insertion of test data since it will destroy all existing versions of tables as part of the script. This approach fits comfortably with the idea of the schema in SQL Server.

A more generally satisfactory approach for the larger team is to work at ‘object’ level, where scripts for individual objects can be checked in and out independently.  This allows work at a more granular level, but it is easily possible to make database changes via the object browser or in script that affect one or more ‘objects’.  The use of the term ‘object’ has a different connotation to the actual way that database entities are related. Columns, indexes and parameters are not ‘objects’ within the system views, whereas constraints are.  Build scripts can easily have different parts of a ‘table object’ built at different parts of the script. If database ‘objects’ are stored independently of a complete build script, then these will need to be combined into a build script. Unfortunately, the order of scripting is important, and so there is an added risk at the point of creating a  build. This has led to the practice of maintaining a complete build script in addition to the object scripts.

Many developments use a shared development server. This is often used where the test database has to run against large test data, sometimes even an obfuscated copy of the actual data. This can be made to work but the chances are not always good. It relies on structured headers and information in extended properties to keep a tally of the work in progress and the author of an alteration.  The entire database script is then checked into source control at regular intervals, as well as all occasions when the team are likely to require a consistent build.  The code must be put in source control as a series of table-scripts and routine-scripts, as a complete build script, or both.  Unless the developers are particularly orderly in their approach to Source Control, it is usually better to use the hybrid approach as described in the next section.

Development databases.

Database developers need to have a ‘private’ version of the current database for doing the bulk of their development work, but they will need to be able to access a shared database that is built from the definitive source in source control for testing with realistic data sets and creating test data, and for making make ‘uncontrolled’ experimental changes that are overwritten from the master source in source control. It should be impossible to check in source directly from this shared database

Each database developer should either have a local version of SQL Server on their own workstations, or have access to a ‘sandbox’ server with instances for each developer. This allows them to test out routines, database configurations, and the effects of different database/server properties on performance, even to the extent that it crashes a server, or locks up the database without affecting others.  Additionally, there ought to be a central database development cell accessible to all, which allows test data to be created and updated with releases. This environment is 'uncontrolled' in that developers can change it at will - but agreement is needed before making changes that affect structure and common routines. This should be refreshed from time to time to get rid of test data.

Updating the database from source control

Ideally, it should be possible to build the application, both the client-side and database, from the scripts that are in source control. This is essential if you are using continuous integration with your development, because automated builds, with unit testing, will be either triggered on every check-in of source code or at a set time. With a database, there are problems with automating this approach. For a start, databases are not normally part of a unified Check-in, and build,  process. More importantly, change or deployment scripts (also known as Migration scripts, or rollout scripts) will also be required to supplement the code that synchronizes the live database with what is in source control. If the database has been subject to any refactoring between the two versions, it is impossible to modify a database entirely automatically without scripting some of the intermediate stages. If tables have been changed and data has been rearranged, or modified, one cannot merely change the schema. The data must be changed as well, as it may have been modified as apert of the refactoring, or be in different tables under different constraints. The problems of migrating the data to fit the schema will depend on constraints, referential integrity and a number of other factors. Where, for example, data has been de-duplicated at the time of the imposition of a unique constraint, then that de-duplication script must be included.  If a VARCHAR column has been reduced in size, then some strings may need to be truncated. A rollback may require a restore from backup, or for the tables to be filled with data from a previously made BCP native output file, since migration scripts aren't normally reversible.

Any Database Source Control system must therefore include the special Change or deployment scripts to ensure that the current data in the database is preserved and does not cause constraints to fire. Not only must  these extra scripts be included but they must be executed at the correct point in the build.

Conclusion

The lack of suitable and simple tools for doing database Source control in the past has led to a number of problems with team working on database applications. Source Control must be perceived by the developers as a way of saving on administrative work and speeding development. It should never get in the way.

Now that suitable tools exist for all the IDEs that are used for developing SQL Server databases , and the quality of source control systems has improved greatly, we’re reaching the point where Source Control for database developments can be subject to a list of simple best-practices rather than being an endless source of friction within development teams, and pain for those people tasked with providing consistent builds for applications.



This article has been viewed 13361 times.
William Brewer

Author profile: William Brewer

William Brewer is a SQL Server developer who has worked as a Database consultant and Business Analyst for several Financial Services organisations in the City of London. True to his name, he is also an expert on real ale.

Search for other articles by William Brewer

Rate this article:   Avg rating: from a total of 31 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Tools, and distributed version control
Posted by: Corrin (not signed in)
Posted on: Monday, July 12, 2010 at 2:57 PM
Message: Firstly William, good article on an important topic. There's two points I'd like to raise, the first relates to your discussion of distributed version control and the second to tools.

One big issue I have with version control systems is they feel like they're designed for application development and so tend to treat branch management as an advanced feature while even tiny IT shops will run PROD, PREPROD, TEST and DEV database instances. Distributed systems make it extremely easy to promote changesets between branches and non distributed systems don't. This means that a core operation that is fundamental to database development is handled much better by distributed version control.

Having said that I don't use distributed source control for the simple reason that the quality of GUI integration is not there yet. I use a product called Source Control by Red Gate which is an extremely good SSMS add-in. Because this tool integrates so tightly with the database it makes version control part of my day to day working rather than a hassle.

I noticed you did not name any tools in your article, so I am curious if there are any other good tools out there - especially tools that support distributed version control.

Subject: Naming of tools
Posted by: WBrewer (view profile)
Posted on: Tuesday, July 13, 2010 at 2:40 AM
Message: I wanted to write a general article about the various issues of Database Source/version Control from a developer's viewpoint, rather than do a review of available tools. In reviewing existing practices, one would need to tackle the thorny topic of using SMO scripts directly with the Provider's directories, which, up to now, has been a widespread practice. (I used TortoiseSVN!)

I now run SQL Source Control against Subversion. It is a pretty obvious and uncontroversial choice if you want to stay with SSMS. I'm keen on eventually trying GIT though, if SQL Source Control decides to support it!




Subject: database source control
Posted by: fairwan (view profile)
Posted on: Tuesday, August 03, 2010 at 11:27 PM
Message: A good read on an important and often overlooked subject. I completely agree with the need for an object based source control strategy which is tied to a release process used by both application developers and database developers.

I am a SQL server develoment DBA / Developer. I developed a solution where we have a "production trunk" which exactly matches the production code and a "Development Branch". Changes are made & checked into the development branch.

A Release Builder utilty is used to select and order the release objects which are checked in against this task and create a backout -so that the release can be easily backed out at a later date.

A Deploy tool (written in c#), releases only SQL code which has been checked in against the release task (TFS work item). It releases from source control directly, in the order specified by the builder. It is transactional. It also merges the Development Branch to the Production Trunk on a production release (but not on releases to other environments). This keeps the Production Trunk in step with the production database.

This solution has been successfully used in a team of up 30 developers for over 2 years now. I think that Microsoft should introduce something like this into their suite of tools. So often the database development tools are 2nd fiddle to application development tools!

This solution works well for maintenance and for entirely new releases and has the advantage of only releasing what has been deliberately checked in and chosen, rather than just releasing the differential between a devlopement database and live : a highly dangerous approach (in my humble opinion), used by other soltions on the market!

Although this solution is heavily integrated with Team Foundation Server's source control, it could easily be apapted for any other source control soltuion and any other destination RDMS.

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
Converting String Data to XML and XML to String Data
 We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk