Click here to monitor SSC

Tony Davis

Simple-Talk Editor
News, views and good brews

Sweet and Sour Source Control

Published Thursday, May 27, 2010 3:37 PM

Most database developers don't use Source Control. A recent anonymous poll on SQL Server Central asked its readers "Which Version Control system do you currently use to store you database scripts?" The winner, with almost 30% of the vote was...none: "We don't use source control for database scripts". In second place with almost 28% of the vote was Microsoft's VSS. VSS? Given its reputation for being buggy, unstable and lacking most of the basic features required of a proper source control system, answering VSS is really just another way of saying "I don't use Source Control".

At first glance, it's a surprising thought. You wonder how database developers can work in a team and find out what changed, when the system worked before but is now broken; to work out what happened to their changes that now seem to have vanished; to roll-back a mistake quickly so that the rest of the team have a functioning build; to find instantly whether a suspect change has been deployed to production.

Unfortunately, the survey didn't ask about the scale of the database development, and correlate the two questions. If there is only one database developer within a schema, who has an automated approach to regular generation of build scripts, then the need for a formal source control system is questionable. After all, a database stores far more about its metadata than a traditional compiled application.

However, what is meat for a small development is poison for a team-based development. Here, we need a form of Source Control that can reconcile simultaneous changes, store the history of changes, derive versions and builds and that can cope with forks and merges. The problem comes when one borrows a solution that was designed for conventional programming. A database is not thought of as a "file", but a vast, interdependent and intricate matrix of tables, indexes, constraints, triggers, enumerations, static data and so on, all subtly interconnected. It is an awkward fit.

Subversion with its support for merges and forks, and the tolerance of different work practices, can be made to work well, if used carefully. It has a standards-based architecture that allows it to be used on all platforms such as Windows Mac, and Linux. In the words of Erland Sommerskog, developers should "just do it". What's in a database is akin to a "binary file", and the developer must work only from the file. You check out the file, edit it, and save it to disk to compile it. Dependencies are validated at this point and if you've broken anything (e.g. you renamed a column and broke all the objects that reference the column), you'll find out about it right away, and you'll be forced to fix it.

Nevertheless, for many this is an alien way of working with SQL Server. Subversion is the powerhouse, not the GUI. It doesn't work seamlessly with your existing IDE, and that usually means SSMS.

So the question then becomes more subtle. Would developers be less reluctant to use a fully-featured source (revision) control system for a team database development if they had a turn-key, reliable system that fitted in with their existing work-practices? I'd love to hear what you think.

Cheers,

Tony.

Comments

 

RossPatterson said:

"The problem comes when one borrows a solution that was designed for conventional programming. A database is not thought of as a "file", but a vast, interdependent and intricate matrix of tables, indexes, constraints, triggers, enumerations, static data and so on, all subtly interconnected. It is an awkward fit."

Surely you jest.  A program is also not a file, but a vast, interdependent and intricate matrix (in the non-mathematical sense) of functions, variables, data structures, and so on, ...  But some how programmers have been taught to successfully manage the source control aspects of this without resorting to the VSS "lock it and don't share" style of operation.  Of course, they've been building their matrices of "stuff" in a line-focused mode for over 50 years, and even Visual Studio hasn't managed to hide that completely (although Visual Basic tried).  Perhaps DBAs ought to look to their own history, and see what their predecessors did before they became so GUI-focused.  I remember when schemata were managed as card decks, with the ultimate line-replacement source control - human hands.
May 27, 2010 12:24 PM
 

RossPatterson said:

"Would developers be less reluctant to use a fully-featured source (revision) control system for a team database development if they had a turn-key, reliable system that fitted in with their existing work-practices?"

Heck yes.  In the programming world, the tipping point for revision control came when the IDEs started making it dirt simple to use.  Prior to that, one had to be disciplined to do it well.  

Isn't that what Visual Studio Team System Database Edition is supposed to be?
May 27, 2010 12:32 PM
 

Ross Presser said:

During early development, we keep entire database backups in svn. We also keep stored procedure CREATEs in SQL files which are checked into SVN.  We tried keeping CREATE TABLE schemas in SVN too but it got cumbersome, and since we've got the database backup we can accomplish what we need to.

As the database acquires lots of data and the backup gets huge, this will be less practical ...

As for "Subversion is the powerhouse, not the GUI", I don't see what you mean.  TortoiseSVN integrates smoothly into Windows Explorer; Ankhsvn integrates into Visual Studio.
May 27, 2010 2:04 PM
 

retracement said:

Teams ignore source control at their peril. One of the biggest problems I find even when they do use it is the way in which they string together the scripts. For instance simply having a create script for each object is actually pretty cumbersome and very little thought is usually given to the way in which these might be checked out and (re) deployed. The last thing that should be done is the removal of an object in order to replace it. On the other side of the coin, if an object does not exist, then an Alter based script would not work. Secondly if the scripts are deployed out of sync then their dependencies might cause them to fail.
May 27, 2010 3:34 PM
 

timothyawiseman@gmail.com said:

You make several points I would like to address if I may.

First, I would strongly advise against anyone doing any serious development work without some form of version control.  Even for the lone wolf developer, version control provides a system to recover from mistakes and see how the software has evolved.  I even use source control on most of my non-programming writing so I can see how things change as I go through edits and be able to easily retrieve old wordings if I decide later I like them better.  Of course, lone wolf developers may be able to use simpler systems (even just numbering or date-stamping their filenames as they make tweaks), but that is not nothing.

As to Visual Source Safe (VSS), I believe you are being too harsh.  Yes, there are far better out there now.  I have not had the chance to try it, but I have heard Team Foundation Server is much better for small teams.  Personally, I use Mercurial with TortoiseHG for my personal projects at home.  But I do use VSS at the office, and it gets the job done well for a small team and is extremely simple to learn.  I have taught novice programmers the basics in just a few minutes and it is simple enough that even non-programmers here use it to track word documents that go through many revisions.  It also does integrate with SSMS 2008 in a limited but effective and useful manner.  It is certainly an aging program that has been supplanted, but it still gets the job done quite nicely and is a far cry from nothing.

I shall also echo Ross Patterson.  All but the simplest of programs are not contained in a single stand alone file.  They often contain multiple files of source code written for that project along with the graphics they need and they may include other media or data files essential for the programs working.  And that of course is before mentioning the libraries which may or may not be part of the core libraries for the language.  The only difference is that programmers are accustomed to seeing all the related files in a program as files while a database is normally seen as a more integrated whole.  

Of course, a database does not need to be seen as an integrated whole, at least not from a development stand point.  I know this is something of a shop-by-shop issue, but I generally give every function, trigger, view, and procedure its own file along with another file to generate the tables and their constraints and relationships and possibly another file to populate some static data in the appropriate tables.  It is these files I track in source control and only in production does the entity appear as an integrated whole, which is much as a program looks to the end user after it is compiled for that matter.

As for subversion, it is a fantastic program that deserves the highest praise.  But it is not user friendly for a non-programmer or even for some inexperienced programmers.  Even with front ends such as TortoiseSVN I could not easily teach the non-programmers I work with to use it for documents as I have with VSS.  As you imply in your article, it does not synch with many IDEs even with add ons and in particular does not synch with SSMS, the most common IDE by far for SQL.  I am eagerly looking forward to Red Gate's SQL Source Control to help with that.  I am hoping that will overcome the reluctance of some database programmers to use source control.  After it leaves beta, it may actually convince me to migrate from Mercurial and to petition my company to move away from VSS, though personally I would prefer if it began support Mercurial instead.  Until then, I find the integration between VSS and SSMS2008 adequate and it only takes a couple extra mouse-clicks to commit my changes to Mercurial outside of SSMS after saving the code as a file.
May 27, 2010 4:57 PM
 

DamianM said:

At last an article which recognises the difference between application and database when it comes to source control. I'm pleased to see the recognition that a database is more of a matrix than a single entity. I'd like to add two other features which distinguish databases from applications in this regard.
Firstly and most critically, a database becomes a living organism as soon as you add DATA. From this point you can no longer simply "build from scratch" as you can with applications, but rather have to consider how to get from whatever is the current state to your desired endpoint. Imagine if Windows was not simply rolled out each time, but had instead to upgrade from whatever version was installed in the existing system.
The second difference is that most source control systems don't adequately reflect the reality of multiple databases (on the same design but with different data), and the separation between production and development environments. The idea that you "check out from the database, modify, and re-deploy" may work fine in a single system development database, but is less valid when you have to provide the same tested deployment mechanism to multiple production systems - none of which you or your turnkey tool has access to. Another common problem is database size, in that a table restructuring can work fine on a dev database but blow logs in production, so you need to give production dbas a mechanism for keeping the logs in control.

My own philosophy on this is this:
(i) separate code and structural elements of the database, in that the former can be deployed at will while the latter requires consideration and protection of the existing data.
(ii) stamp the code elements with a version (using keyword expansion - a feature that some source control systems have chosen to eliminate) so that there is some way in the database of determining what is deployed. Unlike applications, where a single resultant version number explicitly represents the build, the multiple objects in a database cannot be identified collectively, and my solution is to reference the individual versions to a build inventory.
(iii) update the structural aspects by means of creating a template database from source control, and comparing this with the target using something like SQL Compare (we have used an inhouse compare tool too).
June 1, 2010 5:23 AM
 

andrewbrown said:

In environments like rails, all DDL and test data generation is script-driven.  This naturally encourages proper storage in source control, because there is total parity between Rails source code and database code.  It all gets generated into a common and well-defined database structure, and the Rails tools used to create updates and test code make sure that the current state of database code matches the database state used during unit testing.  

Thus, minimal extra discipline is required on the developer's part to keep the database and related code in sync in a source control system.

I'm aware of no other development context where this is achieved as well.  VS Team System Database Edition makes it possible to generate database scripts, but it is completely up to the developer to (re)generate scripts at the correct points, to set up tests that are matched against a database built from these scripts, and to bind all of this to source control.  A tremendous amount of discipline is needed across the entire development team.

Integration between tools like SSMS and source control systems is a start, but in environments where database development is intertwined with code development, I think it is insufficient.  Until the database/code development process naturally generates artifacts into a common project structure that is in sync with the edit/compile/debug cycle, databases in source control are doomed to be out of sync with a project's code.
June 1, 2010 5:58 AM
 

marsanyi said:

Yes, source code control is hugely valuable.  Yes, data ("master" tables, test data, etc) needs to be part of the picture.  Rails does it well; so does Django/Python.  This was one of the reasons I purchased SQL Compare, the script generation being a nice way to generate code that could be versioned in SVN.  Django and others handle (small) data fixtures by having simple ways to generate/consume text representations of the contents of a database, as for example json or xml, which can be similarly versioned and compared, and simple ways to populate tables from code when the corresponding object structures are initialized.  Between these facilities and the tools available for SVN on different platforms, I don't feel a need for much more, but I miss them when I work in the SQL Server/MS domain.
June 1, 2010 2:52 PM
 

IowaWebDave said:

Your editorial was very timely because I was just hit with a "snafu" regarding stored procedures and source control...  

We use Team Foundations Server with Visual Studio 2005.  I was making a minor update to a project which included the stored procedures and - mistakingly - thought I could just update the SPs from Visual Studio and move them up.  After getting them moved into production, we discovered that I had lost a change made previously.  

No problem, we looked up the documentation for that change and I made an urgent production update to the SP and all was right with the world - or was it?  Actually, this same exact thing played out 2 more times until our documentation no longer contained the change that was obviously lost.  

It was at this point that I asked a DBA to restore a copy of the SP prior to my overwriting it in production.  So much for a nice simple change!!! ;-D

So, with Visual Studio, Team Foundation Server, the Database Project, and the handy "Run on ..." option for Creating/Altering stored procedures directly on the database, it is definitely easier to use source control for database "stuff", but it isn't foolproof.  As RossPatterson said above, if it's fully integrated, we're going to be more likely to use it!

What could make it better?  Here are some of my suggestions:
  - somehow be able to "turn off"/restrict creating/altering SPs any other way than via "the source control way" (in lieu of that, zap people with an electric charge - not too big of one, maybe like a strong static shock - if they don't do it the right way! ;-D)
  - allow an easy way - like a right-click menu option - to compare the database artifacts with WHAT'S IN THE DATABASE, instead of (/in addition to?) what's in source control (I don't want to have to go script out a stored procedure to a file and then open up my favorite text compare tool)

My 2 cents!
June 3, 2010 3:32 PM
You need to sign in to comment on this blog
<May 2010>
SuMoTuWeThFrSa
2526272829301
2345678
9101112131415
16171819202122
23242526272829
303112345
How to Kill a Company in One Step or Save it in Three
 The majority of companies that suffer a major data loss subsequently go out of business. David Wesley... Read more...

Migrating from OCS 2007 R2 to Lync: Part 4
 Having migrated the rest of our users and legacy resources across, and start getting ready to... Read more...

Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Seth Godin: Big in the IT Business
 Seth Godin has transformed our understanding of marketing in IT. He invented the concept of 'permission... Read more...

Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...