Checking program code into source control is a daily ritual for most developers, but versioning database code is less well-understood. Grant Fritchey argues that getting your databases under source control is not only vital for the stability of development and deployment, but it will make your life easier when something does go wrong.
Software development is a difficult and demanding discipline. It gets even more difficult when you bring teams of developers together on a single project. One of the fundamental issues arises around the code itself. Who is responsible for what piece of it? What changes have they made? How do you get those changes from one member of the team to the next? How do you keep a history of each change, in case it causes problems later that you need to track down and fix? The answer to all these questions, and many more, is to manage your code through a source control system.
Source control systems, also called version control systems (VCS) or revision control systems, date back to the birth of modern computing. One of the first be developed was in 1975 when there was finally enough disk space to store a second copy of the program, just in case. Since then, getting application code into source control has gone beyond being an industry standard practice to simply a part of programming, like writing a function or an IF clause. Yes, there are some shops or individual developers who don't use source control for their code, but those are the glaring exceptions that prove the almost universal rule that code goes into source control.
But databases are different.
The history of databases diverges from the history of code. At some point developers were no longer responsible for databases. Instead they moved into the hands of system administrators or dedicated database administrators, who, frankly, looked at the world quite a bit differently than developers. They spent time worrying about backups, availability, integrity and performance.
In many cases, database development work moved into the realm of the DBA. While many DBAs came from the ranks of developers, they spent more time worrying about all those administration tasks than the development tasks, and some of the best practices and methods created for managing code just weren't applied to databases. But they should be.
DBAs are very good at putting in place backup schemes that will protect the production data. However, when working with the code of a database, and the SQL that defines data structures and stored procedures is nothing but code, a full database backup is an unwieldy device by which to maintain copies of the schema, for retrieval of changes and historical tracking. For example, to find out what changed between the previous and current versions of a stored procedure a DBA would be forced to use a third party tool that could directly compare to a backup, or to run a full restore of the database to a secondary location, and then extract the stored procedure definition. This isn't always possible, it's frequently impractical and it's certainly going to be slow.
Getting a database into source control provides a much more efficient mechanism for backing up the SQL code for your database. Retrieving a previous version of a stored procedure from entails simply inspecting the history of changes within your VCS. Retrieval is nearly instantaneous.
Once you realize that your SQL is code, it immediately makes sense to use the same backup mechanisms that code uses, which is a VCS.
Within most database management systems, it's possible to find out when an object was created or last modified, and which login performed that action. However, there is usually no historical record of any previous modifications to that object. Further, depending on the security mechanism within the database, you may simply see that a system administrator or database owner made the change, with no indication as to the actual identity of the person working within that role.
If you have your database in a VCS, and use that VCS as a fundamental part of your development and deployment mechanisms, then it will provide exactly that type of tracking. All changes originate in the VCS and are not made directly against the production system outside the process around your VCS. You'll know who made what change and when it was made.
Many organizations have to comply with legal requirements for change auditing, such as those mandated by Sarbanes-Oxley. Implementing a VCS could be the quickest and easiest way to provide the required level of historical tracking of all changes so that for every change to the database you know who did it and when.
As soon as we enter a new code file into the VCS, it assigns it a version. Each time we commit a change to that file, the version increments, and we have access to the current version and all previous versions of the file. When we put a database into the VCS, this means that every database object (table, view stored procedure and so on) in the VCS has a version number. We can also create labels, or tags, that allow us to assign meaningful "build number" to the set of files that comprise a particular version of a database.
Furthermore, having the database in source control directly alongside the application will integrate the database changes with the application code changes, so that you'll always know that the version of the database being deployed directly corresponds to the version of the application being deployed. This direct integration helps to ensure better coordination between teams and it can help when troubleshooting issues.
If all changes needed for a production system are in a development database somewhere instead of inside a VCS, deployments are necessarily going to be a manual affair. You will need a process that will generate changes from your development database in order to make the production database mirror the newer design. There are third party products that can help, but how do you differentiate between objects that are meant to go out with one version and objects that are meant to go out with a different version of your code? From within a development database this is frequently impossible since there is no clear and easy methods for differentiating object changes within that database.
Once you start generating your deployment scripts from source control, a number of opportunities open up. You'll be able to differentiate the database objects into known versions which will allow you to control what is getting deployed. Once you can control what is being deployed, you can bring automation to bear on the deployment process. You'll be able to take advantage of continuous integration and other automated deployment and testing mechanisms that application code already uses. Automated deployments also means more testing and validation of those deployments which can help to ensure the final deployment to production is successful.
Since the SQL that defines a database is code, it just makes sense to take advantage of the existing and long-established mechanisms for managing that code. You'll have a better way to backup that code, and, more importantly, retrieve previous versions of that code. Source control for the database provides an audit trail to help with troubleshooting and legal compliance. You'll get better integration with your application code through shared source management. Finally you'll be able to automate your deployments. All these reasons makes putting databases into source control a smart move to help improve management of systems within your organization.