Click here to monitor SSC
  • Av rating:
  • Total votes: 27
  • Total comments: 15
Troy Hunt

The unnecessary evil of the shared development database

23 March 2011

One of the greatest pain-points in developing a database-driven application happens when the application is in source control,but the database isn't. When the development database is shared, the pain increases, and it is not alleviated by source control alone. Troy Hunt spells out why each database developer must have their own version of the database.

I have memories from the late 1990s of building classic ASP apps in VB script, using Dreamweaver, side by side with my fellow developers, working on the same set of files using the same mapped path. You had to talk a lot; “Have you closed the CSS file? I need to add a class”. And I remember the painful ‘find and replace’ process, safe to execute only once every developer had saved their work and closed all their files. We huddled around shared drives mapped to the same UNC path and recklessly worked on the same set of files before firing them up in the browser right off the same server.

Back then, there was source control of a kind, by which I mean Visual Source Safe, CVS or even Rational Clear Case: But normally it would be the classic pattern of selecting the root folder of the app, and then going “CTRL-C –> CTRL-V –> Rename” with the date appended at key points in the project.

In hindsight, there were probably better ways of doing it a dozen years back; but the practices were reasonably common. Today, no one in their right mind would consider building apps this way. So why are so many people still using these same methods to build the databases behind their web applications? Is the data layer really that special that it needs to be approached entirely differently? Don’t those hard-learned lessons from the last century apply to database development?

What are we talking about here?

Just so there’s no confusion, let’s try to be clear what we’re talking about. When using the shared database development model, the developers build the web app layer in the usual fashion (Visual Studio running locally, files stored on the developer’s PC) yet all connecting to a remote database server and working directly on the same DB. It looks something like this:

Usually they’d also be working with SQL Management Studio running locally and connected to the remote database server. When the app actually runs locally, all data connections are going out to the shared, central development database. There is no local instance of SQL Server on the developers’ machines.

The alternative, of course, is dedicated development databases. Things now look a little bit different:

Obviously each developer has their own version of the database but the biggest difference to the earlier model is the presence of a version control system. Why?

The problems of the Shared Development database

The “last writer wins” problem

The obvious problem with collectively working on a shared database is that, when it comes to the problem of multiple developers working on the same object at the same time, it’s a case of “last writer wins”. No merge, no management of editing-conflicts, just the last guy getting his way.

In order to mitigate the risk of this happening, you have to implement social mechanisms to work around the problem. So you’re back to developers communicating backwards and forwards in an attempt to keep their fellow coders out of their work in progress. It’s clumsy, it’s labour intensive and it will fail. It’s just a matter of time.

The “experimentation” problem

An essential part of software development is experimentation. Unless you’re a true believer in the waterfall mantra of “design then build” and two shall only ever progress in that sequence, you’re inevitably going to build some code that sucks and consequently change direction. This is healthy, it’s part of the continuous improvement cycle we all implicitly exercise every time we build software.

What’s not healthy is to unintentionally impede the work of the other developers in the team by exposing the consequences of your experimentation to them. It could happen in all manner of ways: It could, for instance, be that your work breaks their code by removing dependencies, or by unexpectedly changing the result of their ORM layer generation by adding objects. Experimentation should be about you trying different approaches and not forcing it upon your team. In short, you need to play in your own sandbox; a sandbox that spans each layer of the application.

Martin Fowler has a nice summary in his article about Evolutionary Database Design

Evolutionary design recognizes that people learn by trying things out. In programming terms, developers experiment with how to implement a certain feature and may make a few attempts before settling down to a preferred alternative. Database design can be like that too. As a result, it's important for each developer to have their own sandbox where they can experiment, and thereby prevent their changes from affecting anyone else.

To develop code, you need to be able to experiment alone, so you must insist selfishly on your own play-space.

The “unpredictable data state” problem

Whilst the problems of “last writer wins” can be largely ameliorated by such strategies as segmenting the tasks or announcing changes before making them, the unpredictable state of the data is a whole new level of trouble.

The problem is simply that databases tend to change as the client applications are used. I don’t so much mean the changes at the object level as much as the data within the tables. It’s the whole reason we’ve got the things in the first place; so that we may manipulate and persist the state of the data.

Let’s take a typical example. We’ve got an app with an authentication module and administration layer and each of the developers have created accounts which they can test with. But the guy building the administration layer wants to test how the app behaves with no accounts in it. Now we’ve got a problem because deleting the existing accounts is likely to hinder the other team members.

As things get more complex, the problems worsen. Applications that are highly dependent on the state of data, for example, become a nightmare when you simply can’t control how it’s being changed. Unpredictability is not your friend when it comes to building software.

The “unstable integration tests” problem

If you have complex data dependencies when the time comes to do integration tests, it is essential to have a predictable, stable set of data to run against. Otherwise, with a constantly changing set of data, you can abandon all hope of a function returning a predictable set of data. You’ll never achieve this if the rest of the team is continuously evolving both the schema and the data.

The “objects missing in VCS” problem

You will never get continuous codebase integration with the work of your team if every change is made centrally without source control, and is immediately available to everyone. What’s the motivation to commit your code?

Of course what’s really happening here is that the shared model is simply allowing a bad practice to creep in without any repercussions. If changes have to be committed, then the benefits of Version Control will become apparent. Dedicated development databases nurture good VCS practice.

The “disconnected commuter” problem

By using a shared development database, you are forced to be connected to the network at all times. If you want to take work with you and slip some development-time into the train journey, or if you want to work through the weekend in the comfort of your own home, you’re only going to have half the solution available. Yes, there are often VPN solutions or other means of remote access but you’re starting to increase the friction of working productively.

The same problem extends to working effectively with those outside the network segment which contains the shared database. Want to send work out to a partner for a few weeks? Sure, you can do that by backing up the DB and sending it out but the chances are you’re going to have issues integrating back into the trunk of development if the project is really not geared for distributed database development.

The project has been tightly coupled to a single internal implementation of a database server and this is always going to result in more difficulties further down the line.

The “sysadmin” problem

One of the problems with development in any server-based environment, including a database server, is that there are times when elevated privileges are required. When this environment is shared, there are potential consequences well beyond the scope of a single application.

Here’s a case in point; I recently suggested to a developer that their performance tuning could benefit from some SQL Server Profiler analysis to take a closer look at things like reads and writes. This particular case involved a shared database so the next thing that happens is I get an email back with an image like this:

Frankly, I don’t want the guy to be sysadmin on a box that may contain totally unrelated databases to which he probably shouldn’t have access . I could give him ALTER TRACE permissions (and ultimately, I did), but of course this has to be set at the master database level so now he has the right to inspect every query across every database.

This discussion would never have even taken place in the dedicated local database scenario. He would have simply already had the rights and it would have been dealt with locally. There are plenty of similar occasions where the rights a developer needs to do their job exceed what should be granted in a shared environment.

The “unexpected performance profiling” problem

Continuing from the previous point, performance-profiling in a shared environment where you have no control over the other processes running on the machine is an absolute nightmare. That query which takes 20 seconds to run in one test-run can easily blow out to 50 seconds a few moments later. Why? You have no idea.

Whilst it’s always a bit tricky getting consistent results from any sort of performance profiling, the worst thing that can happen in the midst of this is other processes getting in your way. When you’re doing this locally, you have both visibility and control over these processes.

Of course there are cases, such as where huge volumes of data which would normally be queried on serious servers, where performance profiling on a PC is not going to yield constructive results. However, for the majority of performance-tuning tasks, a developer needs a predictable environment more than anything else.

The “non-representational latency” problem

So let’s say you’re working on a shared database which is almost inevitably located on a server in a data centre. Where exactly is that? How many milliseconds of latency are being added to each round trip?

The problem is that in a world of data centre consolidation, you’re quite possibly going to be loading up a whole heap of additional latency to each ADO.NET connection which isn’t going to represent the target live environment. I’m guessing you don’t have the same sort of gigabit-Ethernet connectivity from your PC as the production web application server will have, and that creates a little bit of a problem.

It’s a problem in that the application performance in the development environment is going to be comparatively slow. The degree of sluggishness will depend on the latency and the amount of activity over the wire but, for example, 150ms to a remote SQL server coupled with a chatty application is not going to make for a very accurate representation of real world app performance.

The “my development machine is slow” problem

Of course the “problem” with developing databases locally is that it is necessary to run SQL Server on the PC. I say “problem” in quotes because the issue is not so much that SQL Server is asking too much of the machine, it’s that developers are all too frequently given slow, underspecified, PCs.

If developers are given machines which struggle to concurrently run Visual Studio, SQL Server and the usual “business as usual” tools (Outlook, Word, etc.), there is a bigger underlying problem: Developers are not cheap. In Australia, your average developer is costing about $90k a year. There are then a whole bunch of other costs to contend with such as floor space, equipment (other than the PC), operating expenses (such as payroll) and on and on and on. Conservatively call it $100k annually or around $420 for each day they work. On the other hand, fast PCs are cheap. I recently replaced an aging laptop and the difference in price between a run of the mill machine designed for the desk jockey who lives in the Office productivity suite and an 8GB, SSD, i7 machine was $0.60 a day over a three year lifespan. Put it this way – if you fit the $90k/y bill and you’ve read this far (say 10 minutes), you’ve just consumed three weeks’ worth of super-fast machine upgrade, based on your hourly rate and the time cost of reading this post. Enough said.

And yes, yes, I know developers and costs are a lot cheaper in other countries. So let’s assume only $25k annually; you’re still looking at over $100 a day for these guys and a $0.60 cost to fundamentally improve their productivity. If you need to debate the mathematics of this with anyone, it’s probably time to have a good hard look at how conducive the environment is to having a productive, fulfilling role; for some essential further reading, check out Jeff Atwood’s Programmer's Bill of Rights.

Getting to the solution of dedicated development databases

The “SQL Server Developer Edition” solution

Microsoft provides the perfect means of developing DBs locally in the SQL Server Developer Edition. This is effectively a full blown Enterprise edition licensed for non-production use. Chances are you already have a license if you have an MSDN subscription but even if you don’t, it’s dirt cheap.

Installed locally, it can easily be configured so the service doesn’t start automatically if you’re really worried about it dragging down the performance of your PC when you’re not even using it:

But having said that, the resource usage is actually pretty small unless you’re seriously pounding it. Mine is sitting there consuming only 340MB of memory (about 4% of what’s on the machine) and 0.4% of CPU. So unless you’re running under-specced hardware (again, this is reflective of a deeper problem), the performance impact shouldn’t even be noticeable.

The “script it all” solution

One great thing about decentralising your development database is that it forces you to script a “ready state” of data. Versioning of database objects is one thing, and it’s obviously essential, but we all know that most applications won’t play nice if they start out with zero records in them. All sorts of reference data is usually required to initialise a database so the problem now becomes how you get it in there.

If you need to initialise the state of the database via SQL scripts, you are forced to think clearly about what your application needs to function. You need to think through the purpose of each table and what it needs to contain in order to achieve that “ready state”, rather than just organically growing reference records as required.

The other big bonus is that this script then goes into source control. It gets versioned along with the DB objects and persists in VCS for perpetuity.

Finally, scripts are fantastic for automation. It means that at any time you can pull a revision from VCS and have a clean, repeatable installation of the application. Tie that into a continuous integration environment and you now have one click deployment of the entire app.

The “this will damn well force you to do it right” solution

By working on dedicated local databases, the developer is forced into a number of good practices which could otherwise be circumvented in the shared world. The obvious one is source control; if you’re not versioning your database objects and reference data, you’re doing it wrong. You simply won’t be able to get away with it any more if the only way your colleagues can get the changes is via VCS.

So as to work effectively and not break builds, work needs to modularised and committed atomically. You can no longer get away with randomly changing unrelated parts of the application; otherwise you begin “breaking the build” for others, something which is generally not received very positively by your peers. This is a good thing; it forces more thoughtful design and conscious completion of tasks.

And of course you can’t get away with running SQL Server on that tired, cheap, PC with a single GB of RAM and an old 5,000 RPM disk. You actually have to get a half decent machine – I mean one that is actually suitable for building software on!

So you see the whole shared development database model can disguise the use of those practices you might not be doing properly to begin with. Working autonomously on a local DB becomes a self-perpetuating cycle of practice improvement as it simply won't let you get away with taking nasty shortcuts.

Summary

If you’re using a shared development database, the chances are that you’ve simply inherited the practice. Take a good look around; are you really working this way because it’s the most effective possible way of building software? In times gone by, it wasn’t easy to ‘version-control’ databases, but we’ve now got tools at our disposal to do it.

In terms of .NET, there’s obviously the official Microsoft Team Foundation Route but there are also offerings from third parties such as Red Gate’s SQL Source Control. Around the middle of last year I wrote about Rocking your SQL Source Control world with Red Gate and then Foolproof Atomic Versioning of Applications a little after that, both of which go into detail about the importance and value of versioning your databases. So I won’t repeat the message here. Just make sure you’re doing it, ok?

Developing locally on dedicated databases is not only better for the process of database development, it’s better for configuration, which means better for deployment. It’s also better for development processes in general, such as experimentation, modularisation of work. It solves all sorts of other problems which are engendered by the communal DB model.

So really, what’s stopping you?

Troy Hunt

Author profile:

Software architect and Microsoft MVP, Troy has spent the last 15 years building web applications in the finance, media and healthcare industries. Based out of Sydney Australia, he now spends his days (and frequently nights), working as an architect for Pfizer Pharmaceuticals’ Emerging Markets. Troy’s software interests focus on enabling colleagues and partners to be productive in delivering high quality applications within proven frameworks. He regularly blogs about application security, improving the software development process and all things technology related at troyhunt.com.

Search for other articles by Troy Hunt

Rate this article:   Avg rating: from a total of 27 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Nice Article!
Posted by: Anonymous (not signed in)
Posted on: Monday, March 28, 2011 at 7:33 AM
Message: Troy you hit a major problem in lots of shops! Very great article, well done! Now us database folks need to push and present this type of information to our groups and make things right.

Subject: Spot on!
Posted by: Adam Aspin (not signed in)
Posted on: Monday, April 04, 2011 at 2:27 AM
Message: Very clear analysis - thank you. I will now use all of this to keep trying to convince those further up the food chain...

Subject: Impediment - complex architecture
Posted by: David K Allen (view profile)
Posted on: Monday, April 04, 2011 at 6:29 AM
Message: You did a great job of describing the ideal. In your scenario, the architecture is simple: a custom application talks to a custom database. We tried to do this but could not. The reason was that our architecture is too complicated. We are writing code that gets embedded in a CRM server. But the code also runs against several other legacy systems. The total configuration in a development environment is anywhere from 20-30 servers, many of which are running COTS software. An "ideal" solution would be to use our amazing VMWare Lab Manager to clone the Development environment and give one to each developer. But each multi-server environment requires roughly one VERY beefy server, and substantial storage, which is expensive. Though after reading your article, I would be inclined to get our finance people to help us estimate the economic value of spending $20-40K in hardware for each developer in return for earlier detection of defects and less self-induced friction.

Subject: What about a Development DBA?
Posted by: Anonymous (not signed in)
Posted on: Monday, April 04, 2011 at 7:11 AM
Message: It's nice to see we are not the only ones who had these issues. We solved the problem by using task requests in Outlook, then the Development DBA manages the tasks. Works well but may be a little too time consuming.

Subject: Just curious
Posted by: Bryant (view profile)
Posted on: Monday, April 04, 2011 at 7:16 AM
Message: How do you ensure that every desktop has the same configuration as the target environment?

How does a local database solve the disconnected commuter problem when that disconnected commuter is also disconnected from VCS?

Subject: The deploy to user schema on shared database solution
Posted by: Eric Russell (view profile)
Posted on: Monday, April 04, 2011 at 7:47 AM
Message: First, in my environment there are typically only one or two database developers on any given project, and each database object is scripted and maintained in TFS or VSS. It's been years since I've worked on a project where there are a dozen developers all editing the same database scripts, and I can recall the headaches it can cause.
In the past, I've experimented with developing and unit testing on my own local copy of the database in Express or Developer edition, but it seems I was spending more time than I would like maintaining it and keeping it synchronized. I still do that on occasion, like when I'm making a major version change to the data model that would require a large deployment involving many objects. However, what I typically do instead is deploy my recently updated database objects to a different schema on the shared database. For example, I'll first deploy my updated procedure to something like erussell.p_calllog_iu where I can unit test it with real data and the latest version of related objects. Once the unit test is complete, I'll then deploy it to dbo.p_calllog_u where it becomes integrated with the application.

Subject: Large DB's
Posted by: MikeC (not signed in)
Posted on: Monday, April 04, 2011 at 7:48 AM
Message: We do mostly database development and our db is almost 200 GB. How does each developer have a copy of such a large db?

Subject: Large DB's
Posted by: MarkP (view profile)
Posted on: Monday, April 04, 2011 at 8:14 AM
Message: Interesting point made by MikeC about Large DB's, I would very much like to see an answer to that question.

Subject: Re: Large DBs
Posted by: Phil Factor (view profile)
Posted on: Monday, April 04, 2011 at 8:27 AM
Message: I must admit I much prefer to have the individual developer databases hosted on a 'sandbox' database server. Grant Fritchey has done pretty well to describe how to shoe-horn an apparently impossible number of these onto a server.
http://www.simple-talk.com/sql/sql-tools/supporting-large-scale-team-development/

Subject: Point of view
Posted by: JasonR (view profile)
Posted on: Monday, April 04, 2011 at 8:50 AM
Message: It has been my experience that when more than one developer works in the same database they end up affecting each other’s unit tests and wasting too much time trying to figure out why they aren't getting the expected results.

Subject: Point of view
Posted by: JasonR (view profile)
Posted on: Monday, April 04, 2011 at 9:11 AM
Message: It has been my experience that when more than one developer works in the same database they end up affecting each other’s unit tests and wasting too much time trying to figure out why they aren't getting the expected results.

Subject: The Evil is Subjective
Posted by: James (view profile)
Posted on: Monday, April 04, 2011 at 9:13 AM
Message: It's been my experience that whether a shared development database is practical depends on a number of factors. Quality of team communication, number of developers, complexity of the environment, how source code is controlled, etc. all play into this. While you make some good points, to call a shared development database an "unnecessary evil" seems a knee jerk reaction to me. Like any other tool, whether or not to use a shared devlopment database should be evaluated on its merits, or lack thereof, and how it relates to the overall environment.

Subject: What about a Development DBA?
Posted by: Anonymous (not signed in)
Posted on: Monday, April 04, 2011 at 9:37 AM
Message: It's nice to see we are not the only ones who had these issues. We solved the problem by using task requests in Outlook, then the Development DBA manages the tasks. Works well but may be a little too time consuming.

Subject: Answers
Posted by: troyhunt (view profile)
Posted on: Monday, April 04, 2011 at 6:17 PM
Message: Thanks for all the feedback everyone and for making this the most popular article in the last newsletter :)

This is not a "one size fits all" approach - this sort of thing never is. There will be times where logistics such as other dependencies will simply make this approach infeasible. It is, however, a sound practice in a significant portion of projects which would otherwise share a central DB for development.

@Bryant - In terms of ensuring each development configuration is identical, it's really no different to the same challenge faced in the app tier and the simple answer is "you can't". Of course you can mitigate the potential for configuration differences by having a standard desktop image but even within strict controls, you always end up with deviations. But of course this is why integration and test environments are so important so that in the (hopefully rare) case that a developer has a configuration which is unique to them and not able to be checked into source control, it should be caught very early in the development lifecycle. And as for commuters and VCS, just take the latest revision before heading out - you don't need to remain connected.

@MikeC - 200GB is large enough to suggest there's probably quite a bit of data in there, right? It's another somewhat philosophical discussion but if this is the case, you really don't want this in your development environment. If it's for testing, then use a test environment which is appropriately specced and seed it with data using something like SQL Data Generator rather than taking production data (not sure if that's the source of the volume). On the other hand, when terabyte disks are coming down as low as about $60 these days, is 200GB still an insurmountable challenge to run locally?

@James - Can we just call it poetic license? :) All your points are very valid and I totally concur with them. They're all important to a cohesive, productive team as is the appropriate use of environments.

Subject: A way to determine if you should be Shared vs. Dedicated
Posted by: Stoney DeVille (view profile)
Posted on: Monday, April 04, 2011 at 6:18 PM
Message: We determined there are factors in determining whether to go shared or dedicated.
Those on the left weight toward shared and on the right toward dedicated. Give each a single point.
Heavy read applications vs heavy write applications
Large db dev teams (5 or more) vs. small db dev teams (4 or less)
sysadmin developers vs. dbo developers
large data vs. small data
rapid development vs. test driven development

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Searching for Strings in SQL Server Databases
 Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.