Click here to monitor SSC

Simple-Talk developer. Yes, it's probably my fault.

An Agile House of Stone

Published 14 April 2011 1:17 pm
A couple of weeks ago, Tony discussed why complex databases can’t be created in the Agile way. As someone that works on code from the database right up to the user interface, I wonder what makes a database so different from application code. The arguments in favour of an agile approach are just as strong in databases as in code – in particular you can’t predict future requirements, and if you attempt to do so, you are very likely to get them wrong, or build something far more complex or larger than actually required.

Tony specifically mentioned that a database has to be planned in detail beforehand if it is to perform quickly, reliably and securely over time, but software often has the same requirements, and agile development already has approaches to deal with them, in code. If, say, performance really is a requirement for your application, then why not have an automated test suite that can consistently test performance? A key part of many agile practices is refactoring, and having a suite of automated regression tests gives you the confidence to make changes while knowing that everything still works. The automated part is particularly important – if you’re doing these tests (or, arguably more accurately, checks) manually, then their cost will keep growing as your application grows.

What is it about a database that makes an agile approach to development harder than with code? There are certainly different challenges in an agile approach to database development. One difference is the fact that at least some part of it is required to persist. With software, you can often just restart the application with the new code; with databases, you must migrate any existing data to the new schema. If you’re constantly changing your database, then you need to do this repeatedly. However, in my experience, this has only been a problem where the original schema is poorly understood – “What does that column do?”, “Is that date column the creation date, or modified date?”, “What does null mean?”, “Is that text plain-text or HTML-encoded?” and so on. Once you have a good handle on your existing (as well as new) schema, data migration isn’t usually a hellish experience, especially if you’re taking an agile approach by taking a small steps, so that you only have to migrate a small part of your database to an updated schema at once.

Another impediment to database refactoring is “leaky abstractions”. It is difficult to protect the application completely from changes in the database. Ideally, all database access would be underneath some layer so that, assuming the business logic remains the same, the layers beyond the abstraction are not affected by changes in the database. Of course, abstractions often leak, and a faulty database abstraction can leak all the way up to user interface code. This means that we still need to change code at every layer of the stack, despite having a database abstraction layer. However, this is really no different to any other part of your codebase that has such a central role. Strict adherence to DRY (don’t repeat yourself), often requiring ruthless refactoring, is crucial in minimising the cost of changing your database.

Refactoring can also be hindered by the changes required in the database abstraction layer itself. For instance, if you rename a column, you might have to find all the references to that column in dozens of SQL strings. This is where an ORM such as NHibernate, with its QueryOver interface, can come in handy. These allow you to write database queries in the comfort of C#, Java, or your language of choice. Design your database schema carefully, and use an ORM to perform a literal mapping from a database row to an object in your static language – if your database row has a field called “Title” of type varchar(255), then your corresponding object should have a field called “Title” of type String. You can lean on existing tools to, say, rename a field across your entire codebase, allowing you to refactor more quickly, and with greater confidence that you haven’t missed a pesky reference in an obscure part of the code. Even if you do reference a field that doesn’t exist, the compiler will point it out. This is no replacement for a thorough suite of tests, but it can give you a little more confidence and help you find errors more quickly.

Finally: would I build a house with an agile approach? No – but that’s because the costs of building a house are completely different. With both software and databases, I can delete huge swathes with a single key, or make changes to any component, and have a suite of tests that will tell me within minutes if I’ve done something horrifically wrong.

Cheers,

Michael

21 Responses to “An Agile House of Stone”

  1. Thomas Williams says:

    Hi Michael – nice take on Tony’s article, well put. I hadn’t felt completely comfortable with “software as construction” as a counter-point to agile, thanks too for expressing an alternative.

    And you make some good points that databases aren’t necessarily “special” in agile development, but do have special requirements like persistence that can be accounted for in a build/test process.

    Cheers, Thomas

  2. Anonymous says:

    Interesting Finds: April 15, 2011

  3. callcopse says:

    I am also respnsible for code from the database to the UI and I think I agree with what you are saying.

    From my perspective the database is like the foundation of an application. The UI and reporting code will properly flow from a well constructed schema. However it also has to be part of the agile method – there is little choice. You may not choose to alter the foundation of a house once constructed but you might extend it. Similarly, I shy away fom altering existing live schema once development has built upon it – as I believe was one of Tony’s points. At the very least I would discuss the implications with a colleague.

    I am however using Entity Framework / L2S as an ORM and it is reassuring these days that you get nice compile errors if field names change or are removed – this does give you some confidence in proceeding! I always hated the niggling feeling you might have missed some view, procedure or trigger code.

  4. Jim Swarr says:

    The only problem that I see with Agile design within the database realm is that pesky issue of persitance. I don’t mean persitance in the sense of commiting changes to a database – but more from an historical (and auditing) perspective.

    The database represents the permanent manifestation of the abstract objects that the codebase is manipulating and as such require more thought because you are going to have to live with the result a heck of a lot longer. Get it wrong and you’ll either be writing a lot of database conversion utilites or living with multiple legacy flags in your structure (even worse a little bit of both)

    Because of the persitant nature of the data that we are actually storing in these databases – you may have to build more stucture into your database than the current codebase is calling for.

  5. Keith Rowley says:

    I agree with you. I never quite bought the database is sacred and special and can’t be developed this way argument.

    I personally find it interesting that Ruby on Rails and other such agile development frameworks pretty much move the creation and changing of the database schema into the codebase and these products are still able to create high performance applications.

  6. paschott says:

    I definitely agree on the persistence part. This is one of the issues that I have with Agile. You may design a perfectly good schema that is just what is needed for the initial problem. However, once we start really getting into the new needs as the project evolves, that schema may need significant changes for performance and just generally better design. On the other hand, that simple schema may be just enough to carry through for years. This balance is the main area of struggle. I don’t want to design something overly complex, but I hate the idea of refactoring everything we’ve done, writing conversion scripts, and repeating that as new/different requirements surface. Maybe to has to do with how we do Agile, but it’s a concern.

    As noted, Code is pretty easy to refactor. Blow it away and replace. Data isn’t quite as easy and as we change our data storage needs, that tends to introduce a good amount of complexity to avoid data loss. We’ve finally reached an understanding that data <> code. Data has to stick around. Code can be replaced relatively easily.

    I think we’re getting to a place where we can do Agile DB Development, but it’s definitely still in a pretty early state. The tools are definitely getting better and as we get the point across that just throwing a different DB schema version out there doesn’t necessarily solve all problems, the relationship between Devs and DB people is improving.

  7. BuggyFunBunny says:

    On both threads, there’s been no discussion of why it is, and how it is, that database refactoring/Agility is antithetical to code Agile. There is the use of the word “complex” in the assertion that such can’t be done in an Agile way in the database.

    No one, yet, has taken up the alternate assertion that “complex” is a stalking horse for un-normalized, and that morphing from one flatfile image in an engine to another flatfile image is, you betcha, difficult. But that’s what we shouldn’t be doing in the first place. We should be starting from Normal (at least 3NF, BCNF better) and staying that way. With update control in the engine, in the form of DRI, SPs, and triggers, the client code can morph its widgets as it sees fit. New data gets added to whatever screen(s) are appropriate.

    However, “appropriate” is not an aesthetic decision in the database, it’s an engineering one, and is forced on the developer by the Normal forms. The database developer, unlike the code developer, doesn’t have the freedom to toss columns and tables into the existing structure on a whim. Stop that.

  8. paschott says:

    I’d tend to agree with you on that one, but for those of us who have inherited bad design or sometimes created one because it was simple and met the needs of the story as it was introduced (not knowing where we were going), it happens. And yes, complex is not a reason that it _can’t_ be done, but it does introduce complexity and sometimes when you’re trying to get things out the door, that refactor sits around for a while. For larger legacy tables, the conversion task can be pretty complex. We’re looking at mapping several years’ worth of historical data into a much better structure and that’s not an easy, nor short, task. It will require some careful planning, tight mapping, and a way to push this data across that will not interfere with normal operations.

    As for BCNF, sometimes it does make sense to denormalize, especially for data that would otherwise result in a very costly query. That’s part of the “art” behind DB design, though. Usually we shoot for good normalization, but sometimes performance concerns mean that we denormalize to avoid the cost.

    As for why it tends to go against the grain, I’d argue that a good portion of it is that we don’t have a clear picture of the end results and that makes it harder to come up with the best design. We tend to get the new requirements in pretty small pieces, which leads to us just tacking a column on because it’s just one column and doesn’t warrant its own table. Only after several iterations do we really conclude that separating that data in the first place would be the better solution and that results in a conversion/refactor of the design. Doable, but sometimes painful. It’s also a hard sell to the development team at times. “Why are you adding a new table when we’re just adding a single column?” and “Just keep it simple to meet the needs of the story.” In some ways it is hard to argue with that and one reason I press the Product Owner for some future vision so we can make better architecture decisions.

    I definitely agree that complex tends to mean un-normalized, but it can also mean that it was normalized, but poorly thought out, or worse – object-oriented instead of DB-oriented. Good thoughts on it and I tend to agree with what you’re saying.

  9. puzsol says:

    Personally, I disagree with the comments that it is easier to refactor code than it is to modify the data. I mean let’s face it developers on both sides don’t document the design to a sufficient level (if at all). So you want to make some changes…

    On the application side, you need to read and interpret the code, follow events, track back from interfaces to the concrete classes, sometimes through inversion of control, settings kept in obscure places etc. All of which takes a very skilled and careful person and lots of testing.

    On the database side, you should hopefully have a self-documenting structure – relationships are explicitly stated and can be listed by a query or self constructed in a diagram. You can add a column in three seconds. The data is all there and can be queried, manipulated, exported, copied, split, backed up and restored, and you can be fairly confident that nothing was missed or lost simply by looking at the row counts. I’m not saying conversion from one persons idea of a schema to another persons idea of a schema is easy or trivial – but that is not what we are talking about here… it is the incremental changes that make the data better represent the problem space and allow the application to do what it needs.

    I have seen some projects where you write the database to store the data you have, write the queries to get the data you need for the user, and that works just fine. The interface evolves and the data grows, so the database must be extended to handle the new requirements, and the queries need to be retuned for performance, indexes dropped and created etc. If this isn’t agile database development, then I don’t know what is, and it worked well, and the database side was easier than the code side. Sure if you wanted to generate years worth of data using the tools by RedGate and fine tune your holy grail database right from the start, then things might take awhile to develop. But if on the other hand you want to ship a usable product now, and get some money to fine tune the database once you have a few years of real data (and income to match), then I don’t see any problem with the agile approach to databases.

    Sorry if I seem ignorant to you, but what is an example of something that is too complex on the database side to even think about changing it?

  10. callcopse says:

    @puzsol

    Nothing is sacred and there are always workarounds but there are often times when it is not sensible to think about changing something. For instance external dependencies on a structure that cannot be altered – perhaps:
    - a third party import/export from part of a schema
    - strict auditing requirements
    - a functional ETL component that can no longer be edited due to the passing of time and changes of personnel or tools that no longer work on existing operating systems

  11. nick harrison says:

    I don’t think that anyone is arguing against normalizing the database. Rather we are acknowledging the sad fact that a properly normalized database in the wild is somewhat rare.

    Starting an application and building every database involved from scratch is also somewhat rare unless you are working for a startup, and even then this is somewhat rare.

    We also need to acknowledge that no matter how much you plan, you are likely to still make mistakes. These “mistakes” will often be the result of requirements / assumptions no longer being true. They may be the results of new technology making old choices seem naive. They may be the result of not understanding the choices that were originally made, or they may even be the result of a hurried decision on a bad day.

    The end result, regardless of the cause, is a system with aspects that do not optimally meet the current requirements.

    Refactoring provides a mechanism to help ensure that our systems don’t follow such a degenerative path as well as provide guidance for how to pull a system back that has already started slipping down this slope.

  12. puzsol says:

    Ah, I get it, we don’t change the database because its the foundation of a fragile house of stone.

  13. paschott says:

    @puzsol – I don’t think it’s a matter of never changing the database. However, if something is working and the ROI to make the change is relatively small, it may be put on the back burner for quite a while or until that particular area of the app is refactored. I remember one table design from before my time where a set of 4 columns were added to a table for each set of check-boxes. Then we’d deprecate one set, or worse – repurpose a set. Time to refactor? Probably a month or two. When you’re constantly trying to add value to the system, it costs quite a bit in time and people to refactor something that works, even if it’s somewhat painful.

    Similarly, we have all seen designs that were not well thought out for the long haul. It may be a major pain point, but until it reaches some threshold where the pain of maintenance is high enough, it won’t be touched.

    I like Nick’s answer – Requirements/Assumptions change (sometimes way too often and even within the sprint), New Tech comes along, sometimes we just want to get it done so we can move on. That leads to a need to refactor, though it may not be immediate.

    All this to say that we _should_ be looking at pain points in the system and addressing them, but the reality is that we don’t always have time or resources to do that as changing the DB can have far-reaching effects. In most cases, we can get by with adding a column or a new set of tables. The pain comes when you start messing with the core parts of the system. That doesn’t mean we can’t or won’t do it, but we’ll be more cautious or thorough when doing so.

  14. BuggyFunBunny says:

    @puzsol:

    I agree with the gist of your first posting, but the OP (and most of the comments) have been along the lines of: “we’ve got a flatfile mess of a schema (built by some bunch of idiots, before us of course) which we don’t have documented on either the database or client side, so the cost of cleaning it up isn’t worth the benefit; the application works (we think so anyway) now and will have to work after, so let’s not bother, since we don’t get any ‘new’ functionality”. So, what started as a Quick and Dirty Hack Application continues on that path for decades. Agile is just a formal justification for Q&D application building. For those who weren’t there, the original MS/DOS was officially named (by its author Seattle Computer) Quick and Dirty Operating System, QDOS. Lucky, English gives us both Dirty and Disk.

    Tony’s original post, as I read it anyway, was that the databases in such applications are even more fragile than the code, since much of ACID is executed out in disparate locations in said code. MySql applications are notorious in this respect; as well as much of Fortune X00 COBOL systems converted into java/C#.

    If RDBMS instances were relational, then you (and I, if I may be so bold) would be spot on. Getting the corner office suits to see that Q&D application building, while immediately cheaper (“Let’s just add a column here and a table there and not worry about the theoretical formalities”) to grow like Topsy end up being far more costly when viewed as NPV of TOC over a reasonable time horizon. The reason is that none of the actors involved (suits, managers, coders, and DBAs) intend to be around the application any longer than it takes to get promoted based on the cheap Q&D mods done this quarter.

  15. paschott says:

    Not sure I’d completely agree with that. True, the cost of refactoring doesn’t always match the benefits and some of these solutions will “just work” for quite some time even though we hate them. Once the pain of maintenance surpasses that threshold, we’ll redo the DB and code.

    It’s possible to be Agile and change things as needed and can work well, but I think Nick brought up some good points. If we’re always developing for a single story, we’ll have a larger tendency to either design poorly or need to convert/refactor more often. Sure it’s possible, but that brings its own special types of pain points.

    One thing I’ve found that helps with Agile type DB work is to get a slightly larger view of what’s coming. That way you can plan a better architecture up front and be ready for some of those changes. Obviously if the project takes a sharp turn for a while, that gets a bit more difficult because you’ve now got some extra structure that may not be needed for a while – real or perhaps ready to be deployed.

  16. puzsol says:

    As far as I can see, my original point still stands…. It would seem that we don’t do Agile DB development, not because it’s hard to change the database (as in actually change it), but because it tends to be the hub of systems that are not so easy to change (fragile not agile).

    I agree it is crazy to do a DB change without factoring in the cost of changing everything around it. We had a case recently where we wanted to change two date columns from being non-null to nullable. The change to the db – simple. The rest of the systems… check the reports for what null on that column means (not just find the column but interpret meaning), check the utilities for use, check the code for errors that could have tried to add a null value before but were stopped by the constraint, modify all the uses of the class to use HasValue for the now nullable data member (again slippling into the logic what a nullable value means), test the whole enchilada etc… Was it difficult? Moderately. Was it because the database itself was hard to change? No. Did it require co-operation between database changes and the rest of the system that use it? Of course.

    I still say if you ask me that the database (if it was an entity all by itself) would be one of the easist things to use agile development on. And as people who have ended up with a hobbily designed (or not designed) mess of a database would say, it can also be the worst place to do agile development. I would think you can end up with horrible databases either way, it just that the up-front design-development has a better chance of realizing the ladder is leaning up against the wrong wall before anyone starts climbing it. But there are projects where the up-front delay would kill it, which was the point of starting Agile development in the first place.

    My personal opinion? That database development can be as agile as the processes around it.

  17. paschott says:

    @puzsol – I think I’ll generally agree with that in most cases. There are definitely times when refactoring the DB results in a more complex task that holds up the coding more than normal, but on the whole those times tend to be rare. The case you brought up can definitely be a factor as well. I’ve run into times like that when a seemingly simple change took quite a while because we had to run it through so many different scenarios. On the whole, knowing at least a little of the up-front design proves helpful, but not always a show-stopper. It’s worse when requirements take a completely unforeseen turn. Even worse if they keep veering back and forth. :-)

  18. Lee says:

    Maybe the Agile approach could work with databases, not exactly sure. But if it could, it isn’t obvious how that follows from the fact that it works in application code. Databases and application code are fundamentally different in that applications are built on databases, not the other way around. The database is the mother of the application. Application components break all the time and this may or may not affect other components; but, when mama ain’t happy, ain’t no one happy.

    The “database as foundation” analogy to construction has already been cited in the comments. I like this analogy, but would also add another: the concept of leverage. If you were in the corporate sabotage business and you had a choice between messing up someone’s applications or his databases, as the evil saboteur, which would you choose? Which would have the more far-reaching implications? True, some application code ripples through and affects other application code. However, when the databases are changed, the likelihood of breaking the application code is many times greater.

    (And sometimes, depending on the development team, the database designer can fill in for the evil saboteur quite nicely, or so it seems to some developers.)

    I would add that I think the dismissal of the construction analogy (“I can delete huge swathes with a single key, or make changes to any component, and have a suite of tests that will tell me within minutes if I’ve done something horrifically wrong”) is perhaps a bit too facile. True, there are no materials costs for changing the database, but there are certainly labor costs. Plus, the real cost of the fast-and-loose approach to database design is not correcting the errors that are horrifically wrong, but in correcting and testing the code that must adjust to the new “truth” as modeled in the database. Thought is the developer’s concrete and steel, and changing one’s mind is a double-edged trowel: it can tuck in the mortar that holds it all together, or it can weaken the foundation that brings it all tumbling down — particularly if the change of mind was half-baked to begin with or inadequately tested to finish with.

  19. puzsol says:

    I really appreciate Lee’s comments, I think it really helps clarify some of the comments before.

    A couple of final thoughts I had:

    1 – I bet most databases go through an Agile phase in their life; before they give birth to the applications. I bet no database is deigned completely before it is implemented. I bet all of them have tables renamed, indexes added or removed, data types changed, triggers modified etc. But only up to a point (probably when the first application becomes a toddler – then mummy has to leave her unstable life behind her and be a sensible role-model for her new child)

    2 – Is there something that can be done about it? In “Command-Query Responsibility Segregation” (CQRS), commands are changed from “update this column to that value” to “the customer got married and changed their surname”, that is the commands that modify anything have inherant meaning. The queries are all made upon de-normalised tables that are populated for each use (even down to different tables for different user roles), so even the data-view has specific purpose. In this paradigm if a new application wants a slightly different view, a new table is created for it (disk space is cheap). I think this this sort of abstraction would help a database is to remain agile… it allows the current ‘view’ of the data to remain in place regardless of the ‘database’ structural changes (until it is no longer needed)…. it that captures intent independant of the structure… these are things that are not built into an SQL update statement, and while Data Abstraction Layers are simply mirrors of the database structure, the situation is unlikely to improve. [if you are interested in reading about CQRS, go here: http://www.udidahan.com/2009/12/09/clarified-cqrs/

    3 – In my professional life I have seen a number of occasions where people have been asked by the business to do something, and their reply is just a flat “no, can’t do that” (and I know it’s not true). I have also had that kind of answer from a database team when asked to make a change I needed. I really dislike that kind of answer. Ok some times the database team said, no, you need to use this table instead – I’m fine with that kind of answer. In my opinion, the IT team are there to serve the business, so answers should be more of “I’m not sure what your intent is”, “ok, but it will cost this much”, etc. And the database needs to serve both the business and the IT team, it makes a horrible master. Which I guess is why we are even having this discussion… I mean, if the database is not agile, then how can it adapt to serve the new business requirements? Perhaps that is the real question.

    Long may the database serve.

  20. BuggyFunBunny says:

    – Which I guess is why we are even having this discussion… I mean, if the database is not agile, then how can it adapt to serve the new business requirements? Perhaps that is the real question.

    Long may the database serve.

    The conversation has been going, in one form or another, since IDMS/IMS in the mid 60′s. Back then, it was more rancorous, since the access path was built into the database schema, and the inherent conflict between coders (who were accustomed to controlling both code and data files explicitly, e.g. COBOL/VSAM cowboys) and database architects emerged . Change to such a datastore is inherently destructive. Coders have thought that way ever since: “the real world is hierarchical” is the usual mantra/justification. The real world isn’t hierarchical, it’s relational, often with a veneer of hierarchy; case in point, the org chart.

    Business analysts and coders want to force orgs into hierarchy (since it’s the familiar and most rigorous structure they can grok), when, in fact, orgs are matrix structured; have been for decades. Matrix management (multiple supervision) is relational, not hierarchic. BAs and Coders generally reply with: “well, sure, but you’ve only got one primary boss”, as if that changed the structure. It doesn’t. This understanding was Codd’s brainstorm; the RM and what followed were just implementation details.

    SGML (and its perverted spawn, HTML/XML) hasn’t helped data understanding. These are document formatting languages, twisted into datastore responsibilities all too often.

    In both cases, because coders existed before database developers (alas, it’s true that guys were banging on 026s in the 50′s), they’ve been able to flummox corner office suits into letting them define datastores. As a result, we continue to get inappropriate tree data (defined by coders at the behest of BAs), which is, by definition, bloody to change.

    In sum, agile in datastores is only possible with orthogonal data from the outset. And that means using an industrial strength SQL database (admitting that no RM database yet exists) as normally as it permits. And coders have to accept that screen/file monomorphism will not be the path.

  21. puzsol says:

    “The real world isn’t hierarchical, it’s relational, often with a veneer of hierarchy; case in point, the org chart.”

    Funny thing.. I once drew an org chart for our company that was in a state flux as a suggestion that had the HR “department” underneath all the worker “leaves” of the tree (HR supports everyone don’t they?), with the CEO at the bottom (underneath HR) as well as at the top of the org chart… it was a kind of feedback loop if you like for the org health… try to put that into a strictly heirachial/tree structure! Of course it was nothing like the one actually implemented, but then I guess I am a little bit odd.

Leave a Reply