Click here to monitor SSC

Tony Davis is an Editor with Red Gate Software, based in Cambridge (UK), specializing in databases, and especially SQL Server. He edits articles and writes editorials for both the Simple-talk.com and SQLServerCentral.com websites and newsletters, with a combined audience of over 1.5 million subscribers. You can sample his short-form writing at either his Simple-Talk.com blog or his SQLServerCentral.com author page. As the editor behind most of the SQL Server books published by Red Gate, he spends much of his time helping others express what they know about SQL Server. He is also the lead author of the book, SQL Server Transaction Log Management. In his spare time, he enjoys running, football, contemporary fiction and real ale.

An Agile House of Straw

Published 31 March 2011 3:39 pm

The ideal Agile application developer welcomes changing requirements, even late in development. The DBA or Database Developer doesn’t. Why is that?

You can’t create complex databases in the Agile way, by breaking tasks into small increments, with minimal planning. Building a database that will perform quickly, reliably and securely over time, as it grows, is more like building a house: it involves architecture and has to be planned in detail beforehand. The tables, their data types, their key structure, constraints, and the relationships needed between the various tables, need to be thought out before you start the first cut.

The database structure then needs to be loaded with realistic transactional data and rigorously tested to make sure that the expected results are always returned in the various reports. It needs to be stress tested under load to ensure adequate performance, smooth scalability, and that there are no data integrity issues caused by predicted levels of concurrent access/modification of the data. (If you think SQL Server automatically protects you from such eventualities, it doesn’t, as Alex Kuznetsov proves in his Defensive Database Programming book).

When a database architect finally gets all this right, it’s only with the greatest reluctance that any changes to the database structure are allowed. Sure, you can make changes, but it is far trickier to do. Every change necessitates another round of regression, stress testing and so on.

This need to test data structures plus data is the critical difference between database and application testing. Data retrieval logic is not enshrined in the SQL code; the database engine decides how to do it based on a range of factors including table/index structure, the data that is stored, its volume, distribution, and so on. This means that, often, each round of testing must use not just "realistic" data but also consistent data, in terms of the data types, ranges of values, and distribution of data within those ranges. The test data must stay the same between builds so that you can compare aggregations to ensure that nothing is broken, and get relative performance figures.

All of this makes Agile database development hard. This becomes a severe problem when the database development becomes a full participant in the Agile process. There is no easy way to accommodate the evolutionary changes that some developers would like to freely make part of an Agile sprint, without destroying the resilience, maintainability and performance of the database. After all, you wouldn’t build a house by an evolutionary process, would you?

Cheers,

Tony.

15 Responses to “An Agile House of Straw”

  1. paschott says:

    Sadly, it’s possible that I would build a house in an evolutionary manner. That’s why nobody lets me near hammers, saws, and drills. :-)

    However, I’d argue that _some_ things could be done in an Agile manner, but anything that will have far-reaching effects needs an architecture. It all depends on the features and how “new” they are as a concept in our experiences. If something is pretty new to us and will have far-reaching effects, we spend more time in planning and prototyping. If it’s just a table to store some information that’s following a pretty standard pattern, we can create it and iterate as we realize we need an additional bit of information.

    On the flip side, it’s also true that you can’t just change the DB Structure at will. If we decide that we need to completely change the way we store the data, that has a far-reaching impact in code (not everything uses Stored Procs and views) and in the time spent refactoring the database. If we can catch that refactor early in the process, it can work pretty well. If we catch it after months or years of released code, it’s a lot harder.

    There are ways to work around the data tests and our QA team handles that pretty well. I agree that there are problems in that as well, but you can work around it.

    Overall, I’m more in favor of the Agile methodology if only because you can correct your course a lot more efficiently that way. I’ve done too many waterfall projects in the past where we were given requirements, went away to develop, came back months later and were told “that’s not what I meant”. Yes that can be worked around whatever you do, but waterfall has gotten me in more trouble than Agile. I don’t think I’ll ever be a fan of late-arriving requirement changes, especially in-sprint. If we get a set of requirements, they should stay the same throughout the sprint or be agreed upon by the team. Some of those “small” changes have a huge impact on the work done and should be started in a new story.

    This promises to be an interesting discussion. I look forward to the other responses.

  2. BuggyFunBunny says:

    tl;dr
    The question boils down to: what’s the purpose of Agile? I’ve found it’s used as an excuse for lazy minded development. Spolsky, IIRC, holds a similar view. Yes, one can build applications under RDBMS control in an Agile way, but the problems are largely organizational/bureaucratic, not technical.

    I’d argue that, if you want to do Agile and use a RDBMS efficiently, then Stored Procs and views is the only viable alternative. But it comes back to the age old question: where is the data control, in the client or the engine? The Agile folk want it back in the client, where their granddaddies wrote 80 column COBOL. Agile can be done with the active participation of database developers, but not, generally, when Agile is exercised in the usual fashion. The intent of that kind of Agile is to dumb down the datastore to just a bunch of dumb bytes, and that’s not what a RDMBS is, although it is too often treated as such.

    Agile, when done in a database environment, has to be done in discrete (discreet?) parts; figure out the data, figure out how the data’s going to look on the screen, two Agile exercises. They can be done in tandem. As an aside: with small form devices (iStuff, as archetype), BCNF databases, which can serve up small, independent chunks of data will be superior to the age old flatfile stores. The reason is that it’s far easier to build up (and maintain) from orthogonal data than it is to figure out how to tear it apart from un-normalized muck.

    – (Tony) You can’t create complex databases in the Agile way, by breaking tasks into small increments, with minimal planning.

    Well, yes you can, if you start with a BCNF schema, however few the tables. Let’s start with my meme architecture, the multi-processor/SSD machine, and an initial BCNF schema, product of some requirements document. With that as the Patient 0, it can be argued that changes from that point forward, on the assertion that the tables’ PKs are correct, amount to adding either related tables or non-key columns to existing tables. I know, at the very earliest time, the basic structure has to be worked out. NOBODY should be writing client code until the basic schema is defined. You have to know what’s what before you go adding bells and whistles. In other words, don’t worry about the widgets until you’ve figured out the data. Kiddie koders often embrace fire, aim, ready and call it Agile.

    Yes, the database will grow like Topsy, but unlike the flatfile un-normalized messes that OO/COBOL coders create, this ones grows in a controlled, organic way. The RM makes maintenance a matter of extending the BCNF schema in a relational manner; there’s no redundancy or duplication to worry about. What could be simpler or less dangerous?

    Here’s where code generation from schemas (generally CRUD simple) can be very useful. The klient koder kiddies just love to argue about appropriate widgets and color schemes and other such trivia. What matters is the datastore’s structure, which, in fact, defines the logic of the application. The client side is only responsible for getting input from the user, and sending back some answer from the datastore. Use a CRUD simple generator while defining the application and schema, leaving until last the klient kiddie koders to decide how to make it all look with widget choices and such; until then, just generated screens to test the logic of the application. It’s still Agile, but the developers don’t worry about pretty-pretty until after the datastore is stable.

    Since we’re only extending our initial schema, we don’t break any existing code. The new data supports new client side screens, which the Agile Babies can futz with until the cows come home. They get the data from the database with a SP or view call, pretty it up the way they want, and neither know nor care how the data are stored; that’s not their responsibility.

    How those input and answer are displayed is irrelevant to the datastore structure; one view may join the tables, while another may only show the independent table. Those two were intermingled, of necessity, as many flatfiles back during the COBOL/VSAM era. We’ve learned a lot since then. A screen *isn’t* a map of a file. OO programmers still have a tendency to make that assertion. They’re quite wrong. They need a good whoopin’, some times.

    The Rails folks have had Migrations (and Scaffolds for screen generation) for some years now, although I’m not a Rails guy; so this notion has precedent. But if you think about it a bit, extending a BCNF schema which is accurate to the application at time/0 will be robust to change at time/X. Either you’ve added: a new dependent table, a new independent table, or a new column. In each case, the new data is orthogonal (since we’ve started with BCNF, and stay on the Yellow Brick Road) to existing data, and existing code (both sides) is itself extended to manage the new data.

    Ah, but what if we’re moving data from one set of tables to another, and changing PKs and FKs and triggers and …. Then you’ve not reached Patient 0, basic schema; you’re still designing. Shippable client code, of any kind, cannot yet exist; to think that it does is folly.

    Now, if the original requirements were not thought out, and the business analysts (to use a pejorative term) are continually changing their minds, then the solution isn’t Agile; it’s get a bunch of smarter BAs.

    Agile, from my decades of experience, is just an excuse for lazy requirements definition, design, and development. The mantra goes something like: “you can’t know what you’re doing until you’ve done it”. Phooey, to quote Nero Wolfe. A concatenation of ad hoc design decisions is just sloppy stupidity, not something to brag about. Have a wander through Spolky’s archive; for a young-ish guy, he actually gets it.

  3. mbaylon says:

    Tony, an interesting and thought provoking post.

    I agree with you that as databases have both data and structure they offer different challenges to changing software.

    A database and its data are often going to be around an organisation a lot longer than the original app(s) they were built for.

    Therefore, it could be argued, that it is even more important that a database built today is able to reflect the inevitable changes any organisation is going to undertake over the next 10+ years.

    There is also the issue in many organisations, that if you won’t/can’t change in an appropriate time frame, it is very likely that you will be sidelined.

    The effects of this are often worse – with people going off and doing their own thing – creating another siloed/unmanaged copy of the data, EAVs, access etc etc.

    So perhaps the key issue is what can be done to make it easier to change – whilst maintaining overall quality.

    Not an exclusive list – but techniques that might help:

    1. More automated testing. Database unit testing both data and structure, data quality, integration, performance tests etc.
    2. Database development included as part of an overall continuous integration process. With automated testing quickly providing feedback on any potential ‘breaking’ changes.
    3. Automate db deployment – schema versioning, appropriate test enviroments etc
    4. Decouple database from their clients by using sprocs and/or views – to allow more flexibility in changing a db independently of its clients.

    I agree with a number of things mentioned in the previous comment – especially ‘the problems are largely organizational/bureaucratic, not technical’.

    I think it is key that an organisation has a group who are responsible for the overall enterprise data architecture and that they are also directly involved with
    project teams. This can help balance the fact that many project teams are incentivised to focus on taking a siloed/tactical approach to development.

  4. GuinnessFan says:

    This has been an over-generalization of agile development. It’s been around for over ten years and I doubt projects with RDBMS are the minority. Agile is not about planning less, but more planning during the life of the project. Of course developers want to make changes as they see fit without regard for network admins or DBA’s. When an application gets to the point you mention, it’s important to be able to test the changes. I just can’t believe the database is the greatest hinderance to requirement changes. If there was such a success rate of being able to conduct all planning at the beginning of projects with such accuracy that there would be no need for changes, there would be no need for agile methods. There is a huge difference between the belief that it is impossible to plan everything up front and not wanting to plan at all.

  5. allankelly says:

    Tony,
    Not only would I build a house in an iterative fashion I suggest that the most successful towns and cities are built in exactly this fashion. Christophers Alexander explains how in “A Timeless Way of Building”, if it is only houses you are interested in, then read Stuart Brand’s “How Buildings Learn”.

    The aim is to produce something that is highly /adaptable/ rather than highly /adapted/.

    There are examples of database development that has done just this. I recently went to a talk by Chris Oldswood who has created such a system with a database covered by automated unit tests.

    Reading your blog post gave me a sense of deja vu, these are the kind of points I heard made by application coders a few years ago.

    At the moment I don’t think the database community has developed tools and techniques that are able to support the iterative and Agile way of working. I believe it is only a matter of time before they do. However, I also believe the database vendors are not interested in this. I also suspect many in the database community are not interested in this either. There is currently too much “black magic” associated with databases, all too often only the high priests are allowed to touch the database.

    Thanks for your thoughtful blog.

  6. TheCPUWizard says:

    Tony,

    I find it very interesting that you post this blog post as the editorial on the same newsletter as n artical that referrences Martin Fowler’s “Evelutionary Database Design” [http://martinfowler.com/articles/evodb.html], which refutes your basic premise.

    Perhaps you should give it a read……

  7. Phil Factor says:

    I quote from the Raven DB website….
    ‘No upfront agonizing about the database schema or trying to shoehorn data into a relational model’ http://bit.ly/h7F6rC
    I quote from Scott Ambler
    ‘My experience is also that many data professionals are difficult to work with, often because they are stuck in their “serial ways” but also because they have little or no experience following modern software development techniques.’ http://bit.ly/hnzx6Y

  8. Grant Fritchey says:

    Hey Tony,

    I’ve found that it is possible to develop databases in an agile fashion, but it really is quite painful to do so and requires a great deal of discipline from both the database team and the development teams. It’s this requirement for extreme discipline in following processes precisely that, I’ve seen, is the primary reason it’s not adopted. Instead, either the DBA team rejects the agile approach as “too hard” (and it is not easy) or the development team rejects the DBA team and goes with an ORM tool and builds an object database (which, shock of shocks, doesn’t perform well, can’t be used for reporting, and generally breaks at version 1.1 of product, necessitating a complete rewrite). In my experience (hardly a huge, all ecompassing set of data), it’s more often the developers rejecting the DBA team and going after that Holy Grail of the elimination of the object impedance mismatch in order to arrive at a “pure” agile development approach without all that messy persistence stuff getting in the way.

  9. Michelle A. says:

    Fortunately, most of the developers that I work with understand this dilemma and after many years of struggling and battling with me they see the importance of a sound database design early on in the project. They don’t take database changes lightly. Maybe they finally got tired of hearing me complain? I honestly think that have realized that I’ve got a point.

    Data without integrity is not worth anything. Without enforcing this integrity the data can be easily fouled up and give the application a bad name. Making significant design changes to a database after it’s been living and breathing for awhile can be risky and slows down their development efforts. I don’t think that they try to avoid making database changes late in the project because they don’t like working with me. They typically have one project and I am often juggling several. So, when the project requires database changes, it will often lead to delays.

    Scott Ambler won’t score any bonus points with data professionals by making statements like that! He’s just fueling the fire!

  10. BuggyFunBunny says:

    @allankelly:
    Houston is the result of such unplanned thinking; no one likes it. NYC, Paris, and Washington are more or less planned. If you take what I (and those far more well known than I) said to heart about the strengths of BCNF schemas, you will come to understand that fully normalized datastores are the *most* adaptable; far more so than the flatfile mucks that OO coders create. It still comes back to a matter of bureaucratic control; a datastore smart enough to keep the crap out is superior, technically, to data which is siloed to particular code. For an example, check out xTuple; any client code can be used. It’s Postgres, but the design is what matters.

    @Phil/Michelle:
    Ambler is a buffoon. Just as Pascal.

  11. BuggyFunBunny says:

    Ooops. I admire Pascal. It should be:

    Just ask Pascal.

  12. Ian Ringrose says:

    Is this the correct questions?

    I think we should be asking:

    As Agile development is the best way to create a large class of applicants that are impossible to define upfront, should we be moving away from conventional RDMS and DBAs…

    (I am not thinking of apps that are just lots of CRUD screens, but applications that solve problems no one have solve before and hence no one really understands how the expert user will use the application, e.g. not custom in house accounting systems)

  13. BuggyFunBunny says:

    – should we be moving away from conventional RDMS and DBAs

    Only if someone, you perhaps?, devises a model of data which has greater quality than the Relational Model. Again, the notion that data is dumb and has to be controlled by a mass of client side code is just a fallacy. There’s no reason that an application, which is data driven so not games or such, can’t be done iteratively with data in a store which is nearly a pure RM implementation. Conventional SQL databases are such. The fact that coders define flatfile mucks into a SQL engine doesn’t mean that the RM or SQL databases are inadequate. Lousy barbers make lousy haircuts; that doesn’t mean you should get your hair cut by a butcher.

  14. robert18mack says:

    There’s no reason that an application, which is data driven so not games or such, can’t be done iteratively with data in a store which is nearly a pure RM implementation. Conventional SQL databases are such.

  15. paschott says:

    @BuggyFunBunny – Just read through your first response where you mention that Agile tends to be lazy requirements gathering. I’d agree with that in part. However, I think there is a lot to be said for building out just what’s needed and to get regular feedback along the way. I remember several large waterfall-style projects where we reached the end and it was not what was desired at that time because things did change along the way. Yes, it matched the original specs exactly, but by release time, those specs were not what was then needed. Having done both, I appreciate the more timely course corrections that Agile provides.

    I completely agree that you need a stable datastore design and at our current workplace, we’ve got a relatively stable DB schema. We adjust to add new features, but rarely have to completely change the schema and migrate data to adapt to the new way of doing things. It helps that the DB team tends to push for some future details on where we’re planning to go so we have also tended to avoid massive schema rewrites, except perhaps for legacy (not so good) schema choices.

    Actually, I agree with most of your points, just not the one about Agile being about lazy planning. The point of Agile should be that you can adjust more quickly to legitimately changing requirements as opposed to poorly planned changes. I’ve seen the poor planning, but I’ve also see the change to where what we plan out generally has some thought behind it so it won’t have huge changes in the future. We can then build out those new features and release them, moving on to the next releasable feature set within a short time. Our customers see progress and if something needs to be enhanced or changed, we can do that within a week or two as opposed to a couple of months for a waterfall project. (This assumes that it’s not a bug, but a legitimate change/enhancement.) Yes, a lot of people abuse it and then abuse the DB design, but if you have a good Dev team, they’ll realize the long-term effects of poor DB planning.

Leave a Reply