Agile methodologies work well with database developments only if great care is taken to do things right. It requires good judgement and leaves little room for error. Dev Nambi, in an extract from the book Tribal SQL, argues that Agile works for smart, curious, and experienced software engineers.
- Agile 101: A History
- Agile 201: Implications
- ABC – Automate, Balance, and Communicate
- DDT – Design, Deployments and Tests
More and more software development teams use "Agile" methods. Done well, Agile improves software quality and makes development and releases more predictable. Unfortunately, these are not the typical results for "early stage" Agile adopters. Instead, we see:
- an ill-considered rush towards lots of features, with inadequate testing to avoid bugs
- daily, automated deployments, without getting them right
- emergent (read, chaotic) system design, and stream-of-consciousness programing leading to "spaghetti code"
- exponentially increasing technical debt and developer frustration.
Teams stop trusting each other. Human sacrifice! Dogs and cats, living together! Mass hysteria! Teams adopting Agile take care to improve designs continually, as well as their testing and deployment practices. Agile leaves little room for error; it requires good judgment.
Agile database development is particularly hard because databases contain state and must ensure data integrity. They are harder to upgrade or roll back than the front end of a website and so are more amenable to up-front design than continual refinement. Lastly, database developers and DBAs tend to have less experience in Agile practices, leading to additional struggle in the early stages. This article will explore the history and principles of Agile development with an emphasis on how we can apply Agile practices successfully to databases.
Agile 101: A History
Agile-style development emerged as a reaction to the practices of what the Agile movement termed traditional Waterfall-style software development. The latter places an emphasis on up-front design, documentation, and completeness, to create a safety net that can produce mission-critical, complex software in a small number of releases.
The "design-first" (so-called "Waterfall") project methodologies originated in the automotive and aeronautics industries, where safety and quality were paramount. The Internet sped everything up and rewarded companies that could rapidly turn an idea into a working product. Development speed became more important than thoroughness and so the main benefit of up-front design vanished. Agile became popular.
Agile development allows teams to spread development and deployment risks over more releases. The shift from plan-driven to feedback-driven work can also reduce business risk, since the team builds fewer unneeded features, potentially reducing technical complexity. Continuous feedback also makes it easier to abandon bad ideas and bad code practices.
Agile development also makes compromises, and adds new risks. The main compromise is trading away time spent up front on design, for more frequent feature delivery. Agile teams spread design time over numerous iterations, but the lack of an initial "unifying" design means that development teams must work hard, adopting best practices and deploying good judgment, to avoid creating incomprehensible system architectures. Maintenance time often suffers too, in the push to iterate and progress continuously. Without careful management, this is what leads to the buildup of the technical debt, i.e. the cut corners, inelegant architecture, incomplete tests that, unchecked, gradually take a toll on system performance and developer productivity. This is particularly bad for databases because they often support multiple applications and, unless we take the time to design proper interfaces, database-refactoring can happen only when all coupled applications can change, meaning it happens at the pace of the slowest application team.
The number of deployments increases, requiring investment in streamlined deployment practices. Ultimately, of course, this is a good thing, but the amount and pace of change is a shock for some professionals, and keeping up requires good communications skills. Agile works best with smart, curious, and experienced engineers.
It's all fun and games.
Agile 201: Implications
The early stages of Agile adoption often take a heavy toll on software engineers. Their working lives becomes faster, less certain, and with more change.
There are three reactions: two common and one wise. The first is defensive: I don't want to do this. It won't work. The second is the Superhero reaction: I can do this, I am invincible! The third reaction is the sensible reaction: I can do this, but only with help.
ABC – Automate, Balance, and Communicate
The necessary help starts with the adopting the ABCs: automation, balance, and communication.
Repetitive tasks haunt the lives of most IT professionals: checking email, filling out reports, responding to bugs and tickets, and so on. However, we can automate or streamline many of these tasks. Most DBAs use SQL Server Agent jobs, scripting languages like PowerShell or Python, SSIS packages, and vendor applications to automate many aspects of their monitoring and maintenance, especially for tasks that happen late at night. Many developers use C#, scripting, and tools such as MSBuild, MSTest and source control command-line applications, to automate their build and test processes.
However, most DBAs don't automate documentation, deployments, SQL installs, patching, and so on. Developers don't make the leap from automated builds and testing to continuous integration or tool-aided refactoring work. Even if we can't automate a whole process, we could probably automate most of it.
Automating your repetitive tasks
For DBAs, my favorite automation project is to tie together monitoring scripts with the ticketing system. That gets rid of 20% of my day-to-day work right there, if I script it carefully enough. For development work, I like to have a single-command build-and-test script. This prevents me from checking in a broken build or failed unit test at least once per new feature.
Here is my recommendation to help you extend your automation. Start with the automation tools you already know, and get more creative. Then, every four months, pick one additional tool, something that's useful and easy to learn, and add it to your arsenal. Most importantly, spend a little time every week automating away your most common headaches.
A few common tasks take up a lot of time that you can save if you automate them, even partially:
- Automate your monitoring alerts to update your ticketing system (and vice versa).
- Automate your monitoring alerts to trigger your maintenance processes (and vice versa).
- Automate your email inbox to coordinate with your to-do list.
- Automate your build process with your test process and with a developer check-in. Every time a developer checks in a change, build the code and run a set of unit tests against it. This improves code quality almost immediately.
- Build a script that can deploy a single set of changes, either one database or several, to a single environment. Targeting development or test environments is usually safe. Make sure it has a rollback mode.
- Create T-SQL scripts that check for issues such as non-indexed foreign keys, GUIDs in primary keys, tables without interfaces, and permissions without roles. Turn them into database design-validation tests, and make sure they run every time you check in a change.
You will never run out of tasks to automate and streamline. I know three brilliant DBAs who work two hours a day. How? Early on, they spend a month of weekends automating most of their jobs. If something fails three times, the same way, they automate it. They never let on how much of their job they automate. This gives them the time to work with developers, Project Managers and testers to make the databases more stable overall. The key is to make this "automation work" a part of your daily routine.
Switching to Agile development creates a lot of change, perhaps the most dramatic one being the shift from development work that is planned and deliberate, to intuitive and rapid. This can be perilous. The worst results are:
- Rampant technical debt – code bugs and design flaws pile up rapidly and go uncorrected because they compete with user features for developer time.
- Deployment time and code quality for each release don't improve despite a dramatic increase in the number of production releases. Every release causes a lot of pain.
- Business managers, delighted by the sudden potential for lots more user features, come to expect a feature delivery rate that is often unsustainable.
- Teams swap current processes and practices for new ones, with reckless abandon, and without considering their relative merits. Consequently, they encounter problems they previously avoided.
Blessed with logical engineering leaders, we can institute some important balance measures. For example, it's a wise practice to set aside 10–20% of development time to pay the "maintenance tax," in other words, reduce technical debt and fix operational bugs.
In addition, allow a fixed, reasonable amount of deployment time per week. Developers can deploy more frequently, in the longer term, if they are allowed time to streamline and clean up each individual deployment.
The key goal is to aim for a steady, sustainable pace. As soon as a Project Manager for an Agile team detects the build-up of technical debt, a large backlog of deployments, or a long list of security patches starting to pile up, then it is time to pay some maintenance tax, and push for a better balance. Ironically, creating a successful Agile team is a marathon, not a sprint.
One great idea that encourages balance is to reward good behavior, and introduce a mild penalty for bad behavior. I worked in a team where developers who broke the build had to bring in donuts for everyone. Conversely, the IT staff rewarded with beer any developers who made the build or deployment process significantly easier. It worked beautifully.
One big change with a switch to Agile is a dramatic decrease in documentation. This can be a benefit; thorough documentation is always out of date and rarely used. Without documentation, the code, including the tests, is the documentation. This, coupled with the rapid rate of change of the system design means that people, rather than documents, become the primary source of domain knowledge. In order for all of the team to keep up with this pace, and still understand the system and where it is heading, they must communicate all…of…the…time.
There are many different ways to communicate. Unless you are telepathic, the fastest way is speaking in person. Face-to-face communication is high bandwidth. The second most effective way to communicate is via phone. Spoken language is very efficient; we can convey about 150–200 words per minute, which is 2–4 times faster than we can type. Also, speaking lends itself to back-and-forth communication, which helps people ask questions at just the right time. I'd estimate speaking in person or over the phone is easily 5–10 times more efficient than instant messaging or email, meaning that we should be able to communicate in 10 minutes what would take close to 2 hours of email time.
One of the most common and effective spoken-word communication techniques is the daily scrum or stand-up. Having DBAs and developers at the same daily stand-up works wonders; developers learn about how the system is working, and DBAs learn about imminent code changes.
With this change comes opportunity. For DBAs, constant communication makes it easy to keep developers in the loop about production issues. It also makes it easy to ask for fixes to the most annoying issues of the day, and to get these fixes quickly. DBAs also get the opportunity to provide input into design choices that might affect database performance. I have worked in teams where DBAs contributed to all code reviews; the result was very stable code.
For developers, constant communication provides feedback about how well our code is doing, both good and bad, and allows us to fix production inefficiencies and bugs. This gradually helps us hone our craft; we learn very quickly what ideas and approaches work, and which ones don't. That means faster, cleaner, better code over time.
DDT – Design, Deployments and Tests
Now it's time to use DDT: Design, Deployments, and Tests. Why? DDT kills bugs. Most of a database is…code. Databases enable features, and have interfaces. The way we design databases has performance implications. We can deploy patterns in their design, we can refactor them; databases are code. We should treat them in the same way, and apply the same standards of quality to their design, deployment and testing.
Good design saves lives, usually your own. I use the SIIP approach to database design: Simplicity, Integrity, Interfaces, and Patterns.
Keep your design as simple as possible. There's a famous acronym for this: KISS (Keep It Simple, Stupid). Features being equal, a simpler design is easier to fix and extend than a complex design. You need to be twice as smart to troubleshoot a system as you do to build one. If you build the most complex system you can, you will not be able to fix it.
The most important objects in a database are its tables. They store the data and enable data integrity. The best way to ensure accurate data is to use a clean and simple data model. For an OLTP database, that means as much normalization as you can handle. Designing to anything lower than third normal form (3NF) usually causes painful headaches. Key-value tables are notoriously painful because they force the application to handle data integrity.
Some database objects are pure code. Stored procedures, views, functions, triggers, and metadata are pure code. They are similar to code objects in other programming languages. They should have the same standards of quality (unit tests, security considerations, versioning, and so on.
Data integrity matters in a database. Unlike C# or Java apps, databases store data, so data integrity is not optional.
The data integrity features of your database design are of prime importance. Start with data types, primary keys, foreign keys, and check constraints. Beyond that, there are data integrity implications in our choice of nullability, defaults, and the use of triggers. Triggers in particular are a double-edged sword; their hidden nature and support of complexity can lead to unexpected problems.
An interface is the (hopefully) simple contract between your system's guts and any application that calls it. Interfaces enable decoupling: the ability to separate what an application expects from its implementation. I have found that we need interfaces whenever:
- a database is used by multiple applications
- the database and application code change at different speeds.
Inside a database, interfaces are stored procedures, functions, and views. Other types of interfaces are Object Relational Mapping (ORM) and client-side tools. The most important, and most obvious, use of interfaces is to decouple a table's physical schema from the application using it. That way you can change the two independently. Having well-defined interfaces will save you a lot of pain.
Use patterns. Similar functionality should look the same. Database developers often use patterns to ensure a set of consistent naming conventions and data type standards. The goal is consistency: stored procedures, tables, column names, security practices, should be similar. Two stored procedures that do almost the same thing, for example, writing one record to a table, should look similar. For example, they should all use the same conventions and standards for parameter names, logging, join patterns, documentation, and so on.
Patterns are effective when they are widely used. Agile is a team sport, so the best way to adopt a set of patterns is through democratic practices like discussion and voting. The same goes for eliminating patterns that the team don't find useful.
Make sure you can enforce patterns automatically. Writing them down in a Word document does nothing at all.
I think I can, I think I can…
These ideas are not new, but they are particularly important in a fast development environment. If your code is changing constantly, your systems and processes will degrade naturally over time, unless you take specific steps to keep them clean and healthy as you progress; it's like entropy.
In an Agile environment, there is rarely the time for drawn-out discussions, so practitioners need to get into the habit of making the right design decision quickly, even intuitively. The best way to acquire this skill is through experience. I find that DBAs and developers with 6–12 months of experience, including production support, have the hard-won skill necessary to design intuitively. Weak or counter-intuitive designs lead to anarchy.
Automated validation tests can help guard against some aspects of poor database design. We can create validation tests that fail when a database fails a design check. Start out with the basics, and add additional design checks to clean up your system, over time. For example, we can create validation checks based on querying the system views to look for tables without primary keys, foreign keys without indexes, tables without abstraction views, or objects without permissions, and so on. If someone checks in code that fails one of these tests, they get an email automatically, warning them of their peril. Some tests are widely used already, such as those contained in the highly regarded sp_Blitz tool at (http://www.brentozar.com/blitz/).
If anyone can modify the database design without checks or limitations, then a good database design can turn into an ugly mess within weeks, and you will deeply upset your DBA in the process. Control carefully who has permissions to make database modifications. Ideally, all database changes will take place in Source Control, where they can be tested thoroughly and deployed through the approved process. If necessary, audit direct database modifications using a trigger (see, for example, the Avoiding Version Drift section of Alexander Karmanov's article at http://tinyurl.com/ma7u6a3).
Having a set of deployment best practices and tools can make it easy to test and release code, especially if you have multiple environments (development, test, production, and so on.)
The following simple equation can help the team estimate what time they need to allocate to manual database deployments:
Total Available DBA Time for Deployments =[Manual Time per Deployment] * [Average Risk per Deployment] * [Number of Deployments]
To increase the number of deployments, the team needs to automate and streamline the deployment process to reduce time and risk, or they need more DBAs! The only alternative, fewer manual deployments, is what happens in many "Waterfall" teams.
Database deployments are different from other application deployments in one critical way: databases contain data. It is relatively easy to upgrade a website because the ASP.NET or PHP pages don't have data in them, so we simply overwrite the entire site. With a database that is impossible, because dropping and re-creating a database drops all of the information, negating its raison d'être.
There are three tenets to database deployments: keep them robust, fast, and specific.
The first tenet of database deployments: make them robust. You should be able to rerun a set of scripts, regardless of whether someone ran them previously, and achieve the same result. As a simple example, instead of having a script that runs:
CREATE TABLE dbo.foo…
Our scripts should first check for the object's existence.
IF OBJECT_ID('dbo.foo') IS NULL
CREATE TABLE dbo.foo…
INSERT INTO DeploymentLog (Description) SELECT ('Created table foo')
Alternatively, we could code this such that, if the table does exist, the script fails immediately.
You should also record the actions of all previous deployments in a deployment log, ideally populated via DDL triggers. Having rerunnable and logged scripts is very useful. If your deployment fails at Step 4, you can restart it from Steps 1, 2, or 3 without worry.
Another pillar of robust deployments: always have a rollback mechanism to undo the changes made to your database during a deployment, and return the database schema and objects to their predeployment state.
To roll back a stored procedure change, redeploy the previous version. To roll back a new table, drop it. Instead of dropping a table during an upgrade, rename it out so that, if necessary, the rollback can rename it and put it back into place.
The second tenet of database deployments: make them fast. Agile teams deploy anywhere between once a week, to ten times a day, a pace unthinkable in a Waterfall environment. When dealing with a higher frequency of database deployments, it's important to make sure they don't disrupt users continuously with long (or, ideally, any) service outages.
For objects that contain no data, the key goal is to avoid user disruptions during deployment. A common way to change a stored procedure is to drop it, if it exists, then recreate it, as follows:
DROP PROCEDURE DoSomething
CREATE PROCEDURE DoSomething AS …<New Code>
The problem here is that if a user session calls the procedure in the period between dropping and creating it, then the user's operation will fail with an "object-not-found" error. This technique also removes any permissions associated with the stored procedure, so we need to reassign them to allow users to access it, once it's re-created.
A better approach is one where we don't drop the object. If the object doesn't exist we
CREATE it, if it does we
ALTER it. This technique also keeps the object permissions in place and we can use it for views, functions, triggers, i.e. any object that is procedural, or does not contain data.
IF OBJECT_ID('dbo.DoSomething') IS NULL
EXEC ('CREATE PROCEDURE AS BEGIN SELECT ''foo'' END');
ALTER PROCEDURE DoSomething AS …<Code>
The dynamic SQL approach can get messy for anything beyond a very simple stored procedure; the previously referenced article by Alexander Karmanov describes an elegant alternative approach using
IF OBJECT_ID('dbo.DoSomething') IS NOT NULL
SET NOEXEC ON
-- skipped, if object already exists
CREATE PROCEDURE DoSomething
PRINT 'DoSomething: Not implemented yet.'
-- always executed
SET NOEXEC OFF
ALTER PROCEDURE DoSomething AS …<Code>
For indexes, the choice is different. It is impossible to change an index definition using
INDEX. Instead, we must drop and re-create it. Another option for non-clustered indexes is to create a duplicate index, drop the old one, and rename the new index with the old index's name.
Dropping an index is a quick operation, but creating or altering (rebuilding) an index often requires significant processing time. If your SQL Server edition doesn't support online index operations, or if the specific operation must be offline (such as those on clustered indexes that contain Large Object (LOB) data types), then applications and users will be unable to access the table during the entire index operation. That's often unacceptable.
With online index creation, the indexes and underlying tables remain accessible but you can still expect to see an impact on performance during the operation, and users may experience blocking or deadlocking.
I'd recommend creating and modifying indexes when your application is idle. Commonly, teams perform these operations in the middle of the night. The exception is if a table isn't yet in use by applications, in which case there is minimal user impact.
The most difficult object to deploy is a table. This is especially true if applications query the table directly, without interfaces. In that situation, the table schema is tightly coupled to the applications themselves, and cannot be changed without coordinating the change among application teams. I would recommend implementing an interface first, to decouple the database design from the application design.
There are only a few ways to change a table: we can add or remove a column, or change its definition (modifying a data type, adding a constraint, and so on). Commonly, we might want to change a column name or data type. The former is a metadata-only operation and should have no performance impact on concurrent user operations. Changing a data type, however, can result in SQL Server checking every row, to make sure existing data does not violate the new data type.
Generally, making a column bigger is a metadata-only, quick change. Making a column smaller requires SQL Server to check the entire table for invalid data, resulting in a table scan, which can take considerable time on large tables.
A quick check to see if a change is metadata-only
IO before running the
ALTER. If you see no output, it's a metadata-only change.
Another common task is to remove an unneeded column. The simplest way to do this is to drop the column (assuming no foreign keys reference that column), but this does not allow for a rollback mechanism.
The key decision is whether to save off the data in the column we wish to drop. To roll back a column drop, first you need to save off all of the data.
INSERT INTO Save_Table
( PK ,
SELECT PK ,
Then if we need to undo the change, we simply re-create the column with
NULL values and then reload the column with the saved values.
The last common requirement is to add a column to a table and load it with data. If the table is small, we can add it in any way we like, since any blocking from populating the column is likely to be short-lived. For a large table, the way to add a column with minimal user impact is to add it as a
NULLable column. If you need to add values into the new column, for existing rows, do so in batches and then set the column to be
NULL, if that's required. This minimizes blocking, although it does require several steps.
Similar to indexes, it is best to change tables when your application is idle. However, by following practices such as these you can minimize downtime and disruption. For further reading on how to refactor tables without downtime, I recommend Alex Kuznetsov's article at http://tinyurl.com/kl8a3dd.
The third tenet of deployments: keep them specific. You should deploy all of the code you have changed, and only the code you have changed. Deployment verification queries are a great help here. For example, if you're deploying a new view, a verification query might look like this:
SELECT CASE WHEN OBJECT_ID('dbo.fooView') IS NOT NULL
THEN 'View dbo.fooView created'
ELSE 'View dbo.fooView missing'
It should be easy for anyone to identify exactly what is, and what isn't, part of a deployment.
Keep releases decoupled, so that you can deploy them independently of other deployments, such as application deployments. For example, deploy a
TABLE script before your application needs to use it. Just in case, make sure your application fails gracefully if the table isn't there. That way, we can deploy either application or database on their own, without any dependencies. That is also an example of forward compatibility.
We should also attach a version number to each database. For example, if we know that a test database is running version 4.1.120, and the production database is running version 4.1.118, we can find the codebase definitions for each version. In addition, we can identify quickly which changes we need to deploy in order to advance the production database to the current version.
Having a good folder-diff tool is a huge benefit. Database code is text, after all. Comparing text files is simple, with the right tool. This is an area in which we can benefit from application developers' solutions, who have had decades to solve the same challenges. Of course, keeping your database code in a source control system is even better, giving access to features such as change history, comments, branching and merging, code review tools and so on.
Database testing is the third component of DDT. Having a solid set of tests and testing tools increases code quality. Tests ensure that our code works, and that we aren't breaking existing functionality. They give us confidence, and reduce risk.
Focus the majority of your testing efforts on the most critical system components. For databases, I've found that the following factors predict quite accurately how critical a particular piece of database code is:
- it has lots of business logic
- it impacts one of the 20 biggest tables in a database
- it impacts one of the 20 most-commonly-queried tables in a database
- it impacts tables/databases that are used by several different applications
- it changes permissions, or requires odd permissions
- it deletes data from a database via DDL (for example, dropping a column or a table)
- it uses an uncommon piece of the database engine (
xp_cmdshell, linked servers, service broker, SQL CLR, log shipping).
If we have 100 tests but we only run 20 of them, those 20 are the valuable ones. Tests that aren't run effectively don't exist. Getting developers and testers to use the same set of tests and testing tools is the most important step. Do that, and then incrementally add to your tests.
Test-driven development (TDD) is a very common Agile technique that helps you to make sure you write the tests you need by writing them before you write or change code. You know your code works when all of the tests pass. Writing tests first forces you to think about different ways your code can fail.
There are three important categories of tests: unit, integration, and performance.
Put simply, unit tests test a single piece of code in isolation, and make sure it works. For example, having a unit test for a stored procedure makes a lot of sense, especially if the stored procedure has complicated logic. In our team, we have a build-and-test machine that automatically runs unit tests whenever a developer checks in code. It checks our source control system for code changes, rebuilds the entire database schema from the code base and runs a folder full of unit tests against the new database. We catch any test failures instantly, and can notify the developer(s) responsible. This is a continuous build system.
While unit tests will check just a particular database object, integration tests will make sure that a change doesn't "break" some dependent database object. In other words, integration tests check code changes to make sure they don't break other parts of a system. For example, if we make a change to the database, we should run integration tests to ensure we didn't break any applications or services that use the database. I have found them to be particularly important when working on a database without enough interfaces, or when changing an interface.
The key point with integration tests is to have a good set of cross-application tests running, to make sure that you're simulating system behavior. The emphasis of the test is on integration points; each test creates/modifies data in every application in the system architecture. For an OLTP system, that can mean calling the UI, the business middle-tier, and the database tier.
In our team, we have integration tests for each business feature and we usually try to run them every 1–3 days. When an integration test fails, we email developers on all teams who have made check-ins since the last successful test. Of course, the ideal might be to run them every time we make any change, with every team member getting immediate feedback, before checking in the change.
Performance tests verify that the system meets the identified performance criteria, in terms of query response times under various workloads, and so on. To put it another way, you run a piece of code to see if it is fast enough, and if it scales well under load.
In my experience, development teams don't run database performance tests as often as they should, mainly because running them can require a near-copy of production hardware and they don't have the IT resources or budget to do that.
I have found that the best systems to run performance tests on are restored copies of production databases. It's extremely rare to see test databases with enough data volume and variety comparable to a production environment. A simple PowerShell script can run a set of scripts against a test database and capture the runtime of each query. The best queries to run are copies of the most common ones on the system and the most critical intermittent query, like a monthly billing query.
Our team usually tries to run performance tests at least once per release, especially for changes to large or critical systems. When doing performance testing, define fast enough and scales enough before you test. Be realistic about your goals.
Agile works very well if you do the right things. It also breaks down quickly if you do the wrong things too often. The key is to know when something is working, and to improve on it. Good judgment, smart people, lots of communication, and a level head are what you really need. With that, you can make Agile work for you, in your own way.
Keep learning, and keep improving. Good luck!
This was originally published as a chapter from the book 'Tribal SQL' which is a reflection of how a DBA’s core and long-standing responsibilities sit alongside new thinking and fresh ideas about where the DBA role is going, and what it means to be a DBA in today’s businesses.
Currently available on Amazon, in paperback and Kindle versions.- all royalties go to Computers 4 Africa