Deployability is now a first class concern for databases, so why isn’t it as easy as it should be? Matthew Skelton explores seven of the most common challenges which will bring your database deployments to their knees, and the steps you can take to avoid them.
Deployability is now a first-class concern for databases, and there are several technical choices (conscious and accidental) which band together to block the deployability of databases. Can we improve database deployability and enable true Continuous Delivery for our software systems? Of course we can, but first we have to see the problems.
Until recently, the way in which software components (including databases) were deployed was not a primary consideration for most teams, but the rise of automated, programmable infrastructure, and powerful practices such as Continuous Delivery, has changed that. The ability to deploy rapidly, reliably, regularly, and repeatedly is now crucial for every aspect of our software.
We’re going to look at some of the most common reasons why databases often tend to become increasingly less deployable over time, some suggested remedies for how we can improve deployability, along with some clear practices and perhaps a little re-thinking around the way that databases are used and evolved. I’m not about to pretend that there aren’t team structures, responsibilities, and communication patterns that affect database deployability, but that’s a discussion for another day. For now, let’s focus on the most common technical blockers to database deployment and how to fix them.
Deployability and Continuous Delivery
The value of Continuous Delivery – regular, rapid, reliable and controlled delivery of working software systems into Production – is becoming increasingly apparent. Not only do your users see results and new features more quickly (thus making for a happier management team), but the shorter feedback cycle vastly increases your ability to learn about and improve the software system as a whole, reducing bug-fixing and 'firefighting'. Everyone wins!
However, in order to sustain this high rate of controlled change, the component parts must be easy and quick to deploy and verify, and the simplicity and speed of deployment (and post-deployment verification) are measures of their “deployability”. In short, it’s now essential to be able to deploy database schemas, objects, code, procedures, reference data, and a representative sample of dynamic data as easily as deploying an OS package or a web application.
Thankfully, the tools and techniques for achieving this are already largely known, although in some cases we may need to change how we use our databases in order to take full advantage of them.
What prevents deployability for databases?
Several 'forces' act against databases to make them increasingly undeployable over time: the complexity of database-internal logic; complexity of cross-application dependencies; sheer size (data volume); risk (or more accurately, perception of risk); and Production-specific tooling and configuration. None of these is necessarily a critical issue in itself, and most will be recognised by experienced DBAs. However, taken together, they will bring your deployment to its knees.
As a first step, we need to acknowledge that many databases – particularly those ingesting and supplying transactional data from live applications – need to be treated somewhat differently in Continuous Delivery from most other parts of the application and infrastructure estate, and this is partly why their deployability blockers might not be obvious at first.
Many of these problems aren’t at all unique to databases, but if you’re a DBA you’ll almost certainly recognise and have worked around them individually for separate reasons. However, even though these problems may seem disconnected, taken in combination, they are sure-fire ways of making your databases painful to deploy: the whole is greater than the sum of the parts.
By the same token, the 'remedies' described here may well be practices you’ve put into place before but, taken together, they’ll form the backbone of a painless database release process. Of course, none of these changes or steps are “free”, because refactoring is hard (as is explaining to the Business Intelligence team that they can’t have access to the production database anymore). If you want to optimise for database deployability – and you should! – then here are some things to tackle right away.
1. Flow of Production-originated data vs. everything else
Let’s start with a problem that actually IS fairly special to databases. Our applications, middleware, infrastructure, database schemas, reference data, and even cloud hosting 'fabric' can and should now all be defined as code, and flowed down a deployment pipeline from development towards Production, with significant automated (and possibly some manual) testing on the way. In effect, like salmon swimming upstream, Production-originated data needs to go 'against the flow' of changes and user requirements, which all flow 'downstream'. However, too often this pattern of 'Production first' is also used for changes to schemas, reference data, and database logic, leading to 'drift' between the Production database and versions used by development and test teams. As you can guess (and are probably aware from painful experience), that’s going to block your deployments.
Remedy: Minimize changes in Production
One way we can improve the deployability of databases is to separate the database changes that should flow downstream (with other code changes) from the data which cannot. Ideally, the only database changes that should be made 'in Production' are changes to live data, and reference data should change in development and flow down. The addition of new indexes and other performance tuning should be made in a pre-Live environment using recent live data and traffic replay techniques.
By reducing the 'specialness' of Production down to its live data, we can increase the amount of testing possible in upstream environments. This will increase confidence in database changes, encouraging a higher rate of change, and in turn a reduced fear of database deployment (not to mention reducing surprise fire-fighting). This also improves the team’s ability to track changes and roll-back changes that don’t go according to plan.
2. Accidental size and complexity
I’ve seen several almost identical cases where successful organisations, operating with an increasing online presence since around 1999, have reached a point in 2014 where their core database effectively dictates the rate of change of their systems. These organisations all have (or had) a large, central database that started out small and gradually grew to become highly unwieldy, apparently risky to change, and difficult to deploy.
The database was seen as the 'single source of truth' and allowed to grow, not only in terms of the accretion of live records, indexes and views, but also in terms of the applications that depended upon it directly. Given that they’re all now at varying stages of adopting Continuous Delivery, this presents them with a serious challenge!
Early on, the presence of all data in the same place was great for rapid development, but over time the complexity of the data required specialist skills to maintain, and changes became more and more painful and risky as multiple teams vied to make changes to suit their application's view of the business domain. This typically leads to highly expensive or complicated database technology in order to manage the database, perhaps available only in Production/Live (and I’ll come to that later), and other systems in the business suffer as budgets and resources are suddenly decimated. Sound familiar?
Of course, some of this is irreducible once your organisation or application reaches a certain scale, but often it stems from opaque legacy processes, organisational red-tape, or just inexperience. In any case, the additional complexity makes it difficult for individuals or teams to understand the database, and so – particularly when diagnosing a failed deployment – makes database deployment difficult.
Remedy: Reduce accidental complexity
In short, the smaller and less complex an individual database, the easier it becomes to deploy. As a general rule, I’d argue that it’s always preferable to reduce the internal complexity of any system, especially any unnecessary complexity, down to just what is truly irreducible. Complexity in systems generally scales geometrically with number of interrelationships, not linearly – so keeping the number of interrelationships fairly low is a useful engineering maxim if we want to build human-comprehensible systems.
Moving non-core data and logic out of the main database and into separate databases helps to make the core database less complex, and more amenable to human understanding. This both reduces fear around database changes and reduces the number of possible failure states for database changes! Reducing the size of databases by taking out accidental data then has the double benefit of reducing the hardware requirements to run the databases (potentially even allowing a technology change), and allowing more rapid and frequent upstream testing.
But how do you decide where to split the data? I’m glad you asked…
3. Lack of data warehousing and archiving
Sheer data size can also work against deployability. Large backups take a long time to run, transmit, and restore, preventing effective use of the database in upstream testing. Moving from eight hours for a backup/transmit/restore cycle to even 30 minutes makes a huge difference in a Continuous Delivery context, particularly when you include a few retries in the process. In some of the post-Dotcom systems I mentioned above, around 70-80% of the data held in the core transactional database was historical data that was rarely requested or used, but was preventing the rapid export/restore of data that would have led to effective testing.
Remedy: Archive, distinguish, and split data
This is one of the issues which is, at least at first glance, more pertinent to databases than many we’ve discussed (certainly more painful in many cases.) Applications (and developers) need to be more 'patient' when working with data; we should not expect to have all data readily to hand, but instead appreciate that data should, in fact, be archived. Working in a more asynchronous, request-and-callback fashion has the benefit of allowing smaller 'Live' databases, making restoring and testing on real data easier and quicker. By archiving rarely-requested data, we can drastically reduce the size of our 'data space' to something that more accurately reflects user demand. Of course, applications then need to be updated to understand that 'live' data is held only for (say) 9 or 12 months (depending on business need), and so to treat 'live' and historical data as two discrete sets.
In practice this might mean performing an asynchronous, lazy load for historical data (archived to secondary or tertiary location and rehydrated for the user when requested), but a synchronous, greedy load for 'live' data. This live vs. historical approach also matches human expectations, where 'retrieving from an archive' is naturally understood to take longer, so your users shouldn’t be up in arms.
Of course, we still need to distinguish between data that correctly belongs in a large central database, and data that has been added out of laziness or mere convenience, and store them separately. Once we’ve managed to split data along sensible partitions, we can also begin to practice polyglot persistence and use databases to their strengths: a relational database where the data is relational, a graph database where the data is highly connected, a document database where the data is more document-like, and so on.
Fortunately, there are tried and tested patterns for building software systems that do not have single central data stores: Event Sourcing, CQRS, and message-queue based publish/subscribe asynchronous designs work well and are highly scalable. The 'microservices' approach is another way to improve deployability, and is deep enough to warrant its own article later on.
These approaches are not easy, but the requirement for rapid and reliable changes (for which Continuous Delivery is a good starting point) already makes the problem domain more tricky, so we should expect an increase in the required skills.
4. Poor naming of system components
On a related note, the way in which subsystems or components are named can lead to that system unnecessarily aggregating all kinds of functionality and data.
I once did some work for an organisation which wanted some help with moving older systems to Continuous Delivery, but all the systems had names like (let's say) SPRUCE, REDWOOD, or CHESTNUT. The names were entirely opaque, giving no clues to their purpose, which had resulted in each new piece of functionality simply being bundled into an arbitrary monolithic subsystem! The tendency to 'lump' data into a poorly-named (and so poorly-defined) databases is depressingly common.
You can probably imagine the conversation:
"Where should this Customer data go?" "Uh, just put it in SPRUCE, I guess".
Whereas the answer should have been: "In the Customer service, of course!"
Now of course this isn’t unique to databases, but opaque names in any part of a system are bad for software. Well-named components tend to retain a good degree of coherence and focus, whereas their poorly-named cousins attract the unnecessary data and complexity we’ve just described.
Remedy: Name things transparently
This is a really simple fix, but we can dramatically improve deployability by using good names for databases and other software components, because opaque names tend to lead to arbitrary 'clumping' and coupling of features and data. Instead, use transparent names that describe the purpose of the subsystem or database, so that it is clear whether data should belong in that system.
If you want a little guidance in this area, I recommend "Uncle Bob" Martin's classic book Clean Code, where Tim Ottinger lays out some ground rules for effective naming of things in software systems, including: 'use intention-revealing names'; 'avoid mental mapping'; 'use solution domain names'; 'don't be cute'; and so on. Steve McConnell uses similar advice in his excellent book Code Complete.
5. Opportunistic Business Intelligence data
While we’re at it, having a single, large, data-rich database sometimes results in the database being co-opted for Business Intelligence (BI) purposes: simply provide a BI person with a read-only database login, and let them write SQL directly against the database, right? This ‘opportunistic BI’ pattern naturally leads to strong resistance to database change from the BI team; their SQL queries are often hand-crafted and brittle, and database changes tend to break them. This direct use of application & integration databases for BI actively pushes against more frequent changes and deployments by arguing for stability-at-all-costs (albeit in a particularly narrow domain). BI folk also tend to have the ear of the finance department, which usually overrules IT requests for change. The result: a tendency to avoid changing and deploying the database.
Remedy: Source Business Intelligence from a data warehouse
We can reduce the friction around database changes and deployments by explicitly and consciously servicing the BI team. We should not just get BI 'for free' by aggregating data in the same transactional database, but design a proper system that provides the necessary data from multiple data sources, aggregated into a data warehouse, and accessed via a specific datamart. If there is a need for BI, address that properly, not just as a happy result of the data being conveniently in the same place.
6. Optimisation for data administration
Another force working against database deployability (and related to the BI problem above) can be the desire of the DBA team to make the database as easy as possible to administer, especially if the database also has significant accidental complexity! For instance, the need to keep track of data for audit purposes is very real, and data protection and privacy regulation is increasing year by year. However, optimizing for the ease of audit at the database level by, for example, aggregating data in the same transactional database is an example of a local optimization that works against deployability. If you’re serious about making the deployability of your database a first-class concern, you might have to make some compromises.
Remedy: Value more highly the need for change
To avoid optimizing for data administration within a single database or database technology, we need to move to a more distributed view of authoritative/definitive data. By that I mean that we need never lose the sense of a 'single source of truth' for data, but the existence of read-only copies in multiple locations isn’t seen as a problem, and we might source different kinds of data from different locations and database technologies. These changes help to reduce the size and complexity of individual databases, and reduce the perceived risk of changes in each specific case, leading to easier database deployments. Yes, there is an increased overhead in management and administration, but with careful planning on good team communications, that increase can be minimised.
7. Production-only technology and configurations
Special Production-only database technologies, configurations (e.g. linked server mappings), and routines all work against database deployability. Even if the volumes of Production data cannot feasibly be re-created in upstream environments, the database technologies, configurations, and features should be – at the very least in a Pre-Production environment. I’d even go so far as to say that if the cost of licenses for upstream environments would be prohibitive, then you should seriously consider an alternative technology because the hidden costs of a 'unique' database set-up in Production are huge.
Some of this might stem from one of sheer data volume problems we’ve already discussed, which could be fixed in a few different ways, but it might also be down to resourcing, organisational politics, or some other set of priorities that don’t rate ‘the ability to release software’ as highly as they should.
One organisation which I recently worked with moved to Oracle Exadata for its live transactional database. The improvement in query speeds was astonishing, and the built-in diagnostic tooling very useful. However, the Exadata machines were for Production and Pre-Production only (due to cost) and, because Exadata worked so differently from the versions of Oracle in upstream environments, it was difficult to predict the query behaviour before deployment to Pre-Prod, leading to uncertainty about performance. Furthermore, the 'developer' version of the database, Oracle XE, only allowed a single schema, significantly hampering useful automated testing in deployment pipelines. The presence of the high-speed Exadata machine also meant that requests from developers and testers to improve the database to enable better development and testing tended to be ignored because "Exadata will handle it fine in Production", leading to ever-slower integration tests and development speed!
If you can’t test or develop every element of your software – including the database – in an environment that simulates Production, then how can you possibly have confidence that your software will deploy, let alone work as expected? Production-only setups work against database deployability by ignoring upstream development and testing requirements, meaning that high rates of failed deployments and buggy releases are effectively guaranteed.
Remedy: Avoid Production-only tooling and config where possible
We should aim to keep the live transactional databases fairly small and simple, moving rarely-requested data to a warehouse, and allow the transactional database to run on less specialised technology. Using the same technology and configuration across environments reduces uncertainty and fear within teams, aids collaboration and diagnosis, and even helps to keep data size and complexity low, all of which help to make the databases as deployable as possible.
For large data warehouses, expensive database technology is often needed purely for performance reasons. However, for live transactional databases that need rapid and reliable changes and deployments, it is usually better overall to use technologies that can be present in all environments from the developers’ laptops, through Test and Pre-Production, all the way to the Production racks.
None of the issues explored here would, by themselves, necessarily prevent rapid and reliable database deployments, but taken together these issues present significant blockers to database deployability. At the same time, none of these issues are necessarily new or unique to databases, but hopefully by considering them as a related set you can see how significant their combined impact can be. To address them individually perhaps doesn’t seem too challenging, but to tackle deployability requires close, effective collaboration between developers, DBAs, and operations teams to achieve the right balance between rapid deployment and access to data.