In the article, What’s a Database Pipeline, Steve Jones describes a database pipeline as “the set of databases across your delivery environments, through which your code flows.”
It’s described as a pipeline because changes to a database usually flow from left to right through each database environment. We make changes to the database in the Development (Dev) environment, such as altering a stored procedure or adding a new column to a table, save the changes in source control, and then deploy these changes through subsequent environments in the pipeline, such as Build, Test, and Staging, and ultimately through to Production (Prod). This can be envisaged as a pipeline like this:
The idea of a pipeline is very appropriate, but can cause problems if each of the environments is viewed by the team as a reservoir, where changes are accumulated over time, and the ‘valve’ opens only occasionally, and floods the parched downstream environment. Instead, we want a steady and continuous flow of changes through environments, so that we are constantly delivering value to the users.
How do we achieve this? We need to start thinking earlier about database changes and the impact they’ll have. We need to ‘shift left’ the database build, test, integration and release processes, performing them as early as possible in the application development cycle, so that we can deliver database changes to each subsequent environment, frequently and reliably.
This article considers how we can use ‘shift left’ for databases in areas such as build, integration, testing, and deployment, to help ensure early, frequent and successful release of database changes.
The advantages of ‘shift left’
Many teams find it difficult to shift left. Why? It is because the delivery process of a database relies on a range of knowledge and skills beyond development, and if these aren’t brought to bear on the process at the right time, what should be a steady flow of improvements gets ‘blocked up’ into an increasingly complex release that can swamp the capacity of those people who have to deal with the release.
We can ‘shift left’ if we can harness this body of knowledge and skills as early on in development as possible, to gain insights into high-availability, support, monitoring, compliance, data reconciliation, hardware requirements and network implications, and encourage early guidance and feedback from other IT activities with responsibility for the application. It gives developers a better picture of the sort of concerns that can cause delays in the delivery process. It gives the other experts with responsibility for the delivery process a much earlier idea of what is planned and why.
This is a ‘DevOps’ style of database development, release and deployment, and requires a cooperative approach between these teams. The idea is that if they work together, and share expertise, we can uncover mistakes early, correct them before the development effort takes a serious ‘wrong turn’, and therefore avoid ‘deployment blockers’ being discovered only days or hours before a scheduled deployment.
What processes need to be shifted left?
Too often database changes are integrated late in the application development lifecycle and the database build, integration, and deployment processes are right-sided, i.e. they are devised and implemented “on the fly”, close to the time when the changes need to be delivered to users, often without thorough testing. The DBAs and system administrators are often left to deal with and repair the inevitable bugs, security vulnerabilities, and performance-killing queries. The delays in deployment, and subsequent disruption that the bugs and performance issues cause, can cost organizations a lot of time and money.
In order to avoid this situation, we need to start deploying database changes more frequently, and this means that we need to implement our database mechanisms early, effectively shifting them left on the project timeline:
- Database Build – the team should be able to reconstruct any version of a database from scratch using the scripts in version control
- Database Integration – having established a working version of the database very early in the development cycle, the team continue to verify regularly that it remains in a working state by integrating and testing all changes
- Database Testing – unit testing, as well as integration testing, performance testing and so on will give the team confidence that what they are building will work and perform reliably without causing any regressions
- Database Release – a reliable, repeatable, and auditable process is required to deploy changes to production and practice on a production-like environment will ensure early feedback on the deployment scripts and any potential security, configuration or other operational issues.
If we can start to shift these processes over to the left then we can identify and fix problems quickly, as well as test and refine our build and deployment processes as early as possible in the timeline. All of this will help to minimize nasty surprises when releasing to Production.
For many organizations, Shifting-left will be a long term process, but it can be tackled in small, manageable steps, which we’ll discuss now.
Getting DBAs involved earlier
A key concept in DevOps is breaking down barriers between roles and working together towards a common goal to deliver valuable, reliable software to users quickly.
So, ask DBAs to help with database changes when they are happening in Dev (the left hand side). DBAs can make sure schema changes are designed correctly from the beginning. They can get involved in complex queries to make sure they’re optimized and will work well on larger, more realistic data sets. They can think about what indexes may be needed. They can see the bigger picture and know what other systems may be impacted by the change. They can start to think about servers, network capacity, backups and high availability. They’ll also consider any further downstream ETL processes that may be impacted or any reports that may need to be updated. They will have a lot of expertise with security, encryption and access control and will know about standards and policies. They will also know where to find expertise.
Also, this allows DBAs to understand what changes are being made to the database. This is extremely helpful when they are the first ones called when there’s a problem with Production. If you’re still doing manual deployments that involve throwing a script over to the DBA, I’m sure they’d love to know further in advance about what changes will be coming.
Of course, sometimes this is harder than it sounds. You might not have a DBA dedicated to your team. Also, even if you notify the production DBA of database changes as early as possible, there is no guarantee that they will have time to look at them; they are probably very busy administering the production servers, tuning queries, dealing with disk space or security issues, creating reports, applying patches, upgrading SQL Server to later versions, or a hundred other things.
However, over the longer term view is that as database build, integration, and deploy processes become automated, tested and reliable, and are continuously refined and improved based on feedback, so the deployments to controlled environments such as staging and production improve in quality, causing fewer subsequent production issues and less time firefighting.
Version control your database
Introducing database version control is the first step on the “left shift” journey. The Version Control System (VCS) becomes a single source of truth for any database version, often referred to as the canonical source. If your database is not in version control, then it will be very difficult to perform any subsequent build, integration, test, and deployment processes frequently or in a repeatable and automated fashion. As a consequence, they will tend to “right shift” in the time line.
Without database version control, a lot of teams rely on production database backups to “get the latest version” of a database, or to revert an environment to a previous database version, from an older backup, if needed. Restoring production backups can be time-consuming for big databases. If the database contains sensitive data, the DBA may need to remove or sanitize it, requiring more time and effort. Also, what happens if database development takes a wrong turn and the team need to revert to a previous development database version? Are the development databases being backed up? If so, how frequently? It’s very likely that development changes could be lost since they are overwritten directly in the database and there’s no way to undo a change or go back to a previous version if the database is not in version control. The team might be able to get back to roughly the right version from an old production backup, and then reapply any changes that they did need to keep, but the whole process can become very cumbersome and again has the problem of dealing with production data.
By contrast, with the database under version control, the development team can reconstruct any version of a database, and migrate from one version to another, just from the source-controlled database object scripts. In this way, version control gives the team a safety net. They are less afraid to make database changes, because they know they can revert and undo a mistake easily. This frees the team up to move quicker. It also makes it much easier for the team to track these changes back to specific requirements, user stories, or bugs.
Everything related to the application project should be in version control; the code, configurations, documentation, infrastructure, and of course the database. For the database, teams need to version control database creation scripts, configuration scripts, schema objects (tables, stored procedures, views, and so on), reference data (a.k.a. static or lookup data), and migration scripts for deployments. In short, everything needed to recreate a working version of the database.
Having achieved this, every database change can be seen and understood. This makes it easier and quicker for busy DBAs to understand and review database changes, to give them more confidence in the changes when it comes time to release to Production, and to reduce the amount of time they spend trying to supply development and test environments with a new version of a database.
A good working set-up for many development teams is for each developer to work on their own “sandbox database”, testing and committing database changes to a local source control repository. This also creates an additional step early on where developers will need to deploy their changes from their sandbox to an integration or test server, which tests a deployment of their changes as early as possible.
Shift left to include your database in the Build process
Without the database in version control, it’s actually rather hard to ‘build’ a database from scratch, at the required version. What tends to happen instead is that the development team try to come up with a list of just the database changes that are needed to support the new version, generate a set of change scripts, and run them against the database in the test environment. Inevitably scripts will be missed, adapted on the fly to fix issues, and so on. It’s messy and uncontrolled and makes repeatable builds hard to achieve.
Working from the VCS, however, the build process can create, from the canonical source, a working database at any particular version. It will create a new database, create its objects, and load any reference data, checking that the SQL syntax is valid and dependencies are in order.
If this is a manual process, then it can still be painful and prone to many of the same problems as ad-hoc builds, such as forgetting a particular configuration change, or missing a change script, or accidentally connecting to the wrong server.
As part of a shift left, the team needs to start perfecting the art of the database build as early as possible in the pipeline, as it in turn enables early integration and testing, so that you can find out about problems and fix them easier long before those changes hit production. In order to achieve this, the build process must be automated. Once the automated build for a given database version is tested and verified, we should deploy the package created as a result of the build, which is a known database version, to the other environments, in order to deploy that particular database version across the environments in a repeatable manner. In order to achieve this, any security configuration and database file configuration details, which could differ from one environment to the next, need to be abstracted into configuration files.
Shift left to include your database in Continuous Integration processes
As soon as a developer commits application changes to version control, typically a Continuous Integration (CI) server will build it and run automated unit tests to see if anything is broken. This gives teams a fast feedback loop so problems are found quickly and can be fixed more easily. If the integration fails, then what they changed and why is fresh in the developer’s mind and they can fix it more easily, or revert the change, if needed. Even more important is that the release of the changes through the pipeline stops. Since the CI build failed, this release never got deployed to Production and negatively impacted your users.
Similarly, we need to achieve a fast feedback loop as database changes are made to find any problems, and to prevent breaking database changes from ever reaching production, where they can cause problems for our users or for the production data, or both. In short, we need the database to be included in the CI process.
Developers can run database unit tests locally on their changes. If the tests pass, they commit the change to the VCS, and add their “commit tests” to the broader suite of tests for the application so that the database changes and database unit tests are also included as part of the CI process
Shift left to include your database in automated testing
In addition to automated database unit tests ran during CI, integration tests can also be automated. If the database is built and deployed on each change, assuming the unit tests pass, then automated integration tests can run across both the application and the database, ensuring that the tests are working against the latest version of each. Without this, integration tests may mock out the database, which isn’t testing the entire system, leaving this to be done at a later stage (further to the right).
We can think about performance testing or load testing sooner too, especially in a Greenfield project. Of course, to do this, you also need to “shift left” the mechanism you’ll use to generate the required load of realistic test data. For upgrades to existing databases, you might be able to use a copy of the production data, but in cases where that’s not possible, or for new databases, you’ll need other tools. You might use a data generation tool, for example, or simply import a standard data set using a tool such as BCP. Of course, to test the performance of a new database, you’ll need some way to estimate how much data you’re likely to collect in a set period. This means you can test against that sort of volume of data and make sure performance isn’t going to deteriorate. Again, finding the solution to such problems sooner rather than later is all part of the shift-left mindset.
Shift left to practice database deployments before deploying to Production
Practice makes perfect. We want to practice our deployments as much as possible and have a repeatable, reliable process to reduce the risk when deploying to production, so we don’t cause any costly downtime or upset our users. By deploying as early as possible to a “production like” environment, we can perform early acceptance testing, and verify that security, capacity, availability and performance characteristics of the production system are acceptable to the business or customer.
The first step is to introduce a temporary staging database into the pipeline, or ideally a staging environment that is configured as closely as possible to mimic the production environment. This staging environment must be under the same security regime as production.
This way, we are testing our deployment script before it runs on Production, which gives us more confidence that the Production deployment will go smoothly. We need to create a database in Staging that is as close as possible to Production, and validate it. This could be done by restoring the latest Production backup on another server or doing some kind of database copy. We then apply the exact deployment script to the Staging database that will subsequently be applied to the Production database.
Once we’ve tested the script and we’re happy with it, we need to make sure there hasn’t been any schema changes to Production since we ran the deployment on Staging. This is sometimes referred to as database drift. If our production database has drifted, then our Staging deployment as a “real” test of the Production deployment is no longer valid and we should repeat the Staging deployment dry run.
You can try to prevent schema changes happening directly on Production by limiting permissions and having a strong process in place that makes it easy for schema changes to flow through the pipeline. That means it’s just as easy to put the fix in Development and have it flow through and update Production safely and quickly, which could reduce the urge to jump straight on Production and fix it. There may be times when this is necessary, but best to avoid this if possible.
This may seem like a lot, but you don’t have to make all these changes right away. First, start encouraging developers and DBAs to collaborate earlier so database changes can be done right the first time and DBAs aren’t surprised by last minute production deployments.
Next, get your database in version control so developers and DBAs can understand the database changes that are happening. Version controlling your database is also the first step if you want to start doing CI, automated testing, and/or automated deployments.
Finally, setting up a Staging environment to test deployments in a production-like environment before going to production is extremely important to reduce risk and protect your Production data and your users from any problems.