Simple-Talk Cribsheets: For things you need to know rather than the things you want to know
As part of our long-running Cribsheet series, we asked William to write a guide to deployment that described in general terms what is involved in the deployment of a database application, and the sort of issues that one is likely to come up against.
What is 'Database Deployment'
The term 'deployment', especially when applied to databases, can mean one of several alternative activities, and this ambiguity can cause misunderstandings and friction between teams unless production IT staff are brought into discussions early in the development process and given a clear brief. Deployment could indicate one of several different activities:
- Shipping a new database that is closely integrated with a single application
- Releasing into 'production' a new version of a database, over the top of the old one, in synch with a revised version of a close-coupled application
- Placing into 'production' a new version of a database, over the top of the old one, which is accessed by several applications via a defined interface
- Releasing an update to any combination of the components listed above, whilst preserving the 'business data'
- Rolling back changes to a previous version, whilst preserving the 'business data'
- Upgrades and roll-backs for a production database where the SLA demands zero downtime
The target of a deployment for larger development projects could be a test server, integration server, development server, or staging server. We can deploy less important applications straight to production, having completed integration and release testing.
The number and scale of the various procedures that are required for a release will vary dramatically according to the scale and importance of the database to the organization. There is a huge difference between a simple utility and an enterprise-wide trading system but some procedures are common to all deployments. You always have to perform a final pre-deployment integration-check, for example. The most important disciplines for successful deployment is the management of the source components in their various versions in a source control system, and a reliable way of maintaining the version of all builds of the database.
Build is a term that is more appropriately used for application code than a database. It refers to the process of creating the application fresh from the code to a particular version by compiling and linking the source, and integrating component libraries.
It requires an analogous, but quite different process to build a database. For a start, there are more components, and a database has to preserve the ‘state’ of its data. All these components of the operational system, which should exist as scripts, must be used in the correct version in order to build a database. These components will include:
- The schema of the database – tables, views, routines, constraints, indexes and so on.
- The 'business' data – the result of trading or whatever the database is used for
- The static or reference data
- The database configuration – change a setting, such as the collation or the default way that transactions work, and unpleasant things happen
- The roles, users and security
- The scheduled tasks – such as daily reporting or scheduled integrity checks
- The physical layout of the database files – primary filegroup, growth-increment, size-limit and so on
- Any additional libraries or programs that are necessary for the functioning of the database
Some components are the same for all types of installation whereas others, such as permissions and the physical layout of database files, will depend on the setting.
A change in a database schema from one version to another can usually be managed easily without any ‘explicit migration’, or special scripted steps, being required for adapting the data so it can be preserved in the new schema. A table alteration is generally benign, and the addition of a column is not going to result in data loss. However, deletion of a column will. A synchronization tool will usually infer how to preserve data and thereby avoid the need for a manual script to perform the migration of the data. It will perform an implicit migration of data into the updated schema
Where there is an obvious single solution to preserving data, a synchronization tool can do any necessary implicit migration of data as part of the synchronization process. There are, however, certain schema changes that leave too many questions and so require an explicit migration script. These are:
- When you add a column to an existing table what values should go into that column for existing data if you haven't specified a default constraint?
- If you split a column into two or more, or move one or more columns into a different table, where does the existing data go?
- If you rename a column, how does the DIFF database tool know it is the same column, just with a different name?
- What happens to existing data if you subsequently apply a constraint that renders it invalid?
- If you change the data type of a column, how is the existing data converted? Do you need to specify an explicit conversion routine?
- What if a redesign of the database schema requires a modification to the data itself?
To work reliably in a way that preserves data, your database comparison tool needs to be able to run hand-cut scripts instead of the auto-generated ones, to be able to progress, and roll-back transitions from one version of a database to another when a refactoring process has made automatic synchronization impossible.
Database Source Control
Whatever the scale of the database application, the source of the schema of a database will generally be the version in source control. Sensibly, we will also script into source control the reference data, database configuration, users and scheduled tasks, but these will change less frequently. Source Control for a database should also include the following:
- The database DDL code and the database build script, including all database settings where there is one database to one application
- The application interface within the database, if the database is shared
- All scripts used for testing and deployment
- Build and migration scripts between each major release
Database Versions and 'versioning'
It must be possible for the build application, or script, to query the current version of a database so that it can apply the correct tested script to roll forward or roll back the database, whilst preserving the data within it. Most third-party deployment tools require a method for attaching a version to a database application.
Unfortunately, there is no single, established way to associate a database with a version number. We can store this version number as a value or structured document (e.g. XML or JSON) in an extended property of the database, as a table in the database itself, a table in the master database, or on a central coordinating database.
For a complex database, it is quite possible for the application interface (normally in a separate schema) to be maintained as a separate version to the database, since it needs to keep in step with one or more application. It is perfectly feasible in a large database for schema to be versioned separately.
If a production database contains extra objects, over what is in source control, (e.g., Replication objects), then it will not be completely comparable to the version in source control.
Any database, even for small database applications, should have the following tests before deployment:
- Functional and Regression Testing – Testing each database feature with a wide range of normal and erroneous input data, checking the resulting state of the data against what is expected, from a business and systems perspective.
- Integration Testing – Testing the complete database system with a full data load, simulated or obfuscated, to ensure that the new release causes no unintended or unexpected side effects.
- Performance Testing – Identify performance bottlenecks in high use with a full data loading, using an automated test approach that simulates normal, peak, and exceptional load conditions.
- Stress Testing – Test the database under simulated loadings beyond the limits of its specification in terms of transactional throughput and connections to determine the load under which it fails and whether it fails gracefully without data loss. A sub-category of stress testing is sometimes known as 'limit testing' and involves deliberately failing components such as network hubs or disk drives to ensure that failure alerts are timely and effective and the application fails gracefully without data loss. These tests are hard to automate.
- Systems Integration Testing – To ensure that all applications work with the new release and nothing is broken. Only relevant if more than one application uses the database.
Collectively, the process of running these various types of tests is 'Acceptance testing'. Additionally, the users may want to do their own independent acceptance tests based on business criteria. Most often, these tests are exploratory or 'touring' tests and are less amenable to automation.
Finally, there are the unit tests to ensure that the deployment scripts work. For these tests to work effectively, you need tools that can compare database schema and data, and can simulate usage, so that we can rollback a deployment and check that no data has been lost. Keep these scripts in source control, along with instructions for use.
One would expect, in the course of a handover from development to the operations team, to check that:
- Issue tracking is in place – it helps a great deal if an automated procedure script can write to it when alerts are fired
- All policies that are required of a production system are in place – for example naming conventions, best practices, performance requirements, documentation.
- The scripts cannot be accidentally run twice
- Upgrades and rollbacks are 'all or nothing'.
- All synchronization and migration scripts or change sets that are required to deploy or rollback a release are available in source control.
- Integration tests are in place to check that the migration scripts preserve data correctly.
- Scripts are run on the integration server that simulate usage and check whether the data changes are correct, and in line with what the business expects.
- If more than one application is using the database, will the changes cause faults in one or more of these applications?
The operations team will want to know about:
- Infrastructure (e.g. Server and network) requirements and provision for this release
- Details that allow production to use virtualization and if so, anything that would affect a decision of a virtualization strategy.
- Auditing processes, and documentation
- What provision are in place for monitoring the database, and associated processes such as ETL or reporting.
- Security considerations for the servers, over and above the database server.
- Any categories of data that will require special consideration for data retention and backup. (e.g. offsite backups to cloud providers)
- The extent and mechanism of alerting – for system warning, errors and other abnormal events.
- The actions that production staff should take on all the different classes of alerts, the point at which they should handover to other levels of support, and who to handover to
- Service levels and the details of your planning for maintaining continuity in event of failure.
- Any associated middleware that is required, and how the systems are linked.
- All data feeds, and their maintenance implication
Database Change Management
At the very simplest, where the application and database are in a one-to-one relationship, then releases can be kept in step, and the application and database can be stored in the same source control repository under the same version number. The same developer writes application code and writes a database change script. The developer tests these changes locally, on a development machine. Both the database change script and application code changes are committed to source control together as a unified change, normally against a ticket or bug code. It is then a development task to aggregate a number of check-ins into a meaningful release or ‘version’. We then apply the database changes to the integration server, and test the application and database both in isolation and in combination. A range of tests (see later) will check for compatibility issues. The revised application code is also tested.
Where, instead, one database serves several applications, or the business requires deployments to be done with zero down-time, then it is best to decouple database and application releases by providing a defined interface or abstraction layer between application and database, based on views, functions and stored procedures. ORMs do not easily provide this decoupling and, without prior planning, this tends to make decoupling more difficult. By creating an abstraction layer, the interface specification rather than the entire database needs to be in source control together with each application. Different applications can have different interfaces and accompanying specifications.
New Deployments versus Upgrades
When deploying a new database from scratch, we extract each database change-script directly from source control. We can script the static data or load it from file via BCP. We need to apply those scripts in the correct order. All this will have been done in order to create the pre-deployment integration tests and the obvious solution is therefore to script this in a way that runs each build script in the correct order. If this isn’t possible for some reason, the only option is to document each step and ask the DBA to execute the scripts in the described order. It is a task crying out for automation to eliminate the possibility of human error. Any automated way of deploying a database from multiple scripts will have to deal smoothly with such issues as different environment configurations for each target server, such as file layout, size and growth options, and security options.
Where a database is a live system with business data, it's best to regard all changes from one version of a database to another as a migration, or transition, from the first state to the second, rather than dig-up/replant. It is much more expensive in terms of time and effort to recreate a database of any size than to script changes to the database with the data in place. For a typical enterprise-wide server, dig-up/replant is impossible.
We cannot change the structure of a database in the same light-hearted way as we make changes to an object-oriented application. This is why so much effort is devoted to up-front database design.
In some cases, we can upgrade a database by using schema and data comparison tools between the production database and the staging, or development, database. This will then produce a script to synchronize the differences, unless an explicit migration is required. In the latter case, the synchronization tool will need to be able to get this from source control. However, there can, in rare circumstances, be problems with this approach. When deploying to production, the 'one true source' could be in source control, yet the server code could be different, as when replication is installed.
Because the database deployment process is complicated, and difficulties tend to rear-up unexpectedly, teams tend to delay deployment until it reaches the stage of being a major release. This actually exacerbates the anxiety that surrounds deployment and leads to larger, less frequent releases. On the other hand, frequent releases leave less to chance and can make the process much easier, with less administrative overhead. The key to making this viable is to automate the process where possible, but with a tested means of rolling back without loss of data or service.
The most subtle problems tend to come from security issues involving one of the participant processes having the wrong access rights. A mislaid password can stop a deployment in its tracks. Even worse, a missing password for one part of the process can prevent an emergency rollback. Rehearsal can flush out many of these problems.