SQL Server Replication-
The Crib Sheet
For things you need to know rather than the things you want to know
- Replication topologies
- Replication Methods
- Replication Agents
- Monitoring Replication
- Checking throughput
- Changing the settings
- Updating articles
- Programming Replication Topologies
- Further Reading
Replication is intended to be a way of distributing data automatically from a source database to one or more recipient databases. As such, it can have obvious uses in a distributed system. It has also been used to implement high-availability systems. It is not useful for one-off synchronization, or for simply copying data. It is intended as a long-term data relationship between databases. Typical uses are in:
- Data warehousing and reporting
- Integrating data from several, possibly only partially connected, sites
- Improving scalability and availability
- Integrating heterogeneous data from other databases via OLE DB, integrating data from mobile users, getting data to and from Point-Of-Sale systems
- Offloading/delegating batch processing tasks.
Examples of the use of replication within an application could be
- Distributing data 'owned' by a particular application to other applications that are 'consumers' of that data. (For example, sales records to reporting services, manufacturing stock levels to purchasing systems.)
- Creating several instances of a database to distribute load on it
- Updating a central database with information from a number of remote Laptops that might be only partially connected, and to resynchronise the laptops.
There are three methods of replication: Snapshot, Transactional, and Merge. These are provided to try to meet the wide range of business requirements for replication.
Replication uses a 'Magazine Publishing' vocabulary. Anything from an 'Article' to an entire database can be replicated. An Article is the smallest component of distribution, and can be a table, procedure or function. If it is a table, then a filter can be applied to it so that only certain rows or columns are replicated. This 'Magazine Publishing' analogy can be confusing because, in replication, a Subscriber can sometimes make updates, and a Publisher usually sends out incremental changes to the articles in a publication.
A 'Publisher' maintains the original copy of the data. It holds the definition of the 'Publication', which defines the 'articles' that are to be 'published'. (The database with the original location of the data determines what is to be distributed).
A 'Subscriber' receives the articles from a publisher. It can subscribe to one or more publications. Any database can take on either role or even both roles at once.
A Distributor is a specialist database that runs the 'Replication agents'.
Replication is not part of the SQL Server engine, but an external application. This makes it much easier to involve other database systems in replication. Any SQL Server database, or other database system with an OLE DB provider, can be a publisher or subscriber in snapshot or transactional replication.
It is essential to plan out the replication in detail as a first stage, and to be very certain of the type of replication you wish to implement. A common mistake is to use replication in cases where a much less complex solution is possible.
A problem with production systems using replication is the difficulty of restoring the topology after a disaster.. This requires a fully documented recovery strategy, which has to be periodically tested and practiced. This means that the whole replication topology and configuration must be scripted so it can be re-created in an emergency, even if the system was originally built using the GUI tools. The Object Browser (or Enterprise Manager) makes the initial deployment relatively simple to do, and there are plenty of step-by-step guides, but it is not the best option when trying to restore an existing topology, and settings, in order to achieve a recovery from major failures.
Problems often follow from developers adding or dropping articles, changing publication properties, and changing schema on published databases. It is therefore best to create the replication once the design of the publisher is relatively stable.
In most topologies, it makes sense for publishers, distributors, and subscribers to be on separate physical hardware.
The commonest form of replication is to have a single publisher with the source data, with one or more subscribers. The Distributor database can be on the same, or preferably different, server.
Often, where the data from several databases need to be 'warehoused' centrally in an OLAP or reporting database, one will find a single 'reporting' database subscribing to several publications.
One can come across other topologies such as 'bi-directional' and 'peer to peer' which are really special cases of Central Publisher or Central Subscriber and use transactional replication
The distribution of data can be relayed to other subscribers via a publishing subscriber. This allows replication to be implemented over low-bandwidth WANs to a subscriber that, in turn, distributes it to other servers within its high-bandwidth LAN
How Replication Works
Replication begins with the initial synchronization of the published objects between the Publisher and Subscribers, using a snapshot. A snapshot is a copy of all of the objects and data specified by a publication. After the snapshot is created on the publisher, it is delivered to the Subscribers via the distributor.
For Snapshot replication, this is sufficient. For other types of replication, all subsequent data changes to the publication flow to the Subscriber as they happen, in a queue, or on request.
The snapshot replication process provides the initial synchronization for transactional and merge publications. However, in several cases, this initial synchronization is all that is necessary. This would include circumstances where data hardly changes, or if the latest version of the data is not essential to the subscriber, where the amount of data is small, or if a large number of changes takes place rapidly.
Snapshot replication involves copying the articles that make up the publication. Normally, if they exist already on the subscriber, they are over-written, though this behavior can be changed. Snapshot replication is more expensive in terms of overhead and network traffic and only takes place at intervals. Because locks are held during snapshot replication, this can impact other users of the subscriber database. It is therefore more suitable for static data and enumerations. In SQL Server 2005, several articles can be processed in parallel, and interrupted snapshots can be recommenced from the point of interruption. Snapshots can be queued or immediate.
Data changes are not tracked for snapshot replication; each time a snapshot is applied, it completely overwrites the existing data.
Transactional replication is used if:
- Changes to the data must be propagated immediately
- The database application taking out a subscription needs to react to every change
- The Publisher has a very high volume of insert, update, and delete activity
- The Publisher or Subscriber is a different database application reached via OLE DB.
Essentially, Transaction replication distributes data in one direction, but transactional replication does offer options that allow updates at the Subscriber. Once a snapshot replication has synchronized the subscribers with the publisher, all committed transactions on the publisher are then propagated to the subscribers in sequence, via distributed transactions. One can select a queued update or immediate, depending on requirements.
This is a special type of transactional replication in which every participant is both a publisher and subscriber (2005 Enterprise only) and is most useful for up to ten databases in a load-balancing or high-availability group.
This is where two databases replicate the same articles to each other via a distributor. There must be loopback detection. Data conflicts aren't handled and the replication must be implemented in code, since the GUI doesn't support it.
Transactional replication tracks changes through the SQL Server transaction log
Merge replication allows various sites to work autonomously and later merge updates into a single, uniform result.
Merge Replication is complex, but provides the means to implement part of a high-availability system, as well as its original purpose of serving mobile and disconnected users. It is designed for cases where the publishers are not in constant communication with the subscribers. After the initial snapshot synchronization, subsequent changes are tracked locally with triggers, and the databases are merged when in contact, using a series of rules to resolve all possible conflicts.
Merge replication is used when several Subscribers might need to update the same data at various times and propagate those changes back to the Publisher and thence to other Subscribers. It is also required in applications that involve Subscribers receiving data, making changes offline, and finally reconnecting with the publisher to synchronize changes with the Publisher and other Subscribers.
To make this possible, each Subscriber requires a different partition of data and there has to be a set of rules to determine how every conflict that takes place in the update of the data is detected and resolved. These conflicts occur when the data is merged because there can be no 'locking' and so the same data may have been updated by the Publisher and by more than one Subscriber.
Merge Replication does not use transactions. Merge replication uses a set of conflict-resolution rules to deal with all the problems that occur when two databases alter the same data in different ways, before updating the subscribers with a 'consensus' version. It normally works on a row-by-row basis but can group rows of related information into a logical record. One can specify the order in which 'articles' are processed during synchronisation.
Merge replication tracks changes through triggers and metadata tables.
Replication is done by several different agents, which are separate applications each responsible for part of the process. The replication agents should not be run under the SQL Server Agent account in a production system. Instead, they need the minimal permissions necessary to perform their function.
SQL Server Agent
This manages the overall replication process via SQL Server Agent jobs.
The Snapshot agent
Snapshot.exe executes on the Distributor. It extracts the schema and data defined by the publication, which is then sent to the subscriber via a 'snapshot folder'. It also updates status information on the distribution database. . It is used in all forms of replication
The Log Reader Agent
LogRead.exe is used in transactional replication to extract relevant committed transactions from the publisher's log, repackage them and send them to the distributor in the correct sequence.
Distrib.exe takes the snapshots, and log entries from the agents we've described, and dispatches them to the subscribers.
ReplMer.exe is used only in Merge Replication to send a snapshot when the subscriber is initialized, and also exchanges transactions between publisher and subscriber
Queue Reader Agent
QrDrSvc.exe is used to queue the updates in transactional or snapshot replication when queuing has been specified.
Many problems associated with replication can be avoided by .regular checks. The most obvious check is to make sure that the data has been transferred as expected. Periodic checks with SQL Compare and SQL Data Compare can be very useful in addition to the tools that come with Replication. Additionally the replication processes and jobs need to be checked to make sure they are working.
The performance of replication must be regularly monitored, and performance-tuned as necessary.
The Replication Monitor is used to check on the operational state of publications, and inspect the history, and errors. Right-clicking the replication node in Object Explorer will gain access to it.
One of the most important concerns is the time delay, or latency, of transactions from the publications appearing in the subscriber database. At times of high transaction throughput on the publisher database, bottlenecks can occur. Whereas the stored procedure sp_browseReplCmds on the distribution database can tell you how far behind the synchronisation is at any particular time, one cannot determine where the problems lies just from the data. Tracer tokens are now used to measure the actual throughput of the replication architecture at any particular time to help diagnose such bottlenecks.
There is always an element of doubt as to whether the replication has entirely worked. There are stored procedures provided to compare the 'articles' on the publisher and subscribers to make sure they are the same.
The sp_publication_validation stored procedure validates the data associated with each article by calling sp_article_validation (after the articles associated with a publication have been activated). The sp_article_validation stored procedure invokes sp_table_validation stored procedure, which calculates the number of lines and, optionally, the checksum of the published table. It is considered good practice to perform a daily row-count and weekly checksum. SQL Data Compare is ideal for mending a broken replication.
The Distribution Agent raises the '20574' system message if validation fails, or the '20575' system message if it passes. The Distribution Agent will replicate changes to a subscriber even if the validation shows that the subscriber is out of synchronization. It is a good policy to configure the Replication Alert on the '20574' message so as to send E-Mail, Pager, or Network notification.
This validation approach will only work within certain restrictions. For example, it will not work if certain filters have been applied. They should be used with caution.
Changing the settings
It is best to use the default replication settings unless there are clear performance gains to be made, or if the application design forces the issue. However, one cannot assume that the changes will be generally beneficial to the entire topology without comprehensive testing.
Articles are the smallest unit of a publication. An article can be a table, view, stored Procedure or function. Where an article is based on a table or view, it can contain all the data or just part of it. These filters of two types.: More common are the static 'WHERE' clauses, but filters can be used dynamically in Merge Replication to publish different 'content' (rows) to different 'subscribers' (databases receiving data). These latter Filters are called 'Dynamic' and can be simple Row Filters, or Join Filters, where the selection of rows to publish is based on a join with other tables, rather than a simple WHERE clause.
Normally, any alteration to an article that is a table is propagated to all the subscribers. You can also opt to propagate schema objects associated with the article such as indexes, constraints, triggers, collation and extended properties.
In Merge replication, the subscriber can update the article. This is, of course, a recipe for conflict, and these have to be resolved automatically. When the Merge Agent comes across a row that might have changed recently, it examines the history or 'lineage' of each site's version of the row to see if there is a conflict. If so, then the update that is finally used. Has to be based on either
- A "first wins" resolution,
- a user-specified priority scheme to determine the update to select,
- a customised resolution, using COM and stored procedures.
The 'lineage' is a history of changes in a table row in MSmerge_contents, which is maintained automatically when a user updates a row. Each column contains one entry for each site that has updated the row.
Conflicts to the data in the base table can occur within a column or a row. Most usual are column-tracked articles this means that, within any row, updates are only recognized as conflicts if the same column is updated by more than one subscriber. Occasionally, however, the business rules of the application may treat simultaneous changes to the any column within the row as a conflict, in which case row-level tracking is used.
Programming Replication Topologies
Replication agents and replication topologies can be administered and monitored remotely via SQL Scripts or RMO scripts. The task of building and maintaining replication topologies is made much easier of all parts of the deployments are scripted, even if this is done after the other methods such as wizards or RMO are used for the initial operation.
There are other uses for a replication script. It will be required in end-user application where, for example, such as s a pull subscription is synchronized when the user clicks a button, or where a routine administration task such as monitoring replication throughput is performed from a custom console. It is also generally used for writing customized business rules that are executed when a merge subscription is synchronized.
One can use the Object Explorer in SSMS, or the Enterprise Manager in earlier versions of SQL Server, to set up replication. BOL provide worked examples. Alternatively, RMO can be used with VB or C# to script the replication. Whatever system you use, it is a good idea to use the Transact SQL script as the reference for the replication topology you create
Ultimately, the functionality in a replication topology is provided by system stored procedures. The easiest approach is to use Transact-SQL script files to perform a logical sequence of replication tasks, because it provides a permanent, repeatable, copy of the steps used to deploy the replication topology that can, for example, be used to configure more than one subscriber. It also provides a measure of documentation and disaster-recovery. A script can be stored as a query object in a SQL Server Management Studio project.
Replication scripts can be created by hand, by the script generation of the replication wizards in SQL Server Management Studio, or by using Replication Management Objects (RMOs) to programmatically generate the script to create an RMO object.
When creating scripts to configure replication, it is best to use Windows Authentication so as to avoid storing security credentials in the script file. Otherwise you must secure the script file
SQL Server Replication
For the details of implementing replication, all the steps are documented here in Implementing Replication
details on configuring and maintaining a replication are here Configuring and Maintaining replication
For details on peer-to-peer replication, read Peer-to-Peer Transactional Replication
Some of the wider issues and dangers of replication are discussed here Data Replication as an Enterprise SOA Antipattern, in the excellent Microsoft Architecture Journal