Click here to monitor SSC
  • Av rating:
  • Total votes: 60
  • Total comments: 9
Steve Joubert

Data Synchronization

01 October 2005

Making your smart client application work in a disconnected environment

Let’s say you’ve decided to build a smart client application. You’ve determined that it will run on Microsoft SQL Server and employ a data store solution that the client will use in disconnected mode. Your next step will be deciding how the two data storage mechanisms will interact with one another.

This article explains the design components of a smart client application, and focuses on the data synchronization controller, which is responsible for communication between the client-side disconnected data store and the server-side connected data store.

Architecture: key components

The key design components of your smart client application are the data access and data synchronization controllers.

The data access controller is based on, and works much like, the class factory pattern. A request for an object is posted first. The data access controller retrieves the data, instantiates the object with the data, and returns the object to the requestor. When the data is modified, the object is posted back to the data access controller, which processes it against the data store and notifies the caller that the data has been saved or deleted. Where the data comes from is irrelevant to the process.

The data access controller must also know if the smart client application is running in connected or disconnected mode. With this intelligence, it will know which data store to use when retrieving and updating data.

The second component is the data synchronization controller, which is responsible for communication between the client-side disconnected data store and the server-side connected data store. This controller:

  • ensures that the required data is available in the disconnected data store.
  • ensures that lookup data is synchronized.
  • locks records if required.
  • synchronizes data from the client data store to the server data store when the connection is restored.

Loading data

You must first decide how to get the necessary data from the server data store to the client data store so the application can run entirely off a local database. Unless you are locking records so they can’t be edited, you simply copy data from one data store to the other. If you are building a data synchronization controller and are using SQL Server, you could also create a DTS package to accomplish the job.

If the data is too large to copy over and won’t fit on the local hard drive, determine if you really need all the data to run the application in disconnected mode. Here is an example of how it is possible to deal with very large data sets by building intelligence into your data loading:

Let’s say we are writing a sales application for a multinational pharmaceutical company that makes a wide range of products, from over-the-counter pain relievers to surgical kits to medical imaging products. The company’s client base includes retailers, doctor’s offices, and medical institutions such as hospitals, surgical centers and hospices.

A single pain reliever from the company has more than three-dozen SKUs, or stockkeeping units, which are alphanumeric identifiers of the product that enable it to be tracked for inventory purposes. Some SKUs are used for doctor’s offices, some for medical institutions, and some for retailers. The company’s sales representatives specialize in selling to only one of these vertical markets, so the data synchronization controller can be configured to copy over only the product data that the reps will be selling, and in the region or territory in which they sell. Sales reps only need client data and potential leads in their region.

When copying over a subset of data to the client, you must ensure that all related data is copied over as well. This may include a cascade of data from tables that are four or five times removed from the primary table in your schema. In our pharmaceutical sales example, you may be able to view sales history for a client, but that history might contain SKUs that have been become historical in the system. These need to be copied over to the client to view historical data offline.

You should also consider limiting functionality when an application is disconnected. Perhaps you only allow a view of a client’s order history going back three years. Maybe there are reports that can only be run when the application is connected. While some of these decisions may be based on business requirements, they may be influenced by technical and practical aspects of their implementation. If you are limiting functionality when the application is disconnected, you need a clear way to communicate this to the user.

Client-side data caching

You can build additional intelligence into the data access controller and alleviate some of the burden on the server-side data store by getting look-up data such as a list of states or phone number types from the client-side data store. The data synchronization controller makes sure that the look-up data is in sync with the server, but the application will receive its data from the client data store even when the application is fully connected.

Look-up or static data could also be cached in memory by the data access controller. You could cache the data on the first request or have a background thread pull the data sets into memory at the start of the application.

Let’s now say that the application is disconnected. The user starts creating new data and modifying or even deleting existing data. When the application connects back to the network, how do you get the new and modified data from the client data store to the server? How do you know what has changed? How do you deal with identity field assignments on the client that may not match those on the server?

First, be sure to synchronize only the data that has changed. A few approaches you can take include database delta, change logging, or data flagging.

In all three approaches you must consider identity fields because the ID assigned to a record on the client will probably not match the one assigned on the server. In fact, in a busy system, the ID will likely have been used before you synchronize. Another consideration is the order in which the tables are synced. If a hierarchy or table exists, you must respect that structure and start at the appropriate place.

Synchronization concerns

In the database delta approach, the system must look at each database table that can be synchronized and evaluate if a record has been changed, inserted or deleted. Changed records are easiest to identify and update. New records need to take into account identity fields, particularly if the ID field is referenced in other places. It then becomes the responsibility of the data synchronization controller to ensure that the new ID, assigned by the server, cascades to the referencing tables before the data is synchronized.

Deleted records are more problematic. You can’t assume that if a record exists on the server and not on the client a delete should be performed. If someone added a record while you were disconnected, it would be wrongfully deleted.

One way to implement database delta synchronization is to use a time-stamped field on each record and note the time of the last synchronization. Then any record whose time stamp is greater than the last synchronization is either updated or inserted. Any record that exists on the server with a time stamp that is less than the last synchronization time and does not exist on the client can be deleted. While this approach is workable, it isn’t very elegant and can be dangerous if not implemented carefully.

Change logging is a safer method. Think of it as implementing your own SQL Server transaction log. Every action you take against the local database is logged as a synchronization record. When you are connected to the server again, the synchronization records are played against the server and the data is synched.

Data flagging is another option. It is safer than the database delta approach, and while less elegant than change logging, it is simpler to implement. With data flagging, each table that can be updated is given an extra column—a flag that tells the status of the record, whether it is unchanged, updated, inserted or deleted. In addition, any table with an identity field has a GUID field added to it.

When the application inserts, updates or deletes a row, the status of the record changes appropriately. When a record is inserted into a table with an identity field, however, a GUID is used instead. The GUID relates to other tables rather than to the identity key until synchronization occurs. Then, during synchronization, the identity field is assigned on the server and can be matched to related records via the GUID. Synchronization is simplified because each record that needs to participate is clearly marked and indicates the operation that needs to be performed.

Locking records and concurrency issues

Data locking is a different approach to data loading and synchronization. You create a flag alerting the server that a record or set of records is locked by a user; all other users have read-only access to the information. This approach is restrictive, but it enables you to identify the data sets that need to be loaded onto the client and eliminates concurrency issues.

When synchronizing data back to the server, you must have a set of rules by which the data is transferred. Problems can occur when the record you are updating or deleting has already been altered. Do you overwrite the data? Discard new changes? Let the client decide?

The “last-in” method is the easiest to implement but also the best way to lose valuable changes. In this method, the last person to synchronize or save his or her data controls the values in the record. Your updates overwrite previous updates and your delete removes records—even those that have been modified.

“First-in” disallows changes to records that have been modified since the start of the disconnected work. The idea is to flag records that need to be reviewed and enable the user to view the new version of the record and re-apply his or her changes. You may also want to build in more intelligence and enable the user to do an on-the-spot, side-by-side comparison of the two records and select needed values from each record before updating.

Conclusion

Making an application work in both a connected and disconnected environment is no small task when you consider the complexities of data loading and return synchronization. You must take the time to design, implement and test a system if you are going to be successful.

Steve Joubert

Author profile:

Steve Joubert is a Microsoft Certified Professional in developing web applications in C#. He has spent nine years developing Microsoft technologies, including building .NET applications, and has a background in such diverse markets as pharmaceuticals, biotech, banking, finance and entertainment. He currently works for ASPSOFT in Orlando, Fla.

Search for other articles by Steve Joubert

Rate this article:   Avg rating: from a total of 60 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Good information
Posted by: Anonymous (not signed in)
Posted on: Tuesday, August 22, 2006 at 2:41 AM
Message: Good information. Thanks.

Subject: Need more Information on various type of synchronization
Posted by: Pankaj Shinde (not signed in)
Posted on: Thursday, December 21, 2006 at 11:56 PM
Message: It's very nice.
Can I get more information on Various type of offine synchronization in detail?

Subject: hi
Posted by: Anonymous (not signed in)
Posted on: Friday, April 06, 2007 at 11:49 PM
Message: could you give me a sample about Data caching and synchronization in smart client

this is my email address: chuyennq@gmail.com
thanks you so much

Subject: good
Posted by: Anonymous (not signed in)
Posted on: Sunday, May 13, 2007 at 4:26 AM
Message: ok,but i need some technical support about data synchronization on client/server architecture,using oracle.It means that i want to know what does some software,such as oracle,help developer to implement a product have data synchronization.
Thanks for your post.I hope you will answer asap.

Subject: Question
Posted by: Anonymous (not signed in)
Posted on: Monday, July 09, 2007 at 7:42 PM
Message: Hi..i would like to know what is the best way to synchronize 2 servers at different locations? supposedly 2 different countries say A and B.when i edit a data in country A i want it to reflect in country B as well..and vice versa..hope u can explain this to me..plz get back to me at hk_ajmath@yahoo.com.sg thank you

Subject: what is the GUID?
Posted by: Anonymous (not signed in)
Posted on: Sunday, August 19, 2007 at 11:34 PM
Message: hi,i think this is a good information.But i would like to know what is the GUID and i also would like to know the solution for the question from hk_ajmath@yahoo.com.sg.
thank u.
avirroseana@yahoo.com.

Subject: Good !!
Posted by: Vinay (view profile)
Posted on: Monday, August 20, 2007 at 1:53 AM
Message: Hi, its good. However I would like to know how would the sender know whether its data has been synchronized at the other end or not ?

Probably, any kind of acknowledgement from the receiver....but how ?? Can u please explain that and i also would like to know the solution for the question from hk_ajmath@yahoo.com.sg

thank you
vinaypugalia@yahoo.com

Subject: Sample for a Offline Application
Posted by: Anonymous (not signed in)
Posted on: Thursday, November 08, 2007 at 2:55 PM
Message: This is a very good tool for working with joomla in offlinemode:

http://www.software4joomla.net/index.php?option=com_content&task=blogsection&id=4&Itemid=37

Subject: Anonymous
Posted by: mallik (not signed in)
Posted on: Saturday, July 12, 2008 at 3:30 PM
Message: It's very nice

 

Top Rated

Acceptance Testing with FitNesse: Multiplicities and Comparisons
 FitNesse is one of the most popular tools for unit testing since it is designed with a Wiki-style... Read more...

Acceptance Testing with FitNesse: Symbols, Variables and Code-behind Styles
 Although FitNesse can be used as a generic automated testing tool for both applications and databases,... Read more...

Acceptance Testing with FitNesse: Documentation and Infrastructure
 FitNesse is a popular general-purpose wiki-based framework for writing acceptance tests for software... Read more...

TortoiseSVN and Subversion Cookbook Part 11: Subversion and Oracle
 It is only recently that the tools have existed to make source-control easy for database developers.... Read more...

TortoiseSVN and Subversion Cookbook Part 10: Extending the reach of Subversion
 Subversion provides a good way of source-controlling a database, but many operations are best done from... Read more...

Most Viewed

A Complete URL Rewriting Solution for ASP.NET 2.0
 Ever wondered whether it's possible to create neater URLS, free of bulky Query String parameters?... Read more...

Visual Studio Setup - projects and custom actions
 This article describes the kinds of custom actions that can be used in your Visual Studio setup project. Read more...

.NET Application Architecture: the Data Access Layer
 Find out how to design a robust data access layer for your .NET applications. Read more...

Calling Cross Domain Web Services in AJAX
 The latest craze for mashups involves making cross-domain calls to Web Services from APIs made publicly... Read more...

Web Parts in ASP.NET 2.0
 Most Web Parts implementations allow users to create a single portal page where they can personalize... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.