23 January 2014

The Synchronisation Trap

The original generation of PDAs, the ancestors of today’s mobile devices, were notably limited in their connectivity. They relied on a regular, often daily, ritual of synchronisation where they would be connected to a desktop machine by a wire and synchronise their data.

Mobile devices are changing the way applications are designed in some fundamental ways. Some are obvious, such as the fact that they now have to support a variety of different form factors. However, some of the shifts are overlooked, such as one of the most fundamental: mobile devices can be moved. Users can pick up their device and take it with them, then use your application wherever they want… Only to be told they can’t actually do that, because they took their device somewhere without reliable internet.

This seems absurd – mobile devices have a plethora of connection options available to them – most of them wireless and allowing them to connected directly to data sources. It seems it should no longer be necessary to deal with the clunky and (if we’re honest) complicated business of synchronisation, but the reality is simply not that neat.

In the office, where the applications are designed, mobile devices can often seem to be just as connected as desktop machines. Maybe even more so: they often can connect using a cellular network when wi-fi is not available.

Not An Ideal World

However, this is only true some of the time. Some devices, particularly tablets, can only access the internet through wifi. If they are used as mobile devices, then they will probably often be taken away from networks they can access easily. This is a problem for any server-based application as it now cannot be used at all without first establishing a network connection (something that might actually involve the user physically moving)

Even where a device has mobile data connectivity, the situation might not be ideal: in fact, from the perspective of the user, it might even be worse. Mobile data connections can be unreliable as well as simply not present. This can lead to users finding that the application becomes unusable midway through a session, or creating a situation where they get part-way through an update and find themselves back at the beginning due to a glitch in the network.

Unlike desktop systems, which tend to stay in the office with its reliable network, mobile devices need to be able to move: to go to places where connectivity might not be good enough for an always-on design. They have a wide variety of options for connectivity not because availability is good but because it is often bad.

For the developer, possibly the simplest way to address this problem is simply to improve error handling. If the server isn’t available for a while, handle the problem gracefully and allow the user to resume what they were doing once they re-establish a network connection. This is less than ideal: users still can’t use the application everywhere they can take their device.

A step up from this is to add caching: it’s often relatively simple to cache the server’s response so that, when a network connection isn’t available, that can be used instead. This is generally fine if the application doesn’t need to perform any updates to the data on the server. When updates are required, this produces the possibility that users might be updating out of date records, and the problem that while they can view data with no network connection, a connection is still required to perform the update.

The Offline Rabbit-hole

Many developers faced with adapting existing online applications to the mobile world will follow a similar path: fixing the problems as they are reported by the users creates a predictable path. The first problem is simply accessing the data, so caching is the answer. It then becomes clear that there’s a need to be able to update it as well, so queuing is added, which starts to highlight problems with conflicts and data loss. Each problem seems small by itself, but there are a lot of them, and the simple piecemeal approach means that eventually a lot of increasingly specialised and fragile code begins to accumulate in the application and on the server.

The solution to these problems starts to look very much like an old-fashioned synchronisation system. These systems were common in earlier generations of mobile devices, as the only available connectivity was typically via a cable connected to a desktop machine. In a modern device, synchronisation has a slightly different purpose: connectivity is now common but can’t be relied upon, so synchronisation is needed to make applications work reliably.

Synchronisation is simple in concept but tricky to execute in a way that is truly reliable. A developer who follows the rabbit-hole and tries to build around the issues as they appear is faced with an uncomfortable reality: the REST API that their application is built around is designed for interactivity and not for the intricacies of synchronisation. Unfortunately, there’s suddenly a need to support tracking changes and conflict resolution. Worse, it needs to be reliable on a whole new level: a transient error in an online API can usually be worked around simply by retrying. In a synchronising API, it can leave the server and the device confused as to each others state (veterans of the 90s will probably be familiar with calendar entries that can’t be deleted and similar artifacts of this problem). There’s no simple path of development that leads from caching to reliable synchronisation.

This Isn’t A Solved Problem

The crucial difference between a ‘live’ system and one that synchronises is what happens when an update produces an error. In a live system, the server reports the error to the client and the client reports the error to the user, so that the user easily understands what’s happening. When synchronising, the problem might not be detected until much later when the user has moved on to other things. If the application tries to get the user to resolve the problem, they’re now faced with an error message for a task that they thought was already completed and that is now stopping them from performing whatever new task they had moved on to. They’re likely to make a snap decision to get the message out of the way, and this decision may be wrong, leading to lost data and confusion. On the other hand, if the server tries to silently resolve the situation, it might also make the wrong decision, leading to lost data and confusion.

So: for an application to be truly mobile, it must allow the user to be mobile. To do that, it must also support the user moving to somewhere where an internet connection might be unavailable or unreliable. Mobile devices aren’t useful until they can move around, so this can’t be treated as an error. That means resurrecting the notion of synchronisation, which is hard. To make it reliable, and in particular to ensure that data is not lost and users are not confused, it’s necessary to build both the client and server architecture around it, or face a difficult job in refitting an existing architecture. Which is the unfortunate rock-and-hard-place a lot of mobile developers are now finding themselves, and which there doesn’t seem to be an easy way out of.

Recently, a new movement has appeared: it’s called ‘offline first’ and was first proposed by Hoodie. It’s based around the idea that applications should be built from the ground up to work in offline mode, because that’s what the reality of mobile devices require. It’s something that seems to have struck a chord with many developers and users, with numerous follow-up posts around the internet.

However, it’s a doctrine that can only be followed for new applications. Existing applications seem to have some hard choices in front of them: should they be redesigned and rewritten as offline first? Is the rabbit-hole of adaptation good enough? Is there a short cut – a way of avoiding reinventing the wheel and implementing synchronisation from scratch? Perhaps the problem isn’t severe enough: is it good enough to just keep on treating offline as an error?

I’d love to hear your experiences in the comments.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 6233 times – thanks for reading.

  • Rate
    [Total: 0    Average: 0/5]
  • Share

Andrew Hunter is a Software Engineer at Red Gate who is responsible for much of the recent rewrite of ANTS Performance Profiler and ANTS Memory Profiler. Before that, he wrote the SQL Layout utilities for SQL Refactor/SQL Prompt. He has been described as resident master of the dark .NET arts. Dereferenced in a freak accident, he is forced to spend his days hiding from the garbage collector.

View all articles by Andrew Hunter