Data Syncing in Core Data Based iOS Apps

In this post, we would like to discuss possible issues that you may face while implementing data syncing in iOS applications, and how you can tackle such issues with various approaches and tools. We are going to approach the issue from different angles, trying to make a generic overview of the topic. Still the main focus for us are applications that have a data model that is sophisticated enough to need a structured database and the Core Data framework.

Generally speaking, when would you like to implement syncing in your app? This might be needed, if the data handled by the application should be available on other devices or via a Web interface, or you just prefer that your data is not lost after reinstalling the application. Obviously, in all such cases you can do with a single database instance at a server and a thin client on the app side. But then, your application will need to constantly stay online to be operable. If, on opening each screen, your app has to reload data from the server or initiates requests to the server as soon as the user makes any actions, just test it during your daily subway trip by intermittent cellular connection, and you’ll get the picture.

A Bit of Theory

Data syncing is quite a broad concept. It covers a range of different approaches with their upsides and downsides. A recent post by Drew McCormack gives one of possible classifications. It approaches syncing based on two perspectives: Synchronous / Asynchronous, Client/Server / Peer-to-Peer. Syncing implementation is severely dependent on these factors, data model complexity, amount of data transferred and stored, and other requirements. So in each particular case the choice should be in favor of the simplest implementation meeting the app requirements.

Based on a review of existing off-the-shelf solutions, we can delineate several major classes of syncing, different in granularity of objects subject to synchronization:

Syncing of a whole document or database is used in cloud-based applications, such as Dropbox, Google Drive or Yandex.Disk. When the user edits and saves a file, the new file version is uploaded to the cloud completely, overwriting the earlier copy. In case of a conflict, both file versions are saved so that the user can choose which version is more relevant.
Syncing of key-value pairs can be used in apps with a simple data structure, where the variables are considered to be atomic, i.e. not divided into logical components. This option is similar to syncing of whole documents, as both the value and the document can be overwritten completely. However, from a user perspective a document is a complex object composed of many parts, but a key-value pair is but a short string or a number. Therefore, in this case we can use a more simple strategy of conflict resolution, considering the value more relevant, if it has been the last to change.
Syncing of data structured as a tree or a graph is used in more sophisticated applications where the amount of data is large enough to send the database in its entirety at every update. In this case, conflicts have to be resolved at the level of individual objects, fields or relationships. In this post, we are primarily focused at this option.

If you prefer to immerse yourself deeper into the theory and algorithms of syncing, you can search the Web by the following keywords:

Set Reconciliation Problem
The problem is as follows: We have to reconcile differences between two sets of data. There are some pretty simple ways to address this problem. For instance, we can reset both sets. However in practice the syncing algorithm have to meet user expectations and be quick. Let’s consider one of possible algorithms meeting these requirements. First of all, syncing should keep certain data on each object, i.e., creation date, last modified date and a unique ID. After this, the syncing proceeds by the following steps:
1. Retrieve a unique identifier, creation date, and modified date, for each object on the server and on the client.
2. By comparing the dates in sets of objects available on the server and on the client, determine which of the objects have beed changed, added or deleted.
3. Send the objects that have been added, deleted or modified on the client, to the server.
4. Retrieve from the server the objects that have changed on the server.
Operational Transformation
This is a mathematical model or theory where all changes to the database are represented as a sequence of actions on the timeline, and the problem of synchronization is reduced to merging several sequences into a single sequence of actions, as a result of which all the data is synchronized. This model had been initially developed to support remote document editing and is now used in such projects as Apache Wave (formerly known as Google Wave) and Google Docs.

Lamport Timestamps and Vector Clock Algorithm.
These are the algorithms that generate a partially ordered history of events in a distributed system and find violations of causality.
Some of the principles and ideas that are used in Version Control Systems, such as Git or Subversion, can also serve to you as a source of inspiration to implement your own synchronization algorithm.

Off-the-Shelf Solutions: Libraries and Services

Today, there are several off-the-shelf solutions of varying levels of operability, that tackle the issue of data synchronization. Let’s discuss some of them.

iCloud

First of all, these obviously are, the iOS SDK built-in iCloud features: iCloud Document Storage, iCloud Key-Value Store and iCloud Core Data. The first two are designed for document-level and key-value pair level syncing, so we will not dwell much on them to focus on the latter.

Core Data Sync runs on top of the same technology as document syncing. The application has access to a special directory which is separate from the app’s sandbox, where it can store data. This directory is synchronized with iCloud by the operating system, and the Core Data framework writes local changes to iCloud and merges the changes received from iCloud, with the local database.

It is recommended to use the SQLite database to sync via iCloud Core Data, as it helps to minimize data transfer over the network by sending incremental changes. In this case, each device maintains its own database instance, and only the change log is exchanged with iCloud.

While interacting with iCloud, please keep in mind the abnormalities that may occur and make sure that your app can handle them: iCloud may not be available at application launch, or a user may switch to another iCloud account, so while running from different devices in parallel, conflicts may arise. While debugging, make sure to monitor the amount of data exchanged and, if necessary, modify you app’s logic accordingly. Also, a restriction is imposed on the use of certain Core Data features, such as migration based on the Mapping Model and ordered relationships. For more details on these features please refer to iCloud documentation, section Testing and Debugging Your iCloud App.

Benefits of iCloud:

It is a server infrastructure that is free both for the app and the user.
It offers the maximum level of integration with the Apple platform: automatic backups, API is built-in directly into the SDK, association with an Apple ID.
Apple has taken care to ensure secure transmission and storage of user data.

Here are the reasons why syncing via iCloud may not be a choice for your application:

The data is associated with the user’s Apple ID rather than with your service.
The server part of your application can not access the iCloud data, so you neither can synchronize data with applications running on platforms different from iOS and Mac OS, nor provide the user with access to data via a Web interface.
In iOS 6, the iCloud Core Data functionality had had a lot of problems, and from the developer and the Apple’s viewpoint had been almost unusable. In iOS 7, Apple tried to fix these problems, but if you have to support previous iOS versions, iCloud Core Data use is strongly not recommended.
Syncing may fail because the user have disabled iCloud in settings,
or the user may get out of space in iCloud.

Dropbox Sync API and Other Cloud Storages

Many App Store applications sync via such services as Dropbox or Yandex.Disk, simply saving their data as files. Obviously, this can be used for document syncing only.

Cloud based storage services often offer an easy-to-use SDK to third-party applications, providing methods for working with files and user authorization.
To use this type of syncing, the user should sign-in with one of such services.
You can synchronize data between applications run on different platforms or even upload them to your own server.
However, there is no conflict resolution at the level of individual database records, so the file can only be fully overwritten.

Dropbox Datastore API

Having identified the developers’ need to synchronize their existing structured data, Dropbox offered its service and API for this. Its features include: support of a variety of platforms, offline browsing, automatic conflict resolution. In the Dropbox Developers Blog you can find lots of interesting information about the Datastore API, with a detailed description of algorithms used in it.

Datastore API is not just a synchronization framework, it implements all aspects of data storing and retrieval, replacing Core Data in essence. It means that to enable interoperability between the Datastore API and any other storage system already supported by the application, extra effort is needed. If you choose to use Datastore API only, you would be locked-in to a single syncing system and give no choice to your users. Fortunately, there is the ParcelKit open library. It mirrors the Core Data database to Datastore API and vice versa. It means that the application is running via Core Data, but all syncing is enabled by Datastore API.

A similar functionality to sync the Core Data DB is provided by other cloud-based commercial services. Among them are:

Such services offer similar functionality: cloud data storage, administrative Web interface, native SDK for iOS (and other platforms sometimes), REST API for data access using a native server-side code.

TICoreDataSync

Automatic syncing for Core Data based applications.

It is compatible with iOS 5.1 or higher.
It can run in the offline mode.
To enable syncing, all objects of the model shall inherit from the TICDSSynchronizedManagedObject class.
The TICDSDocumentSyncManager class object is in charge of syncing management.
The database is initialized by full import of the database from the server.
As soon as the library is alerted to save the context, it automatically serializes all the changes and saves them for later syncing with the server.
The syncing process involves the following steps:
1. First, all the changes are downloaded from the server.
2. Then the downloaded changes are committed to the local database. At this stage, conflicts between the changes made locally and on the server, are resolved.
3. After the changes are downloaded from the server and saved, the local changes are sent to the server, and the app saves the information on the last syncing date.
The conflicts are resolved at the client application level. A decision on which change is relevant is made by the application developer using a delegate method.
All data between the client and the server are sent as a set of files containing information about the changes.
As a synchronization backend, the Dropbox or a WebDAV storage can be used. You can add support of any cloud storage to the library by overriding a few classes.
Also, there is an experimental branch to support synchronization via iCloud Document Storage.

You can have a look at unresolved library issues at the list of tasks at github. Most of them remain open for as long as a several months to a year. Among these problems are: how to handle Core Data model migration, how to support ordered relationships, how to merge two existing databases. However, these are really complex problems that do not have an explicit solution.

Ensembles

This is a framework to synchronize with Core Data.

It is a relatively new library, conceived to continue unfolding of the TICoreDataSync concepts, but designed and written from scratch. It has been developed by Drew McCormack, who has been long involved with this problem. Before that, he used TICoreDataSync in his projects and even introduced his own patches into the project. Ensembles looks like a serious attempt to implement syncing in a really proper way, circumventing shortcomings of other libraries.

It builds on top of the Core Data framework, adding syncing between devices via cloud storages, such as iCloud or Dropbox. The framework can enable support for any other file syncing service.
The library intercepts the data that the application saves into its database, and appends to the local database the data arriving from all other devices.
This requires minimal changes to the existing code. There is no need to inherit model classes from the library class.
Not all the functionality conceived has been implemented yet, and the library has not been as thoroughly tested as other libraries. To date, such functionality as syncing of fields, relationships and ordered relationships of the database, is considered implemented. Still developers consider as problematic such use cases that are related to support of migration, BLOB syncing, and high database update rate.
The framework implements the eventual consistency paradigm. Also, it uses the vector clock algorithm to apply changes on to all devices in a proper order, unlike other libraries, where sometimes the databases on different devices may get out of sync forever.
The conflicts are resolved automatically by Core Data, whenever this is possible. In other cases, this is performed using the app logic. In this case, the conflict resolution event is also saved as a separate change and subject to syncing. The distributed version control systems are also based on the same principle.

Couchbase Lite

Couchbase Lite (formerly known as TouchDB) is an engine to support a syncing-enabled document-oriented NoSQL database.

Instead of interoperating with Core Data, it runs its own database engine, but still it deserves mention. The Couchbase database stores data as JSON documents and allows you to run all queries common to document-oriented databases, i.e.: filtering, aggregation, ordering, and map/reduce. The query language used is similar to SQL.

iOS 6.0 version 6.0 or higher is supported.
It is designed for "sometimes connected devices", so it supports the offline mode.
It is assumed that the database is not very large, but it is possible to add large multimedia files to it.
The logic of conflict resolution is implemented at the application level.
The interface to the server copies the CouchDB replication protocol which is comprised of the REST API and a specific data model. It is assumed that the server also uses Couchbase.
The application stores the database version ID at the time of last syncing, and requests only those documents from the server which have changed since this version.
The engine also supports the mode with a constant connection to the server. In this case, syncing can be initiated by the server.

How We Implemented Syncing in Together

Together Video Camera is an application and service designed to manage the user’s personal collection of videos, organize them by various attributes, make movies of them, store all the data in the cloud and access it from multiple devices and via the Web interface. No network connection is needed to access most of the app’s features. Hence syncing in Together can be classified as asynchronous and client-server.

There are many reasons why it might be preferable to implement syncing independently rather than use an off-the-shelf solution. In our case, the main drivers in favor of this path were as follows:

The data model and the database structure requires special approach from the app’s logic viewpoint: relationships between objects should be properly preserved, collections of video content should be ordered. These rules can not be summarized under the universal framework concept.
Such Core Data features as ordered relationships or file storage in the database, are not very well supported by the existing libraries.
All the off-the-shelf solutions assume meeting of certain requirements to the database structure or format of server interaction, which was not possible or practical in our case.
So we preferred to use your own REST protocol to communicate with the server. Such protocol should be identical for different platforms running the app, and should not be restricted by an off-the-shelf framework in any way.
We opted to keep all the data on our own servers and access them from different platforms, which excludes third-party cloud services.

Let’s now discuss each of the syncing specifics implemented in Together.

Loading the source data to the database. We assumed that the amount of data of a user account can be quite large, and the application should run decently even at poor connections. It means that when a user logs in to a newly installed application, we are not downloading all his data from server at once, but incrementally: At each stage of running the application only such data is requested that is needed to display the current screen. Moreover, this data is downloaded not all at once, but pagewise in the context of the current scroll positions.

Retrieval of data updates from the server proceeds in a similar manner: The application requests only that which is needed in the current screen. In this case, the server returns all the requested records rather than the updated ones. This is not generating a too high load on the network, since the amount of data transferred in each query is small. Still, this gives an option to simplify the server implementation, and use the same API for thin clients that do not save data locally.

Data import to the local database is run in a separate Core Data context running in the background thread. To reduce the time spent on import, we update the local database only with such objects that have changed since the latest syncing, which can be easily identified by the last change timestamp available for every object. Also, the import of changes has been optimized by querying from Core Data all the modified objects at once within a single fetch request. For more detail on this and other methods of data import optimization used in Core Data, please refer to this post:Importing Large Data Sets, objc.io issue 4.

To convert JSON structures to data model objects, we decided to avoid using off-the-shelf libraries like RESTKit or Mantle. This gave us the capability to parse highly structured and not always homogenous JSON responses, as well as to validate responses based on arbitrary rules directly inside the code. The import of data received in each JSON response is performed like from within a single transaction, which saves all the changes completely or rolls them back in case of an error.

[gist]https://gist.github.com/kolyuchiy/9827218[/gist]

Also, there is an asynchronous option to implement this method. All transactions are run sequentially as Core Data serializes all the performBlock calls.

This way, you can not get information on objects deleted from the server to remove them from the local DB also. For this there is a separate API method to return a list of objects deleted after a certain date. This request always uses the server time, so there is no need for clock syncing on different clients.

The above principles can be used in more simple applications that move data in one direction only, from the server to the client. Often it is more convenient just to introduce an intermediate step of saving data to a local database between receiving data from the server and its display to the user. This way you can save the data for offline access, make complex queries to the database, easily create various views of the same data for different application screens, and work with classes and objects of the model rather than with dynamically typed arrays and dictionaries, and make the server data structure more application-specific.

Perhaps it would be better to approach syncing in a reverse order: saving of changes made by the user to the server, both in the online and offline mode.

To send changes to the server, the API offers a common CRUD REST interface, with such methods as POST, PUT, or DELETE. Again, as in the case of data import, the API may just as well be used by a "thin client." This approach has an obvious downside associated with the fact that when a large queue of changes is parsed, the server receives, either in sequence or in parallel, a large number of requests, which increases the syncing time and puts a heavy load to the backend. This issue can be partially avoided by using HTTP Pipelining, through "collapsing" the queue and combining overlapping changes.

All changes in the local database related to syncing are performed inside a special wrapper of a standard performBlockAndWait method. In addition to data updates, it saves the current context and serializes all changes in the format suitable for saving between the application restarts, so that afterwards a request to the backend can be made based on such data. Context-related changes can easy be retrieved by standard methods NSManagedObjectContext insertedObjects, deletedObjects, updatedObjects, as well as NSManagedObject changedValues and committedValuesForKeys. The objects storing the serialized change are not standardized, and they are strongly dependent on a specific data model of our application and server API specifics. For instance, in addition to standard types of changes, such as field insert, delete or update, separate types of changes are assigned to such changes as album contents update or video screenshot update.

Then, each change is added to the queue which uses the FIFO mode to generate requests to the backend. Changes are added to the queue to ensure a valid sequence of requests sent to the server in case changes are interdependent. For example, video title editing is dependent on video creation, so such operations can not follow a reverse sequence. In this case, the groups of independent changes can be sent to the backend in parallel queries.

Also, we have partially implemented queue reduction by combining overlapping changes. So, when an object that has not yet been created on the backend, is removed, such removal is not added to the queue, and all the changes associated with a given object, are associated with such an object.

Unlike many other methods of syncing, we do not assign a globally unique identifier of the object at its creation. This identifier appears in an object only after it has been successfully created on the backend. The only exception to this rule is when videos imported from the phone library are assigned links to videos in the assets library (which includes UUID).

Now a few words on conflict resolution. First, minimize the risk of conflict due to the fact that only changed fields are sent rather than entire objects. If a user changes the order of videos in an album, only the information on moved videos is delivered, while the movement is expressed not in terms of numerical indexes of elements in the list, but in relation to other elements. For example, "video X moved to the position following video Y". In this case, the conflicts occur much less frequently. In the worst case, the video is moved to the end or beginning of the list, if the video Y was deleted before. However when the conflicts still arise, they are resolved according to the following principles:

At the server, such change prevents that has been sent to the server later.
On the client side, the changes received from the server are not imported until a relevant object remains in the change queue.

The application can change the current user account. At the data model level, the multiuser support is enabled by assigning the owner property to each object. In all of the screens, all database selections are filtered by the current user. Even if the user has not entered the app under any account, all the objects created by the user are assigned to a special anonymous default user. When a user logs in for the first time, all the objects that he has managed to create prior to this anonymously, are re-assigned to such a new user. When the user logs off, the application does not remove its data from the local database, so the data needs not to be re-downloaded in case of a repeated login. The application will not log off the user or switch from one account to another, until all the changes made by the current user have been sent to the server. The reason behind this is, that at logoff the server invalidates the user session ID, and the application can no longer send the data on behalf of such a user. Similarly, before certain actions can be performed, such as video sharing in social networks, it is necessary to wait until a relevant object is synced.

How to Implement Syncing with Other Applications

Vesper

Brent Simmons, the developer of the Vesper notes manager, has published a series of posts dedicated to syncing engine development for his application. He shares his considerations directly in the process of development, without knowing how the engine will look like in the end. This allows the reader to follow the entire development lifecycle with the author, and gain a better insight into the reasons behind the decisions taken. You can even affect the final result by commenting his article via Twitter.

Things

In 2012, a new version of a popular todo list app called Things 2, was released to support data syncing. Syncing was the main reason for a significant delay in the new Things version release, but despite this, they have decided to develop a syncing engine from scratch. The Things developers have published a series of posts providing a high-level overview of the issues that you may face while adiding syncing to your app, providing explanations of their methods of issue resolution.

Evernote

In the Evernote Web site you can find a very detailed syncing protocol specification that describes all details of the app’s external interaction with the server. This specification is independent of data storage mechanisms used within the app. It is based on the assumption that internal implementation may differ for different customers. Main features of the protocol: the main copy of the database is considered the server database and all clients sync with it; the conflicts are resolved on the app side, with the client side saving the server state after each syncing; both complete and incremental data uploading is supported; to enable incremental updates, each synchronized object maintains a counter that is incremented after each object change. From the Synchronization Speedupification post you can get some idea of how it is implemented on the server side.

Clear

Clear also has its own sync engine based on iCloud File Storage. The data on changes synchronized is formulated in terms of the highest possible level of abstraction, i.e., closer to the user perspective rather than to the internal implementation. This allows you to more efficiently resolve potential conflicts. All the changes are ordered by the vector clock and the timestamps, stored in iCloud and played back on all devices. For more detail of this implementation please visit the Clear blog.

***

Despite a large number of developments, public libraries, commercial services, it is not possible to formulate a direct and generic answer on which syncing is a best fit for your app. In each case it is necessary to weight all the available factors with advantages and disadvantages of a particular approach. In some cases, the best choice would be to write your own syncing code. We hope that our review will help you navigate within this sophisticated yet fascinating topic and make your best choice.

DENIVIP Media

Video apps and services, technologies and business