The Mental Blog

Software with Intellect

6 notes

Under the Sheets with iCloud and Core Data: Seeding iCloud

Before you think about the fine details of syncing your Core Data via iCloud, you need to answer some big questions: How are you going to enable and disable iCloud in your app, and how are you going to seed iCloud with data when syncing gets enabled?

In an ideal world, it would work like this

  • You bring a new app to market
  • An end-user installs the app on one device with iCloud activated
  • The app creates a new store with ubiquity options enabled
  • Core Data writes all changes to the iCloud container
  • The end-user installs the app on a second device with iCloud active
  • The app imports transaction logs from the first device
  • The user never turns off iCloud on any device
  • The user never removes the iCloud Data for your app
  • The user never switches iCloud accounts

Unfortunately, this utopia is far from the reality that most developers will have to face. There are lots of corner cases, and a reasonable chance that you could end up being the dunce. For example,

  • What if the user disables iCloud for a while?
  • What if the user wipes iCloud data for the app?
  • What if they switch iCloud accounts for a bit, perhaps to allow someone else to lend their device?
  • What if your app is not new, but has existed for some time, and the user could have existing data, perhaps across multiple devices?
  • What if your app is not new, and you have a pre-existing method of sync (eg, Wi-Fi, Dropbox, MobileMe), such that — unbeknownst to iCloud — there is logically-equivalent data on multiple devices.

When dealing with user data, you can’t be blaise. As an argument, “it seems unlikely, so let’s just ignore these issues” is not going to cut it.

iCloud Does Not Repeat Itself

When making a decision about how to handle your iCloud setup, it is important to grasp a fundamental implementation detail of iCloud: it will only move a piece of data in or out at most once on any device — it never repeats.

For example, if someone is entering or making changes on a device where iCloud is turned off, any changes they make will not generate change logs, and those changes will be lost forever to iCloud. They will not automatically appear when iCloud is switched back on.

Also, once Core Data has imported some iCloud data on a device, that data will not be imported again, even if a new local store is created. You might think that if you delete the local store, then bring up the Core Data stack, it would see that the store is empty and reimport all changes from iCloud. That does not happen. As far as Core Data is concerned, the store already has the changes, and your store will remain empty.

Stoppages

I’ve already hinted that changes made after iCloud is disabled, either via a switch in your app, or via the System Preferences, are permanently lost as far as syncing is concerned. This has some important consequences for your app.

For a start, any objects inserted while iCloud is disabled, and other changes made to the data, will not appear on other devices, even after iCloud is re-engaged. That means the user will see two distinct sets of data. They will wonder why the objects they see on one device are not appearing on the other. But that will likely be the least of the problems…

With inconsistent data sets on different devices, transaction logs may no longer import properly. Imagine that an object was created on Device A while iCloud was disabled, and saved. The user then enables iCloud again, and makes a change to the object. Device B will now receive a transaction log for an update to an object which — as far as it is concerned — does not exist. Import of the log will fail, including any other changes that happen to be committed during that save transaction.

The more entangled the ‘missing’ objects are with the rest of your object graph, the more likely they will be involved in an update, and cause transaction log imports to fail. To the user, it just looks like iCloud is not syncing properly.

Stoppages are bad news, even if you happen to have an app that otherwise fits perfectly with the utopian workflow described earlier. The only way to get the user’s data consistent again is to destroy iCloud’s data container, allow one device to refill it with data, remove local stores on other devices, and allow them to re-import the new data from the iCloud container.

iCloud Seeding

If the picture painted above doesn’t seem very rosy, what can you do about it? The first decision you will need to make is how you will handle data inconsistencies, and what your policy will be for seeding iCloud with data when they arise. You need a way to coordinate devices such that eventually they are all working on the same logical data set.

To try to make the decision easier, I’m going to layout the options as explicitly as possible in the coming sections.

Vanilla Seeding

The first option is to follow Apple’s basic prescription entirely. You simply setup all your stores with the ubiquitous options enabled. If your app is new, users don’t have pre-existing data, and they never log out of iCloud, this should work quite well.

But if a stoppage occurs, and syncing becomes unreliable, the fix requires considerable manual user intervention, and is not very palatable. To end up with consistent data across all devices, the user will have to

  • Quit/Kill the app on all devices
  • Delete the app’s iCloud container via System Preferences on all devices
  • Remove the app’s local data store on all but one device
  • Launch the app on the device with the intact local store, allowing it to seed the iCloud container.

When the app is next launched on the other devices, it should import the newly seeded data, and all devices should have a consistent set. But this is far from user friendly.

Sophie’s Choice Seeding

If you don’t want your users to have to go through this, or you have a more complex scenario where there may be existing data on multiple devices, you will probably need a more sophisticated approach. In the Sophie’s Choice approach, whenever inconsistencies arise, the user will get to choose whether they want to keep the data from the cloud, or the data from the local store. This should always result in a consistent data set across devices.

It is not quite as straightforward as just giving the user a free choice of data set though. If the device has not previously been syncing, and thus has no transaction logs in the iCloud container, they can indeed choose to keep either the local or the cloud data. But if the device has been syncing, and has logs in iCloud, the only option is to replace the iCloud data with the local data set. This is because of the constraints discussed earlier, namely that Core Data will not reimport data it has already imported from iCloud. Deleting the local store and re-enabling ubiquitous options will not lead to all of the iCloud data being imported.

The Sophie’s Choice variant of seeding requires that your app knows if the device has previously been synced, and if the container has been reset, in order to offer the user the appropriate options when inconsistencies arise. At the time of writing, Core Data gives you no notification of these state changes. In a future post, I will introduce so-called sentinel files, which will reside in the iCloud container, and through continuous monitoring can notify of changes to the set of syncing devices.

Every-Drop-is-Sacred Seeding

If the idea of requiring the user to choose between data sets is not attractive, the last option you have is to attempt to merge data sets whenever a new device is added to iCloud. Unfortunately, if a stoppage occurs, the user will still be forced to fully replace their cloud data in order to guarantee a consistent set of data, but at least it will be possible to merge existing data when devices first start to sync.

One problem with this approach is that you may end up with duplicate data. To fix this, you will want to have a globally unique identifier for your main entities. You can then weed out and delete the duplicates. You will need to do a sweep for duplicates whenever iCloud data is merged.

Migrating Data into iCloud

Getting data into iCloud is generally not as straightforward as just enabling the ubiquitous store options. You have to be very aware of what data is already in iCloud, and which devices have contributed to that data. Once you know that, you may have to explicitly migrate data from a non-ubiquitous store to a ubiquitous store, to allow Core Data to generate transaction logs.

If the iCloud container is empty, such that the current device will be the first to sync, you can simply engage the ubiquity options, and the data will be added to iCloud, no migration needed. Core Data generates a baseline in iCloud, which is effectively a copy of your local store. There has been some discussion as to whether the baseline is actually used, or is just in place for future functionality, but my tests seem to show the baseline is in use. (Update: Although the baseline is used — the initial data is transferred to other devices — future changes to that data do not seem to propagate properly. At this point, I advise against relying on the baseline for seeding initial data.)

The baseline is only generated once, for the first iCloud-enabled device. If you have existing data on other devices, it will have to be migrated in, or it will not be synced between devices. Because of this limitation, you may be better off migrating data in on all devices, even the first to sync.

So how do you actually perform the migration? We will consider this in more detail in the next post, but there are basically two approaches:

  1. Use the persistent store coordinator’s migratePersistentStore:toURL:options:withType:error: method, together with the appropriate ubiquity options, to perform a wholesale migration of the entire non-ubiquitous local store to a ubiquitous one.
  2. Add the ubiquitous and non-ubiquitious stores to the persistent coordinator, and copy objects from one to the other in code.

The later gives you more control over what is imported, but obviously requires more effort.

To Switch or Not to Switch

One question that I have not addressed in this discussion is whether your app should have a switch that allows iCloud syncing to be disabled independent of the global iCloud settings.

If you have very little data to sync, I would suggest not bothering the user with a switch. Just assume that if the user is logged into iCloud, they want the app to sync. On the other hand, if your app needs to sync a lot of data, eg. media files, it is probably best to give the option to disable syncing.

There is a third way: if your app does generate a lot of data, but most of it does not need to be synced, you could include the entities that do need to be synced in one ubiquitous store, and the rest in a non-ubiquitous store. Cross store relationships have to be weak — you have to fetch objects using some identifying property such as a URL or UUID — but this is a relatively small inconvenience.

Next Time…

This has been a very high level discussion of some of the issues you encounter when seeding iCloud with data. Next time, I will introduce a test app, and dig down into the code that you will need. If you want to get an early look, the test app is already on GitHub.

Filed under iCloud Coredata software Mac

  1. mentalfaculty posted this