Does Core Data Sync Quack?
After a spate of articles on the topic, it’s been quite a while since I posted anything about iCloud/Core Data sync. To be honest, I hit some serious stumbling blocks, and ended up giving up on it entirely.
I did end up shipping Mental Case for Mac with iCloud sync, but to do so I adopted a customized fork of the TICDS framework. Mental Case 2 for iOS is just around the corner, and I am using the same sync solution there. (If you are interested in going that route, I recommend first giving the main repository a try. They pulled back my iCloud changes recently.)
Since my original series of posts, many well-known developers have spoken out on the issues that still plague iCloud when used in combination with Core Data. Some have spoken out against the principle of building sync into Core Data directly, and others (eg Black Pixel, Bare Bones, Jumsoft) on their first-hand experiences.
Two years after the technology was first introduced, very few apps have shipped with iCloud/Core Data support. I know of only two — Time Butler and ToDoMovies — each of which has a relatively simple entity model, and very little data to transfer. Perhaps most telling is that the only Apple app that seems to use the technology is iTunes Movie Trailers, which presumably also has a very simple model and little data.
With WWDC just around the corner, we again start to ponder if this will be the year Apple gets iCloud right. But the more I dig into iCloud/Core Data sync, the more I have come to realize that even if it worked as designed, it still may be quite flawed as a solution. They may have gotten it wrong from the outset, and some design failures are probably not easily addressed. What follows is a list of what I think is fundamentally wrong with iCloud/Core Data’s design, leaving aside any of the practical failures that we have witnessed in the past.
iCloud/Core Data issues that may never be resolved:
1) Brent Simmons has already criticized the approach of coupling sync so tightly with Core Data. I don’t think the problem is that Core Data is involved — after all, what use would an object graph management framework be if it didn’t help you manage your objects, including synchronization — but I do agree that Core Data sync should not be so tightly coupled to a single storage mechanism, namely iCloud. By not allowing for extension to other storage facilities, Core Data sync is no option for many categories of app, including those not in an App Store, cross-platform apps, apps that require secure data storage, and apps with developers who simply don’t want to be locked into a single service. In short, sync in Core Data should be an interface that can work with many backends, not just iCloud.
2) Core Data sync currently offers no developer access to the process. It seems Apple expected to be able to make a completely generic sync algorithm that would work in every case, but that was not realistic. Five minutes thinking through the problems that can arise quickly tells you that it is not possible to come up with generic solutions for many cases. Developer intervention is needed in all but the most trivial examples. For example, there is no way for the developer to influence conflict resolution. If conflicts arise, Core Data either decides how to proceed itself, or halts syncing altogether. Some have endeavored to work within these constraints, using hacks to detect conflicts and recover (eg UbiquityStoreManager project, Tom Harrington), but these attempts just make it even more evident that the API of Core Data sync is fundamentally flawed.
3) Because the developer is not invited to the process, the first you get notified of sync changes is after they are resident in the persistent store. In other words, in many situations, your store will actually contain data in an invalid state. Apple recommends ‘cleaning up’ (eg deduping, applying validation constraints) when the merge notification is fired, but I would argue that this is too late. It is OK for a managed object context to be in an invalid state, but your on-disk store should never been invalid. The fact that Core Data sync requires that your store become invalid is a fundamental design flaw of the API.
4) Sync has been built into Core Data with minimal changes to the existing API. Basically, there are a few new notifications, and new metadata properties on the NSPersistentStore class. As a result, many aspects of sync are arguably too tightly coupled to existing classes. For example, if the user logs out of their iCloud account while an app is running, the developer must immediately tear down their Core Data stack, or expect a crash. Not only that, they must migrate data out of the persistent store to a new store, or move to a completely different (un-synced) store. It’s a big mess, and you are left wondering why. Why does the persistent store even need to know it is actively being synchronized via iCloud? Presumably, a better solution would be to have another class monitoring saves into the store, and handling that completely independently. The Core Data stack should not be so fragile, and should continue to function, with sane solutions to sync interruptions offered.
5) Related to 4). Changes to iCloud availability make a developer’s life very difficult, requiring reloading and migration of data stores. Apple produced code to demonstrate how to handle this a year after the technology was first introduced, but it is far from a satisfactory solution. It seems like they didn’t give much thought to store management until it was too late, and then had to resort to obscure workarounds to make the system work. Offering a new persistent store manager class that handles the most important cases would be more sensible and make the technology a lot easier for developers to adopt.
6) Take an app like ToDoMovies, turn on iCloud, and mess with it on two devices at the same time. It doesn’t take long before each device shows different data, and remains that way for all eternity, never ending up back in a fully-synced state. This points to pretty serious flaws in the algorithms being used behind the scenes. I would rather Core Data ensure exactly the same operations end up being applied to each data store, and then require developer intervention to manually massage a potentially invalid state before saving merges, than have data end up systematically out-of-sync.
Who knows, maybe Apple will get iCloud working well enough without addressing these issues to make it a success. But I doubt it. I fear Core Data sync in its current form is a lame duck.