The Lessons of Functional Programming for Cocoa Developers
I’ve been reading Ash Furrow’s new book Functional Reactive Programming on iOS. I’ve always had a vague idea of what Functional Programming (FP) was about, but this book, having been written for Objective-C programmers, offered a good opportunity to learn more.
And it is a good introduction. It’s not dauntingly long, so if you have a passing interest, you can easily read it in a few hours without having to commit to a whole new career. I recommend you take a look if FP is something that you have wondered about.
Having said that, I’m not convinced I am going to throw everything out and start again with the ReactiveCocoa framework, which is the focus of the book. I think there are lessons to be learned from it, but to me it doesn’t feel like a native Cocoa API, and I’m not convinced that code becomes more readable when you adopt it.
I’m not completely sold on ReactiveCocoa, but I do see many advantages to FP in general, and plenty of lessons to take away for improving “imperative” code bases. Some of these lessons I’ve picked up over the years via other channels, but they are central pillars of FP.
For example, mutability of data is considered as close to evil as you can get in FP. I think most experienced Cocoa developers are also aware of the advantages of not allowing data to be changed. It simplifies code when you can assume data is immutable. When you know that a property is readonly, you know it will never change behind your back, and you don’t have to account for all the ways it can change.
I’ve taken this approach to the data that gets stored in the cloud by my new Core Data Ensembles sync framework. I made a decision early on to keep cloud data storage as simple as possible, and to make all files there immutable. Once a file is written, it never gets overwritten or modified. This makes it easier to understand, and less likely conflicts caused by concurrent changes on different devices will confuse the file syncing service (eg, iCloud). The same approach to data mutability has made functional languages such as Erlang ideal for programming complex concurrent systems like telephone networks.
FP also frowns on side effects. If you take a look through the Ensembles source code, you will notice I often use method arguments where I could easily pass data via instance variables. I prefer the transparency of a method argument, and the flexibility it provides. FP takes it even further, but even standard Objective-C classes benefit from a concerted effort to reduce side effects.
Asynchronicity
Closely tied to FP, and something I have struggled with recently, is how best to handle asynchronicity in Cocoa apps. Asynchronicity is rife in a sync framework like Ensembles, which has long running background operations and plenty of network activity.
I have quite a bit of experience developing with a forerunner of Ensembles, TICoreDataSync. This taught me that there are a few things that probably should be avoided. In particular, TICoreDataSync makes extensive use of delegate callbacks to handle asynchronous tasks, and you end up with something of a delegate method forest. Delegates are fine for certain tasks in moderation, and they are used sparingly in Ensembles, but overuse makes program flow very hard to follow. It may be the closest thing to spaghetti code in modern programming.
Completion Blocks
An alternative to delegate methods — one that was not mature when TICoreDataSync was first being developed — are block callbacks. Blocks provide a neat way to track task completion. They are much easier to read, because the completion code follows directly after the initiating code. The completion code also has direct access to the same data scope as the initiation code.
Problem solved, right? Well, not exactly. As any experienced developer will tell you, when you start to use completion blocks, you tend to need more completion blocks, and you often end with deep nesting. Readable synchronous code like this
- (void)merge
{
[self checkCloudFileSystemIdentity];
[self processPendingChanges];
[self.cloudManager importNewRemoteEvents];
CDERevisionNumber lastMerge = [self.eventStore lastMergeRevision];
[self.eventIntegrator mergeEventsImportedSinceRevision:lastMerge];
[self.cloudManager exportNewLocalEvents];
}
turns into this when you add asynchronous callbacks
- (void)mergeWithCompletion:(CDECompletionBlock)completion
{
[self checkCloudFileSystemIdentityWithCompletion:^(NSError *error) {
[self processPendingChangesWithCompletion:^(NSError *error) {
[self.cloudManager importNewRemoteEventsWithCompletion:^(NSError *error) {
CDERevisionNumber lastMerge = [self.eventStore lastMergeRevision];
[self.eventIntegrator mergeEventsImportedSinceRevision:lastMerge completion:^(NSError *error) {
[self.cloudManager exportNewLocalEventsWithCompletion:^(NSError *error) {
}];
}];
}];
}];
}];
}
(I’ve removed error handling to make the examples easier to read.)
This code represents a sequence of asynchronous tasks that need to take place in a sync operation. Each stage of the sync is embedded in the completion block of the previous one. The fact that the code is laid out sequentially makes it easier to read than delegate callbacks, but the nesting is ugly at best.
What about NSOperation?
At this point you may be thinking: “Why don’t you just use an NSOperationQueue?” In all honesty, I thought exactly the same thing. But there is something about operation queues that makes me feel uneasy when applied to tightly controlled sequences of tasks.
To me, operation queues work best when you want a fire hydrant of independent operations. Yes, you can setup dependencies, but there are no facilities for error handling, so if something goes wrong, it all becomes a bit ad hoc. You have to cancel remaining operations, and figure out how to transfer an NSError back to where it matters.
On top of that, writing an NSOperation subclass to support your asynchronous task is a bit hairy. It is all doable, but not very user friendly.
All in all, it just didn’t feel like the existing queueing mechanisms gave me the surgical control I wanted for my problem.
Method Chaining
A better approach, when you want to maintain tight control over asynchronous tasks, is method chaining.
- (void)mergeWithCompletion:(CDECompletionBlock)completion
{
[self checkCloudFileSystemIdentityWithCompletion:^(NSError *error) {
[self performProcessPendingChangesWithCompletion:completion];
}];
}
- (void)performProcessPendingChangesWithCompletion:(CDECompletionBlock)completion
{
[self processPendingChangesWithCompletion:^(NSError *error) {
[self performImportNewRemoteEventsWithCompletion:completion];
}];
}
- (void)performImportNewRemoteEventsWithCompletion:(CDECompletionBlock)completion
{
[self.cloudManager importNewRemoteEventsWithCompletion:^(NSError *error) {
[self performMergeEventsImportedSinceRevisionWithCompletion:completion];
}];
}
- (void)performMergeEventsImportedSinceRevisionWithCompletion:(CDECompletionBlock)completion
{
CDERevisionNumber lastMerge = [self.eventStore lastMergeRevision];
[self.eventIntegrator mergeEventsImportedSinceRevision:lastMerge completion:^(NSError *error) {
[self performExportNewLocalEventsWithCompletion:completion];
}];
}
- (void)performExportNewLocalEventsWithCompletion:(CDECompletionBlock)completion
{
[self.cloudManager exportNewLocalEventsWithCompletion:completion];
}
This fixes the nesting problem, but it suffers a bit from the same ailments as the delegate approach. Your class gets polluted with methods that don’t really do a lot. Not only that, we have lost the clear flow of the original code, which declared the explicit sequence of steps involved in carrying out a merge. That sequence is now implicit in the method chaining, and not immediately obvious at first glance.
CDEAsynchronousTaskQueue
While struggling with this, I was also playing with node.js. Node.js is a server-side Javascript technology that makes extensive use of asynchronicity and block callbacks. As a result, the community has had to come up with ways of dealing with the same nesting problem.
One of the approaches used is to add a queue class that contains an array of task blocks, and executes them in order. It passes a callback function when launching each task, and the task invokes that callback when it is done, so the next task can be dequeued and launched. This effectively flattens those callback hierarchies into a sequential queue of tasks.
The CDEAsynchronousTaskQueue class in Ensembles is based on this approach. The class itself, although relatively simple, has some interesting attributes:
- It is an
NSOperationsubclass, and can be added to anyNSOperationQueue. - Used with a single task, it provides an easy way to make an asynchronous
NSOperationfor any asynchronous task. - You can initialize a queue with a single task, and ask for it to be repeated a number of times.
- It has several different error handling policies. For example, it can terminate if an error is encountered, or continue regardless of errors. It can also repeat a task until it succeeds, which is useful for networking calls that are inclined to fail.
- It returns an
NSErrorupon completion, and will combine multiple errors into a single error where necessary.
Using CDEAsynchronousTaskQueue, the merge method looks like this
- (void)mergeWithCompletion:(CDECompletionBlock)completion
{
CDEAsynchronousTaskBlock checkIdentityTask = ^(CDEAsynchronousTaskCallbackBlock next) {
[self checkCloudFileSystemIdentityWithCompletion:^(NSError *error) {
next(error, NO);
}];
};
CDEAsynchronousTaskBlock processChangesTask = ^(CDEAsynchronousTaskCallbackBlock next) {
[self processPendingChangesWithCompletion:^(NSError *error) {
next(error, NO);
}];
};
CDEAsynchronousTaskBlock importRemoteEventsTask = ^(CDEAsynchronousTaskCallbackBlock next) {
[self.cloudManager importNewRemoteEventsWithCompletion:^(NSError *error) {
next(error, NO);
}];
};
CDEAsynchronousTaskBlock mergeEventsTask = ^(CDEAsynchronousTaskCallbackBlock next) {
CDERevisionNumber lastMerge = [self.eventStore lastMergeRevision];
[self.eventIntegrator mergeEventsImportedSinceRevision:lastMerge completion:^(NSError *error) {
next(error, NO);
}];
};
CDEAsynchronousTaskBlock exportEventsTask = ^(CDEAsynchronousTaskCallbackBlock next) {
[self.cloudManager exportNewLocalEventsWithCompletion:^(NSError *error) {
next(error, NO);
}];
};
NSArray *tasks = @[checkIdentityTask, processChangesTask, importRemoteEventsTask, mergeEventsTask, exportEventsTask];
CDEAsynchronousTaskQueue *taskQueue = [[CDEAsynchronousTaskQueue alloc] initWithTasks:tasks
terminationPolicy:CDETaskQueueTerminationPolicyStopOnError completion:^(NSError *error) {
if (completion) completion(error);
}];
[taskQueue start];
}
Tasks are represented by blocks of the type CDEAsynchronousTaskBlock. This block takes a single argument, which is a callback block itself.
CDEAsynchronousTaskBlock processChangesTask = ^(CDEAsynchronousTaskCallbackBlock next) {
[self processPendingChangesWithCompletion:^(NSError *error) {
next(error, NO);
}];
};
The asynchronous task you want to perform is initiated inside the block, and must callback when it is finished. So in the example above, the next callback is called in the completion block of processPendingChangesWithCompletion:. The arguments for the callback are an NSError, which should be nil upon successful completion, and a parameter for whether the queue should stop prematurely. Passing NO for this causes the queue to continue.
Once a number of tasks have been defined, you kick off the queue like this
NSArray *tasks = @[checkIdentityTask, processChangesTask, importRemoteEventsTask, mergeEventsTask, exportEventsTask];
CDEAsynchronousTaskQueue *taskQueue = [[CDEAsynchronousTaskQueue alloc] initWithTasks:tasks
terminationPolicy:CDETaskQueueTerminationPolicyStopOnError completion:^(NSError *error) {
if (completion) completion(error);
}];
[taskQueue start];
An array of tasks is passed to the new queue, together with a termination policy and completion block. At this point, you can invoke start to begin running tasks immediately, or submit the whole CDEAsynchronousTaskQueue to an NSOperationQueue.
I’m the first to admit the code is a bit verbose — it is quite a bit cleaner in node.js — but there are some advantages to this approach over the options listed earlier.
First, it is declarative, in the sense that you clearly define what your tasks are, and state explicitly that you want them executed in a certain order, all inside the mergeWithCompletion: method. It may be a bit verbose, but intentions are completely clear. There is no implicit sequence hidden in a cluster of delegate methods. The program flow is clear, once you understand how the CDEAsynchronousTaskQueue works.
Second, you declare explicitly how you want errors to be handled. There is nothing ad hoc or implicit in the error handling. You choose the policy that is appropriate, and you state that in your code.
The CDEAsynchronousTaskQueue is quite well tested now, being a linchpin of the Ensembles framework, and is a relatively standalone class. If you want to use it in your own project, just grab the source from GitHub.