Posted by & filed under Android, iOS, Javascript, Presentation, Purple Robot.

The megahertz race in desktop CPUs petered out sometime in the middle of the 2000’s as Intel and AMD found it more difficult to keep increasing the clock speeds of their processors as they had done consistently for the fifteen years before. The competition shifted from who could pack the most MHz into their chips into one where the overall performance of a set of increasing processing cores became the competition.

In a multicore world, writing snappy code ceased being just about being picking the best algorithm with the smallest Big-O and began to rely on concurrent programming techniques that allowed programs to delegate their work to separate cores to process simultaneously. If I had two independent tasks that each took N seconds to complete, the program would run in 2N time in a single-core world, but only in N time in a dual core world. A process that could split its work into independent units could now run as quickly as the slowest component – not the overall sum of all of them. If programmed correctly.

For several years, mobile devices were content to run on single-core processors, but as chipsets caught up, multiple cores proliferated in handheld devices and we find ourselves reviewing specification sheets with phrases like “2.3 GHz quad-core Krait 400” (Nexus 5) and “1.4 GHZ ARMv8A dual-core CPU” (iPhone 5S). Core-mania has infected our pockets now as well.

The key to making the most of the processors on our mobile devices requires that we understand how our mobile apps run and what opportunities are available to offload work to other processors to ensure a snappy and responsive user experience. The key construct we must understand is called the event loop.

Event loops are basically structured in the following manner:

main()
{
    // App initialization goes here
    dont_quit = true;

    while (dont_quit) // this is the event loop
    {
        event = wait_for_event();

        process(event);
    }

    // App cleanup and wrapup goes here

    exit;
}

The program basically enters an endless loop where it waits for a stream of events to process. These events can be user interface interactions (a user clicked a button or typed text) or system-generated events like updating the state a progress indicator.

When we’re programming user interfaces, a term that pops up frequently is the main thread or the UI thread. A thread is a unit of execution that typically runs on a single processor. The UI thread is important in development because most GUI toolkits are not thread-safe, meaning that they are not designed to be used concurrently, so tasks like updating the label on a button must be executed on the main thread.

So, if we need to manipulate our user interfaces on the main thread, why worry about concurrent programming at all? The primary reason is that in the code above, one process(event) must complete before the next process(event) and HCI studies have experimentally shown that humans can start to perceive delays if something is taking more than 100 to 200 milliseconds. So, if one operation to update the user interface exceeds that 200ms threshold, the user will start to feel like your app is not responsive and a bit laggy.

So, once your app exceeds some basic level of functionality, you’re more likely to run into work that your app needs to do off of the main thread so that the responsiveness of your app isn’t compromised. A common pattern emerges within the code that processes the event:

process(event)
{
    do_ui_updates();

    r = block()
    {
        results = time_intensive_task();
        results += network_intensive_task();
        results += io_intensive_task();

        s = block()
        {
            do_ui_updates(results);

            return;
        };

        run_on_main_thread(s);

        return;
    };

    run_on_new_thread(r);

    return;
}

Overall, this code is structured to keep the UI updates on the main thread, but offload as soon as possible any tasks that could take a significant amount of time to another thread that can run concurrently on another core. While these time intensive tasks are running, the main event loop is free to process other events. Once the time intensive tasks complete, the independent thread constructs a set of operations updating the user interface that are run on the main thread. The outcome of do_ui_updates(results) becomes just another event posted to the event loop.

So, what are some tasks that might be time intensive? In my own work, I’ve run across the following examples in the last week alone:

  • Processing and saving content to any kind of persistent storage.
  • Resizing images to optimize limited memory use.
  • Retrieving content from the network.
  • Non-trivial database queries.
  • Time-intensive mathematical operations (e.g. fast Fourier transform).

So, given the event loop, how do we write apps in a reasonable way to make the most of processing resources available?

An iOS example for loading and resizing images from the network:

- (void) loadImageFromUrl:(NSURL *) url size:(CGSize) size block:(void (^)(UIImage *)) block
{
    AFHTTPRequestOperationManager * manager = [AFHTTPRequestOperationManager manager];

    dispatch_async(dispatch_get_main_queue(), ^{
        [UIApplication sharedApplication].networkActivityIndicatorVisible = YES;
    });

    [manager GET:[url description] 
      parameters:nil 
         success:^(AFHTTPRequestOperation *operation, id responseObject)
    {
        [UIApplication sharedApplication].networkActivityIndicatorVisible = NO;

        dispatch_queue_t backgroundQueue = dispatch_queue_create("fetch_image", 0);
        dispatch_async(backgroundQueue, ^{
            NSData * responseData = operation.responseData;

            UIImage * responseImage = [UIImage imageWithData:responseData];

            dispatch_queue_t backgroundQueue = dispatch_queue_create("resize_image", 0);

            dispatch_async(backgroundQueue, ^{
                UIImage * scaledImage = expensive_resizing_function(responseImage);

                [UIImagePNGRepresentation(scaledImage) writeToFile:path atomically:YES];

                UIImage * image = [UIImage imageWithContentsOfFile:path];

                dispatch_async(dispatch_get_main_queue(), ^{
                    block(image);
                });
            });
        });
    }
    failure:^(AFHTTPRequestOperation *operation, NSError *error)
    {
        NSLog(@"Error: %@", error);

        NSLog(@"RESPONSE: %@", operation.responseString);
    }];
}

In this example, the loadImageFromUrl:size:block: selector is typically called from the main UI thread. Blocks (code between ^{ and } symbols) look like

dispatch_async(dispatch_get_main_queue(), ^{
    [UIApplication sharedApplication].networkActivityIndicatorVisible = YES;
});

instruct the app to post a UI event to the main UI thread. When called from the main thread, this dispatch_async doesn’t do anything useful. However, when it’s called from other threads, it makes sure that the UI updates are sent to the proper thread.

Next, we call the AFNetworking library to fetch the image data and we pass it a success and failure block that defines the operations to carry out after the fetch attempt is complete. These blocks are posted to the main UI thread, so we take advantage of that to update the UI, then we create a background thread (queue in iOS jargon) to process the image:

dispatch_queue_t backgroundQueue = dispatch_queue_create("resize_image", 0);

And then we run the operation:

dispatch_async(backgroundQueue, ^{
    UIImage * scaledImage = [FCDatabase resizedImage:responseImage ofSize:size];

    [UIImagePNGRepresentation(scaledImage) writeToFile:path atomically:YES];

    UIImage * image = [UIImage imageWithContentsOfFile:path];
});

When the resized image is ready to go, we update the main UI thread:

dispatch_async(dispatch_get_main_queue(), ^{
    block(image);
});

the block usually defined something like:

^(UIImage * image)
{
    imageView.image = image;
}

The functions dispatch_queue_create, dispatch_async, and supporting types are all provided by a relatively recent iOS/Mac framework called Grand Central Dispatch. Previous versions of iOS and Mac OS X could achieve the same results using the NSThread class that dates back to the early NextStep days. Since GCD is the preferred way of implementing concurrent processing on Apple platforms, I’ll skip the NSThread equivalent and leave it as an exercise to the reader.

Android exposes a similar mechanism called Handlers that manage the creation of event loops (provided by the Looper class) on different threads. This is roughly equivalent to the Grand Central Dispatch functionality provided by Apple and is generally useful if you want to post events with delays.

However, Android also provides low-level threads that are more flexible and work in other Java contexts. Purple Robot uses threads extensively. For example, when the user wants to delete archived files (an operation that can take quite some time if enough files have accumulated), the HttpUploadPlugin.deleteArchiveFiles method is called:

public void deleteArchiveFiles(final Activity activity)
{
    final HttpUploadPlugin me = this;

    Runnable r = new Runnable()
    {
        public void run()
        {
            File pendingFolder = me.getArchiveFolder();

            activity.runOnUiThread(new Runnable()
            {
                public void run()
                {
                    Toast.makeText(activity, activity.getString(R.string.message_clearing_archive), Toast.LENGTH_LONG).show();
                }
            });

            pendingFolder.delete();
        }
    };

    Thread t = new Thread(r);
    t.start();
}

In this case, we’re passing in the Activity object from which the method is called. The first thing we do is set up a new Runnable (r, analogous to a GCD block) and create a thread to run it. Within r, we update the main user interface via activity.runOnUiThread to show a toast notification and then we delete the folder. Once the deletion is complete, the parent method completes and the thread is destroyed.

The disadvantage in using threads in this manner (as opposed to the Handler or GCD queues) is that it opens up the possibility of flooding the system with threads if the software is poorly written. Keep in mind that (active) threads are useful only as long as you don’t exceed the processor core count. If you have more threads than cores, you’re only creating additional work for the local device since it needs to manage the thread bookkeeping along with everything else. Any non-trivial app will exceed the core count when its threads are added to those of the OS and any concurrent apps, so don’t worry it too much – just don’t go overboard. (The specific issues are beyond the scope of this post.)

So to get back to the title of this post, how do these threading models compare with single-threaded event-driven systems like Twisted or Node.JS/V8? These frameworks work off of the same basic event loop model. When you do an operation such as fetching a network resource or loading an image (in a browser context), the underlying runtime handles that work for you in a thread (from a managed internal pool) that it creates and it fires events back to your main event loop that your code can handle via the provided callbacks.

In these contexts, any code that you write is still executing in the main thread. If you’re working on an application where your code isn’t CPU-intensive, this model works well because all the work can be done well below the limit of human lag perception.

However, if you’re doing something non-trivial (such as running a pure-JS Fourier transform on a non-trivial amount of data), the system will begin to exhibit lag and it’s likely that your users’ operations will pile up as they wait for them to be processed by the main event loop. This is the point where treating asynchronous operations as functionally equivalent to concurrent processing breaks down. On the single-core machines from the past, this distinction was largely academic (indeed, events were better in this case due to lower bookkeeping overhead). However, as the Megahertz Race peters out in the mobile realm and we repeat the experiences of the past by stuffing more cores into our mobile devices, our ability to implement more computationally-intensive tasks will depend upon making the most of all of our local processors instead of relying on an individual cores to become faster on their own.