Getting started with Computer Vision on iPhone
I still remember my first mobile phone back in 1998. It was an Alcatel One Touch Easy. You could do thousands of things with it: calling your parents, calling your girlfriend, calling your friends, calling the police, calling a random number, ... Endless possibilities, right?
Ok, let's get serious. Needless to say, today's smartphones are orders of magnitude more powerful than my old Alcatel thanks to Moore's law. In fact, they are becoming powerful enough to execute some computer vision algorithms at decent resolutions and frame rates. Smartphones cannot yet rival desktop machines but, all in all, it is not only a question of computing power. Their portability itself is creating many new application niches for Computer Vision.
Since a few months ago, I had programmed computer vision software for desktop machines and laptops. I always thought of trying with Android or other mobile operating system, but never had time to really go deep into it. Then, I came across a potential client whose product is compatible with iPhone and iPad. It was the perfect chance to learn how to use Computer Vision within smartphone apps, while getting some extra reward out of it besides satisfying my curiosity. What I mean with this is that I am not encouraging you to stick to Apple. Circumstances led me to program for iOS, the iPhone operating system, but I am sure that Android or Windows Phone are also good options for mobile computer vision app development.
I am assuming that some readers have no background on iOS development, so you might like to skip some parts if they are too basic for you.
iOS development basics
First things first, we will start with some basic definitions. iPhone, iPod Touch and iPad work with iOS, an Apple operating system for its mobile devices. An iOS application can virtually work on any iOS device. iOS has an Application Programming Interface (API) called Cocoa Touch. Your iOS app will be calling Cocoa Touch to render the user interface, for instance. To develop the app, you will probably be using Xcode, the Apple's development environment, which is the simplest option as it lets you design user interfaces by point-and-click. With Xcode you program iOS apps with the Objective-C language. Before going on, you must know that Apple will charge you $99 per year to be part of the iOS developer program. High or low? It depends on the profit that you are going to make out of your membership. Now that you know, you can continue reading while you grumble.
The Objective-C language
If you don't have prior knowledge of Objective-C, Ray Wenderlich's iPhone tutorials are a good starting point. There are two tracks depending on your programming experience. I personally took the experienced programmer track and it really helped me out getting started with the language and grabbing the basics of Xcode.
Development environment: OS X and Xcode
In order to use the Xcode development environment, you will need a computer running OS X, the Apple's operating system for Macs. If you don't own a Mac, you can run to your local store now and spend several bucks OR you can install a Mac virtual machine. There are several virtualization solutions out there. I personally use VirtualBox on my Linux Ubuntu. I first installed it from the Ubuntu software center and I was not able to run any virtual machine. My recommendation is that you download and run the installation package at their website. That was my second option and everything worked like a charm. There is also a Windows version, if this is your case. As you will be probably using some USB 2.0 port to connect your iOS device, you should install the VirtualBox Extension Pack, which is also available at the VirtualBox download page.
The Mac virtual machine will do nothing without an operating system. If you don't have a legal copy of OS X ready to install on your virtual machine, you can try to fish some torrent on the net. iAtkos ML2, for instance, is often updated with quite current OS X Mountain Lion versions. However, I must warn you that installing an OS X virtual machine will violate Apple's End-User License Agreement (EULA).
Once your OS X machine is up and running, download and install Xcode from the Apple Developer site. After registering on the site, you can download any Xcode version for free. Before downloading any package, check the OS X version for which it is intended. Any Xcode installation that does not match your OS X version will refuse to install and it can be quite frustrating after waiting for such a large download to finish.
Tip: If you don't succeed adjusting the video mode to your display resolution with step 7 in iAtkos installation tutorial, you can simply press HostKey+C. HostKey will usually be the right control key, unless you changed it in the configuration. That will scale everything to fit your screen, at the price of heating your graphics card, though.
iOS Developer Program membership
As I said, being part of the iOS developer program will cost you $99 a year. After your payment is processed, you will be able to test your apps on a real iOS devices. If you prefer not to enroll, you still have the iPhone and iPad simulator that is shipped with Xcode, but you won't be able to access the camera, for instance. As you are willing to play with Computer Vision, that will be a major obstacle because, by the time I wrote this, I was not able to find any alternative on the net.
Once you are part of the developer program, you will have to register your devices, generate certificates and other operations that are well documented at the Apple's App Distribution Guide. I strongly recommend reading this guide if you are new to the developer program. Following the guide, you will create a Team Provisioning Profile that will allow you to test your app on all your iOS devices during development. Then, inside your Xcode project, you must select the certificates from the provisioning profile to sign you app.
After everything is set up, the apps can be run on the real iOS device by selecting it under "Product > Set Destination" before execution. Once you run an app from Xcode, it will remain installed in the device after disconnecting the USB cable.
Tip: I had some problems when trying to select my recently created provisioning profile. First, the profile was not shown in Xcode. I had to click "Refresh" at "Organizer > Library" in order to the provisioning profile to appear. However, I still could not select the profile certificates. To solve this, just restart Xcode and you will be able to select the certificates at "Project Properties > Code Signing Identity".
What libraries to use?
Ok, so now you have a full development environment for iOS apps. If you are familiar with Computer Vision, you probably want to use a library with known algorithm implementations to start with. OpenCV was probably your first thought, as it is the most popular today and contains a large bunch of state-of-the-art algorithms. We are lucky because OpenCV can run on iOS. The bad point is that OpenCV won't take any benefit of the GPU in an iOS device.
Tip: I got some errors while compiling OpenCV from the git repository. If you find problems too, you can downgrade to a previous version that is known to work. In my case, I did that with "git checkout 2.4.3" and rebuilding.
iPhones and iPads are driven by systems on chip that integrate an ARM-based CPU and a PowerVR GPU. At the moment of writing this article, OpenCV only supports NVIDIA's CUDA technology for parallel computing on GPUs. This means that, on an iOS device, OpenCV will run exclusively on the CPU.
Can we unleash the power of the GPU in our iOS device? Well, GPUs in iPhone 4, iPad and higher models offer support for OpenCL, another parallel programming technology. And there seems to be an experimental port of OpenCV to OpenCL. However, iOS does not support OpenCL yet, despite it is likely that it will in the future. So what can we do? We have three possibilities:
- Use Apple's Core Image and Accelerate framework: These are libraries that can be directly included in your project from Xcode. They have many useful functions for image processing and they make use of the GPU and CPU optimally, but don't expect to find an OpenCV-equivalent set of computer vision algorithms.
- Use GPUImage: This is an open-source framework by Brad Larson for GPU-based image and video processing on iOS. I didn't use this framework myself, but the author claims that GPUImage is faster with video than Core Image and it offers the possibility to run your own custom filters. However, it lacks some advanced features that are present in Core Image, like face detection.
- Use OpenGL shaders: most iOS device GPUs support OpenGL ES 2.0 shaders, so you can use computer vision algorithm implementations that make use of the shader language (SIFT features, for instance). Or you can directly write shaders yourself to speed up your visual algorithm. If you do it and want to share, don't hesitate to leave a comment with a link here.
In conclusion, if you need common image processing stuff, like filters, you can use the first two options in the list, as they will do the job probably faster than OpenCV's CPU implementation. However, if you need pretty advanced state-of-the-art computer vision algorithms, you might want to use OpenCV, as they will be probably there and not in the other libraries. Once your prototype app is running with OpenCV and it does what it is expected to, you might like to optimize some stages of your visual pipeline with OpenGL shaders.
[EDIT] Michele Pratusevich kindly suggested PointCloud for iOS. It is a free library that implements SLAM and image detection and tracking. I could not find whether it uses the GPU, but I bet it does if it is giving what it claims in real time.
Example with OpenCV
By using OpenCV, you will have everything you need to write a full computer vision pipeline almost without leaving the library; from reading the camera to drawing processing results on the frame or saving them to a file. There is just a small detail to pay attention to. When reading from the camera, OpenCV calls iOS AVFoundation functions and output frames have a 32-bit BGRA format instead of the 24-bit BGR format that most OpenCV functions understand. The most common workflow is converting the camera frames to 24-bit BGR format with OpenCV itself, doing your processing and converting back to 32-bit BGRA before displaying the result. There is a good tutorial here, that explains this technique and other useful aspects, like how to include the OpenCV library in your project or how to receive frames through a delegate.
I wrote a simple test application on an iPhone 4S to detect faces. It uses an OpenCV cascade classifier with Haar features. Fork the open-source project if you wish. A quick test is shown in the video below: