Movidius Takes Deep Learning to The Edge
Neural Compute Stick makes AI possible in low power embedded applications.
Artificial intelligence (AI) is one of those things that, like clean energy from nuclear fusion, has for decades held the promise of being set to have a profound impact on society — and advances made in recent years have seen AI finally being put to use in a greater number of practical applications. However, an enduring challenge has been that many key applications, such as vision processing, are computationally intensive and thereby rule out local processing in low power devices.
Movidius — now an Intel company — is changing this with its Myriad 2 Vision Processing Unit (VPU), which delivers software programmable machine vision with an ultra low power budget.
In this post we take a first look at the Myriad VPU technology and associated SDK, and get hands-on via the Neural Compute Stick (NCS) which enables low cost development and prototyping.
100Gflops for 1 watt
Source: Movidius Myriad 2 VPU Product Brief
The Movidius Neural Compute Stick (139-3655) delivers fantastic performance for vision processing applications, of over 100GFlops for around only 1 watt energy consumption. Achieving this via a 600MHz system-on-chip (SoC) that integrates 12x VLIW 128-bit processors that are centred around 2MB of 400Gbps transfer-rate on-chip memory.
Key features include:
- Support for FP16, FP32 and integer operations with 8-, 16- and 32-bit accuracy
- Real-time, on-device inference without Cloud connectivity
- Deploy existing convolutional neural network (CNN) models or uniquely trained networks
- All data and power provided over a single USB 3.0 port on a host PC
While it is equipped with USB 3.0, a USB 2.0 port is likely to suffice for many uses. Note also that for increased performance multiple NCS can be networked via a suitable USB hub.
Support for deep learning applications is provided via the open source Caffe framework that is developed by Berkeley Artificial Intelligence Research (BAIR) Lab. Caffe’s guiding principles are:
- Expression: models and optimizations are defined as plaintext schemas instead of code.
- Speed: for research and industry alike speed is crucial for state-of-the-art models and massive data.
- Modularity: new tasks and settings require flexibility and extension.
- Openness: scientific and applied progress call for common code, reference models, and reproducibility.
- Community: academic research, startup prototypes, and industrial applications all share strength by joint discussion and development in a BSD-2 project.
The project website provides a tour of the Caffe anatomy and functions. If like me you have little experience with AI technology, there is quite a lot to take on board. However, fear not, as the Movidius Neural Compute (NC) SDK packages the framework together with hardware support, plus example applications that make use of existing neural network models. In other words, you can be up and running in no time, evaluating the performance of networks and the NCS hardware.
A computer running Ubuntu 16.04 is required in order to install the NC SDK. It may be that other distributions could be made to work, but this is the only one specified.
What follows is a summary of steps taken and it is recommended to download the Getting Started Guide PDF and other official documentation for more detailed instructions.
Once downloaded the NC SDK can be installed with:
$ sudo apt-get update
$ sudo apt-get upgrade
$ tar xvf MvNC_SDK_1.07.07.tgz
$ tar xzvf MvNC_Toolkit-1.07.06.tgz
$ cd bin
This may take a little while as the toolkit together with associated dependencies are installed. Note that the setup script updates your ~/.bashrc file to set the PYTHONPATH environment variable accordingly. E.g. with the default install location the line added is:
If other users needed to be able to use the SDK they would also need to add this line to their ~/.bashrc file.
Assuming we’re still in the bin directory, we can then install Caffe models for the example code:
$ cd data
If we go back to the directory where the SDK was first extracted to, we can now install the API:
$ cd ../..
$ tar xzvf MvNC_API-1.07.07.tgz
$ cd ncapi
There are a series of examples that can then be run from the bin directory to test NCS operation.
$ cd ../bin
$ make example00
$ make example01
$ make example02
$ make example03
Video stream infer example
Python stream_infer example
Now we can get on to a more interesting example! This requires a video input device and we decided to use the Logitech C920 Full HD Webcam (125-4272), as it has excellent Linux support.
$ cd ../ncapi/tools
$ cd ../py_examples/stream_infer
$ python3 stream_infer.py
Ouch! 60.60% shower cap?! OK, let’s not shoot the messenger — the NCS was simply doing the heavy lifting and we have the convolutional neural network (CNN) used to thank for the inference made. Which to be fair, is perhaps in some way understandable. In any case, we can see above that the same example did much better with a coffee mug and it similarly did so with other objects.
What’s important here is the speed at which our AI is able to make inferences — and for the energy used, it certainly feels to be making these fast.
The default model employed by the stream_infer.py example is called SqueezeNet, a CNN that achieves accuracy similar to one called AlexNet that pre-dates it by some 4 or so years, and which was trained to classify the 1.3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1,000 different classes. Albeit SqueezNet is stated as achieving this with a model that is 510x smaller than AlexNet’s.
stream_infer.py can be configured to use either SqueezeNet or AlexNet, allowing their performance on the NCS to be compared. It’s simply a matter of (un)commenting lines near the top of the Python file. There are also Gender and GoogleNet models that can be configured in the same way. E.g.:
NETWORK_IMAGE_WIDTH = 224 # the width of images the network requires
NETWORK_IMAGE_HEIGHT = 224 # the height of images the network requires
NETWORK_IMAGE_FORMAT = "RGB" # the format of the images the network requires
NETWORK_DIRECTORY = "../../networks/GoogLeNet/" # directory of the network
The stream_infer.py example will look for “graph", "stat.txt" and "categories.txt" files in NETWORK_DIRECTORY. If we compare the size of the graph file for AlexNet and SqueezeNet:
Not quite a difference of 510x for the binary graph, but still a significant size difference.
Network compilation and profiling
Source: Movidius NCS Tookit Documentation
New CNNs — e.g. for classifying types of image not already covered by an existing CNN — would be designed and trained using an appropriate framework. Following which the network can be compiled to a graph file and profiled using the NCS Toolkit that is supplied as part of the SDK.
Targeting embedded platforms
Although the API must initially be installed on the same computer as the Toolkit (an Ubuntu 16.04 x86-64 system), the libraries and headers etc. can subsequently be installed on other platforms. And in fact a set of packages for Raspbian Jessie are provided with the SDK. Meaning that upon installing these plus dependencies from the Raspbian repo, it leaves just one line in stream_infer.py to be modified in order to get this example up and running on a Raspberry Pi.
Movidius VPU selected for 4K VR pixel processing in new Motorola Moto Mod. Source: movidius.com
Machine vision applications include:
- Drones and robotics
- Augmented and virtual reality
- Smart security
It’s easy to see how the Myriad VPU could be put to use in security cameras that, for example, identify a vehicle parked in the driveway or distinguish a burglar from a pet. You could equally imagine it adding a great deal of value in a household robot — e.g. vacuum cleaner — and drone applications, where you might want to avoid or seek out certain objects. These are just some of the uses for the technology and there are quite clearly going to be a great deal more.
The Movidius technology is already enabling the practical application of AI in many real world applications and continues to find use in cutting edge products, such as 4K VR pixel processing for smartphone add-ons and sense-and-avoid systems for drones. The availability of the Neural Compute Stick means that anyone can immediately start to experiment with the Myriad 2 VPU, and with ease add powerful deep learning capabilities to existing embedded platforms.