Movidius Takes Deep Learning to The Edge

star_borderFollow article

Andrew Back 25 Aug 2017

2star_border 0question_answer 3thumb_up

Your next article

Dave from DesignSpark

How do you feel about this article? Help us to provide better content for you.

Dave from DesignSpark

Thank you! Your feedback has been received.

Dave from DesignSpark

There was a problem submitting your feedback, please try again later.

Dave from DesignSpark

What do you think of this article?

Neural Compute Stick makes AI possible in low power embedded applications.

Artificial intelligence (AI) is one of those things that, like clean energy from nuclear fusion, has for decades held the promise of being set to have a profound impact on society — and advances made in recent years have seen AI finally being put to use in a greater number of practical applications. However, an enduring challenge has been that many key applications, such as vision processing, are computationally intensive and thereby rule out local processing in low power devices.

Movidius — now an Intel company — is changing this with its Myriad 2 Vision Processing Unit (VPU), which delivers software programmable machine vision with an ultra low power budget.

In this post we take a first look at the Myriad VPU technology and associated SDK, and get hands-on via the Neural Compute Stick (NCS) which enables low cost development and prototyping.

100Gflops for 1 watt

Source: Movidius Myriad 2 VPU Product Brief

The Movidius Neural Compute Stick (139-3655) delivers fantastic performance for vision processing applications, of over 100GFlops for around only 1 watt energy consumption. Achieving this via a 600MHz system-on-chip (SoC) that integrates 12x VLIW 128-bit processors that are centred around 2MB of 400Gbps transfer-rate on-chip memory.

Key features include:

Support for FP16, FP32 and integer operations with 8-, 16- and 32-bit accuracy
Real-time, on-device inference without Cloud connectivity
Deploy existing convolutional neural network (CNN) models or uniquely trained networks
All data and power provided over a single USB 3.0 port on a host PC

While it is equipped with USB 3.0, a USB 2.0 port is likely to suffice for many uses. Note also that for increased performance multiple NCS can be networked via a suitable USB hub.

Caffe support

Support for deep learning applications is provided via the open source Caffe framework that is developed by Berkeley Artificial Intelligence Research (BAIR) Lab. Caffe’s guiding principles are:

Expression: models and optimizations are defined as plaintext schemas instead of code.
Speed: for research and industry alike speed is crucial for state-of-the-art models and massive data.
Modularity: new tasks and settings require flexibility and extension.
Openness: scientific and applied progress call for common code, reference models, and reproducibility.
Community: academic research, startup prototypes, and industrial applications all share strength by joint discussion and development in a BSD-2 project.

The project website provides a tour of the Caffe anatomy and functions. If like me you have little experience with AI technology, there is quite a lot to take on board. However, fear not, as the Movidius Neural Compute (NC) SDK packages the framework together with hardware support, plus example applications that make use of existing neural network models. In other words, you can be up and running in no time, evaluating the performance of networks and the NCS hardware.

SDK install

NCS_example031_b6dc86b7d1ac74ed95bfb384e88e9a3386b55ad6.jpg

A computer running Ubuntu 16.04 is required in order to install the NC SDK. It may be that other distributions could be made to work, but this is the only one specified.

What follows is a summary of steps taken and it is recommended to download the Getting Started Guide PDF and other official documentation for more detailed instructions.

Once downloaded the NC SDK can be installed with:

$ sudo apt-get update

$ sudo apt-get upgrade

$ tar xvf MvNC_SDK_1.07.07.tgz

$ tar xzvf MvNC_Toolkit-1.07.06.tgz

$ cd bin

$ ./setup.sh

This may take a little while as the toolkit together with associated dependencies are installed. Note that the setup script updates your ~/.bashrc file to set the PYTHONPATH environment variable accordingly. E.g. with the default install location the line added is:

export PYTHONPATH=$env:"/opt/movidius/caffe/python":$PYTHONPATH

If other users needed to be able to use the SDK they would also need to add this line to their ~/.bashrc file.

Assuming we’re still in the bin directory, we can then install Caffe models for the example code:

$ cd data

$ ./dlnets.sh

If we go back to the directory where the SDK was first extracted to, we can now install the API:

$ cd ../..

$ tar xzvf MvNC_API-1.07.07.tgz

$ cd ncapi

$ ./setup.sh

There are a series of examples that can then be run from the bin directory to test NCS operation.

$ cd ../bin

$ make example00

$ make example01

$ make example02

$ make example03

Video stream infer example

coffee_mug1_63fc1f088f0459a8af395ba3ceb5d5c611083cf9.jpg

Python stream_infer example

Now we can get on to a more interesting example! This requires a video input device and we decided to use the Logitech C920 Full HD Webcam (125-4272), as it has excellent Linux support.

$ cd ../ncapi/tools

$ ./get_models.sh

$ ./convert_models.sh

$ cd ../py_examples/stream_infer

$ python3 stream_infer.py

shower_cap1_f06ff3be07988e14156d30a3bb279ef114eebc22.jpg

Ouch! 60.60% shower cap?! OK, let’s not shoot the messenger — the NCS was simply doing the heavy lifting and we have the convolutional neural network (CNN) used to thank for the inference made. Which to be fair, is perhaps in some way understandable. In any case, we can see above that the same example did much better with a coffee mug and it similarly did so with other objects.

What’s important here is the speed at which our AI is able to make inferences — and for the energy used, it certainly feels to be making these fast.

The default model employed by the stream_infer.py example is called SqueezeNet, a CNN that achieves accuracy similar to one called AlexNet that pre-dates it by some 4 or so years, and which was trained to classify the 1.3 million high-resolution images in the LSVRC-2010 ImageNet training set into the 1,000 different classes. Albeit SqueezNet is stated as achieving this with a model that is 510x smaller than AlexNet’s.

stream_infer.py can be configured to use either SqueezeNet or AlexNet, allowing their performance on the NCS to be compared. It’s simply a matter of (un)commenting lines near the top of the Python file. There are also Gender and GoogleNet models that can be configured in the same way. E.g.:

NETWORK_IMAGE_WIDTH = 224 # the width of images the network requires

NETWORK_IMAGE_HEIGHT = 224 # the height of images the network requires

NETWORK_IMAGE_FORMAT = "RGB" # the format of the images the network requires

NETWORK_DIRECTORY = "../../networks/GoogLeNet/" # directory of the network

The stream_infer.py example will look for “graph", "stat.txt" and "categories.txt" files in NETWORK_DIRECTORY. If we compare the size of the graph file for AlexNet and SqueezeNet:

graphSize1_a2d9b802cedf8eb33b3132a47e8551baf5d499dc.jpg

Not quite a difference of 510x for the binary graph, but still a significant size difference.

Network compilation and profiling

CompileProfile1_d4b9897b3546f58da417bd4049a2d2c3a6317eb1.jpg

Source: Movidius NCS Tookit Documentation

New CNNs — e.g. for classifying types of image not already covered by an existing CNN — would be designed and trained using an appropriate framework. Following which the network can be compiled to a graph file and profiled using the NCS Toolkit that is supplied as part of the SDK.

Targeting embedded platforms

Although the API must initially be installed on the same computer as the Toolkit (an Ubuntu 16.04 x86-64 system), the libraries and headers etc. can subsequently be installed on other platforms. And in fact a set of packages for Raspbian Jessie are provided with the SDK. Meaning that upon installing these plus dependencies from the Raspbian repo, it leaves just one line in stream_infer.py to be modified in order to get this example up and running on a Raspberry Pi.

Typical applications

Movidius VPU selected for 4K VR pixel processing in new Motorola Moto Mod. Source: movidius.com

Machine vision applications include:

Drones and robotics
Augmented and virtual reality
Wearables
Smart security

It’s easy to see how the Myriad VPU could be put to use in security cameras that, for example, identify a vehicle parked in the driveway or distinguish a burglar from a pet. You could equally imagine it adding a great deal of value in a household robot — e.g. vacuum cleaner — and drone applications, where you might want to avoid or seek out certain objects. These are just some of the uses for the technology and there are quite clearly going to be a great deal more.

Initial thoughts

The Movidius technology is already enabling the practical application of AI in many real world applications and continues to find use in cutting edge products, such as 4K VR pixel processing for smartphone add-ons and sense-and-avoid systems for drones. The availability of the Neural Compute Stick means that anyone can immediately start to experiment with the Myriad 2 VPU, and with ease add powerful deep learning capabilities to existing embedded platforms.

— Andrew Back

thumb_upLike star_borderFollow article

Andrew Back star_borderFollow

Open source (hardware and software!) advocate, Treasurer and Director of the Free and Open Source Silicon Foundation, organiser of Wuthering Bytes technology festival and founder of the Open Source Hardware User Group.