How do you feel about this article? Help us to provide better content for you.
Thank you! Your feedback has been received.
There was a problem submitting your feedback, please try again later.
What do you think of this article?
Getting set up with an Nvidia Jetson Nano and experimenting with Machine Learning to create a person following robotic trolley.
Machine learning & AI are all the rage right now within the tech industry. However, it can initially seem like the barrier to entry to the world of AI is high but in fact, it can be quite the opposite.
In this project, we’re aiming to build a trolley that can follow a person — think of something along the lines of a garden trolley, but motorised and with a computer vision system that can identify and follow a person. We’re using a Jetson Nano as this is a platform well-suited to embedded computer vision and machine learning, featuring an Nvidia Maxwell GPU and a quad-core ARM processor.
The first thing that we did was unpack the Jetson Nano Developer Kitwhich includes the Jetson Nano module fastened to a carrier board; breaking out an M.2 slot for storage, two MIPI-CSI headers for cameras, a 4K capable HDMI & DisplayPort socket, 4x USB3 ports, a Raspberry Pi compatible GPIO header, Ethernet and two power ports for flexibility — plenty of I/O for most applications, and what you’d typically expect on a single-board computer.
A 5V power supply will be needed to power the board and achieve full performance, we picked a 4A capable supplyto give plenty of headroom. A Raspberry Pi camera v2 or USB webcam is also required to be able to capture a live video stream.
To get started, Nvidia has provided a clear guide that details the getting started steps. We deviated from the guide slightly to follow the note on switching the power supply over to the barrel jack input. This is detailed within the Jetson Developer Kit User Guide and consists of moving jumper J48 (located just next to the barrel jack) to connect the two pins.
With the power input reconfigured, the SD card can be written out with the Nvidia provided image. This is quite a large (6GB) ZIP archive which then unpacks to 16GB and so you need to be sure you’ve got plenty of disk space to do this! The file can then be written to an SD card using a tool such as Balena Etcher or Win32DiskImager, a process that should be familiar to anyone who has used a Raspberry Pi or other SBC before.
The rest of the setup consisted of selecting a few options — the defaults are a good choice — and then updating the system with “sudo apt update && sudo apt upgrade”.
Running an Example
Nvidia has provided a range of examples tailored to running on the Jetson Nano, including a short course titled “Two Days to a Demo” which walks through setting up the environment, running some short examples and then building upon knowledge gained from the short examples to run more complex examples.
To verify our setup we used the “Hello AI World” example set which demonstrates deep learning object inference. To get started, the steps listed in the GitHub repository under “Building the Project from Source” were followed, which clones the repo and all necessary prerequisites, then builds all the examples.
This takes some time, as models have to be downloaded and then lots of code compiled. By default, not all of the image recognition models are selected, but should you change your mind later the “download-models.sh” script can be run again which allows for models to be changed.
With everything compiled and installed, we ran an example that uses a Pi camera plugged into the development kit. The first run of any application using a model that has not been previously used will take some time due to the optimization of the model network file, which is then cached to disk to speed up future loading times.
As this demonstration has worked, we moved onto thinking about how we would achieve person tracking without using any sort of facial recognition.
Object tracking & identification is a well-solved problem, one that is widely used already in industry in machine vision applications, such as robots on production lines that need to move items, or reject things coming down a conveyor belt. This can be done using tags stuck on things, particularly in the case of motion tracking of one item — however, the problem we’re tackling shouldn’t require the user to wear a tag and should be able to identify them by using only what they’re wearing.
Facial recognition would work as it provides something to identify a person and track them — but again, it would not work in this situation as the trolley will be following behind a person so their face will not be visible. This restricted us somewhat in what methods are available to achieve our goal.
Some judicious searching returned the term “person re-identification”, which is described by ScienceDirect as being “the problem of recognizing an individual captured in diverse times and/or locations over several non-overlapping camera views”. This is closer to the issue we are trying to solve (person identification) but again different — we have only one camera angle and no diverse times.
Yolo v4 + DeepSORT
Object tracking is the closest approach to the method we’re trying to solve. With another liberal application of Google-fu, we found this article that uses “DeepSORT” to perform object tracking. This looked to be along the right lines, so with yet more searching we found a GitHub repository that looked to be suitable — in one of the example videos people are shown to be identified and tracked as they move throughout the scene.
The example also demonstrates a box being drawn around the people, which brought us very close to being able to then centre a specific person in the middle of the scene by moving the robot.
Documentation in the repository also looked good enough to get started with, so we decided to give it a try. Unfortunately, it appears as though this particular program would not work — the Jetson Nano could not handle the large “YOLOv4” model but did manage to process the “YOLOv4-tiny” model, and then choked when trying to run the actual object detection script. This may not be the case with a more powerful Jetson SoM, or of course a PC that is equipped with a PCIe Nvidia GPU.
We next tried another program from this GitHub repository that uses a “YOLOv4” model. Again, documentation was clear with a script provided to get started.
With the Github repository cloned, we could begin the installation. There is a setup script located at
“scripts/install_jetson.sh” which we started with — this looked promising until reaching the point of installing PyCuda, which failed with a missing header. Again, searching around online led us to this Nvidia forum thread where another Github repository was linked that contained a script specifically to install PyCuda.
Having now successfully installed PyCuda, we re-ran the original install script which got further until falling over at trying to install another dependency. This was caused by a missing header file called
“xlocale.h”, and again the fix came from more searching. The accepted answer from this Stackoverflow post worked and consisted of running the command
“ln -s /usr/include/locale.h /usr/include/xlocale.h” to create a symbolic link to the new “locale.h” header.
With the symbolic link in place, we ran the original
“install_jetson.sh” script to try again. This time, all the dependencies installed and lots of time was spent compiling LLVM — this takes a few hours on the Jetson Nano so it’s well worth leaving and finding something else to fill your time.
Now that everything has been installed, we downloaded the necessary models which were as easy as running
“./scripts/download_models.sh” — or so we thought. The script installs a Python library called
“gdown” which handles downloading large files from Google Drive; this should appear as an executable in the shell but alas, it didn’t. More digging around later returned the path where executables downloaded from pip are kept, which was located in
With the script now modified (any instance of
“gdown” used as an executable was replaced with
“/home/jetson/.local/bin/gdown”) all the models were downloaded. Some more compilation then happens to build the TensorRT plugin as per the documentation in the repository, and then the program is ready to be run.
We launched the program with the command
“python3 app.py --input_uri csi://0 --mot --gui” from the desktop environment. This took quite some time as all ~700MB of the models were compiled. This didn’t work initially — but a fix is already documented here on the project’s Github issue tracker. Changing one line across two files
“1 << 30” to
“1 << 28” fixed the issue.
Re-running the initial command to launch the program took some more time (compilation was interrupted previously) and eventually, a window appeared! This was a good sign, with a camera feed being displayed. We then moved into the camera’s field of view, and we were greeted with a box and number! This was a success, showing that the network is capable of detecting and tracking people moving around — performance was not stellar, but this is to be expected given the documentation specifies a Jetson TX2/Xavier/Xavier NX, whereas the Jetson Nano has less processing power available.
Now we have achieved a working tracking solution, we could then begin work to modify the code to output coordinates that can then be used to command the robot electronics — this shouldn’t be too difficult as the example code already draws boxes around tracked items.
In this post, we have touched on getting started with the Nvidia Jetson Nano, taken a look at a variety of different methods for tracking a person within a frame, and successfully getting a FastMOT-based person tracker up-and-running. In part 2, we will take a look at building the frame for the robot.