The Technical Challenges Involved in Creating the DS AR application
When approaching any new technology, there are always technical challenges that present themselves throughout development, and the DesignSpark Augmented Reality (DS AR) application was no different.
DS AR (developed by Austella) utilises 2 key technologies – AR and Optical Character Recognition (OCR). AR injects digital content into the real-world on a device screen, whilst OCR allows a device to pick out text from an image (such as a camera feed).
The powerful combination of the two technologies allows users to effortlessly ‘scan’ an RS Stock Number, or MPN from printed text or monitors, and instantly access information on the product such as technical specifications, pricing, stock availability, and the 3D CAD model. This 3D CAD model can then be ‘placed’ in the real-world to allow the user to walk around and inspect the prospective product.
Augemented Reality Summary
If you haven’t read the Augmented Reality Design Spark article by Mark Cundle of Austella, I would highly recommend it. Below is a quick recap on AR with some technical insight.
Augmented Reality (AR) is still a relatively new technology for developers, especially when it comes to the consumer market. Many well-known brands are beginning to investigate and adopt AR solutions that can be deployed on the mobile devices of their consumers to (for example) offer extended experiences of their existing products, or give a new interactive way to engage with existing content.
AR can be broken down into 3 separate categories:
Snapchat's early environmental effects employed a simple form of AR, overlaying digital content using the phones rotation state to appropriately shift the effects with the environment. Source: Engadget
This form of AR allows 3D digital content to be overlaid onto a real-world image, such as a mobile camera feed, to inject digital augmentations into the view of the user. This is the easiest form of AR to implement but is by far the least impressive and interactive. Some mobile video games and social apps tend to use this option sometimes in combination with device rotation, as it is the quickest to achieve, and the most predictable in terms of behaviour as it relies on no real-world information – the camera feed literally used as a backdrop, like a green screen in the film industry.
Image recongition based AR used in an art gallery to display appropriate information. Source: Wikipedia
Often appearing similar in appearance to QR Codes, AR markers are specially designed images that an application can recognise and use as a real-world anchor point. Typically, a user would have a physical print out of the marker, and when this comes into the camera’s field of view, a digital anchor point is created to follow the marker. This allows the 3D digital object to seemingly be placed on the marker, giving the digital object the correct position and rotation in relation to the mobile device. Some marker-based solutions are now offering image recognition functionality, to allow developers to use more natural images, such as magazine advertisements, as a marker rather than a blocky code-type marker.
It’s a huge step up in interactivity from the simple AR implementation, and uses real-world data to augment reality. Due to the marker tracking, the user can physically move around the marker to navigate around the digital content on screen.
This is starting to become a common form of AR, with brands distributing leaflets, web links to print outs, and various other media as a companion to their AR application.
Simultaneous Localisation and Mapping (SLAM)
SLAM AR used by an application to place furnature in an environment. Source: Wikipedia
SLAM, often dubbed markerless AR, is the most premium form of AR available for mobile devices currently. It uses algorithms to process each frame from the mobile camera feed to rebuild a digital point cloud map of the environment – in other words, simultaneously localise and map the environment. The code is then able to keep track of the user’s device in both the real-world and digital world environments, allowing 3D objects to be ‘placed’ into the real-world on the device screen.
The latest breakthrough in SLAM comes from Apple with their native ARKit recently announced for iOS 11. This will allow many iOS developers to begin creating SLAM AR applications, although only for a small selection of Apple products.
Some larger brands are beginning to explore SLAM to create apps that allow their users to place digital representations of their products into real-world such as furniture, or in the DS AR applications case – 3D CAD models from the RS Components catalogue.
Of course, there are many more forms of AR that you may have seen, such as Microsoft’s Hololens (technically Mixed Reality), but these are the 3 most common forms of AR amongst the mobile device market.
One of the core goals of the DS AR app was to provide an efficient and easy-to-use application that allows the user to use the application anywhere, and without restrictions. To this end, we chose to implement SLAM into the application, empowering the user to use the app without the need for an AR marker, and use the app anywhere – at work, at home, even on their commute to work! It is also the most interactive and impressive form of AR available, and we felt that it was important for users to continue with their high-quality experience from other services as they came across to the DS AR application.
Optical Character recognition
Optical Character Recognition (OCR) allows for devices to interpret text from an image. Some applications are beginning to utilise it to allow their users to scan in long voucher codes, and other text that can be difficult or time-consuming to read and type by hand.
Looping back to one of the goals of the DS AR app, we wanted the user to have an efficient and easy-to-use experience throughout the app. Whilst the application allows the user to manually search for an RS Stock Number or MPN with their devices keyboard, we were aware that some of the manufacturer numbers were quite long, and could be time-consuming to type, especially with frequent use of the application. To remove this limitation from the user, we implemented a custom OCR solution that allows the app to ‘scan’ RS Stock Numbers and MPNs to cut down on access time to the key product information and AR the user wants to experience.
Now you’ve been given an insight to the different types of AR and summary on OCR, you’re probably wondering what technical challenges this poses when it comes to development.
To allow the widest number of users to access the application, we decided from the beginning of development that we wanted to target both Android and iOS devices. This first challenge required us to decide on our development approach – did we want to create a single application that was compatible with both platforms, or did we want to build 2 versions of the application, one for each platform?
Clearly our preferable answer was to create a single application that was compatible with both platforms. This left us with a few choices in terms of development environment, each with their own problems, most notably:
- Unity Engine
- Pro: Our developers are very familiar with this engine
- Pro: AR SDKs available for the engine with runtime model capabilities.
- Con: No OCR solution available for the engine.
- Xamarin cross-platform developer
- Pro: AR SDKs available for this environment.
- Pro: OCR SDK available for this environment.
- Con: AR SDKs had limitations in terms of dynamically loading 3D models at run-time. This issue would have forced us to embed several thousand 3D models into the application, not a practical expectation when the resulting app size would be extremely large on the user’s device.
- Con: Our developers were not fully familiar with the environment.
From the simple breakdown above, the Unity Engine was our development environment choice, but this left us with a large hole in our development goals – with no OCR solution available for Unity Engine, how were we going to implement OCR for our end users?
To achieve OCR we worked with a small external development team to create a custom OCR module, the first of its kind to work with Unity Engine for both Android and iOS. We created our own version of the Tesseract open-source API, and whilst there are some solutions that are built for native iOS and native Android, there existed no way to easily integrate these into a Unity C# project. To allow the application to call the same logic for both platforms, a software wrapper was created.
To ensure we chose the right AR SDK for the application, we went through a lengthy due diligence trying out different SDKs, pushing them to their limit, and experimenting to see if they fit our requirements. Whilst there are some really good marker based solutions, there are only a handful of SDK capable of performing fully markerless SLAM. I was sceptical at first of the markerless solutions, but after using them and implementing our chosen SDK (Wikitude) into the final application, I’m very impressed with how far Augmented Reality has come.
This powerful combination of OCR and AR has allowed us to achieve a very efficient, interactive experience for the end user that can cut down their time on day-to-day tasks by providing an effortless interface for searching the RS Components catalogue, and displaying all available information for any given product. Users quite literally have the entirety of the RS Components catalogue at their fingertips!