Future Hardware Research

Since building our hardware prototype, a great deal of our research has centered around how we can improve the real-life performance of the SNAP concept. The single greatest limitation of our current prototype is probably the amount of noise and incorrectly measured points in the depth map. Following this, the limited field of view of the depth camera is our second greatest limitation. We've considered a couple of different options to improve on these limitations: a newer depth camera, and Stereo Visual Odometry.

Intel's New Depth Cameras

We recently received an email from Intel informing us that the RealSense Robotic Development Kit is being phased out and sold at a discounted price in anticipation of the release of their newest RealSense units. We promptly bought another RealSense RDK and began to research the new depth camera offerings. They look like they could be decent platform, with a few potential issues. Shown below is the Intel RealSense D435.

The listed FOV is 91.2 degrees horizontal by 65.5 degrees vertical, which is almost twice the FOV of the R200 that we are currently using. In addition, the depth output is now up to 1280*720 @ 90FPS, increased from 640*480 @ 30FPS. However, this model is pricier than the old one, with the camera alone costing $180 and no development kits listed yet. This could be a very promising offering for SNAP version 2 if the image quality is high and development kits are available.

Stereo Visual Odometry

Another completely different option we've considered is rolling our own stereo visual odometry solution. In SVO, two cameras facing the same direction are used to compute a depth map by finding matching points in the two images and computing the depth using the offset between the points, similarly to how the human eyes gauge distance. The image shown below demonstrates the matching of features between the two images. Source: http://www.mtbs3d.com/phpBB/viewtopic.php?f=138&t=18055

This imaging technique could allow us to use wide angle lenses, or even multiple cameras to provide a very wide field of view. However, it also requires a lot more computational power than a bundled depth camera solution and it is dependent on visible light for reliable output.

With either of these imaging solutions, an ideal application would be to create a real-time SLAM (simultaneous locating and mapping) algorithm that allows SNAP to remember rooms, objects, and places and to use this data to provide clear and timely cues to the wearer about their surroundings. This is a very computationally difficult problem, but it's never to early to make design decisions with this potential in mind. It's clear that SNAP only stands to benefit from the increasing level of corporate and public interest in computer vision and depth cameras.

Building the Hardware Prototype

Hardware Block Diagram

Discussions

Become a Hackaday.io Member