Towards gesture-controlled computers with real-time structured light

A novel set of decoding equations help PCs understand hand movements by speeding up acquisition, processing, and display of 3D data.
11 May 2010
Kai Liu, Yongchang Wang, Daniel Lau, Laurence Hassebrook and Qi Hao

Many areas of science and industry require 3D measurements made in real time. One example is a human-to-computer interface where hand gestures are used to control a computer. Structured-light illumination (SLI)1 is a noncontact, optical, active, triangulation-based 3D reconstruction technique that is well known for its simple implementation, low cost, and high accuracy.2 However, the high rate of processing demanded in real time has, until now, proved unattainable.

Structured light works on the same principle as stereo vision, where an object's coordinate in 3D space is derived by triangulating between pixels of two cameras. SLI avoids the computational complexities of matching pixels across camera views by replacing one of the two component cameras with a projector that generates a series of striped patterns. By analyzing the change in the pattern at a particular point on the target object's surface (a process known as demodulating the captured images), unique correspondences can be derived between the camera and projector pixels. However, decoding the projected patterns and deriving absolute 3D coordinates for each pixel of the captured images are two significant bottlenecks that limit the achievable pixel rates. To overcome these issues, many solutions are available3 including, for example, new pattern-decoding algorithms4 and a dedicated graphics-processing unit,5 but the improvements these methods offer are limited because of the complexity associated with classic demodulation. We have derived a novel set of decoding equations, efficiently implemented by a series of look-up tables (LUTs).

Our system is comprised of a high-speed camera/projector pair (see Figure 1) that is synchronized by a software-controlled trigger and coupled to a four-core central processing unit. In operation, the system projects a series of well-designed time-multiplexed patterns at a rate of up to 150 frames per second (fps), which is the maximum allowable frame rate of the monochrome projection unit for eight-bits-per-pixel video. The computer then performs a real-time decoding algorithm on the captured images to extract coordinate information regarding the surface shape of the scanned objects at a benchmarked rate of up to 230fps for 640×480-pixel video frames. Finally, the measured 3D data-point clouds are displayed in real time at a maximum frame rate of 40fps (the maximum rate of the LCD display).


Figure 1. Real-time structured-light-illumination 3D scanner.

Divided across processing threads in the host PC, the first thread is for grabbing images from the camera. The second and third generate the phase map and compute the absolute 3D coordinates. Finally, the fourth thread displays the measured 3D data in real time. For deriving phase, we take advantage of the fixed bit depth of the camera sensor by building a LUT instead of employing a time-consuming arctangent process. To derive the absolute 3D coordinates from the phase information, we eliminated the usual steps of matrix inversion and multiplication by expanding the matrix equations into a complex form. This is divided into two separate parts containing static information about the camera calibration in one part, and dynamic phase information in the second. As such, the first part of the computation can be replaced by seven new LUTs, with the output of this processing subsequently combined with the phase information extracted from our earlier LUT, using one division, four multiplication, and four addition operations to produce absolute 3D point clouds. At no point does the use of LUTs reduce the accuracy of the calculations, since we have one LUT entry for every possible bit combination of input pixel values.

To limit the effects of sensor noise on the resulting 3D point clouds, we further developed a novel four-pattern strategy that employs a high-frequency sinusoidal grating that, during modulation, produces an additive-noise-suppressed, high-quality, wrapped phase signal. Embedded with this high-frequency sinusoid is a unit-frequency sinusoid whose coarse signal is used during demodulation to produce an instantly decodable, unwrapped phase term so that the process of unwrapping the higher-frequency phase is avoided. Figure 2 (left) shows a static object and a moving hand scanned using this new pattern strategy, while Figure 2 (right) shows live real-time capture and display of two isolated hands, where 3D point clouds are shown using pseudocolor depth rendering. Traditional phase unwrapping would make the differentiation of two discontinuous objects in absolute space impossible. As a further demonstration, the front, top, and side views of 3D reconstructed point clouds for a moving hand are shown in Figure 3.


Figure 2. (left) A static object and a moving hand are scanned using a new pattern strategy. (right) Live show of real-time capturing, reconstructing, and display of two isolated hands, where 3D point clouds are shown in pseudocolor depth rendering.

Figure 3. (top) Front, (middle) top, and (bottom) side view of the 3D reconstructed point clouds for a hand making different gestures.

We applied our LUT-based system to the traditional SLI-pattern strategy of phase-measuring profilometry (PMP).6 We achieved 3D measurement in real time: see Figure 4, which compares the benchmarked performances for both phase generation and 3D point cloud reconstruction for PMP using three to six component patterns. Using six patterns is known to produce point clouds that are less susceptible to distortion caused by sensor noise, but three-pattern PMP is less sensitive to object motion given the shorter scan period. As expected, the time to perform phase generation is negatively impacted by an increase in the number of projected patterns given the larger LUTs, while the processing load associated with generating the 3D point cloud depends only on the number of pixels per frame.


Figure 4. Performance of solutions based on look-up tables for three-, four-, and six-pattern phase-measuring profilometry. (left) Phase generation and (right) 3D reconstruction.

In summary, we have introduced a series of LUTs to efficiently implement a novel set of decoding equations for real-time 3D shape measurement. With this SLI system in our arsenal, our research group is now working on a real-time human/computer interface based upon hand gestures.


Kai Liu, Yongchang Wang, Daniel Lau, Laurence Hassebrook
Department of Electrical and Computer Engineering
University of Kentucky
Lexington, KY
Qi Hao
Department of Electrical and Computer Engineering
University of Alabama
Tuscaloosa, AL

PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research