SPIE Membership Get updates from SPIE Newsroom
  • Newsroom Home
  • Astronomy
  • Biomedical Optics & Medical Imaging
  • Defense & Security
  • Electronic Imaging & Signal Processing
  • Illumination & Displays
  • Lasers & Sources
  • Micro/Nano Lithography
  • Nanotechnology
  • Optical Design & Engineering
  • Optoelectronics & Communications
  • Remote Sensing
  • Sensing & Measurement
  • Solar & Alternative Energy
  • Sign up for Newsroom E-Alerts
  • Information for:
SPIE Photonics West 2018 | Call for Papers

SPIE Defense + Commercial Sensing 2018 | Call for Papers




Print PageEmail PageView PDF

Electronic Imaging & Signal Processing

Multi-view dynamic geometry capture using structured light

A low-cost system composed of multiple cameras and projectors positioned around a central performance area captures the dynamic geometry of a scene.
19 February 2013, SPIE Newsroom. DOI: 10.1117/2.1201301.004724

In structured light (SL) systems, a projector illuminates a scene or an object with special patterns that are then captured by one or several cameras. By determining which projector pixel each camera pixel observes (i.e., the correspondence), the 3D position of each point in the scene can be triangulated. SL systems are sometimes referred to as markerless motion capture systems because they do not use LED lights or markers on an object to recover or track its motion. The Kinect, a device used with Microsoft's Xbox 360 video-game console, is perhaps the most commonly used SL system. For gesture recognition in gaming and entertainment applications, it uses an IR projector that is invisible to the human eye, which minimizes visual interference with the scene.

Figure 1. System layout with the target positioned in multiple orientations at the center of the scene. C: Camera. P: Projector.

Most existing SL systems, including the Kinect, consist of one station, which correspondingly captures the 3D depth of a scene from a single view. Over the past five years, we have been developing a multi-station SL (MS-SL) system consisting of off-the-shelf commercially available cameras and projectors to capture the full 360° dynamic geometry of a moving scene surrounded by multiple stations. Each station in our three-station SL system consists of one projector and three cameras (see Figure 1). Two of the three cameras are exclusively used to capture the projector pattern illuminated on the scene whereas the third camera is used to capture the texture information of the scene.

The main challenge in the design and implementation of an MS-SL system is the light interference between the various stations' projectors. One way to get around this is to use different wavelengths for the different projectors. However, this would necessitate specialist projector and camera hardware that is not readily available. Accordingly, we have opted to use a temporal multiplexing approach with round-robin scheduling in which each station is turned on only a fraction of the time. This approach results in a reduced capture rate. For a system with Nstations, the effective capture rate is 1/N of the equivalent single-station system. In a temporally multiplexed MS-SL system, each station only captures a partial view of the scene at a specific instant in time. Thus, these views need to be spatio-temporally registered and merged to reconstruct the complete intended geometry across time. To do so requires all the projectors and cameras at the various stations to be accurately calibrated prior to data capture.

Figure 2. (a) Vertical binary (Bv) coded projection patterns. (b) Three phase-shifted sinusoidal patterns.

Figure 3. Projection of sphere for the proposed method along the (a) x-axis, (c) y-axis, (e) z-axis, and for pairwise calibration for the (b) x-axis, (d) y-axis, (f) z-axis. (g) Zoomed-in portion of (a). (h) Zoomed-in portion of (b). (i) The points for each camera (Cam) are assigned a unique color.

We have recently developed a method to simultaneously calibrate all cameras and projectors in an MS-SL system.1 Our method uses a translucent sheet mounted to a plastic frame as the calibration target. The target is placed at the center of the capture area—see Figure 1—so that it is visible to all projectors and cameras. Simple binary patterns—see Figure 2(a)—are then projected onto the target by each projector and the resulting pictures are captured by all cameras to determine the pixel-to-pixel correspondences between cameras and projectors. The translucent planar sheet is then rotated to multiple positions and, at each position, the above correspondence process is repeated. The translucent nature of our target allows it to be simultaneously visible by cameras on both sides of the sheet during pattern projection, which enables us to generate correspondences across nearly all devices surrounding the capture area.

One of the main advantages of this calibration strategy is that all cameras and projectors in the MS-SL system are calibrated simultaneously rather than by the traditional method of pairwise calibration. Figure 3 compares the calibration error of a sphere between our proposed and pairwise calibration methods for the three-station system depicted in Figure 1. The alignment error for our method is considerably lower than for pairwise calibration. An example of a dynamic point cloud of a moving human actor captured by our calibrated three-station SL system can be viewed online.2

Another challenge for any single- or multiple-station SL system is to reconstruct the 3D depth of the scene in a temporally coherent fashion to avoid unpleasant flickering artefacts. To address this issue, we developed a temporally coherent 3D reconstruction method for phase-shifted sinusoidal (PSS) SL systems.3 PSS patterns are commonly used in SL systems owing to their fast capture time and low decoding complexity. The main idea is to sequentially project three PSS patterns—see Figure 2(b)—onto the scene and to capture camera images after each projection. Recovered phase values for each point in the scene can then be used to establish a correspondence between each camera pixel and a projector column and, hence, determine depth. However, since the projected sinusoidal pattern is periodic with many periods, the phase needs to be ‘unwrapped’ to determine the true correspondence. Essentially, the ‘wrapped’ phase is constrained to the values within a single period of our projected pattern. The transitions between neighboring periods produces discrete jumps in the phase images, which are seen as jumps in color from red to blue. An ‘unwrapped’ phase is a continuous function that produces a seamless transition in the colour pixels (see Figure 4).

In most existing PSS systems, phase unwrapping is performed on individual frames separately, resulting in disturbing flicker artefacts. To alleviate this problem, we developed a temporally coherent approach to phase unwrapping whereby we stack consecutive wrapped phase images into a volume of phase data and unwrap over the x, y, and time axes simultaneously. Our approach is inspired by existing 3D phase unwrapping algorithms typically used in magnetic resonance imaging.4 Essentially, we locally unwrap pixels according to a measured confidence value and create chains of unwrapped pixels in the phase volume. For each pixel, we calculate the probability of taking on the unwrapped phase from each possible period using stereo observations from the second camera. As each new pixel is added to each chain, we merge the connected pixel probabilities to estimate the true unwrapped phase of the entire chain. A video sequence demonstrating temporal consistency of our proposed phase unwrapping process for a scene consisting of a moving human actor can be viewed online.5

Figure 4. Phase image (a) before phase unwrapping and (b) after phase unwrapping.

In summary, we have developed an MS-SL system that uses readily available hardware, a novel calibration method, and a temporally coherent phase unwrapping approach to produce clear, artefact-free renderings of a scene. Our future work involves developing algorithms to efficiently merge the independently captured geometries to create a dynamic, closed mesh of the objects or humans in the scene.

Ricardo R. Garcia, Avideh Zakhor
Electrical Engineering & Computer Sciences
University of California (UC)
Berkeley, CA

Ricardo R. Garcia received his BS in electrical engineering from The University of Texas at Austin in 2007. He went on to complete his MS in electrical engineering at UC Berkeley in 2009. Currently, he is a PhD candidate whose research interests are in computer vision, with an emphasis in structured light systems and dynamic geometry capture.

Avideh Zakhor holds the Qualcomm Chair in the electrical engineering department at UC Berkeley where she has been a faculty member since 1988. Her interests are in the areas of video and image processing, and 3D computer vision. She is a fellow of the Institute of Electrical and Electronics Engineers (IEEE).

1. R. R. Garcia, A. Zakhor, Geometric calibration for a multi-camera-projector system, IEEE Workshop Appl. Comput. Vision (WACV), 2013.
2. Video of dynamic geometry from multi-view structured light system. Credit: Ricardo R. Garcia, UC Berkeley. http://spie.org/documents/newsroom/videos/4724/MultiViewDynamicGeometry_Garcia.wmv
3. R. R. Garcia, A. Zakhor, Temporally-consistent phase unwrapping for a stereo-assisted structured light system, Int'l Conf. 3D Imaging Model. Process. Visual. Transm., p. 389-396, 2011. doi:10.1109/3DIMPVT.2011.56
4. H. S. Abdul-Rahman, M. A. Gdeisat, D. R. Burton, M. J. Lalor, F. Lilley, C. J. Moore, Fast and robust three-dimensional best path phase unwrapping algorithm, Appl. Opt. 46(26), p. 6623-6635, 2007. doi:10.1364/AO.46.006623
5. Video of temporally-consistent phase unwrapping for a stereo-assisted structured light system. Credit: Ricardo R. Garcia, UC Berkeley. http://spie.org/documents/newsroom/videos/4724/PhaseUnwrap Garcia.wmv