Increasingly, 3D image data is being used in stereoscopic display systems, virtual reality systems, robotics, and so forth. The methods used to acquire 3D information about objects mostly involve ranging methods that use triangulation1 and time-of-flight.2 The 3D data is generally stored as a combination of ordinary 2D images, plus the range information: a combination that is not versatile enough to be easily used in the many different potential applications. For many of them it is preferable to define the 3D image in terms of a common 3D Cartesian coordinate system.
The Axi-Vision Camera3 was developed for the production of broadcast television. It can simultaneously capture both an ordinary high-definition television (HDTV) color image and a range image of the scene at video frame rates using the time-of-flight method: see Figure 1. Initially, the use of the 3D information has been restricted to depth keying, where the range (depth) information is used to combine two different images.3,4 However, were the Axi-Vision 3D data held in a more versatile form, we recognized it would have potential applications in various kinds of stereoscopic displays. As a result, we have been working with the 3D image data the camera produces in a more flexible way, and have applied the results a stereoscopic display using an integral photography (IP) system.5 As well as demonstrating 3D data acquisition and display, this shows we have addressed and solved the problem of mismatch between the image capture and display systems. This had previously seemed unavoidable.
Figure 1. The Axi-Vision image data is composed of both a color and a range image.
Figure 2 shows the perspective-projection model used to acquire 3D image data I(X, Y, Z) from an ordinary color image i(x, y) and corresponding range image R(x, y), where (X, Y, Z) and (x, y) are specified, respectively in the camera coordinate system and the image coordinate system on the CCD image plane. The image intensity (color and brightness) and range of the point (X, Y, Z) are respectively projected as i(x, y) and R(x, y) on the point (x, y) of the image plane. Z was approximated by R(x, y), which can be obtained for each pixel without any complex processing. X, Y and Z were derived by perspective projection theory as:
where f is the focal length of the camera lens and kx and ky are the pixel gaps in the x and y directions of the CCD, respectively. Based on these relations, each of the image intensities i(x, y) was mapped on the corresponding point (X, Y, Z), thus resulting in the 3D image data I(X, Y, Z) defined in the camera coordinate system.
Figure 2. The perspective projection model.
A stereoscopic video image display has successfully been demonstrated using the 3D image data using an IP system, which combines a 4-inch VGA (640×480 pixel) liquid crystal color display panel (LCD) and a 64×48 pinhole array with a spacing of 5.75mm. First, the 3D coordinates were converted in parallel using up/down scaling transformations into coordinates such that the 3D image became appropriate for the LCD size. Then, the IP images were created. One of these, shown in Figure 3(a), was composed of 64×48 elemental images of 10×10 pixels, each of which was generated for the corresponding pinhole using the transformed 3D image data. The view of the IP images displayed on the LCD through the pinhole array can be confirmed as stereoscopic by looking at the difference between the image shown in Figure 3(b), a view from the left, and (c), taken from the right.
Figure 3. (a) IP image and reproduced stereoscopic images (b) from left viewpoint and (c) right viewpoint.
There was a mismatch between the image resolutions of the Axi-Vision and the IP system, which resulted in aliasing in the reproduced stereoscopic image because of discrete spatial sampling by the pinhole array. This aliasing was effectively suppressed by pre-processing the color Axi-Vision image data using a two-dimensional lowpass filter. This was designed based on analyses of the maximum spatial frequency of the 3D image, appropriately transformed for the LCD size, and the Nyquist spatial frequency of the IP system.
By using this more versatile method of storing 3D image data, therefore, we should be capable of using the Axi-Vision camera to feed almost any kind of IP-based 3D display.
Osaka City University
Tahito Aida is a professor of electronic circuits at Osaka City University in Osaka. Research interests include image processing, optical devices and micro electro-mechanical systems (MEMS).
3. M. Kawakita, K. Iizuka, T. Aida, H. Kikuchi, H. Fujikake, J.Yonai, K. Takizawa, Axi-Vision Camera (real-time distance-mapping camera), Applied Optics 39, no. 22, pp. 3931-3939, 2000.