Show all abstracts
View Session
- Human Factors
- Stereoscopic Compression
- Stereoscopic Image Processing and Rendering
- Stereoscopic Camera Systems
- Autostereoscopic Displays I
- Autostereoscopic Displays II
- Autostereoscopic Displays III
- Stereoscopic Video
- Integral 3D Imaging
- Stereoscopic Developments I
- Stereoscopic Developments II
- Poster Pop Session
- Synthesis and Design
- Research Programs
- Technology and Applications
- Special Session: Virtual Reality Works
- Stereoscopic Compression
Human Factors
Development and evaluation of amusement machine using autostereoscopic 3D display
Show abstract
Pachinko is a pinball-like game peculiar to Japan, and is one of the most common pastimes around the country. Recently, with the videogame market contracting, various multimedia technologies have been introduced into Pachinko machines. The authors have developed a Pachinko machine incorporating an autostereoscopic 3D display, and evaluated its effect on the visual function. As of April 2003, the new Pachinko machine has been on sale in Japan. The stereoscopic 3D image is displayed using an LCD. Backlighting for the right and left images is separate, and passes through a polarizing filter before reaching the LCD, which is sandwiched with a micro polarizer. The content selected for display was ukiyoe pictures (Japanese traditional woodblocks). The authors intended to reduce visual fatigue by presenting 3D images with depth "behind" the display and switching between 3D and 2D images. For evaluation of the Pachinko machine, a 2D version with identical content was also prepared, and the effects were examined and compared by testing psycho-physiological responses.
Perception of 3D spatial relations for 3D displays
Show abstract
We test perception of 3D spatial relations in 3D images rendered by a 3D display (Perspecta from Actuality Systems) and compare it to that of a high-resolution flat panel display. 3D images provide the observer with such depth cues as motion parallax and binocular disparity. Our 3D display is a device that renders a 3D image by displaying, in rapid succession, radial slices through the scene on a rotating screen. The image is contained in a glass globe and can be viewed from virtually any direction. In the psychophysical experiment several families of 3D objects are used as stimuli: primitive shapes (cylinders and cuboids), and complex objects (multi-story buildings, cars, and pieces of furniture). Each object has at least one plane of symmetry. On each trial an object or its “distorted” version is shown at an arbitrary orientation. The distortion is produced by stretching an object in a random direction by 40%. This distortion must eliminate the symmetry of an object. The subject's task is to decide whether or not the presented object is distorted under several viewing conditions (monocular/binocular, with/without motion parallax, and near/far). The subject's performance is measured by the discriminability d', which is a conventional dependent variable in signal detection experiments.
Stereo display for chest CT
Show abstract
Based on the need to increase the efficacy of chest CT for lung cancer screening, a stereoscopic display for viewing chest CT images has been developed. Stereo image pairs are generated from CT data by conventional stereo projection derived from a geometry that assumes the topmost slice being displayed is at the same distance as the screen of the physical display. Image grayscales are modified to make air transparent so that soft tissue structures of interest can be more easily seen. Because the process of combining multiple slices has a tendency to reduce the effective local contrast, we have included mechanisms to counteract this, such as linear and nonlinear local grayscale transforms. The physical display, which consists of a CRT viewed through shutter glasses, also provides for real-time adjustment of displayed thickness and axial position, as well as for changing brightness and contrast. While refinement of the stereo projection, contrast, and transparency models is ongoing, subjective evaluation of our current implementation indicates that the method has considerable potential for improving the efficiency of the detection of lung nodules. A more quantitative effort to assess its impact on performance, by ROC type methods, is underway.
Development of a miniaturized system for monitoring vergence during viewing of stereoscopic imagery using a head-mounted display
Show abstract
Head-mounted displays (HMDs) are popular for viewing stereoscopic imagery due of their immersive qualities. However, symptoms and visual problems are commonly associated with their use. The discrepancy between vergence and accommodation cues, present in stereoscopic imagery, has been implicated in these adverse effects. The aim of this investigation was to develop a high resolution but relatively inexpensive on-line vergence monitoring system for use within a HMD to enable important information about the vergence response to be obtained. The new vergence monitoring system utilized infrared (IR) light emitting diodes (LEDs) for illumination and miniature charge couple infrared (CCIR) cameras, one for each eye, to capture images of the eyes. The infrared light reflected from the eyes was directed to the cameras via cube beam splitters which allowed an uninterrupted line of sight to the HMD screens. An image acquisition board was used to capture the images and a program was designed using LabVIEW to process the images. The resolution was at least 0.2 degrees, which translates to vergence changes of 7 cm from the image plane of the V6 HMD. The vergence monitoring system enables a better understanding of the contribution of accommodation and vergence mismatch to symptoms and visual problems associated with viewing stereoscopic imagery.
Variation and extrema of human interpupillary distance
Show abstract
Mean interpupillary distance (IPD) is an important and oft-quoted measure in stereoscopic work. However, there is startlingly little agreement on what it should be. Mean IPD has been quoted in the stereoscopic literature as being anything from 58 mm to 70 mm. It is known to vary with respect to age, gender and race. Furthermore, the stereoscopic industry requires information on not just mean IPD, but also its variance and its extrema, because our products need to be able to cope with all possible users, including those with the smallest and largest IPDs. This paper brings together those statistics on IPD which are available. The key results are that mean adult IPD is around 63 mm, the vast majority of adults have IPDs in the range 50-75 mm, the wider range of 45-80 mm is likely to include (almost) all adults, and the minimum IPD for children (down to five years old) is around 40 mm.
Stereoscopic Compression
Coding of multiview images
Show abstract
Multi-view images visualized by autostereoscopic displays are a heavy load for networks. The amount of data which has to be stored and transmitted is huge compared to mono-view images. In this paper, we show, that the upcoming ITU-T H.264 standard, which is designed to compress moving pictures, is also suited to code multi-view images after some minor modifications. This upcoming standard is even capable of outperforming the best stereo image coders known so far. It is only lacking the ability to code multi-view images progressively, and it is missing the multi-resolution property of wavelet-based coders.
Video memory compression for multiview autostereoscopic displays
Show abstract
Nowadays, virtual 3D imaging is very commonly used in various domains, i.e. medical imaging or virtual reality. So far these 3D objects are projected to be displayed on 2D visualization systems (i.e. computer monitor or printed paper sheet), by the application itself, a graphic library or a specific hardware. Now, new displaying systems that allow computers to display 3D objects in real 3D appear, often based on the stereo-vision principle, which ultimate evolution is the multi-view autostereoscopic system, that displays different images at the same time, visible from different positions by different observers. When the number of images grows and these different images are directly stored, the needed memory becomes very large. This article proposes an algorithm for coding multi-view stereograms with very low quality loss and very fast and simple decoding that allows to calculate all the stereoscopic images with a low need of memory. This algorithm projects the objects on the screen but stores the associated depth of each one. Some of the background voxels are not erased by foreground voxels even if they are projected at the same point of the screen. All those voxels are sorted in a way that fasten the decoding which is reduced only to few memory copies.
Multiresolution image compression using image foveation and simulated depth of field for stereoscopic displays
Show abstract
Spatial contrast sensitivity varies considerably across the field of view, being highest at the fovea and dropping towards the periphery, in accordance with the changing density, type, and interconnection of retinal cells. This observation has enabled researchers to propose the use of multiple levels of detail for visual displays, attracting the name image foveation. These methods offer improved performance when transmitting images across low-bandwidth media by conveying only highly visually salient data in high resolution, or by conveying more visually salient data first and gradually augmenting with the periphery. For stereoscopic displays, the image foveation technique may be extended to exploit the additional acuity constraint of the human visual system caused by the focal system: limited depth of field. Images may be encoded at multiple resolutions laterally taking advantage of the space-variant nature of the retina (image foveation), and additionally contain blur simulating the limited depth of field phenomenon. Since optical blur has a smoothing effect, areas of the image inside the high-resolution fovea, but outside the depth of field may be compressed more effectively. The artificial simulation of depth of field is also believed to alleviate symptoms of virtual simulator sickness resulting from accommodation-convergence separation, and reduce diplopia.
Stereoscopic Image Processing and Rendering
Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV
Show abstract
This paper presents details of a system that allows for an evolutionary introduction of depth perception into the existing 2D digital TV framework. The work is part of the European Information Society Technologies (IST) project "Advanced Three-Dimensional Television System Technologies" (ATTEST), an activity, where industries, research centers and universities have joined forces to design a backwards-compatible, flexible and modular broadcast 3D-TV system. At the very heart of the described new concept is the generation and distribution of a novel data representation format, which consists of monoscopic color video and associated per-pixel depth information. From these data, one or more "virtual" views of a real-world scene can be synthesized in real-time at the receiver side (i.e. a 3D-TV set-top box) by means of so-called depth-image-based rendering (DIBR) techniques. This publication will provide: (1) a detailed description of the fundamentals of this new approach on 3D-TV; (2) a comparison with the classical approach of "stereoscopic" video; (3) a short introduction to DIBR techniques in general; (4) the development of a specific DIBR algorithm that can be used for the efficient generation of high-quality "virtual" stereoscopic views; (5) a number of implementation details that are specific to the current state of the development; (6) research on the backwards-compatible compression and transmission of 3D imagery using state-of-the-art MPEG (Moving Pictures Expert Group) tools.
Non-orthogonal subsampling and anti-alias filtering for multiscopic 3D displays
Show abstract
Multiview passive 3-D displays, such as those based on lenticular or
parallax-barrier technologies, require multiplexing of views into a single same-size RGB image. Thus, multiplexing of N views necessitates N:1 sub-sampling of each view and must be preceded by suitable lowpass filtering to prevent, or at least reduce, aliasing. Without such filtering, objectionable "jagged" edges, distorted textures, or Moire patterns are perceived although, admittedly, these effects are not as disturbing as in the case of single-view sub-sampling without multiplexing with other views. In
this paper, unlike in our previous work, we consider anti-alias filtering derived from a non-orthogonal lattice. First, we approximate pixel layout for each view (sampling pattern) by a two-dimensional lattice; we find parameters of the lattice by minimizing a mismatch error between lattice and single-view points. Then, based on lattice parameters, we find frequency-domain specifications of the anti-alias filter. The filter has hexagonal passband and thus is non-separable. Although previously we designed such filters for
floating-point implementations, here we opt for the more practical
fixed-point arithmetic; the resulting filters can be easily implemented on ubiquitous fixed-point DSP chipsets. The fixed-point filters slightly depart from the desired magnitude specifications, but when applied to actual multiview images they produce almost indistinguishable results from those obtained by floating-point counterparts.
Mapping perceived depth to regions of interest in stereoscopic images
Show abstract
The usable perceived depth range of a stereoscopic 3D display is limited by human factors considerations to a defined range around the screen plane. There is therefore a need in stereoscopic image creation to map depth from the scene to a target display without exceeding these limits. Recent image capture methods provide precise control over this depth mapping but map a single range of scene depth as a whole and are unable to give preferential stereoscopic representation to a particular region of interest in the scene. A new approach to stereoscopic image creation is described that allows a defined region of interest in scene depth to have an improved perceived depth representation compared to other regions of the scene. For example in a game this may be the region of depth around a game character, or in a scientific visualization the region around a particular feature of interest. To realize this approach we present a novel algorithm for stereoscopic image capture and describe an implementation for the widely used ray-tracing package POV-Ray. Results demonstrate how this approach provides content creators with improved control over perceived depth representation in stereoscopic images.
Mosaicing impossible stereo views
Show abstract
Most image rendering methods try to mimic real cameras by generating images having the perspective projection. In contrast, a unique power of image mosaicing is the ability to generate new views with "impossible" projections which are not perspective. This can be done with mosaicing methods that construct a panoramic mosaic image by stitching together narrow strips, each strip taken from a different source image. A different selection of strips gives a different mosaicing effect using the same set of source images, including the generation of stereo images. For example, given a sequence of source images from a camera moving sideways, a set of panoramic stereo views can be generated, even though perspective cameras allow only a very narrow view for stereo images. And even though the original (single) camera moved sideways, a sequence of forward moving stereo images can be generated. As each of the stereo views is generated synthetically from the original images, stereo effects can be adjusted in the post production stage. Such effects include changing the stereo baseline and the vergence. Post production enables the same set of original images to be used for generating stereo images for various displays and viewing conditions.
Virtual voxel: a quantitative figure of merit for autostereoscopic display technology and implementation
Show abstract
A stereoscopic display based on the viewing of two eye-multiplexed co-planar images correlated by perspective disparity exhibits a three-dimensional lattice of finite-sized volume elements -- virtual voxels -- and corresponding depth planes whose number, global and individual shapes, and spatial arrangement all depend on the number, shape, and arrangement of the pixels in the underlying planar display and on the viewer's interocular distance and viewing geometry relative to the display. This paper illustrates the origin and derives the quantitative geometry of the virtual voxel lattice, and relates these to the quality of the display likely to be perceived and reported by a typical viewer.
An immersive display with enhanced user/data access
Show abstract
An immersive display which presents physically accessible 3D data will be discussed and demonstrated. Described will be non-planar visual environment with tactile feedback I/O allowing direct manipulation of stereoscopic images and data in user space. Presented and discussed will be a dynamic display which is incorporated into a calibrated data/display/user XYZ coordinate system.
Stereoscopic Camera Systems
Real-time capturing and interactive synthesis of 3D scenes using integral photography
Show abstract
This paper proposes a system which can capture a dynamic 3D scene and synthesize its arbitrary views in real time. Our system consists of four components: a fresnel lens, a micro-lens array, an IEEE1394 digital camera, and a PC for rendering purpose. The micro-lens array forms an image which consists of a set of elemental images, in other words, multiple viewpoint images of the scene. The fresnel lens controls the depth of field by demagnifying the 3D scene. The problem is that the scene demagnified by the fresnel lens is compressed along its optical axis. Therefore, we propose a method for recovering the original scene from the compressed scene. The IEEE1394 digital camera captures multiple viewpoint images at 15 frames per second, and transfers these images to the PC. The PC synthesizes any perspective of the captured scene from the multiple viewpoint images using image-based rendering techniques. The proposed system synthesizes one perspective of the captured scene within 1/15 second. This means that a user can interactively move his/her viewpoint and observe even a moving object from various directions.
The camera convergence problem revisited
Show abstract
Convergence of the real or virtual stereoscopic cameras is an important operation in stereoscopic display systems. For example, convergence can shift the range of portrayed depth to improve visual comfort; can adjust the disparity of targets to bring them nearer to the screen and reduce accommodation-vergence conflict; or can bring objects of interest into the binocular field-of-view. Although camera convergence is acknowledged as a useful function, there has been considerable debate over the transformation required. It is well known that rotational camera convergence or 'toe-in' distorts the images in the two cameras producing patterns of horizontal and vertical disparities that can cause problems with fusion of the stereoscopic imagery. Behaviorally, similar retinal vertical disparity patterns are known to correlate with viewing distance and strongly affect perception of stereoscopic shape and depth. There has been little analysis of the implications of recent findings on vertical disparity processing for the design of stereoscopic camera and display systems. We ask how such distortions caused by camera convergence affect the ability to fuse and perceive stereoscopic images.
A real-time ray-space acquisition system
Show abstract
This paper proposes novel Ray-Space acquisition systems that capture dynamic dense Ray-Space in video rate. Most of the previous works on Ray-Space acquisition system targeted “static” Ray-Space because of the difficulty of dealing with dynamic dense Ray-Space. In this paper, we investigate two types of real-time Ray-Space acquisition systems. One uses multiple video cameras. We developed a 16-camera setup and a capturing system using a PC cluster. Interpolation of Ray-Space is introduced to generate dense Ray-Space from sparsely placed video cameras. Another acquisition system captures a “real” dense Ray-Space without interpolation. Using a synchronized galvanometric mirror and a high-speed camera, we succeeded to capture more than 100 view images in 1/30 second. We developed a special hardware that receives digital high-speed camera signal and outputs an arbitrary viewpoint image based on Ray-Space method.
Development of a reliable and practical HD stereoscopic camera system
Show abstract
We have developed a reliable and practical HD stereoscopic camera system. It consists of a pair of full digital box-type HD video camera, small radius SD class lens, a multiplexer board and some other control boards. The camera is a parallel-axes style. We control the convergence by moving the lens slightly inward which is separated from the camera body. We have used two sets of linear motor modules to control the convergence and the distance between the two cameras precisely. The various camera parameters concerned with stereoscopic view can be displayed in the viewfinder, stored with video and used for studying picture quality improvement and assessment. We have combined zoom control with convergence control for the convenience of stereoscopic image capturing, so we can control them with one knob. They also can be controlled individually. The built-in multiplexer board receives two video signals from the left and right camera, and makes them into one side-by-side image that is compressed in half horizontally and multiplexed two images. After this process we can record the video into a normal VCR, then reconstruct the original two images by demultiplexer, and we can enjoy stereoscopic images.
An improved stereovision scheme using one camera and a composite lens array
Show abstract
The stereovision scheme is a method to extract the three-dimensional information of the original object from two or more images of it. In the conventional scheme, more than two cameras are used to acquire numbers of images with different perspectives. As the numbers of cameras are increased, the complexity of the entire system is also increased and there can be many difficulties such as camera calibration and vibration in the conventional method. In this paper, an improved stereovision scheme using single camera and a composite lens array is proposed. In the proposed system, only one camera and a composite lens array is required and the complexity of the entire system is decreased. With the use of a composite lens array, it is possible to improve the performance of the system compared with the method using a conventional lens array. The proposed method is proven to be useful by simulation and experimental results.
Autostereoscopic Displays I
Assessment and improvement of the stereo-image visualization on X3D technologies 3D displays
Show abstract
For the presentation of the stereo-image, the 3D-display of X3D Technologies uses the special optical filter structure. One of the problems appearing here is the adjustment of the optical filter with respect to the surface of the pixel structure. In many practical cases the width of the display's subpixel is not known exactly or in the process of the creation of the optical filter the attempts to realize the required sizes with a necessary accuracy failed. For the solution of these problems we suggest using the scaling, translation or rotation of the stereo-image to adjust the size of the elements of the structure of the optical filter with the pixel structure of the 3D-display. We demonstrate here also that the scaling of the stereo-images has some special features in contrast to the scaling of ordinary 2D-images. The approach may be applied also to displays with a different pixel form or with individually different widths of the RGB subpixels. The numerical algorithms based on the approach considered here have been successfully tested on our 3D-displays. These approaches may also be used for 3D-displays based on lenticular lenses.
Temporally consistent virtual camera generation from stereo image sequences
Show abstract
The recent emergence of auto-stereoscopic 3D viewing technologies has increased demand for the creation of 3D video content. A range of glasses-free multi-viewer screens have been developed that require as many as 9 views generated for each frame of video. This presents difficulties in both view generation and transmission bandwidth. This paper examines the use of stereo video capture as a means to generate multiple scene views via disparity analysis. A machine learning approach is applied to learn relationships between disparity generated depth information and source footage, and to generate depth information in a temporally smooth manner for both left and right eye image sequences. A view morphing approach to multiple view rendering is described which provides an excellent 3D effect on a range of glasses-free displays, while providing robustness to inaccurate stereo disparity calculations.
Multiview 3D projection system
Show abstract
A new optical architecture of multiview projection displays, based on mirrored light tunnel is presented. According to current concept, all perspective images should be arranged in line on the same projection panel. The light tunnel is made of two or more parallel plane mirrors, installed between the image projector and the screen. The mirrors are installed horizontally, close to the screen, being perpendicular to its surface. One of the perspective images is projected directly to the screen while the others experience reflection before they strike the screen. Vertical arrangement of the viewing zones is changed to horizontal with a help of slanted directional diffuser, incorporated in directional screen. The important features of new architecture is the usage of single projection lens and the absence of keystone distortion of projected perspective images.
Autostereoscopic Displays II
Three-dimensional interaction with autostereoscopic displays
Show abstract
We describe new techniques for interactive input and manipulation of three-dimensional data using a motion tracking system combined with an autostereoscopic display. Users interact with the system by means of video cameras that track a light source or a user's hand motions in space. We process this 3D tracking data with OpenGL to create or manipulate objects in virtual space. We then synthesize two to nine images as seen by virtual cameras observing the objects and interlace them to drive the autostereoscopic display. The light source is tracked within a separate interaction space, so that users interact with images appearing both inside and outside the display. With some displays that use nine images inside a viewing zone (such as the SG 202 autostereoscopic display from StereoGraphics), user head tracking is not necessary because there is a built-in left right look-around capability. With such multi-view autostereoscopic displays, more than one user can see the interaction at the same time and more than one person can interact with the display.
The three-dimensional display for user interface in free viewpoint television system
Show abstract
We have been developing a new television system named Free Viewpoint Television (FTV) that can generate free views. We propose a new user interface of FTV using an automultiscopic display (multi-viewing autostereoscopic display) and a head tracking system. We made a head tracking system which used cameras and didn't need to attach any sensors to a person. We succeeded to extend the viewing zone of an automultiscopic display. It brought us to interface with FTV naturally.
Implementation of projection-type autostereoscopic multiview 3D display system for real-time applications
Show abstract
In this paper, a new projection-type autostereoscopic multiview 3D display system for real-time applications is proposed by using IEEE 1394 digital cameras, Intel Xeon server computer system, projection-type 3D display system and Microsoft' DirectShow programming library and its performance is analyzed in terms of image-grabbing frame rate, displayed image resolution, possible color depth and number of views. In the proposed system, four-view color images are initially captured by using four IEEE 1394 digital cameras and then, these are processed in the Intel Xeon server computer system and they are transmitted to the graphic card having 4 output ports for supporting 4-view stereoscopic display system in real-time. These outputs are finally projected to the specially designed-Fresnel screen through four projectors to make 4-view autostereoscopic image. Also, the overall system control program is developed basing on the Microsoft's DirectShow programming library. From some experimental results, it is found that the proposed system can display four-view VGA images with a full color of 16bits and a frame rate of 15fps in real-time.
Autostereoscopic Displays III
Multiview autosterescopic display with floating real image
Show abstract
This paper proposes multiview version of autostereoscopic display FLOATS (Fresnel Lens based Optical Apparatus for Touchable-distance Stereoscopy), which combines generation of floating real image and parallax presentation to show realistic 3-D image within the viewer's reach. Earlier versions of FLOATS have required a head tracker, physical motion control of filters or mirrors, and transformation of image in accordance with the viewer's motion to keep on presenting different images to each eye. To do away with these requirements, we propose two methods which realize multiview presentation to the viewer. One method is to use multiple LCD panels and multiple fixed mirrors instead of mobile mirrors. The other method is to use mutiple projectors, fly-eye lenses, and fresnel lenses. Though the former system doesn't cost much, it is not practical to present more than 10 views. In the latter system it is practical to present more than 30 views, which can realize presentation of both horizontal and vertical parallax. With this technology the viewers can perceive undistorted 3-D space from any angle, which makes it possible for multiple viewers to observe 3-D image at consistent position from different angles at the same time.
Step barrier system multiview glassless 3D display
Show abstract
The step barrier technology with multiple parallax images has overcome the problem of conventional parallax barrier system that the image quality of each image deteriorates only in the horizontal direction. The step barrier distributes the resolution problem both to the horizontal and the vertical directions. The system has a simple structure, which consists of a flat-panel display and a step barrier. The apertures of the step barrier are not stripes but tiny rectangles that are arranged in the shape of stairs, and the sub-pixels of each image have the same arrangement. And three image processes for the system applicable to computer graphics and real image have been proposed. Then, two types of 3-D displays were developed, 22-inch model and 50-inch model. The 22-inch model employs a very high-definition liquid crystal display of 3840 x 2400 pixels. The number of parallax images is seven and the resolution of one image is 1646 x 800. The 50-inch model has four viewing points on the plasma display panel of 1280 x 768 pixels. It can provide stereoscopic animations and the resolution of one image is 960 x 256 pixels. Moreover, the structural or electric 2-D 3-D compatible system was developed.
Novel view sequential display based on DMD technology
Show abstract
The authors present work that was conducted as a collaboration between Cambridge University and MIT. The work is a continuation of previous research at Cambridge University, where several view-sequential 3D displays were built. The authors discuss a new display which they built and compare performance to previous versions. The new display utilizes a DMD projection engine, whereas previous versions used high frame rate CRTs to generate imagery. The benefits of this technique are discussed, and suggestions for future improvements are made.
DepthCube solid-state 3D volumetric display
Show abstract
The DepthCube 3D Volumetric Display is a solid state, rear projection, volumetric display that consists of two main components: a high-speed video projector, and a multiplanar optical element composed of a air-spaced stack of liquid crystal scattering shutters. The high-speed video projector projects a sequence of slices of the 3D image into the multiplanar optical element where each slice is halted at the proper depth. Proprietary multiplanar anti-aliasing algorithms smooth the appearance of the resultant stack of image slices to produce a continuous appearing truly three-dimensional image. The resultant 3D image is of exceptional quality and provides all the 3D vision cues found in viewing real object.
The fabrication of a novel projection screen for autostereoscopic display systems
Show abstract
Entertainment is usually considered an important application for stereoscopic display technologies. In order to provide more realistic and exciting VR effect, it is desirable to have as large screen as possible. However, the screen sizes of current autostereoscopic display technologies are limited by either the displaying panel or the optical components. In a government-sponsored project, we designed and fabricated a novel projection screen for autostereoscopic display. The screen consists of two layers of microretarder and a layer of polarization reserved diffuser. Both the screens and the projectors can be arrayed to build a large autostereoscopic display system. Curved or multi-plane screens are also possible. This kind of autostereoscopic display screen has the advantages of ease to scale up, low cost and no precise alignment between the projectors and the screen required. In this paper, the manufacture considerations of such a screen are studied and the experimental results are presented.
Stereoscopic Video
Production and evaluation of stereoscopic video presentation in surgical training
Show abstract
Stereoscopic video teaching can facilitate understanding for current minimally-invasive operative techniques. This project was created to set up a digital stereoscopic teaching environment for training of ENT residents and medical students. We recorded three ENT operative procedures (tympanoplasty, paranasal sinus operation and laser chordectomy) at the University Hospital Aachen. The material was edited stereoscopically at the Waseda University and converted into a streaming 3-D video format, which does not depend on PAL or NTSC signal standards. Video clips were evaluated by 5 ENT specialists and 11 residents in single sessions on an LCD monitor (8) and a CRT monitor (8). Emphasis was laid on depth perception, visual fatigue and time to achieve stereoscopic impression. Qualitative results were recorded on a visual analogue scale, ranging from 1 (excellent) to 5 (bad). The overall impression was rated 2,06 to 3,13 in the LCD group and 2,0 to 2,62 in the CRT group. The depth impression was rated 1,63 to 2,88 (LCD) and 1,63 to 2,25 (CRT). Stereoscopic video teaching was regarded as useful in ENT training by all participants. Further points for evaluation will be the quantification of depth information as well as the information gain in teaching junior colleagues.
Visual comfort/discomfort and visual fatigue caused by stereoscopic HDTV viewing
Show abstract
The problems associated with watching stereoscopic HDTV have been classified into three groups, one of which is how natural/unnatural stereoscopic pictures look to viewers. It is known that the shooting and viewing conditions affect the depth of a stereoscopic image, and this depth distortion is a major factor influencing the viewer's stereoscopic perception. The second group concerns the visual comfort/discomfort. Visual discomfort is caused by the difficulty of fusing left and right images because of excessive binocular parallax and its temporal changes. We have studied how visual comfort is affected by the range of parallax distribution and temporal parallax changes. The results show that stereoscopic images are comfortable to view for an angular parallax of up to about 60 minutes and that visual comfort is achieved if discontinuous temporal changes are angle of 60 minutes or less. The third group concerns visual fatigue that a viewer experiences after viewing a stereoscopic HDTV program, which is thought to be mainly caused by the mismatch between the eyes' convergence and accommodation. We confirmed that, after observing stereoscopic images for about an hour, the fusion range diminishes and the viewer's visual functions deteriorate as a result.
DepthQ: universal system for stereoscopic video visualization on WIN32 platform
Show abstract
We have developed software for flexible and cost effective high-resolution stereoscopic video playback from an off-the-shelf Windows compatible computer. The software utilizes the highly flexible input format created through compatibility with the Microsoft DirectShow standard. Video processing speeds are based on selected compression method usage in combination with hardware acccelerated OpeGL data post processing. The key features of the software are: support for multiple input and output formats, on the fly format conversion, up to HDTV (currently 1280 x 720) per eye resolution, ability to preview data from stereoscopic cameras, and adjustable stereoscopic data corrections.
Integral 3D Imaging
Improvement of integral 3D image quality by compensating for lens position errors
Show abstract
Integral photography (IP) or integral imaging is a way to create natural-looking three-dimensional (3-D) images with full parallax. Integral three-dimensional television (integral 3-D TV) uses a method that electronically presents 3-D images in real time based on this IP method. The key component is a lens array comprising many micro-lenses for shooting and displaying. We have developed a prototype device with about 18,000 lenses using a super-high-definition camera with 2,000 scanning lines. Positional errors of these high-precision lenses as well as the camera's lenses will cause distortions in the elemental image, which directly affect the quality of the 3-D image and the viewing area. We have devised a way to compensate for such geometrical position errors and used it for the integral 3-D TV prototype, resulting in an improvement in both viewing zone and picture quality.
Extraction and conversion of the 3D information for integral imaging
Show abstract
Integral imaging is one of the most attractive methods for displaying three-dimensional images. The lens array mismatch between the pickup and display systems or between different display systems is an important problem for the practical implementation of the three-dimensional display system based on integral imaging. In this paper, we provide a solution to that problem by extracting the three-dimensional information from the elemental images. The extracted three-dimensional information is modified to be suitable for the different lens arrays in the integral imaging display systems. Thus our method gives excellent flexibility on the system parameters of the various integral imaging systems and has additional advantage of reducing the required data size for the three-dimensional data storage or transmission.
Stereoscopic Developments I
Dynamic dimension: system for simultaneous 3D and monoscopic viewing
Show abstract
We propose the 'Dynamic Dimension' system that enables simultaneous viewing of 3D and monoscopic content on glasses-based stereo displays (e.g. CRT, Plasma, LCD). A viewer can choose to wear glasses and see content in 3D, or he may decide not to wear glasses, and see high-quality monoscopic content. The Dynamic Dimension system is based on simple image processing such as addition and subtraction. The input images can be captured by a triple camera setup or be rendered from so-called RGBD video, an ad-hoc standard for 3D video. From several subjective tests, we conclude that Dynamic Dimension produces a very much present and appealing 3D effect, while the monoscopic image quality remains high and totally unaffected.
HMD-type multifocus 3D display system
Show abstract
The 3-dimensional image display system using only binocular disparity can induce the eye fatigue because of the mismatch between the accommodation of each eye and the convergence of two eyes. A new 3-dimensional display system for one observer that can solve eye fatigue caused by mismatch between accommodation and convergence is introduced in this paper. A proof about the possibility of satisfaction of accommodation of one eye is given as the experimental result from this 3-dimensional display system.
Ghosting in anaglyphic stereoscopic images
Show abstract
Anaglyphic 3D images are an easy way of displaying stereoscopic 3D images on a wide range of display types, e.g. CRT, LCD, print, etc. While the anaglyphic 3D image method is cheap and accessible, its use requires a compromise in stereoscopic image quality. A common problem with anaglyphic 3D images is ghosting. Ghosting (or crosstalk) is the leaking of an image to one eye, when it is intended exclusively for the other eye. Ghosting degrades the ability of the observer to fuse the stereoscopic image and hence the quality of the 3D image is reduced. Ghosting is present in various levels with most stereoscopic displays, however it is often particularly evident with anaglyphic 3D images. This paper describes a project whose aim was to characterize the presence of ghosting in anaglyphic 3D images due to spectral issues. The spectral response curves of several different display types and several different brands of anaglyph glasses were measured using a spectroradiometer or spectrophotometer. A mathematical model was then developed to predict the amount of crosstalk in anaglyphic 3D images when different combinations of displays and glasses are used, and therefore predict the best type of anaglyph glasses for use with a particular display type.
Stereoscopic Developments II
Stereoscopic retinal scanning laser display with integrated focus cues for ocular accommodation
Show abstract
We describe a full-color stereoscopic display that varies the focus of objects at different distances in a displayed scene to match vergence and stereoscopic retinal disparity demands, better approximating natural vision. In natural vision, the oculomotor processes of accommodation (eye focus) and vergence (angle between lines of sight of two eyes) are reflexively linked such that a change in one drives a matching change in the other. Conventional stereoscopic displays require viewers to decouple these processes, and accommodate at a fixed distance while dynamically varying vergence to view objects at different stereoscopic distances. This decoupling generates eye fatigue and compromises image quality when viewing such displays. In contrast, our display overcomes this cue conflict by using a deformable membrane mirror to dynamically vary the focus of luminance-modulated RGB laser beams before they are raster-scanned and projected directly onto the retina. The display has a large focal range (closer than the viewer's near point to infinity) and presents high-resolution (1280x480) full-color images at 60 Hz. A viewer of our display can shift accommodation naturally from foreground to background of a stereo image, thereby bringing objects at different distances into and out of focus. Design considerations and human factors data are discussed.
Implementation issues for the full-time full-resolution stereoscopic 3D flat panel display
Show abstract
The proliferation of remotely operated vehicles (ROVs) has resulted in a need for the capability to see the operational environment in stereo. In a previous paper the theoretical underpinnings for new types of stereoscopic and autostereoscopic flat-panel displays with full-time, full-resolution images (i.e., no temporal multiplexing and no spatial multiplexing) were presented. Recently, a stereoscopic prototype has been constructed at the U.S. Army Aviation & Missile RDEC and testing is underway. The research presented here describes the application of two liquid crystal displays (LCD) sandwiched together to form a compact, rugged stereoscopic display. Polarized glasses are used to view the image in stereo. The prototype provides a full-time, full-resolution stereoscopic 3D display in a package slightly thicker, but no larger, than the standard liquid crystal display used in laptop computers. The LCDs have been characterized using a Stokes vector polarimeter. The characterization results were very interesting and led to some changes in the encoding algorithms. Significant improvements in the display quality were achieved through these adaptations.
Poster Pop Session
Camera system for autostereoscopic display using floating real image
Show abstract
The present paper proposes a 3D camera system for teleoperation using autostereoscopic display based on floating real image. To present the operator 3-D images which correspond to his viewpoint, the image has to be updated in accordance with the motion of the operator's head. The proposed method combines camera motion control, which keeps on taking the proper texture for the viewpoint, and the image transformation software, which copes with the fast motion of the viewer the camera motion cannot follow. With this technology, presentation of robust 3-D image is realized.
Design and feasibility test for directional diffractive optical elements for LCD-based stereoscopic systems
Show abstract
We introduce the concept of directional diffractive optical elements (DOEs) for LCD backlight illumination for 3D display systems. Stereoscopic images can be obtained by two camera systems. And then, they are displayed at an LCD by interlacing. To generate 3D images, the LCD is programmed to display left and right images of a stereo pair on alternate rows of pixels. The light is diffracted by the DOEs designed for splitting the viewing directions and for adjusting the size of the viewing zones. The lines are spaced with respect to the LCD pixel rows such that the left eye sees all the lines through the odd rows of the LCD, while the right eye sees them through even rows. The DOEs are designed by iterative Fourier transform algorithm for high diffraction efficiency and uniform illumination distribution. For eliminating the twin image noise, the DOEs are designed by eight-quantized levels and synthesized for generating the stereoscopic viewing region. We discuss the design issues of this method and also discuss the advantages of this method over the conventional lenticular method. We will also discuss the initial experiment of the DOE characteristics for this purpose.
Depth-enhanced integral 3D imaging using a polarization-multiplexed display with different optical path lengths
Show abstract
Depth-enhanced integral three-dimensional (3D) imaging using different optical path lengths and polarization selective mirror pair is proposed. In this approach, the enhancement of image depth is achieved by repositioning two types of elemental image planes, thus effectively two central depth planes are obtained. The system makes use of two-arm-structure that has different optical path lengths and polarization selective mirrors. The primary advantage of our proposed method is that we can observe 3D images that maintain some level of viewing resolution with large depth difference without any mechanical moving part. We experimentally demonstrated our proposal by reconstructing real and virtual images with the depth difference of 140 mm successfully.
Development of a stereoscopic 3D display system to observe restored heritage
Show abstract
The authors have developed a binocular-type display system that allows digital archives of cultural assets to be viewed in their actual environment. The system is designed for installation in locations where such cultural assets were originally present. The viewer sees buildings and other heritage items as they existed historically by looking through the binoculars. Images of the cultural assets are reproduced by stereoscopic 3D CG in cyberspace, and the images are superimposed on actual images in real-time. This system consists of stereoscopic CCD cameras that capture a stereo view of the landscape and LCDs for presentation to the viewer. Virtual cameras, used to render CG images from digital archives, move in synchrony with the actual cameras, so the relative position of the CG images and the landscape on which they are superimposed is always fixed. The system has manual controls for digital zoom. Furthermore, the transparency of the CG images can be altered by the viewer. As a case study for the effectiveness of this system, the authors chose the Heijyoukyou ruins in Nara, Japan. The authors evaluate the sense of immersion, stereoscopic effect, and usability of the system.
Synthesis and Design
Presentation of a large amount of moving objects in a virtual environment
Show abstract
It needs a lot of consideration to manage the presentation of a large amount of moving objects in virtual environment. Motion state model (MSM) is used to represent the motion of objects and 2n tree is used to index the motion data stored in database or files. To minimize the necessary memory occupation for static models, cache with LRU or FIFO refreshing is introduced. DCT and wavelet work well with different playback speeds of motion presentation because they can filter low frequencies from motion data and adjust the filter according to playback speed. Since large amount of data are continuously retrieved, calculated, used for displaying, and then discarded, multithreading technology is naturally employed though single thread with carefully arranged data retrieval also works well when the number of objects is not very big. With multithreading, the level of concurrence should be placed at data retrieval, where waiting may occur, rather than at calculating or displaying, and synchronization should be carefully arranged to make sure that different threads can collaborate well. Collision detection is not needed when playing with history data and sampled current data; however, it is necessary for spatial state prediction. When the current state is presented, either predicting-adjusting method or late updating method could be used according to the users' preference.
The DAVRS environment for architecture design
Show abstract
This paper described the design of the DAVRS system. This system not only provides a 3D design environment for architects, but also realizes distributed collaboration between the designers through the internet. The DAVRS system used Java3D to construct virtual sense, XML to pocket sense controlling data, and Java MQ to transmit data. Moreover, a three-level distributed collaboration model is designed in order to confirm the safety of collaboration based on internet.
Experiments to evolve toward a tangible user interface for computer-aided design parts assembly
Show abstract
In this paper, we present the concepts of the ESKUA (Experimentation of a Kinesics System Usable for Assembly) platform that allows designers to carry out the assembly of mechanical CAD (Computer Aided Design) parts. This platform, based on tangible user interface lead taking into account assembly constraints from the beginning of the design phase and especially during the phase of CAD models manipulation. Our goal is to propose a working environment where the designer is confronted with real assembly constraints which are currently masked by existing CAD software functionalities. Thus, the platform is based on the handling of physical objects, called tangible interactors, which enable having a physical perception of the assembly constraints. In this goal, we have defined a typology of interactors based on concepts proposed in Design For Assembly methods. We present here the results of studies that led to the evolution of this first interactors set. One is concerning an experiment to evaluate the cognitive aspects of the use of interactors. The other is about an analysis of existing mechanical product and fasteners. We will show how these studies lead to the evolution of the interactors based on the functional surfaces use.
Research Programs
Sharing skills: using augmented reality for human-robot collaboration
Show abstract
Both stationary 'industrial' and autonomous mobile robots nowadays pervade many workplaces, but human-friendly interaction with them is still very much an experimental subject. One of the reasons for this is that computer and robotic systems are very bad at performing certain tasks well and robust. A prime example is classification of sensor readings: Which part of a 3D depth image is the cup, which the saucer, which the table? These are tasks that humans excel at.
To alleviate this problem, we propose a team approah, wherein the robot records sensor data and uses an Augmented-Reality (AR) system to present the data to the user directly in the 3D environment. The user can then perform classification decisions directly on the data by pointing, gestures and speech commands. After the classification has been performed by the user, the robot takes the classified data and matches it to its environment model. As a demonstration of this approach, we present an initial system for creating objects on-the-fly in the environment model. A rotating laser scanner is used to capture a 3D snapshot of the environment. This snapshot is presented to the user as an overlay over his view of the scene. The user classifies unknown objects by pointing at them. The system segments the snapshot according to the user's indications and presents the results of segmentation back to the user, who can then inspect, correct and enhance them interactively. After a satisfying result has been reached, the laser-scanner can take more snapshots from other angles and use the previous segmentation hints to construct a 3D model of the object.
Jedi training: playful evaluation of head-mounted augmented reality display systems
Show abstract
A fundamental decision in building augmented reality (AR) systems is how to accomplish the combining of the real and virtual worlds. Nowadays this key-question boils down to the two alternatives video-see-through (VST) vs. optical-see-through (OST). Both systems have advantages and disadvantages in areas like production-simplicity, resolution, flexibility in composition strategies, field of view etc. To provide additional decision criteria for high dexterity, accuracy tasks and subjective user-acceptance a gaming environment was programmed that allowed good evaluation of hand-eye coordination, and that was inspired by the Star Wars movies. During an experimentation session with more than thirty participants a preference for optical-see-through glasses in conjunction with infra-red-tracking was found. Especially the high-computational demand for video-capture, processing and the resulting drop in frame rate emerged as a key-weakness of the VST-system.
Shared database of annotation information for wearable augmented reality system
Show abstract
This paper describes a database of annotation information for augmented reality (AR) on wearable computers. With the advance of computers, AR systems using wearable computers have received a great deal of attention as a new method for displaying location-based information in real-time. To overlay annotations on the real scene image, a user's computer needs to hold annotation information. Up to this time, since a database of annotation information is usually held in advance in the wearable computer, it is difficult for the database of annotation information to be effectively updated or added by providers of information. The purpose of this paper is to construct a networked database system of annotation information for wearable AR systems. The proposed system provides users with annotation information from a server via a wireless network so that the wearable computers do not need to hold it in advance and information providers can easily update and add the database with a web browser. In this study, we have developed a shared database of annotation information and have proven the feasibility of the prototype system with preliminary experiments. In experiments, the orientation and the position of user's viewpoint are measured by integrating several kinds of sensors. The user's position-based annotations have been proven to be shown to the user automatically. Moreover, we have confirmed that the database can be successfully updated and added by information providers with a web browser.
Immersive telepresence system using high-resolution omnidirectional movies and a locomotion interface
Show abstract
Technology that enables users to experience a remote site virtually is called telepresence. A telepresence system using real environment images is expected to be used in the field of entertainment, medicine, education and so on. This paper describes a novel telepresence system which enables users to walk through a photorealistic virtualized environment by actual walking. To realize such a system, a wide-angle high-resolution movie is projected on an immersive multi-screen display to present users the virtualized environments and a treadmill is controlled according to detected user's locomotion. In this study, we use an omnidirectional multi-camera system to acquire images real outdoor scene. The proposed system provides users with rich sense of walking in a remote site.
Technology and Applications
Real-time data fusion on stabilizing camera pose estimation output for vision-based road navigation
Show abstract
This paper presents a novel framework of vision-based road navigation system, which superimposes virtual 3D navigation indicators and traffic signs onto the real road scene in an Augmented Reality (AR) space. To properly align objects in the real and virtual world, it is essential to keep tracking camera's exact 3D position and orientation, which is well known as the Registration Problem. Traditional vision based or inertial sensor based solutions are mostly designed for well-structured environment, which is however unavailable for outdoor uncontrolled road navigation applications. This paper proposed a hybrid system that combines vision, GPS and 3D inertial gyroscope technologies to stabilize the camera pose estimation output. The fusion approach is based on our PMM (parameterized model matching) algorithm, in which the road shape model is derived from the digital map referring to GPS absolute road position, and matches with road features extracted from the real image. Inertial data estimates the initial possible motion, and also serves as relative tolerance to stable the pose output. The algorithms proposed in this paper are validated with the experimental results of real road tests under different road conditions.
Effect of visual distortion on postural balance in a full immersion stereoscopic environment
Show abstract
This study attempted to determine the influence of non-linear visual movements on our capacity to maintain postural control. An 8x8x8 foot CAVE immersive virtual environment was used. Body sway recordings were obtained for both head and lower back (lumbar 2-3) positions. The subjects were presented with visual stimuli for periods of 62.5 seconds. Subjects were asked to stand still on one foot while viewing stimuli consisting of multiplied sine waves generating movement undulation of a textured surface (waves moving in checkerboard pattern). Three wave amplitudes were tested: 4 feet, 2 feet, and 1 foot. Two viewing conditions were also used; observers looking at 36 inches in front of their feet; observers looking at a distance near the horizon. The results were compiled using an instability index and the data showed a profound and consistent effect of visual disturbances on postural balance in particular for the x (side-to-side) movement. We have demonstrated that non-linear visual distortions similar to those generated by progressive ophthalmic lenses of the kind used for presbyopia corrections, can generate significant postural instability. This instability is particularly evident for the side-to-side body movement and is most evident for the near viewing condition.
Human factor integration into the development of a realistic tree-rendering system based on lidar remote sensing
Show abstract
This paper introduces application of the Cave Automatic Virtual Environment (CAVE) to forest visualization and user studies which were designed to gain insight into human factors for system development. This interdisciplinary research project was undertaken by the Visualization, Analysis, and Imaging Laboratory and the Department of Forestry at Mississippi State University (MSU). The purpose was to create a forest management tool for remote examination of stands in a stereoscopic environment which allows users to observe and interact with realistic virtual stands. The datasets used in this study include measurements such as total height, Diameter at the Breast Height (DBH), and crown radii. The datasets were directly and indirectly generated from Light Detection and Ranging (LiDAR) data. The datasets from immature (eight-years-old) high density and mature (40-years-old) low density loblolly pine (Pinus taeda) stands were used to generate three types of tree models. These three models represent trees in different graphic-complexities and thus interactivity. In general, higher fidelity is preferred in visualization. However, there is a trade-off between graphic details and interaction speed. To determine an optimal model, a user study was designed to examine the influence photo-reality and interactivity have on the viewer's perception. Human subjects recruited from MSU's Department of Forestry will explore virtual stands rendered with one of the tree models in the CAVE and estimate forest parameters.
Development of a 3D interaction table
Show abstract
We have identified the need for, and started development of, a new tool we call an interaction table. In this paper the experiences with the first prototype are described. The interaction table presents a computer-generated, autostereoscopic, three-dimensional image that can be viewed and interacted with.
Special Session: Virtual Reality Works
Visual navigation structures in collaborative virtual environments
Show abstract
The international Grid, or the iGrid, is a fertile ground for exploring levels of sensorial communication, visual metaphors and navigation strategies. This paper seeks to answer the following questions: What is the iGrid? What does it mean to share a collaborative virtual environment (CVE)? What implications does sharing CVEs have for communication? What are visual navigation strategies across a high performance high-speed network? How can art shape experience in a technological world? Networking virtual environments via the iGrid establishes a performance theater where academics and researchers create dialogues between disciplines. CVEs synergize towards a new dimension of literacy where knowledge is presented as abstract visual engagement. In CVEs, the visuals act as three dimensional navigation icons that can symbolize a choice to be made, a direction to consider or a sequence in a narrative. The user's ability to influence events and receive feedback from the environment through sensorial stimulation enhances the level of immersion. Art CVEs function across the network based on aesthetic style, engagement, levels of interaction and the quality of audio immersion. A level of plasticity, or malleability, is required in CVEs to encourage participants to become directly involved with understanding and realizing the environment.
Stereoscopic Compression
Novel viewing zone control method for computer-generated integral 3-D imaging
Show abstract
We propose a novel algorithm to maximize the viewing zone of integral 3-D imaging (II) display. In our algorithm, the elemental image array consists of two kinds of elemental images whose numbers of sub-pixels are N and (N+1). The pitch of exit pupils was set to be N times the width of the sub-pixel and an average width of elemental images was designed to exceed the pitch of the exit pupils to a small extent by distributing the elemental images consisting of (N+1) sub-pixels. Under this condition, all light rays generated from elemental images can be introduced to the viewing zone width (viewing width) on the viewing line at the distance L without converging points of light rays at around L. This algorithm was applied to one-dimensional II system with 32 parallax light rays using a 20.8”-QUXGA-LCD (192 ppi) equipped with a lenticular sheet. Then, the viewing width at 1.5 m was expanded to 500 mm, a value almost five times larger than the width of a conventional display system. Even if hardware configurations are fixed, our algorithm enables a viewing zone to be the maximum at a certain L.