3D TV is a turning point in the history of television. Here, technologies from computer graphics, computer vision, multimedia, telecommunications, broadcasting, and other related fields merge to expand the sensations provided by watching conventional 2D TV.1
As a result, there has been much research on multi-view image processing and 3D display.2,3
However, most of the previous studies have focused on acquiring and displaying images while optimizing their 'naturalness', simplicity of computation, and real-time operation. Free-viewpoint TV (FTV)4
takes this idea forward. Free-viewpoint systems allow a viewer to look at a scene from the perspective that they want rather than what a director chooses to provide.1
In order to generate a scene according to the user's requirements, a position in 3D space has to be defined as the view point.
To achieve this, we suggest an intelligent approach that automatically generates reference viewpoints based on a theory of human visual attention. Many studies of visual attention and eye movements have shown that humans generally only attend to a few areas in an image rather than scanning the whole, and visual attention models provide a general approach to controlling the activities of active vision systems.5
These models of selective visual attention have been suggested on the basis of evidence in psychology, psychophysics, physiology, etc., and can also be exploited for our purposes.
shows an examples of how our artificial human visual attention system can be used to obtain a basis position to generate a scene that corresponds to a user's viewpoint. The system also simultaneously computes the intensity of salience of an object in a given region. In fact, several objects or regions can be detected in terms of their their individual saliencies, which can in turn be used to describe them. The system computes early visual features from a set of pre-attentive feature maps in a massively parallel way. Activity from all feature maps is combined at each location, giving rise to the responses in the topographic saliency map shown in Figure 2
Figure 1. Schematic diagram showing the proposed system based on a bottom-up approach to attention.
Figure 2. Images (a) and (d) show inputs and (b) and (e) the obtained saliency maps. Graphs (c) and (f) describe the corresponding salient intensity. As shown, the attention system detects spatially salient objects in the environment.
In our perceptive remote-control system, the focuses of attention-as determined by our model-are displayed and candidate viewpoints presented (see Figure 3
). The user then chooses from them using a touch screen. The proposed remote controller system is smart in that it provides the viewers with several interesting viewpoints without disorienting them at the same time.
Figure 3. The diagram shows the interactive relationship between the 3D TV system and the proposed smart remote controller. Several focuses of attention are computed using the attention system and transmitted to the device. A viewer then chooses a candidate viewpoint using the touchscreen and the corresponding viewpoint is displayed.
In our experiment, the 3D TV is realized using a networked desktop computer (see Figure 4
). The 3D TV is networked with the PDA (smart remote controller) using a wireless link. Zigbee and ultra-wideband (UWB) are possible candidates for this networking in future.
Figure 4. The proposed remote controller is realized using a personal digital assistant (PDA) and wireless Internet. The system first computes some candidate viewpoints (left) and transmits them to the PDA from which the viewer can choose, and the selected view is displayed on the 3D TV.
Free-viewpoint television, despite being a great advance, could initially prove difficult for some people, requiring users to acquire a level of skill before they can enjoy it fully. Our proposed remote controller works out this problem by providing the user with reference viewpoints, second-guessing their likely interest based on human visual attention models.We believe that this is one of the first time that user's (rather than producer's) perspective on the technology has been properly addressed, and that it is therefore an important contribution to the technology.
Min-Chul Park, Sung Kyu Kim
Systems Technology Division,
Korea Institute of Science and Technology (KIST)
Min-Chul Park received his MS and PhD in Information and Communication Engineering from the University of Tokyo in 1997 and 2000, respectively. He was invited to become an associate professor with the Department of Electrical Engineering, Tokyo University of Science, in 2005. He is currently a senior research scientist at KIST. His research interests include 3D image display and visual communication. In addition, he has presented several papers on 3D TV, video, and display at SPIE's Optics East conference in the last two years.
Sung-Kyu Kim received his BS, MS, and PhD degrees from the Quantum Optics Group, Physics Department, Korea University, in 1989, 1991, and 2000, respectively. In 2001 he was appointed as a senior research scientist at KIST. His research interests include optical design of 3D display systems, digital holography, and multi-focus 3D display. He has also presented several papers on 3D TV, video and display at recent SPIE Optics East meetings.
Whole Image Laboratory,
Jung-Young Son received his PhD in applied optics in 1985 from the University of Tennessee. He worked at KIST as a principal research scientist in optics from 1989 to 2002. Since 2002, he has been working as a research professor at Hanyang University. His primary interests are focused on the recording, display, and transmission of 3D images, electro-holography, and laser-based optical instrumentation and measurement. He is a fellow of SPIE, the Optical Society of Korea, Sigma Xi, Phi Kappa Phi, IEEE, and the Optical Society of America. He has more than 200 journal and conference proceeding articles, nine books, and more than 50 registered patents.