Proceedings Volume 6074

Multimedia on Mobile Devices II

cover
Proceedings Volume 6074

Multimedia on Mobile Devices II

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 23 January 2006
Contents: 11 Sessions, 32 Papers, 0 Presentations
Conference: Electronic Imaging 2006 2006
Volume Number: 6074

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Invited Paper I
  • Multimedia Coding I
  • Multimedia Coding II
  • Invited Paper II
  • Mobile Multimedia Retrieval and Classification
  • Processors for Multimedia
  • Multimedia Applications I
  • Multimedia Applications II
  • Multimedia Data Management
  • HCI Issues for Mobile Devices
  • Poster Session
Invited Paper I
icon_mobile_dropdown
New APIs for mobile graphics
Kari Pulli
Progress in mobile graphics technology during the last five years has been swift, and it has followed a similar path as on PCs: early proprietary software engines running on integer hardware paved the way to standards that provide a roadmap for graphics hardware acceleration. In this overview we cover five recent standards for 3D and 2D vector graphics for mobile devices. OpenGL ES is a low-level API for 3D graphics, meant for applications written in C or C++. M3G (JSR 184) is a high-level 3D API for mobile Java that can be implemented on top of OpenGL ES. Collada is a content interchange format and API that allows combining digital content creation tools and exporting the results to different run-time systems, including OpenGL ES and M3G. Two new 2D vector graphics APIs reflect the relations of OpenGL ES and M3G: OpenVG is a low-level API for C/C++ that can be used as a building block for a high-level mobile Java API JSR 226.
Multimedia Coding I
icon_mobile_dropdown
A novel fast inter-prediction mode decision for H.264/AVC
Yi Guo, Houqiang Li, Shibao Pei, et al.
Video coding in wireless environment requires lower computational complexity and lower energy consumption than that used in storage oriented or network oriented application. Although H.264/AVC standard provides considerable higher compression efficiency as compared to the previous standards, its complexity is significantly increased at the same time. In a H.264/AVC encoder, the most time-consuming components are variable block sizes motion estimation and mode decision using rate-distortion optimization (RDO). In this paper, we propose a novel fast inter-prediction mode decision by exploiting high correlation between rate-distortion costs (RD cost) of macroblocks in the current inter frame and their co-located macroblocks in the previous inter frame. Using this new algorithm, we can reduce a number of inter mode candidates and skip motion estimation for these modes. In addition, our algorithm can also decrease a number of tested intra modes. Simulation results show that our approach can save 20% to 50% encoding time, with a negligible PSNR loss less than 0.1 dB and bit rate increase no more than 2% for almost all the test sequences.
Multimedia Coding II
icon_mobile_dropdown
Concealment driven bit rate reduction using the H.264 video coding standard
With the continual increase in both popularity and power of portable multimedia devices, it is often desirable to stream video across telephone networks using modern encoding standards, even when the transmission network has only limited bandwidth available. This paper demonstrates an alternative use of the passive error concealment algorithm that is currently used in the H.264 video coding standard. As each macroblock is encoded a concealed version is also generated in the same way that a decoder would conceal an erroneous macroblock. Occasionally the concealed version is mathematically closer to the original source than the normally reconstructed version, in these cases the macroblock is flagged as containing no data and is not included in the bit stream. The decoder then uses the already present weighted pixel value averaging concealment technique to reconstruct the flagged macroblocks as it renders the picture. The proposed method has been tested over a wide variety of test sequences at various sizes and qualities; the outcome of the research is that for almost identical mathematical and visual metric results, a significant reduction in the bit stream size can be achieved, in some cases by as much as 5%.
Image embedded coding with edge preservation based on local variance analysis for mobile applications
Gaoyong Luo, David Osypiw
Transmitting digital images via mobile device is often subject to bandwidth which are incompatible with high data rates. Embedded coding for progressive image transmission has recently gained popularity in image compression community. However, current progressive wavelet-based image coders tend to send information on the lowest-frequency wavelet coefficients first. At very low bit rates, images compressed are therefore dominated by low frequency information, where high frequency components belonging to edges are lost leading to blurring the signal features. This paper presents a new image coder employing edge preservation based on local variance analysis to improve the visual appearance and recognizability of compressed images. The analysis and compression is performed by dividing an image into blocks. Fast lifting wavelet transform is developed with the advantages of being computationally efficient and boundary effects minimized by changing wavelet shape for handling filtering near the boundaries. A modified SPIHT algorithm with more bits used to encode the wavelet coefficients and transmitting fewer bits in the sorting pass for performance improvement, is implemented to reduce the correlation of the coefficients at scalable bit rates. Local variance estimation and edge strength measurement can effectively determine the best bit allocation for each block to preserve the local features by assigning more bits for blocks containing more edges with higher variance and edge strength. Experimental results demonstrate that the method performs well both visually and in terms of MSE and PSNR. The proposed image coder provides a potential solution with parallel computation and less memory requirements for mobile applications.
Image coding using adaptive resizing in the block-DCT domain
In this paper, we propose an image coding scheme using adaptive resizing algorithm to obtain more compact coefficient representation in the block-DCT domain. Standard coding systems, e.g. JPEG baseline, utilize the block-DCT transform to reduce spatial correlation and to represent the image information with a small number of visually significant transform coefficients. Because the neighboring coefficient blocks may include only a few low-frequency coefficients, we can use downsizing operation to combine the information of two neighboring blocks into a single block. Fast and elegant image resizing methods operating in transform domain have been introduced previously. In this paper, we introduce a way to use these algorithms to reduce the number of coefficient blocks that need to be encoded. At the encoder, the downsizing operation should be performed delicately to gain compression efficiency. The information of neighboring blocks can be efficiently combined if the blocks do not contain significant highfrequency components and if the blocks share similar characteristics. Based on our experiments, the proposed method can offer from 0 to 4 dB PSNR gain for block-DCT based coding processes. Best performance can be expected for large images containing smooth homogenous areas.
Spatial scalability of multiple ROIs in scalable video coding
Tae Meon Bae, Truong Cong Thang, Duck Yeon Kim, et al.
In this paper, we propose a new functionality to Scalable Video Coding, which is the support of multiple ROIs. SVC is targeted at the flexible extraction of some bitstream from the original SVC bitstream, and it is discussed in the MPEG committee for standardization. Region of interest (ROI) is an area that is semantically important to a particular user. It is expected that a bitstream that contains the ROI can be extracted without any transcoding operations. In many cases, the user may want to see more than one ROIs at the same time. The existence of multiple ROIs results in some difficulties in extracting the bitstream containing more than one ROI. In this paper, we present solutions to address these difficulties.
Invited Paper II
icon_mobile_dropdown
Implementing energy efficient embedded multimedia
Olli Silven, Tero Rintaluoma, Kari Jyrkkä
Multimedia processing in battery powered mobile communication devices is pushing their computing power requirements to the level of desktop computers. At the same time the energy dissipation limit stays at 3W that is the practical maximum to prevent the devices from becoming too hot to handle. In addition, several hours of active usage time should be provided on battery power. During the last ten years the active usage times of mobile communication devices have remained essentially the same regardless of big energy efficiency improvements at silicon level. The reasons can be traced to the design paradigms that are not explicitly targeted to creating energy efficient systems, but to facilitate implementing complex software solutions by large teams. Consequently, the hardware and software architectures, including the operating system principles, are the same for both mainframe computer system and current mobile phones. In this paper, we consider the observed developments against the needs of video processing in mobile communication devices and consider means of implementing energy efficient video codecs both in hardware and software. Although inflexible, monolithic video acceleration hardware is an attractive solution, while software based codecs are becoming increasingly difficult to implement in an energy efficient manner due to increasing system complexity. Approaches that combine both the flexibility of software and energy efficiency of hardware remain to be seen.
Mobile Multimedia Retrieval and Classification
icon_mobile_dropdown
A study of low-complexity tools for semantic classification of mobile video
With the proliferation of cameras in handheld devices that allows users to capture still images and videos, providing users with software tools to efficiently manage multimedia content has become essential. In many cases users desire to organize their personal media content using high-level semantic labels. In this paper we will describe low-complexity algorithms that can be used to derive semantic labels, such as "indoor/outdoor," "face/not face," and "motion/not motion" for mobile video sequences. We will also describe a method for summarizing mobile video sequences. We demonstrate the classification performance of the methods and their computational complexity using a typical processor used in many mobile terminals.
Audio-based queries for video retrieval over Java enabled mobile devices
Iftikhar Ahmad, Faouzi Alaya Cheikh, Serkan Kiranyaz, et al.
In this paper we propose a generic framework for efficient retrieval of audiovisual media based on its audio content. This framework is implemented in a client-server architecture where the client application is developed in Java to be platform independent whereas the server application is implemented for the PC platform. The client application adapts to the characteristics of the mobile device where it runs such as screen size and commands. The entire framework is designed to take advantage of the high-level segmentation and classification of audio content to improve speed and accuracy of audio-based media retrieval. Therefore, the primary objective of this framework is to provide an adaptive basis for performing efficient video retrieval operations based on the audio content and types (i.e. speech, music, fuzzy and silence). Experimental results approve that such an audio based video retrieval scheme can be used from mobile devices to search and retrieve video clips efficiently over wireless networks.
Processors for Multimedia
icon_mobile_dropdown
Parallel implementation of Mpeg-2 video decoder
Abhik Sarkar, Kaushik Saha, Srijib Maiti
The demand for real-time MPEG decoding is growing in multimedia applications. This paper discusses a hardware-software co-design for MPEG-2 Video decoding and describes an efficient parallel implementation of the software module. We have advocated the usage of hardware for VLD since it is inherently serial and efficient hardware implementations are available. The software module is a macro-block level parallel implementation of the IDCT and Motion Compensation. The parallel implementation has been achieved by dividing the picture, into two halves for 2-processor implementation and into four quadrants for 4-processor implementation, and assigning the macro-blocks present in each partition to a processor. The processors perform IDCT and Motion Compensation in parallel for the macro-blocks present in their allotted sections. Thus each processor displays 1/no_of_processors of a picture frame. This implementation minimizes the data dependency among processors while performing the Motion Compensation since data dependencies occur only at the edges of the divided sections. Load balancing among the processors has also been achieved, as all the processors perform computation on an equal number of macro-blocks. Apart from these, the major advantage is that the time taken to perform the IDCT and Motion Compensation reduces linearly with an increase in number of processors.
Software-based geometry operations for 3D computer graphics
Mihai Sima, Daniel Iancu, John Glossner, et al.
In order to support a broad dynamic range and a high degree of precision, many of 3D renderings fundamental algorithms have been traditionally performed in floating-point. However, fixed-point data representation is preferable over floating-point representation in graphics applications on embedded devices where performance is of paramount importance, while the dynamic range and precision requirements are limited due to the small display sizes (current PDA's are 640 × 480 (VGA), while cell-phones are even smaller). In this paper we analyze the efficiency of a CORDIC-augmented Sandbridge processor when implementing a vertex processor in software using fixed-point arithmetic. A CORDIC-based solution for vertex processing exhibits a number of advantages over classical Multiply-and-Acumulate solutions. First, since a single primitive is used to describe the computation, the code can easily be vectorized and multithreaded, and thus fits the major Sandbridge architectural features. Second, since a CORDIC iteration consists of only a shift operation followed by an addition, the computation may be deeply pipelined. Initially, we outline the Sandbridge architecture extension which encompasses a CORDIC functional unit and the associated instructions. Then, we consider rigid-body rotation, lighting, exponentiation, vector normalization, and perspective division (which are some of the most important data-intensive 3D graphics kernels) and propose a scheme to implement them on the CORDIC-augmented Sandbridge processor. Preliminary results indicate that the performance improvement within the extended instruction set ranges from 3× to 10× (with the exception of rigid body rotation).
MVSP: multithreaded VLIW stream processor
Somayeh Sardashti, Hamid Reza Ghasemi, Omid Fatemi
Stream processing is a new trend in computer architecture design which fills the gap between inflexible special-purpose media architectures and programmable architectures with low computational ability for media processing. Stream processors are designed for computationally intensive media applications characterized by high data parallelism and producer-consumer locality with little global data reuse. In this paper, we propose a new stream processor, named MVSP1. This processor is a programmable stream processor based on Imagine [1]. MVSP exploits TLP2, DLP3, SP4 and ILP5 parallelisms inherent in media applications. Full simulator of MVSP has been implemented and several media workloads composed of EEMBC [2] benchmarks have been applied. The simulation results show the performance and functional unit utilization improvements of more than two times in comparison with Imagine processor.
System-on-chip architecture with media DSP and RISC core for media application
Systems-on-chips provide single-chip solutions in many embedded applications to meet the applications size and power requirements. Media processing such as real-time compression and decompression of video signal is now expected to be the driving force in the evolution of media processor. The MediaSoC322xA consists of two fully programmable processor cores and integrates digital video encoder. The programmable cores toward a particular class of algorithms: the MediaDSP3200 for RISC/DSP oriented functions and multimedia processing, and the RISC3200 for bit stream processing and control function. Dedicated interface units for DRAM, SDRAM, Flash, SRAM, on screen display and the digital video encoder are connected via a 32-bit system bus with the processor cores. The MediaSoC322xA is fabricated in a 0.18um 6LM standard-cell SMIC CMOS technology, occupies about 20mm2, and operates at 180MHz. The MeidaSoC322xA are used to audio/video decoder for embedded multimedia application.
IIA: a novel method to optimize media instruction set of embedded processor
To accelerate media processing, many media enhancement instructions have been adopted into the instruction set of embedded processors. In this paper, a novel method, called interaction between instructions and algorithms (IIA), is proposed to optimize these media enhancement instructions. Based on the analysis for inherent characteristics of video processing algorithms and processor's architecture, three measures are proposed: three single-cycle instructions for manipulation on bit level are implemented to speed up variable-length decoding; a data path is designed to solve data misalignment in SIMD processing instead of software programs; a memory architecture is proposed to support 128-bit word parallel processing. All these suggestions are used in the optimization of an embedded processor, MediaDSP3200 which fuses RISC architecture and DSP computation capability thoroughly and achieves reduced instruction and 64-bit SIMD instruction set with various addressing mode in a unified RISC pipeline stage architecture. Simulation results show that this optimization method can reduce more than 26.4% of clock cycles for VLD, 47.8% for IDCT and 66.8% for MC in real-time processing.
Multimedia Applications I
icon_mobile_dropdown
Wireless steganography
Modern mobile devices are some of the most technologically advanced devices that people use on a daily basis and the current trends in mobile phone technology indicate that tasks achievable by mobile devices will soon exceed our imagination. This paper undertakes a case study of the development and implementation of one of the first known steganography (data hiding) applications on a mobile device. Steganography is traditionally accomplished using the high processing speeds of desktop or notebook computers. With the introduction of mobile platform operating systems, there arises an opportunity for the users to develop and embed their own applications. We take advantage of this opportunity with the introduction of wireless steganographic algorithms. Thus we demonstrates that custom applications, popular with security establishments, can be developed also on mobile systems independent of both the mobile device manufacturer and mobile service provider. For example, this might be a very important feature if the communication is to be controlled exclusively by authorized personnel. The paper begins by reviewing the technological capabilities of modern mobile devices. Then we address a suitable development platform which is based on SymbianTM/Series60TM architecture. Finally, two data hiding applications developed for SymbianTM/Series60TM mobile phones are presented.
Multimedia Applications II
icon_mobile_dropdown
Image processing for navigation on a mobile embedded platform: Design of an autonomous mobile robot
Harald Loose, Christiane Lemke, Chavdar Papazov
This paper deals with intelligent mobile platforms connected to a camera controlled by a small hardware-platform called RCUBE. This platform is able to provide features of a typical actuator-sensor board with various inputs and outputs as well as computing power and image recognition capabilities. Several intelligent autonomous RCBUE devices can be equipped and programmed to participate in the BOSPORUS network. These components form an intelligent network for gathering sensor and image data, sensor data fusion, navigation and control of mobile platforms. The RCUBE platform provides a standalone solution for image processing, which will be explained and presented. It plays a major role for several components in a reference implementation of the BOSPORUS system. On the one hand, intelligent cameras will be positioned in the environment, analyzing the events from a fixed point of view and sharing their perceptions with other components in the system. On the other hand, image processing results will contribute to a reliable navigation of a mobile system, which is crucially important. Fixed landmarks and other objects appropriate for determining the position of a mobile system can be recognized. For navigation other methods are added, i.e. GPS calculations and odometers.
Image processing for navigation on a mobile embedded platform
Thomas Preuss, Lars Gentsch, Mark Rambow
Mobile computing devices such as PDAs or cellular phones may act as "Personal Multimedia Exchanges", but they are limited in their processing power as well as in their connectivity. Sensors as well as cellular phones and PDAs are able to gather multimedia data, e. g. images, but leak computing power to process that data on their own. Therefore, it is necessary, that these devices connect to devices with more performance, which provide e.g. image processing services. In this paper, a generic approach is presented that connects different kinds of clients with each other and allows them to interact with more powerful devices. This architecture, called BOSPORUS, represents a communication framework for dynamic peer-to-peer computing. Each peer offers and uses services in this network and communicates loosely coupled and asynchronously with the others. These features make BOSPORUS a service oriented network architecture (SONA). A mobile embedded system, which uses external services for image processing based on the BOSPORUS Framework is shown as an application of the BOSPORUS framework.
The future is 'ambient'
The research field of ambient media starts to spread rapidly and first applications for consumer homes are on the way. Ambient media is the logical continuation of research around media. Media has been evolving from old media (e.g. print media), to integrated presentation in one form (multimedia - or new media), to generating a synthetic world (virtual reality), to the natural environment is the user-interface (ambient media), and will be evolving towards real/synthetic undistinguishable media (bio-media or bio-multimedia). After the IT bubble was bursting, multimedia was lacking a vision of potential future scenarios and applications. Within this research paper the potentials, applications, and market available solutions of mobile ambient multimedia are studied. The different features of ambient mobile multimedia are manifold and include wearable computers, adaptive software, context awareness, ubiquitous computers, middleware, and wireless networks. The paper especially focuses on algorithms and methods that can be utilized to realize modern mobile ambient systems.
Multimedia Data Management
icon_mobile_dropdown
Performance analysis of MPEG-21 technologies on mobile devices
Saar De Zutter, Frederik De Keukelaere, Chris Poppe, et al.
This paper gives an introduction to technologies and methodologies to measure performance of MPEG-21 applications in mobile environments. Since resources, such as processing time, available memory, storage, network, and battery time, are very sparse on mobile devices, it is important to optimize technologies to use as little as possible of those resources. To identify possible optimization points for MPEG-21 technologies, performance measurements technologies are applied on a prototype implementation of MPEG-21 Digital Item Declaration and Digital Item Processing. The upcoming MPEG-21 its goal is providing transparent and augmented use of multimedia resources across a plethora of networks and devices. The prototype, which has been implemented on the J2ME platform, gives information about possible bottlenecks when designing MPEG-21 based applications. The results of the measurements are discussed and used to identify which improvements need to be realized to reduce memory and processor consumption when implementing the discussed parts of the MPEG-21 standards on a mobile platform. This paper ends with a discussion and concluding remarks.
TV anytime and MPEG-21 DIA based seamless content mobility prototype system for digital home applications
Munjo Kim, Chanseok Yang, Jeongyeon Lim, et al.
Much research has been made to make it possible the ubiquitous video services over various kinds of user information terminals anytime and anywhere. In this paper, we design a prototype system for the seamless TV program content consumption based on user preference via various kinds of user information terminals in digital home environment, and we show an implementation and testing results with the prototype system. The prototype system operates with the TV Anytime metadata for the consumption of TV program contents based on user preferences in TV program genres, and use the MPEG-21 DIA (Digital Item Adaptation) tools which are the representation schema formats in order to describe the context information for user environments, user terminal characteristics, user characteristics for universal access and consumption of the preferred TV program contents. The proposed content mobility prototype system supports one or more users to seamlessly consume the same TV program contents via various kinds of user terminals. The proposed content mobility prototype system consists of a home server, display TV terminals, and user information terminals. We use 42 TV programs contents in eight different genres from four different TV channels in order to test our proposed prototype system.
A mobile phone-based context-aware video management application
Janne Lahti, Marko Palola, Jari Korva, et al.
We present a video management system comprising a video server and a mobile camera-phone application called MobiCon, which allows users to capture videos, annotate them with metadata, specify digital rights management (DRM) settings, upload the videos over the cellular network, and share them with others. Once stored in the video server, users can then search their personal video collection via a web interface, and watch the video clips using a wide range of terminals. We describe the MobiCon architecture, compare it with related work, provide an overview of the video server, and illustrate a typical user scenario from the point of capture to video sharing and video searching. Our work takes steps forward in advancing the mobile camera-phone from a video playback device to a video production tool. We summarize field trial results conducted in the area of Oulu, Finland, which demonstrate that users can master the application quickly, but are unwilling to perform extensive manual annotations. Based on the user trial results and our own experience, we present future development directions for MobiCon, in particular, and the video management architecture, in general.
HCI Issues for Mobile Devices
icon_mobile_dropdown
MIKE's PET: A participant-based experiment tracking tool for HCI practitioners using mobile devices
Dean Mohamedally, Stefan Edlich, Enrico Klaus, et al.
Knowledge Elicitation (KE) methods are an integral part of Human Computer Interaction (HCI) practices. They are a key aspect to the synthesis of psychology empirical methods with requirements engineering, User Centred Design (UCD) and user evaluations. Examples of these methods include prototyping, focus groups, interviews, surveys and direct video observation. The MIKE project (Mobile Interactive Knowledge Elicitation) at the Centre for HCI Design, City University London, UK provides mobile cyberscience capabilities for HCI practitioners conducting such research while at stakeholder locations. This paper reports on the design and development of a new MIKE based tool, named PET, a Participant-based Experiment Tracking tool for HCI practitioners using Java-based (J2ME) mobile devices. PET integrates its user tracking techniques with the development of the second generation implementation of the CONKER (COllaborative Non-linear Knowledge Elicitation Repository) Web Service. We thus report further on CONKER v2.0's new capabilities developed to enable tighter collaboration and empirical data management between HCI practitioners, considering their UCD needs. The visualisation, tracking and recording of HCI participant-based datasets via PET is explored with close connectivity with the CONKER v2.0 Web Service, in order to provide mobile-web cyberscience for remote and local HCI practitioners.
Maintenance support: case study for a multimodal mobile user interface
G. Fuchs, D. Reichart, H. Schumann, et al.
Maintaining and repairing complex technical facilities such as generating plants requires comprehensive knowledge on subsystems, operational and safety procedures by the technician. Upgrades to the facility may mean that knowledge about these becomes outdated, raising the need for documentation at the working site. Today's commonplace availability of mobile devices motivates the use of digital, interactive manuals over printed ones. Such applications should provide high-quality illustrations and interaction techniques tailored for specific tasks, while at the same time allow flexible deployment of these components on a multitude of (mobile) hardware platforms. This includes the integration of multimodal interaction facilities like speech recognition into the user interface. To meet these demands, we propose a model-based approach that combines task, object and dialog models to specify platform-independent user interfaces. New concepts like relating tasks to domain objects and dialog views allow us to generate abstract canonical prototypes. Another focus is on the necessary adaptation of visual representations to the platform capabilities to remain effective and adequate, requiring tight coupling of the underlying model, the visualization, and alternative input/output modes. The above aspects have been addressed in a prototype for air-condition unit maintenance, presented on the CeBIT 2005 fair.
Breaking the news on mobile TV: user requirements of a popular mobile content
Hendrik O. Knoche, M. Angela Sasse
This paper presents the results from three lab-based studies that investigated different ways of delivering Mobile TV News by measuring user responses to different encoding bitrates, image resolutions and text quality. All studies were carried out with participants watching News content on mobile devices, with a total of 216 participants rating the acceptability of the viewing experience. Study 1 compared the acceptability of a 15-second video clip at different video and audio encoding bit rates on a 3G phone at a resolution of 176x144 and an iPAQ PDA (240x180). Study 2 measured the acceptability of video quality of full feature news clips of 2.5 minutes which were recorded from broadcast TV, encoded at resolutions ranging from 120x90 to 240x180, and combined with different encoding bit rates and audio qualities presented on an iPAQ. Study 3 improved the legibility of the text included in the video simulating a separate text delivery. The acceptability of News' video quality was greatly reduced at a resolution of 120x90. The legibility of text was a decisive factor in the participants' assessment of the video quality. Resolutions of 168x126 and higher were substantially more acceptable when they were accompanied by optimized high quality text compared to proportionally scaled inline text. When accompanied by high quality text TV news clips were acceptable to the vast majority of participants at resolutions as small as 168x126 for video encoding bitrates of 160kbps and higher. Service designers and operators can apply this knowledge to design a cost-effective mobile TV experience.
Multimodal audio guide for museums and exhibitions
In our paper we introduce a new Audio Guide concept for exploring buildings, realms and exhibitions. Actual proposed solutions work in most cases with pre-defined devices, which users have to buy or borrow. These systems often go along with complex technical installations and require a great degree of user training for device handling. Furthermore, the activation of audio commentary related to the exhibition objects is typically based on additional components like infrared, radio frequency or GPS technology. Beside the necessity of installation of specific devices for user location, these approaches often only support automatic activation with no or limited user interaction. Therefore, elaboration of alternative concepts appears worthwhile. Motivated by these aspects, we introduce a new concept based on usage of the visitor's own mobile smart phone. The advantages in our approach are twofold: firstly the Audio Guide can be used in various places without any purchase and extensive installation of additional components in or around the exhibition object. Secondly, the visitors can experience the exhibition on individual tours only by uploading the Audio Guide at a single point of entry, the Audio Guide Service Counter, and keeping it on her or his personal device. Furthermore, since the user usually is quite familiar with the interface of her or his phone and can thus interact with the application device easily. Our technical concept makes use of two general ideas for location detection and activation. Firstly, we suggest an enhanced interactive number based activation by exploiting the visual capabilities of modern smart phones and secondly we outline an active digital audio watermarking approach, where information about objects are transmitted via an analog audio channel.
Human sound detection on experience movies
Shogo Shimura, Yasushi Hirano, Shoji Kajita, et al.
In this paper, we describe an indexing method for video lifelog using sounds generated by human actions. The miniaturization of information-processing devices has enabled us to constantly record experience movies recently. However, these movies include many useless parts for a user. It is important to automatically extract only useful experiences from whole records. If a tool to easily add indices to important experiences is provided, users can mark these ones. Even though it is easy to add indices with some device, considering users' loads, it is undesirable for users to wear devices other than a microphone and a video camera that are needed to record experiences. Therefore, we propose a method that users can add indices using sounds which can be generated by using a part of users' body. We especially analyzed typical sounds like hand clapping and finger clicking sounds that users can generate themselves. A detection method of two index-sounds was developed. We performed an experiment to confirm the recall ratio and relevance ratio of two index-sounds. A wearable system was worn by a subject and experiences were recorded for ten days (about 100 hours). The proposed detection method was applied to recorded data, and two index-sounds were detected with the recall ratio 86.0% and the relevance ratio 83.6%.
Poster Session
icon_mobile_dropdown
A FGS coding method based on CL multiwavelet transform
Multiwavelet transform has important traits in image processing such as symmetric, orthogonal, smoothness, and short support. Theoretically speaking, multiwavelet should be a more satisfying tool for image coding. In this paper, we made deep research in the application of multiwavelets in scalable coding and proposed a new transmission-oriented image coding method, that is Fine Granularity scalable coding method based on CL multiwavelet transform. The experimental results show that the proposed coding method is effective. It is a valuable work on the application of multiwavelet transform in image coding.
A context-aware video display scheme for mobile devices
A fully automatic and computationally efficient method is proposed for intelligent display of soccer video on small multimedia mobile devices. The rapid progress of the multimedia signal processing has contributed to the extensive use of multimedia devices with small LCD panel. With these flourishing mobile devices with small display, the video sequences captured for normal viewing on standard TV or HDTV may give the small-display-viewers uncomfortable experiences in understanding what is happening in a scene. For instance, in a soccer video sequence taken by a longshot camera technique, the tiny objects (e.g., soccer ball and players) may not be clearly viewed on the small LCD panel. Thus, an intelligent display technique is needed for small-display-viewers. To this end, one of the key technologies is to determine region of interest, which is a part of the scene that viewers pay more attention to than other regions. In this paper, we only focus on soccer video display for mobile devices. Instead of taking visual saliency into account, we take domain-specific approach to exploit the characteristics of the soccer video. We propose a context-aware soccer video display scheme, which includes three folds: ground color learning, shot classification, and ROI determination. The experimental results show the propose scheme is capable of context-aware video display on mobile devices.
Verification of WIPI-based T-DMB platform for interactive mobile multimedia services
Byungjun Bae, Woosuk Kim, Joungil Yun, et al.
This paper presents the architecture of the interactive terrestrial digital multimedia broadcasting (T-DMB) system and the integrated receiver using the code division multiple access (CDMA) network. And a novel architecture of WIPI-based T-DMB platform will be proposed. The proposed extended WIPI platform is implemented as an emulator, and an integrated browser which can show multimedia data with interactive data contents at a time developed to be carried on it. To verify the proposed platform and the interactive mobile broadcasting services, we implemented the integrated receiver as well as the data broadcasting server and the return channel server. Users can get interactive services or more information about broadcasting through CDMA network by using the proposed platform. This shows a possibility of new interactive services in the digital broadcasting by using the mobility of T-DMB and CDMA.
New TPEG applications based on digital multimedia broadcasting
Youngho Jeong, Sammo Cho, Geon Kim, et al.
The digital multimedia broadcasting (DMB) allows can provide an economical way of massive multimedia data services, thus, it is emerging as an optimal solution to address the several drawbacks of the mobile communication network. Among the data services based on the DMB, the traffic and travel Information (TTI) service has been spotlighted in aspects of economic influence and information usability. For example, the demand of TTI service has dramatically increased due to rapid increase in the number of vehicles and population of frequently traveling on long-weekends. In order to satisfying such a demanding, a transport protocol experts group (TPEG) protocol was developed. However, TPEG has been applied merely to two application areas up to now and there is no consideration to standardize new services very valuable for drivers and travelers. This paper presents new economic and effective point of interest (POI) and news services compatible to TPEG that can be provided in the DMB. To verify the stability of POI and news protocols, we implemented the TTI data server/client, and the DMB/DAB receiver with the TTI decoder. Through the experimental broadcasting in the local DMB network, it is shown that the proposed protocol operates stably and effectively in the navigation system.
An effective method and its implementation for splicing in terrestrial DMB
Yonghoon Lee, Jinhwan Lee, Gwangsoon Lee, et al.
The implemented T-DMB Splicing System provides multimedia service without any discontinuity of video and audio when inserting commercial and specific program while transmitting main DMB broadcasting program. And it can be used for inserting local broadcasting program while retransmitting central broadcasting program. This paper introduces Terrestrial Digital Multimedia Broadcasting (T-DMB) splicing method based on Eureka-147 DAB and presents a new architecture of transmission system for T-DMB Splicing.
Media digital signal processor core design for multimedia application
Peng Liu, Guo-jun Yu, Wei-guang Cai, et al.
An embedded single media processor named MediaDSP3200 core fabricated in a six-layer metal 0.18um CMOS process which implemented the RISC instruction set, DSP data processing instruction set and single-instruction-multiple-data (SIMD) multimedia-enhanced instruction set is described. MediaDSP3200 fuses RISC architecture and DSP computation capability thoroughly, which achieves RISC fundamental, DSP extended and single instruction multiple data (SIMD) instruction set with various addressing modes in a unified pipeline stage architecture. These characteristics enhance system digital signal processing performance greatly. The test processor can achieve 32x32-bit multiply-accumulate (MAC) of 320 MOPS, with 16x16-bit MAC of 1280MOPS. The test processor dissipates 600mW at 1.8v, 320MHz. Also, the implementation was primarily standard cell logic design style. MediaDSP3200 targets diverse embedded application systems, which need both powerful processing/control capability and low-cost budget, e.g. set-top-boxes, video conferencing, DTV, etc. MediaDSP3200 instruction set architecture, addressing mode, pipeline design, SIMD feature, split-ALU and MAC are described in this paper. Finally, the performance benchmark based on H.264 and MPEG decoder algorithm are given in this paper.