Proceedings Volume 6821

Multimedia on Mobile Devices 2008

cover
Proceedings Volume 6821

Multimedia on Mobile Devices 2008

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 12 March 2008
Contents: 10 Sessions, 25 Papers, 0 Presentations
Conference: Electronic Imaging 2008
Volume Number: 6821

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 6821
  • Multimedia Applications
  • Video Coding
  • Invited Paper I
  • Media Processing
  • Multimedia Content Protection
  • Systems for Multimedia
  • HCI Issues on Multimedia I
  • HCI Issues on Multimedia II
  • Poster Session
Front Matter: Volume 6821
icon_mobile_dropdown
Front Matter: Volume 6821
This PDF file contains the front matter associated with SPIE Proceedings Volume 6821, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Multimedia Applications
icon_mobile_dropdown
Real-time scalable visual analysis on mobile devices
Interactive visual presentation of information can help an analyst gain faster and better insight from data. When combined with situational or context information, visualization on mobile devices is invaluable to in-field responders and investigators. However, several challenges are posed by the form-factor of mobile devices in developing such systems. In this paper, we classify these challenges into two broad categories - issues in general mobile computing and issues specific to visual analysis on mobile devices. Using NetworkVis and Infostar as example systems, we illustrate some of the techniques that we employed to overcome many of the identified challenges. NetworkVis is an OpenVG-based real-time network monitoring and visualization system developed for Windows Mobile devices. Infostar is a flash-based interactive, real-time visualization application intended to provide attendees access to conference information. Linked time-synchronous visualization, stylus/button-based interactivity, vector graphics, overview-context techniques, details-on-demand and statistical information display are some of the highlights of these applications.
REST based mobile applications
Mark Rambow, Thomas Preuss, Jörg Berdux, et al.
Simplicity is the major advantage of REST based webservices. Whereas SOAP is widespread in complex, security sensitive business-to-business aplications, REST is widely used for mashups and end-user centric applicatons. In that context we give an overview of REST and compare it to SOAP. Furthermore we apply the GeoDrawing application as an example for REST based mobile applications and emphasize on pros and cons for the use of REST in mobile application scenarios.
Open source OCR framework using mobile devices
Steven Zhiying Zhou, Syed Omer Gilani, Stefan Winkler
Mobile phones have evolved from passive one-to-one communication device to powerful handheld computing device. Today most new mobile phones are capable of capturing images, recording video, and browsing internet and do much more. Exciting new social applications are emerging on mobile landscape, like, business card readers, sing detectors and translators. These applications help people quickly gather the information in digital format and interpret them without the need of carrying laptops or tablet PCs. However with all these advancements we find very few open source software available for mobile phones. For instance currently there are many open source OCR engines for desktop platform but, to our knowledge, none are available on mobile platform. Keeping this in perspective we propose a complete text detection and recognition system with speech synthesis ability, using existing desktop technology. In this work we developed a complete OCR framework with subsystems from open source desktop community. This includes a popular open source OCR engine named Tesseract for text detection & recognition and Flite speech synthesis module, for adding text-to-speech ability.
Video Coding
icon_mobile_dropdown
A high-level simulator for the H.264/AVC decoding process in multi-core systems
Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, et al.
ABSTRACT H.264 as a new-generation video coding algorithm is becoming increasingly important for international broadcasting standards such as DVB-H and DMB. In comparison to its predecessors MPEG-2 and MEPG-4 SP/ASP, H.264 achieves improved compression effciency at the cost of increased computational complexity. Real-time execution of the H.264 decoding process poses a large challenge on mobile devices due to low processing capabilities. Multi-core systems provide an elegant and power-effcient solution to overcome this performance limitation. However, effciently distributing the video algorithm among multiple processing units is a non-trivial task. It requires detailed knowledge about the algorithmic complexity, dynamic variations and inter-dependencies between functional blocks. The objective of this paper is an investigation on the dynamic behavior of the H.264 decoding process and on the interaction between the main decoding tasks in the context of multi-core environments. We use an H.264 decoder model to investigate the effciency of a decoding system under various conditions (e.g. different FIFO buffer sizes, bitstreams, coding features and bitrates). The gained insights are finally used to optimize the runtime behavior of a multi-core decoding system and to find a good trade-off between core usage and buffer sizes.
Human visual system based adaptive inter quantization
Block effect is one of the most annoying artifacts in digital video processing and is especially visible in low-bitrate applications, such as mobile video. To alleviate this problem, we propose an adaptive quantization method for inter frames that can reduce visible block effect in DCT-based video coding. In the proposed method, a set of quantization matrices are constructed before processing the video data. Matrices are constructed by exploiting the temporal frequency limitations of human visual system. The method is adaptive to motion information and is able to select an appropriate quantization matrix for each inter-coded block. Based on the experimental results, the proposed scheme can achieve better subjective video quality compared to conventional flat quantization especially at low-bitrate application. Moreover, it does not introduce extra computational cost in software implementation. This method does not change standard bitstream syntax, so it can be directly applied to many DCT-based video codecs. A potential application could be for mobile phone and other digital devices with low-bitrate requirement.
Wyner-Ziv video coding based on a new hierarchical block matching algorithm
Rong Ke Liu, Hong Bo Zhao, Zhi Yue
Distributed video coding (DVC) is a new video coding paradigm that shifts the complexity from the encoder side to the decoder side. One particular case of DVC, the Wyner-Ziv coding scheme, encodes each video frame separately and decodes the video sequence jointly with side information. This paper presents a new Wyner-Ziv video coding scheme based on hierarchical block matching algorithm (HBMA). In this proposed scheme, the side information is greatly refined to assist the reconstruction of the Wyner-Ziv frames. The bidirectional motion estimation and the forward motion estimation are associated to generate the interpolated frame from temporally adjacent key frames to attain the high fidelity side information. During the bidirectional motion estimation, the size of the block and the search area vary at different levels of hierarchy. In additional, the motion vectors are inherited from big blocks to small blocks by choosing the smallest mean-of-the-absolute-difference value among neighboring blocks. Preliminary experiment results show that the proposed scheme can achieve better rate-distortion performance by 0.5-1 dB compared to the existing Wyner-Ziv video coding with the slightly increased decoding complexity.
Invited Paper I
icon_mobile_dropdown
New video applications on mobile communication devices
The video applications on mobile communication devices have usually been designed for content creation, access, and playback. For instance, many recent mobile devices replicate the functionalities of portable video cameras and video recorders, and digital TV receivers. These are all demanding uses, but nothing new from the consumer point of view. However, many of the current devices have two cameras built in, one for capturing high resolution images, and the other for lower, typically VGA (640x480 pixels) resolution video telephony. We employ video to enable new applications and describe four actual solutions implemented on mobile communication devices. The first one is a real-time motion based user interface that can be used for browsing large images or documents such as maps on small screens. The motion information is extracted from the image sequence captured by the camera. The second solution is a real-time panorama builder, while the third one assembles document panoramas, both from individual video frames. The fourth solution is a real-time face and eye detector. It provides another type of foundation for motion based user interfaces as knowledge of presence and motion of a human faces in the view of the camera can be a powerful application enabler.
Media Processing
icon_mobile_dropdown
Non-photorealistic rendering for energy conservation
As images are increasingly used in wireless communication, such as mobile phones and PDAs, it is important to reduce the energy consumption for transmitting and receiving images. The energy is approximately proportional to the sizes (numbers of bytes) of the images. Many existing techniques aim to improve compression ratios while preserving the image fidelity without perceivable differences. In this paper, we propose a new approach by allowing visible distortion in the images. Our method eliminates or reduces the fine details (textures) so that new images have smaller file sizes and require less energy to transmit or receive. Even though new images may be visually different, the essential information is preserved. Our experiment uses 400 images and achieves up to 40.1% and average 32.2% reduction in file sizes.
An image registration technique aimed at super resolution on mobile devices
Mihail Georgiev, Ilian Todorov, Atanas Boev, et al.
We propose an image registration technique to be implemented on mobile devices equipped with cameras. We address the limited computational power and low-quality optics of such devices and aim at designing a registration algorithm, which is fast, robust with respect to noise, and allows for corrections of optical distortions. We favor a feature-based approach, consisting of feature extraction, feature filtering, feature matching, and transformation estimation. In our application, the transformation estimation is robust to local distortions, and is accurate enough to allow for a subsequent super-resolution on the registered images. The performance of the technique is demonstrated in fixed-point implementation on the TMS 320 C5510 DSP.
Selective frame dropping based on hypothetical reference decoder buffer model for initial buffering delay reduction
We propose a method for selective frame dropping based on hypothetical reference decoder buffer model for initial buffering delay reduction. The client side buffering consists of two logical buffers: a de-jitter buffer and a pre-decoder buffer. To playback an encoded bit-stream without underflow the client must do a minimum initial buffering. This minimum initial buffering is a property of the bit-stream. The minimum initial buffering relates to the pre-decoder buffer. In addition the client can do additional initial buffering to handle network jitter and other bandwidth variations. Our proposed approach relates to reducing the minimum initial buffering delay for an already encoded bit-stream. We propose a method for selectively dropping frames to reduce the amount of initial buffering the client needs to do to avoid underflow during the streaming. Our proposed method is especially applicable to pre-stored content. The method is also particularly useful for variable bit-rate (VBR) encoded media. The method can be used by a streaming server. Alternatively the method can be implemented by a trans-rater/ transcoder. In a preferred embodiment our method can be applied in advance on a pre-stored bit-stream to decide which frames to drop to reduce the required minimum initial buffering.
Multimedia Content Protection
icon_mobile_dropdown
Digital watermarking in parametric slant transform domain
Jiong Xie, Sos Agaian, Joseph Noonan
In this paper, we introduce a novel signal-dependent parameterization of Slant-Hadamard transform. A secure watermarking scheme in transform domain is proposed based on this new orthonormal transform. Then watermarking performance is analyzed and discussed if we model the watermarking scheme as spread spectrum communication problem. We also propose an optimal parameters selection algorithm and show the robustness performance of the method through simulations that compare with watermarking approaches in other orthogonal domains in the presence of lossy compression.
Systems for Multimedia
icon_mobile_dropdown
Context adaptive binary arithmetic decoding on transport triggered architecture
Joona Rouvinen, Pekka Jääskeläinen, Tero Rintaluoma, et al.
Video coding standards, such as MPEG-4, H.264, and VC1, define hybrid transform based block motion compensated techniques that employ almost the same coding tools. This observation has been a foundation for defining the MPEG Reconfigurable Multimedia Coding framework that targets to facilitate multi-format codec design. The idea is to send a description of the codec with the bit stream, and to reconfigure the coding tools accordingly on-the-fly. This kind of approach favors software solutions, and is a substantial challenge for the implementers of mobile multimedia devices that aim at high energy efficiency. In particularly as high definition formats are about to be required from mobile multimedia devices, variable length decoders are becoming a serious bottleneck. Even at current moderate mobile video bitrates software based variable length decoders swallow a major portion of the resources of a mobile processor. In this paper we present a Transport Triggered Architecture (TTA) based programmable implementation for Context Adaptive Binary Arithmetic de-Coding (CABAC) that is used e.g. in the main profile of H.264 and in JPEG2000. The solution can be used even for other variable length codes.
The Rosetta phone: a hand-held device for automatic translation of signs in natural images
When traveling in a region where the local language is not written using the Roman alphabet, translating written text (e.g., documents, road signs, or placards) is a particularly difficult problem since the text cannot be easily entered into a translation device or searched using a dictionary. To address this problem, we are developing the "Rosetta Phone," a handheld device (e.g., PDA or mobile telephone) capable of acquiring a picture of the text, identifying the text within the image, and producing both an audible and a visual English interpretation of the text. We started with English, as a developement language, for which we achieved close to 100% accuracy in identifying and reading text. We then modified the system to be able to read and translate words written using the Arabic character set. We currently achieve approximately 95% accuracy in reading words from a small directory of town names.
Software-only implementation of DBV-H
Daniel Iancu, Hua Ye, John Glossner, et al.
In this paper, we present the system and software implementation of the Digital Video Broadcasting protocol for hand held applications (DVB-H), on the Sandbridge Technology's multithreaded digital signal processor SB3011. The I and Q base-band analog output signals from the tuner are digitized, filtered and further processed conforming to ETSI EN 302 304 V1.1.1 (2004-06). All processing blocks including the receiver synchronization and forward error correction are executed entirely in software. At 1.5 Mbps the processor usage is less than 40% with maximum power consumption of 120mW.
Energy efficiency analysis of multi-stream MPEG-4 decoder systems
Sébastien Lafond, Jani Boutellier, Johan Lilius, et al.
This paper presents a comparison of two systems that can simultaneously decode multiple videos on a simple CPU and dedicated function-level hardware accelerators. The first system is implemented in a traditional way, such that the decoder instances access the accelerators concurrently without external coordination. The second system implementation coordinates the tasks' accelerator accesses by scheduling. The solutions are compared by execution cycles, energy consumption and cache hit ratios. In the traditional solution each decoder task continuously requests access to the needed hardware accelerators. However, since the other tasks are competing on the same resources, the tasks must often yield and wait for their turn, which reduces the energy-effciency. The scheduling-based approach assumes that the accelerator latencies are deterministic and assigns time slots for accelerator accesses required by each task. The accelerator access schedule is re-designed for each macroblock at run-time, thus avoiding the over-allocation of resources and improving energy-effciency. Deterministic accelerator latencies ensue that the CPU is not interrupted when an accelerator finishes. The contribution of this study is the comparison of the accelerator timing solution against the traditional approach.
A mobile video surveillance system with intelligent object analysis
Yuan-Kai Wang, Li-Ya Wang, Yung-Hsiang Hu
A mobile video surveillance system is a video surveillance system adopts mobile clients to visualize surveillance videos over mobile networks. However, mobile networks and mobile clients have limited computational and network resources. The system combines moving object detection and video transcoding techniques to help users monitor remote site through video streaming over 3G communication networks. The moving object detection and tracking can skim off useful video clips. The communication networking services, comprising video transcoding, short text messaging, and mobile video streaming, transmit surveillance information into mobile appliances. Moving object detection is achieved by background subtraction of adaptive Gaussian mixture modeling, and particle filter tracking. A spatial-domain cascaded transcoder is developed to convert the filtered image sequence of detected objects into 3GPP video streaming format. Experimental results show that the system can successfully detect all events of moving objects for a complex surveillance scene, and the transcoder has high PSNR.
HCI Issues on Multimedia I
icon_mobile_dropdown
Performance analysis of visual tracking algorithms for motion-based user interfaces on mobile devices
Stefan Winkler, Karthik Rangaswamy, Jefry Tedjokusumo, et al.
Determining the self-motion of a camera is useful for many applications. A number of visual motion-tracking algorithms have been developed till date, each with their own advantages and restrictions. Some of them have also made their foray into the mobile world, powering augmented reality-based applications on phones with inbuilt cameras. In this paper, we compare the performances of three feature or landmark-guided motion tracking algorithms, namely marker-based tracking with MXRToolkit, face tracking based on CamShift, and MonoSLAM. We analyze and compare the complexity, accuracy, sensitivity, robustness and restrictions of each of the above methods. Our performance tests are conducted over two stages: The first stage of testing uses video sequences created with simulated camera movements along the six degrees of freedom in order to compare accuracy in tracking, while the second stage analyzes the robustness of the algorithms by testing for manipulative factors like image scaling and frame-skipping.
Profiles of the evaluators: impact of psychographic variables on the consumer-oriented quality assessment of mobile television
In the product development of services it is important to adjust mobile video quality according to the quality requirements of potential users. Therefore, a careful participant selection is very important. However, in the literature the details of participant selection are often handled without great detail. This is also reflected in the handling of experimental results, where the impact of psychographic factors on quality is rarely reported. As the user attributes potentially have a large effect to the results, we investigated the role of various psychographical variables on the subjective evaluation of audiovisual video quality in two different experiments. The studied variables were age, gender, education, professionalism, television consumption, experiences of different digital video qualities, and attitude towards technology. The results showed that quality evaluations were affected by almost all background factors. The most significant variables were age, professionalism, knowledge of digital quality features and attitude towards technology. The knowledge of these factors can be exploited in careful participant selection, which will in turn increase the validity of results as the subjective evaluations reflect better the requirements of potential users.
An infrastructure to manage errors and originalities in mobile multimedia development
Daniel Oltmanns, Henrik Hörning, Reidar Hörning, et al.
This paper describes a portal that communicates with mobile devices to gather device data. The idea is to transmit features, characteristics and errors into some kind of middleware - in our case the Bugfinder portal. This has two advantages: First this information can be looked up using a normal web browser. But secondly this information should be available where the development takes place. So Eclipse is one target place that should be supported by a Bugfinder Eclipse Plug-In. In this way the developer can get information and instant code-hints during the development. We present the first approach for this infrastructure that includes a web framework (ajajajava.org) which is ready to use.
HCI Issues on Multimedia II
icon_mobile_dropdown
Personalized summarization using user preference for m-learning
Sihyoung Lee, Seungji Yang, Yong Man Ro, et al.
As the Internet and multimedia technology is becoming advanced, the number of digital multimedia contents is also becoming abundant in learning area. In order to facilitate the access of digital knowledge and to meet the need of a lifelong learning, e-learning could be the helpful alternative way to the conventional learning paradigms. E-learning is known as a unifying term to express online, web-based and technology-delivered learning. Mobile-learning (m-learning) is defined as e-learning through mobile devices using wireless transmission. In a survey, more than half of the people remarked that the re-consumption was one of the convenient features in e-learning. However, it is not easy to find user's preferred segmentation from a full version of lengthy e-learning content. Especially in m-learning, a content-summarization method is strongly required because mobile devices are limited to low processing power and battery capacity. In this paper, we propose a new user preference model for re-consumption to construct personalized summarization for re-consumption. The user preference for re-consumption is modeled based on user actions with statistical model. Based on the user preference model for re-consumption with personalized user actions, our method discriminates preferred parts over the entire content. Experimental results demonstrated successful personalized summarization.
Poster Session
icon_mobile_dropdown
Video contents authoring system for efficient consumption on portable multimedia device
Hyun-Seok Min, Sung Ho Jin, Young Bok Lee, et al.
In a mobile consumption environment, users not only desire to preview video contents with highlights, but also desire to consume attractive segments of the video rather than the whole video. Thus, condensed representation of video contents which can represent the whole video content and video structure is demanded. In this paper, we propose a video content authoring system allowing content authors to filter the video structure and to compose contents and metadata efficiently and effectively. The proposed authoring system consists of two modules: video analyzer and metadata generator. A video analyzer detects shot boundaries and scenes and establishes temporal segmentation metadata including shot and scene boundary information. The shot detection adopts adaptive thresholding with different multiple windows to segment the raw video into shots. The segmented shots are grouped and merged depending on similarity between adjacent shots. In order to minimize the consumption time of the shot clustering, we apply a span as a computation unit, which is defined as aggression of successive shots. A metadata generator allows authors to edit the video metadata in addition to temporal segmentation metadata which is detected by a video analyzer. The video metadata supports hierarchical representation of individual shot and scene.
Reducing the overheads of hardware acceleration through datapath integration
Pekka Jääskeläinen, Heikki Kultala, Teemu Pitkänen, et al.
Hardware accelerators are used to speed up execution of specific tasks such as video coding. Often the purpose of hardware acceleration is to be able to use a cheaper or, for example, more energy economical processor for executing the majority of the application in software. However, when using hardware acceleration, new overheads are produced mainly due to the need to transfer data to and from the accelerator and signaling the readiness of the accelerator computation to the processor. We find the traditional mechanisms suboptimal for fine-grain hardware acceleration, especially when energy efficiency is important. This paper explores a technique unique to Transport Triggered Architectures to interface with hardware accelerators. The proposed technique places hardware accelerators to the processor data path, making them visible as regular function units to the programmer. This way communication costs are reduced as data can be transferred directly to the accelerator from other processor data path components and synchronization can be done by polling a simple ready flag in the accelerator function unit. Additionally, this setup enables the instruction scheduler of the compiler to schedule the hardware accelerator like any other operation, thus partially hide its latency with other program operations. The paper presents a case study with an audio decoder application in which fine-grain and coarse-grain hardware accelerators are integrated to the processor data path as function units. The case is used to study several different synchronization, communication, and latency-hiding techniques enabled by this kind of setup.
Flexible management of shared resources on multiprocessor system on chip
Antti Rasmus, Ari Kulmala, Erno Salminen, et al.
This paper presents a new method for run-time management of shared processing resources in multiprocessor systems on chip. A centralized resource manager unit performs dynamic allocation of shared processing resources according to the system state and given constraints. It implements a hardware mutual exclusion so that no inter-processor synchronization is required for accessing the resources. Moreover, it supports dynamic power management. In addition, a hardware implementation of the resource manager is proposed. In a case study, a resource manager is evaluated in a data-parallel MPEG-4 video encoder on multiprocessor system on chip on FPGA. The RM eases the design of six different architectures featuring two to twelve shared hardware accelerators. Only a few accelerators are required for the best performance as the accesses are efficiently scheduled.
Automatic Bluetooth testing for mobile multi-user applications
Dennis Luck, Henrik Hörning, Stefan Edlich
In this paper we present a simple approach for the development of multiuser and multimedia applications based on Bluetooth. One main obstacle for Bluetooth synchronization of mobile applications is the lack of a complete specification implementation. Nowadays these applications must be on market as fast as possible. Hence, developers must be able to test several dozens of mobile devices for their Bluetooth capability. And surprisingly, the capabilities differ not only between the Bluetooth specification 1.0 and 2.0. The current development was triggered by the development of mass applications as mobile multiuser games (e.g. Tetris). Our Application can be distributed on several mobile phones. If started, the Bluetooth applications try to connect each other and automatically start to detect device capabilities. These capabilities will be gathered and distributed to a server. The server performs statistical investigations and aggregates them to be presented as a report. The result is a faster development regarding mobile communications.