A novel multiple description scalable coding scheme for mobile wireless video transmission
Author(s):
Haifeng Zheng;
Lun Yu;
Chang Wen Chen
Show Abstract
We proposed in this paper a new multiple description scalable coding (MDSC) scheme based on in-band motion compensation temporal filtering (IBMCTF) technique in order to achieve high video coding performance and robust video transmission. The input video sequence is first split into equal-sized groups of frames (GOF). Within a GOF, each frame is hierarchically decomposed by discrete wavelet transform. Since there is a direct relationship between wavelet coefficients and what they represent in the image content after wavelet decomposition, we are able to reorganize the spatial orientation trees to generate multiple bit-streams. We have shown that multiple bit-stream transmission is very effective in combating error propagation in both Internet video streaming and mobile wireless video. We adopt the IBMCTF in our MDSC scheme to take full advantage of its good performance with flexible scalability, particularly at low spatial resolution. To achieve high coding performance for each bit-stream, we apply in-band motion estimation and motion compensated temporal filtering using over-complete DWT of each frame. Furthermore, we adopt bi-directional motion estimation between different bit-streams so as to facilitate the error concealment at the decoder to guarantee robust video transmission over error prone channels. Unlike traditional multiple description schemes, the integration of these techniques enable us to generate more than two bit-streams that may be more appropriate for multiple antenna transmission of compressed video. The integration also provides flexible tradeoff between coding efficiency and error resilience. Preliminary results on several standard video sequences have confirmed the high coding performance and flexible scalability.
Simplified three-dimensional discrete cosine transform-based video codec
Author(s):
Jari J. Koivusaari;
Jarmo H. Takala
Show Abstract
In this paper, a simplified three-dimensional discrete cosine transform (3D DCT) based video codec is proposed. The computational complexity of the baseline 3D DCT based video codec is reduced by simplifying the transformation block. In video sequences with low motion activity, consecutive images are highly correlated in the temporal dimension, thus the DCT does not usually produce significant coefficient values to the higher temporal frequencies. Therefore, we have a possibility to use a simple averaging operation and the 2D DCT, instead of the full 3D DCT operation, for some of the cubes in processing. Furthermore, some of the resulting cubes could be combined together to achieve more efficient binary representation.
Based on our results, simplifications considerably improved compression efficiency of the 3D DCT based codec for video sequences with low motion activity. In addition, the compression efficiency for video sequences with high motion activity was maintained. At the same time, the coding speed of the simplified 3D DCT based video codec was increased from the original. Although the compression efficiency of H.263 video codec was not reached, the encoding speed of the 3D DCT based video encoder was many times faster than the encoding speed of the H.263.
Visualization of medical images over mobile wireless handheld devices
Author(s):
Min Wu;
Khurana Ashish;
Nadar Mariappan;
Chang Wen Chen
Show Abstract
With the novel advances in wireless communication and personal mobile handheld devices, a newly emerging technology of medical visualization on mobile handheld is believed to provide advance service for physicians, especially in image-based diagnosis. In this paper, we have implemented an easy-to-use medical visualization system on mobile handheld device through WLAN. The system provides physicians a very convenient way to interactively access image data using their Pocket PC without the restriction of staying in the fixed location. System architecture, technical problems and solutions are discussed. Because of the huge gap between image server and Pocket PC client on processing power, the Pocket PC client used only for display image and interaction editor to change visualization parameters. Most render tasks are moved to the server. Since the wireless bandwidth is limited, we adopted a simple image compression scheme to achieve best trade-off between computational complexity and compression efficiency. A RS code is employed for forward error coding on UDP socket connection. Experimental results are clear enough to show this system to be practical for clinic diagnosis. The frame rate for lossless 256x256 24 bits color images transmission could reach 5 fps under 802.11b. We believe the system will provide advance service for the doctors.
Video frame rate conversion for mobile devices
Author(s):
Wei Pien Lee;
Harm J. W. Belt;
Erik van der Tol
Show Abstract
These years the growing demand for innovative mobile devices has been a major driving force behind mobile multi-media applications. Essential for further growth is a wide range of multi-media content available through the networks either sent by mobile devices or network servers. Limitations on network bandwidth and storage capacity have a large impact on the quality of video.
One important quality issue concerns the low video frame rate, typically between 5 and 15 frames per second. Such low frame rates lead to undesired artefacts like motion judder or blur. A typical refresh rate of a mobile display is 60Hz. Currently a frame memory is used as a buffer between the software application and the display to deal with the frequency mismatch, which can be considered as frame repetition.
In this paper, frame rate conversion is considered to improve motion portrayal on mobile devices. Two motion estimators are discussed along with simplifications to comply with limited platform resources, and with refinements for acceptable quality. Two frame interpolators and their relationships with the motion estimators are treated. The proposed frame rate converter enhances the quality of portrayed motion significantly, while required platform resources are kept minimal by exploiting typical properties of mobile displays.
Motion estimation algorithms based on complex halfband filters for OMAP platform
Author(s):
Chavdar D. Kalchev;
Atanas R. Boev;
Atanas P. Gotchev;
Karen O. Egiazarian;
Jaakko T. Astola
Show Abstract
Motion estimation (ME) is the most time consuming part in contemporary video compression algorithms and standards. In recent years, certain transform domain "phase-correlation" ME algorithms based on Complex-valued Wavelet Transforms have been developed to achieve lower complexity than the previous approaches.
In the present paper we describe an implementation of the basic phase-correlation ME techniques on a fixed-point dual-core processor architecture such as the TI OMAP one. We aim at achieving low computational complexity and algorithm stability without affecting the estimation accuracy.
The first stage of our ME algorithm is a multiscale complex-valued transform based on all-pass filters. We have developed wave digital filter (WDF) structures to ensure better performance and higher robustness in fixed-point arithmetic environments. For higher efficiency the structures utilize some of the dedicated filtering instructions present in the 'C5510 DSP part of the dual-core processor.
The calculation of motion vectors is performed using maximum phase-correlation criteria. Minimum subband squared difference is estimated for every subband level of the decomposition. To minimize the number of real-time computations we have adapted this algorithm to the functionality of the hardware extensions present in the 'C5510.
We consider our approach quite promising for realizing video coding standards on mobile devices, as many of them utilize fixed-point DSP architectures.
MPEG-4-based 2D facial animation for mobile devices
Author(s):
Thomas B. Riegel
Show Abstract
The enormous spread of mobile computing devices (e.g. PDA, cellular phone, palmtop, etc.) emphasizes scalable applications, since users like to run their favorite programs on the terminal they operate at that moment. Therefore appliances are of interest, which can be adapted to the hardware realities without loosing a lot of their functionalities. A good example for this is "Facial Animation," which offers an interesting way to achieve such "scalability." By employing MPEG-4, which provides an own profile for facial animation, a solution for low power terminals including mobile phones is demonstrated. From the generic 3D MPEG-4 face a specific 2D head model is derived, which consists primarily of a portrait image superposed by a suited warping mesh and adapted 2D animation rules. Thus the animation process of MPEG-4 need not be changed and standard compliant facial animation parameters can be used to displace the vertices of the mesh and warp the underlying image accordingly.
Fusion strategies for speech and handwriting modalities in HCI
Author(s):
Claus Vielhauer;
Sascha Schimke;
Valsamakis Thanassis;
Yannis Stylianou
Show Abstract
In this paper we present a strategy for handling of multimodal signals from pen-based mobile devices for Human to Computer Interaction (HCI), where our focus is on the modalities of spoken and handwritten inputs. Each modality for itself is quite well understood, as the exhaustive literature demonstrates, although still a number of challenges exist, like recognition result improvements. Among the potentials in multimodal HCI are improvements in recognition and robustness as well as seamless men-machine communication based on fusion of different modalities by exploiting redundancies among these modalities. However, such valuable fusion of both modalities still poses some problems. Open problems today include design approaches for fusion strategies and with the increasing number of mobile and pen-based computers, particularly techniques for fusion of handwriting and speech appear to have a great potential. But today few publications can be found that addresses this potential. In this work we introduce a conceptional approach based on a model to describe a bimodal HCI process. We analyze four exemplary applications with respect to the structure of this model, and highlight the open problems within these applications. Further, we will outline possible solutions to these challenges. Having such fusion model for HCI may simplify the development of seamless and intuitive to user interfaces on pen-based mobile devices. For one of our application scenarios, a bimodal system for form data recording and recognition in medical or financial environment, we will present some first experimental results.
Mike’s Conker: a collaborative nonlinear knowledge elicitation repository for mobile HCI practitioners
Author(s):
Dean Mohamedally;
Stefan Edlich;
Panayiotis Zaphiris;
Helen Petrie
Show Abstract
In the field of Human Computer Interaction (HCI), we use a variety of Knowledge Elicitation (KE) techniques to capture user cognitive issues e.g. via interviews, paper prototyping, card sorting, focus group debates and more. MIKE (Mobile Interactive Knowledge Elicitation) is an ongoing research direction to enhance the KE capabilities of HCI practitioners via mobile and electronic methods. MIKE tools are a suite of Mobile HCI software and hardware configurations for a variety of mobile platforms. With MIKE's CONKER we describe a Collaborative Non-linear Knowledge Elicitation Repository for HCI practitioners. Its intention is to provide a scalable infrastructure for supporting the management and collaborative retrieval of mobile based KE datasets. Some of its functional design requirements include HCI practitioner profiles management, managing experimental progress from dispersed mobile HCI teams, timetabling expenditures for time critical empirical capture and participant management, and enabling concurrent HCI specialists to compare elicited mobile data. Further expansion of the CONKER system will include incorporation of distributed psychometric analysis methods. CONKER is realized as a sourceforge-alike Web-Portal using state-of-the-art web-framework technologies. We describe several approaches to the capturing and management of HCI data and how CONKER makes this available to the HCI community.
A 3D character animation engine for multimodal interaction on mobile devices
Author(s):
Enrico Sandali;
Fabio Lavagetto;
Paolo Pisano
Show Abstract
Talking virtual characters are graphical simulations of real or imaginary persons that enable natural and pleasant multimodal interaction with the user, by means of voice, eye gaze, facial expression and gestures. This paper presents an implementation of a 3D virtual character animation and rendering engine, compliant with the MPEG-4 standard, running on Symbian-based SmartPhones. Real-time animation of virtual characters on mobile devices represents a challenging task, since many limitations must be taken into account with respect to processing power, graphics capabilities, disk space and execution memory size. The proposed optimization techniques
allow to overcome these issues, guaranteeing a smooth and synchronous animation of facial expressions and lip movements on mobile phones such as Sony-Ericsson's P800 and Nokia's 6600. The animation engine is specifically targeted to the development of new "Over The Air" services, based on embodied conversational agents, with applications in entertainment (interactive story tellers), navigation aid (virtual guides to web sites and mobile services), news casting (virtual newscasters) and education (interactive virtual teachers).
Proxy-assisted multicasting of video streams over mobile wireless networks
Author(s):
Maggie Nguyen;
Layla Pezeshkmehr;
Melody Moh
Show Abstract
This work addresses the challenge of providing seamless multimedia services to mobile users by proposing a proxy-assisted multicast architecture for delivery of video streams. We propose a hybrid system of streaming proxies, interconnected by an application-layer multicast tree, where each proxy acts as a cluster head to stream out content to its stationary and mobile users. The architecture is based on our previously proposed Enhanced-NICE protocol, which uses an application-layer multicast tree to deliver layered video streams to multiple heterogeneous receivers. We targeted the study on placements of streaming proxies to enable efficient delivery of live and on-demand video, supporting both stationary and mobile users. The simulation results are evaluated and compared with two other baseline scenarios: one with a centralized proxy system serving the entire population and one with mini-proxies each to serve its local users. The simulations are implemented using the J-SIM simulator. The results show that even though proxies in the hybrid scenario experienced a slightly longer delay, they had the lowest drop rate of video content. This finding illustrates the significance of task sharing in multiple proxies. The resulted load balancing among proxies has provided a better video quality delivered to a larger audience.
Efficient content production for a university multimedia learning environment
Author(s):
Ali Kurtze;
Ronald Grau;
Reiner Creutzburg
Show Abstract
The use of multimedia learning environments in educational institutions is often associated with the need of uncomplicated content production and moreover, the problem to animate users to maintain the system and its content. A recently developed prototype, 'Instant Seminar,' which is being used at the University of Applied Sciences Brandenburg in Germany, can be seen as an approach for creating an easy-to-use, low-price learning management
system (LMS) for multimedia learning environments. The system could be suitable as a solution for those universities wishing to extend their teachings with an electronic learning environment, but which are presently not prepared for a commercial LMS implementation, mostly due to financial or organizational reasons. 'Instant Seminar' follows the didactic concept of 'Hybrid Learning Arrangements' and we considered corresponding features for communication and personal tutoring as part of the design. Important part of the prototype is a production tool called 'Lecture Wizard,' which is basically a Web-based application running on Apache and MySQL servers. It employs Windows Media Services to allow production, modification and
broadcasting of AV streams corresponding with the other content of created lectures. Produced streams may have different bandwidths and the distribution of lectures is possible within local and wide area networks. Recently, 'Instant Seminar' has been further developed to serve mobile devices.
Streaming TPEG contents in the MPEG-4 system over DMB network
Author(s):
Hyun Jung;
Kyung-Ae Cha;
Jeongyoen Lim;
Munchurl Kim
Show Abstract
Telematics, a compound word with Telecommunications and Informatics, represents a kind of information service which provides traffic, public transport and emergency information to automobile drivers by using car navigation or other interactive communication system. In particular, as the DAB (Digital Audio Broadcasting) or DMB (Digital Multimedia Broadcasting) technology is introduced and commercialized, telematics is rapidly converging with various applications such as broadcasting and communication services.
In this paper, we suggest an idea how a telematics service can be realized by DMB application which enables multimedia service operate on mobile devices. In order to achieve this goal, we generate multimedia content including TPEG (Transport Protocol Experts Group) contents which contain information about road and traffic. TPEG is an expert group which aims at defining a byte-oriented protocol for transport information broadcast. Transport information includes Road Traffic Messages, Public Transport Information and Location information which enables safe and efficient driving for drivers. In Europe, TPEG contents were delivered over DAB network which supports audio only broadcasting. We investigate the technique to deliver the multimedia content with TPEG content over DMB network so that we can provide the information in the scope of telematics as well as multimedia contents.
Integration of MPEG-4 streaming server with remote control webcam in a GPRS network
Author(s):
Oliver Thiede;
Uwe Dummann;
Reiner Creutzburg
Show Abstract
The aim of this paper is to describe the conception, implementation and integration of a streaming server for the GSM/GPRS test network of the SIEMENS Wireless Modules Research & Development (WMRD) division. We describe a client server application for broadcast of MPEG-4 live streams with possibility of position and direction control of the camera using a web interface at the client. The implementation and test in the GPRS network is described.
Design of a multimedia gateway for mobile devices
Author(s):
Raf Hens;
Nico Goeminne;
Sofie Van Hoecke;
Tom Verdickt;
Thomas Bouve;
Frank Gielen;
Piet Demeester
Show Abstract
Although mobile users are currently offered a lot more capabilities on their mobile devices, they still experience some limitations. They can surf the Internet, read their e-mail and receive MMS messages, but they have limited processing power, storage capacity and bandwidth and are limited in their access to peripherals (e.g. printers). We have designed and implemented a multimedia gateway for mobile devices that reduces these limitations. It gives the mobile devices transparent access to high capacity devices connected to the gateway, which is built around a central, modularly extensible server that can run on any PC or home gateway. It manages two sets of modules: one set offering the actual services and another set handling the IP-based wireless interaction with the client applications on the mobile devices. These modules can be added and removed dynamically, offering new services on the fly. Currently services for storage, printing, domotics and playing music are provided. Others can easily be added later on. This paper discusses the architecture and development, the management of modules, the actual services and their benefits. Besides a proprietary implementation, it also looks into OSGi and how both platforms compare to each other, concerning design, architecture, ease of development, functionality, ...
A multimedia session-aware QoS provisioning scheme for cellular networks
Author(s):
Mona El-Kadi Rizvi;
Stephan Olariu
Show Abstract
Multimedia applications often involve a set of cooperating streams that together form a multimedia session. We propose a novel local QoS provisioning scheme for cellular networks that is aware of the relationships between the streams that compose a session. As a rule, existing schemes either allow composite streams to compete with one another for resources or else provide QoS to the session as an atomic entity, leaving to the application the task of managing QoS for the individual streams. Our new MUltimedia SessIon-aware Cellular (MUSIC) QoS provisioning scheme manages the QoS of the individual streams in a session, and with the knowledge of their relationships, it prevents competition between the streams. Further, by allowing an application-specified prioritization between streams in a session, MUSIC scheme features a significant improvement in performance over session-unaware schemes.
Aorta: a management layer for mobile peer-to-peer massive multiplayer games
Author(s):
Stefan Edlich;
Henrik Hoerning;
Andreas Brunnert;
Reidar Hoerning
Show Abstract
The development of Massive Multiplayer Games (MMPGs) for Personal Computers is based on a wide range of frameworks and technologies. In contrast MMPG development for cell phones lacks the availability of framework support. We present Aorta as a multi-purpose lightweight MIDP 2.0 framework to support the transparent and equal API usage of peer2peer communication via http, IP and Bluetooth. Special experiences as load-tests on Nokia 6600s have been made with the Bluetooth support in using a server-as-client architecture to create ad-hoc networks by using piconet functionalities. Additionally scatternet functionalities, which will be supported in upcoming devices, with more than 12 cell phones have been tested in a simulated environment. Core of the Aorta framework is the Etherlobby which manages connections, peers, the game lobby, game policies and much more. The framework itself was developed to enable the fast development of mobile game, regardless of the distance between users which might be the schoolyard or far away. The earliest market ready application shown here is a multimedia game for cell phones utilizing all of the frameworks features. This game called Micromonster acts as platform for developer tests as well as providing valuable information about interface usability and user acceptance.
Grid-based interaction for effective image browsing on mobile devices
Author(s):
Rene U. Rosenbaum;
Heidrun Schumann
Show Abstract
Compared with stationary environments, mobile devices suffer from
a number of limitations like small screen space, limited processing power and bandwidth. Thus, it is very difficult and expensive to browse large images by using current mobile hardware. In this publication a new image browsing technique especially designed for mobile devices with limited screen space is introduced, and a completely new concept to communicate important image properties based on a well-defined grid structure is proposed. As every browsing technique needs reasonable concepts for user interaction, this publication introduces intuitive ways for image exploration, which need only little action of the user during browsing and processing power to calculate an appropriate image representation. To decrease the need for expensive bandwidth in remote environments, it will also be shown how to combine this browsing technique with image compression and transmission. Thus,a whole system for image communication is presented. Due to its excellent compression performance and flexibility, the modern JPEG2000 image coding standard is adopted as a foundation of the proposed system regarding a compliant compression and efficient transmission of the image. Concrete performance measures show the applicability of the introduced system.
An OSGi compatible implementation of a Java resource monitor
Author(s):
Bruno Van Den Bossche;
Koen Van Boxstael;
Nico Goeminne;
Frank Gielen;
Piet M.A. Demeester
Show Abstract
The OSGi Service Platform is a good choice for developing component
based self-adapting and self-configuring software for embedded and
mobile devices. The self-adapting software can download, install
and run new components on the fly, changing itself to provide the best QoS. When the device faces a resource shortage (processing time, memory, bandwidth...) it switches to a less demanding component. Detecting and identifying those resource bottlenecks usually is a nontrivial operation within a J2ME based environment. The most straightforward solution is to bypass the Java Virtual Machine and invoke native code using JNI. However it's not desirable for every developer to create his own native code and consequently lose the platform independent properties of the Java platform. This paper focuses on developing a hardware resource monitor component
which eliminates the need of native C code and active polling. This
component can be plugged into the OSGi framework like every other
component and provides a developer-friendly, generic and extensible
API to monitor hardware resources. The software will be notified
when relevant changes are detected. Thus allowing the development of
platform independent adaptive software bundles which can be automatically deployed on a wide range of mobile and embedded devices.
Visual object recognition for mobile tourist information systems
Author(s):
Lucas Paletta;
Gerald Fritz;
Christin Seifert;
Patrick Luley;
Alexander Almer
Show Abstract
We describe a mobile vision system that is capable of automated object identification using images captured from a PDA or a camera phone. We present a solution for the enabling technology of outdoors vision based object recognition that will extend state-of-the-art location and context aware services towards object based awareness in urban environments. In the proposed application scenario, tourist pedestrians are equipped with GPS, W-LAN and a camera attached to a PDA or a camera phone. They are interested whether their field of
view contains tourist sights that would point to more detailed information. Multimedia type data about related history, the architecture, or other related cultural context of historic or artistic relevance might be explored by a mobile user who is intending to learn within the urban environment. Learning from ambient cues is in this way achieved by pointing the device towards the urban sight, capturing an image, and consequently getting
information about the object on site and within the focus of attention, i.e., the users current field of view.
Evaluating a mobile location-based multimodal game for first-year students
Author(s):
Palle Klante;
Jens Kroesche;
Susanne C. J. Boll
Show Abstract
We developed an exciting location-based, mobile game that allows first year students to explore the university campus in the fashion of a paper chase game. The players try to find virtual, geo-referenced riddles, located at different interesting spots. When approaching such a spot the corresponding multimedia riddle is displayed and the player tries to solve it. To support the player's
orientation and navigation, we developed a multimodal user interface. On the one hand the checkpoints and current position are shown on a geo-referenced graphical map. Alternatively, a weakly intrusive auditory display tells the player by sounds of different loudness if he or she is walking in the right or wrong direction when looking for the way to the next checkpoint. We conducted a first usability evaluation with five teams of two players each. The process of the game, the interaction of the players with each other, and additional persons were observed and recorded; the players also answered a short questionnaire. The results are very promising: playing the game was fun, the players quickly got used to the game idea, and the multimodal user interface of the mobile device had been easily understood. The auditory support was considered helpful and a good complement for graphical visualisation.
Personal video retrieval and browsing for mobile users
Author(s):
Anna Sachinopoulou;
Satu-Marja Makela;
Sari Jarvinen;
Utz Westermann;
Johannes Peltola;
Paavo Pietarila
Show Abstract
The latest camera-equipped mobile terminals and fast cellular networks have increased the interest in mobile multimedia services. The growing amount of stored digital information presents new challenges for video retrieval and browsing. Mobile multimedia services can be accessed with a wide variety of mobile terminals. Content delivery through mobile networks and its presentation in user terminals with limited capabilities requires special attention. This paper suggests a system that allows the creation, storage and retrieval of home videos. The server uses a distributed database, adaptive user interface and adaptive video transmission techniques for access by diverse mobile terminals and minimizing the required data transfer. Manually and automatically created metadata about the video contents is also stored in the database and can be queried in an intelligent manner by means of domain ontology. The user is able to browse through the search results and choose a video to view in a Metaplayer, in which in-video browsing is facilitated by a metadata display. This helps the user to find the interesting points in the video without having to stream unnecessary parts, thus reducing network load. State-of-the-art terminal profiling techniques are used to adapt the user interfaces and presentation of search results according to the user's preferences and device capabilities.
The construction and integration of XML editor into mobile browser
Author(s):
Marko M. Palviainen;
Timo Laakko
Show Abstract
Like now in the wired Internet, in the future consumers will produce content in mobile devices. We suppose that the content will be produced in browser like applications. Mobile browsers present and contained editor components enable users to edit contents. Editor components could be adapted for various kinds of devices and contexts of use and be installed in runtime. This paper introduces a new generic framework for component-based XML editors (called FEdXML) to be integrated into mobile browsers. FEdXML is a lightweight, scalable and transferable framework offering interfaces for the core parts of XML editor to be used in the various kinds of (e.g., mobile) environments and with various kinds of object-oriented languages (e.g., with Java and C++). Thus, instead of designing the whole editor for the target language the efforts can be targeted on designing language specific (e.g., XML model, view, and controller) components. In addition, the strong reuse of the existing well-tested components will improve the quality of editors. The reference Java implementations for the core parts of FEdXML form a base for new XML editors. We have used FEdXML in J2SE and Java MIDP environments.
Effect of TV content in subjective assessment of video quality on mobile devices
Author(s):
Satu Hannele Jumisko;
Ville Petteri Ilvonen;
Kaisa Anneli Vaananen-Vainio-Mattila
Show Abstract
Selection of test materials in subjective assessment methodology recommendations is based mainly on technical parameters. Materials should test the ability of the codec to cope with spatial and temporal redundancy. However consumers watch TV for a reason -- one of the main criteria is the interesting content. In this study we examined whether the content recognition and subjects’ personal interests have an effect on quality assessment. We also studied subjective assessment criteria for video quality. The study was done using small resolution and low bit rate video in mobile phones in a laboratory environment. Altogether 135 subjects, aged 18-65 years, participated in the tests. The test started with a subjective assessment of video quality using well-known TV content. Afterwards a survey was done to measure content recognition and level of interests in the content. The test session ended up with a qualitative interview about evaluation criteria. Our studies showed that there is a connection between interest in content and given quality score with TV content. Therefore we raise a concern on content selection and recommend measuring the evaluator’s interest in content in subjective assessment studies. The study on subjective evaluation criteria revealed that subjects pay attention on content and quality impairments especially in regions of interest.
Content-based image retrieval on mobile devices
Author(s):
Iftikhar Ahmad;
Shafaq Abdullah;
Serkan Kiranyaz;
Moncef Gabbouj
Show Abstract
Content-based image retrieval possesses a tremendous potential for exploration and utilization equally for researchers and people in industry due to its promising results. Expeditious retrieval of desired images requires indexing of the content in large-scale databases along with extraction of low-level features based on the content of images contained in these databases. With the advancement in wireless communication technology and availability of multimedia capable phones it has become vital to query image databases and retrieve results based on the content of query. Our implemented System, “Mobile MUVIS”, based on contemporary MUVIS, aims to bring the capability of content-based query to any device supporting Java platform. It consists of a light-weight client application running on a Java enabled phone and a server containing a servlet running inside a web server. The server responds to image query using efficient native code from selected MUVIS database. The client application, running on mobile phone, is able to form query request which is parsed by servlet for finding closest match to the queried image. The query response is retrieved over GPRS/HSCSD network and images are displayed on the mobile phone. We are able to conclude that such system is feasible but with limited results due to resource constraint on hand-held devices and reduced network bandwidth available in mobile environments.
A face authentication system for mobile devices: optimization techniques
Author(s):
Kwok Ho Pun;
Yiu Sang Moon;
Jian Sheng Chen;
Hoi Wo Yeung
Show Abstract
This paper presents an experimental study of the implementation of a face authentication system for mobile devices. Our system is based on a widely adopted face recognition technique called Principal Component Analysis (PCA). The execution time of the baseline system on a PDA is unacceptably slow -- a typical authentication session takes more than 30 seconds. To make real-time face authentication possible on mobile devices, optimization is needed. In our study, extensive profiling is done to pinpoint the execution hotspots in the system. Based on the profiling results, our optimization strategy focused on replacing the large amount of slow floating point calculations with their fixed-point versions. Range estimation is also carried out to determine the range of floating point values that must be accommodated by the final, fixed-point version of our system. Compared with the baseline system, experimental results indicate that our optimized system runs as much as 47 times faster for PCA projection. Using the optimized system, a complete authentication session takes only 5 seconds. Real time face authentication for mobile device is achieved with no significant loss in recognition accuracy.
General motion model: a general model for moving objects
Author(s):
Huanzhuo Ye;
Tao Shang;
Jing Ye;
Jianping Pan
Show Abstract
The modeling of moving objects is the basic of the information management of them. The common practice of moving objects’ modeling is to regard the objects as points which have different positions in the space at different time regardless of their structures, sizes, colors, etc. The General Motion Model (GMM) proposed here is good for representing both 2D and 3D moving objects. It combines sampling method and function method, and encapsulates all the data, including parameters and sampling data, and operations with object oriented method. GMM mixes discrete processing and continuous processing of motion data, and offers multiple functions to fulfill the needs of motion data LOD, thus gives users the option to balance processing speed and precision. Nonlinear interpolations and extrapolations could be applied to GMM as well as linear interpolations and extrapolations. GMM also gives users the flexibility of choosing interested dynamic attributes of moving objects in dynamic attributes set, which are not required to be presented explicitly, incrementally and separately as in other models. Experiments show that GMM is easy to implement and easy to operate. With proper indexing of motion data, GMM is also efficient in spatiotemporal data query.
Evaluation and analysis of Terrestrial-DMB system based on Eureka-147
Author(s):
Byungjun Bae;
Sammo Cho;
Joungil Yun;
Young Kwon Hahm;
Soo In Lee
Show Abstract
Since the Eureka-147 DAB (Digital Audio Broadcasting) system was announced in the middle of 1990s, many kinds of applications have been introduced in many countries in the world as well as in Europe. T-DMB (Terrestrial Digital Multimedia Broadcasting), which is developed in Korea, is one of the applications which have emerged from Eureka-147 DAB system. T-DMB system provides mobile multimedia broadcasting services including moving pictures as well as CD quality digital audio services in the VHF band. This paper introduces the Korean T-DMB standard based on Eureka-147 DAB and proposes the architecture of a new T-DMB transmission system, which uses a new device called the Ensemble Remultiplexer, to verify it. Although the proposed T-DMB transmission system is the improved version of the conventional DAB system, it still utilizes the existing devices from the DAB system in many parts. Therefore, it provides a solution with high flexibility and low cost for mobile multimedia broadcasting services.
We also verified the Korean T-DMB standard and evaluated the proposed T-DMB transmission system by moving the city surrounded by high buildings in high speed. The T-DMB is transmitted at the broadcasting station site in Kwanak mountain in Korea. From the result of this experiment, we could confirm that the T-DMB keeps good quality pictures without any problem comparing to analog broadcasting. The T-DMB is expected to provide mobile multimedia broadcasting services for users from the middle of 2005 and will be evolved to focus on more intelligent and interactive services in Korea.
Feasibility study of a real-time operating system for a multichannel MPEG-4 encoder
Author(s):
Olli Lehtoranta;
Timo D. Hamalainen
Show Abstract
Feasibility of DSP/BIOS real-time operating system for a multi-channel MPEG-4 encoder is studied. Performances of two MPEG-4 encoder implementations with and without the operating system are compared in terms of encoding frame rate and memory requirements. The effects of task switching frequency and number of parallel video channels to the encoding frame rate are measured. The research is carried out on a 200 MHz TMS320C6201 fixed point DSP using QCIF (176x144 pixels) video format. Compared to a traditional DSP implementation without an operating system, inclusion of DSP/BIOS reduces total system throughput only by 1 QCIF frames/s. The operating system has 6 KB data memory overhead and program memory requirement of 15.7 KB. Hence, the overhead is considered low enough for resource critical mobile video applications.
Data hiding for error concealment of H.264/AVC video transmission over mobile networks
Author(s):
Alessandro Piva;
Roberto Caldelli;
Francesco Filippini
Show Abstract
In this paper, a new data hiding-based error concealment algorithm
is proposed. The method allows to increase the video quality in
H.264/AVC wireless video transmission and Real-Time applications,
where the retransmission is unacceptable. Data hiding is used for
carrying to the decoder the values of 6 inner pixels of every
macro-block (MB) to be used to reconstruct lost MBs into Intra
frames through a bilinear interpolation process. The side
information concerning a slice is hidden into another slice of the
same frame, by properly modifying some quantized AC coefficients
of the Integer Transform of the 16 blocks 4x4 composing the MBs of
the host slice. At the decoder, the embedded information can be
recovered from the bit-stream and used in the bilinear interpolation to reconstruct the damaged slice. This method, although allowing the system to remain fully compliant with the standard, improves the performance with respect to the conventional error concealment methods adopted by H.264/AVC, from the point of view of visual quality and Y-PSNR. In particular, it is possible to improve the result of the interpolation process adopted by H.264/AVC, reducing the distance between interpolating pixels from 16 to 5.
Scene cut detection in three-dimensional discrete cosine transform-based video codec
Author(s):
Jari J. Koivusaari;
Jarmo H. Takala
Show Abstract
This paper describes how a scene cut detector could be utilized in a video codec based on the three-dimensional discrete cosine transform (3D DCT). In the 3D DCT based video codec, data is processed with 8x8x8 cubes, hence a set of 8 images need to be available in a memory at a time. A change of video scene may occur between any of those images stored in the memory. Rapid scene change within an 8x8x8 cube produces significant high frequency coefficients into the temporal dimension of the DCT domain. If the important high frequency coefficients are discarded, the information between the scenes is mixed around the scene cut position causing ghost artifacts into the reconstructed video sequence. Therefore, an approach to handle each of the eight possible scene change situations within an 8x8x8 cube is proposed. The proposed method includes the utilization of the 8x8x4 DCT, forced-fill, repeat previous frame, and average to previous frame techniques. By utilizing a scene cut detector into the 3D DCT based video codec, unnecessary quality drops could be avoided without reducing the compression ratio. Notable quality improvements could be achieved for images around a scene cut position.