Internet Multimedia Management Systems IV

Video summarization: methods and landscape

Mauro Barbieri, Lalitha Agnihotri, Nevenka Dimitrova

Show abstract

The ability to summarize and abstract information will be an essential part of intelligent behavior in consumer devices. Various summarization methods have been the topic of intensive research in the content-based video analysis community. Summarization in traditional information retrieval is a well understood problem. While there has been a lot of research in the multimedia community there is no agreed upon terminology and classification of the problems in this domain. Although the problem has been researched from different aspects there is usually no distinction between the various dimensions of summarization. The goal of the paper is to provide the basic definitions of widely used terms such as skimming, summarization, and highlighting. The different levels of summarization: local, global, and meta-level are made explicit. We distinguish among the dimensions of task, content, and method and provide an extensive classification model for the same. We map the existing summary extraction approaches in the literature into this model and we classify the aspects of proposed systems in the literature. In addition, we outline the evaluation methods and provide a brief survey. Finally we propose future research directions based on the white spots that we identified by analysis of existing systems in the literature.

Hierarchical video summarization based on context clustering

Belle L. Tseng, John R. Smith

Show abstract

A personalized video summary is dynamically generated in our video personalization and summarization system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order to maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. Besides finding the desired contents, the objective is to present a coherent summary. There are diverse methods for creating summaries, and we focus on the challenges of generating a hierarchical video summary based on context information. In our summarization algorithm, three inputs are used to generate the hierarchical video summary output. These inputs are (1) MPEG-7 metadata descriptions of the contents in the server, (2) user preference and usage environment declarations from the user client, and (3) context information including MPEG-7 controlled term list and classification scheme. In a video sequence, descriptions and relevance scores are assigned to each shot. Based on these shot descriptions, context clustering is performed to collect consecutively similar shots to correspond to hierarchical scene representations. The context clustering is based on the available context information, and may be derived from domain knowledge or rules engines. Finally, the selection of structured video segments to generate the hierarchical summary efficiently balances between scene representation and shot selection.

An extended framework for adaptive playback-based video summarization

Kadir A. Peker, Ajay Divakaran

Show abstract

In our previous work, we described an adaptive fast playback framework for video summarization where we changed the playback rate using the motion activity feature so as to maintain a constant “pace.” This method provides an effective way of skimming through video, especially when the motion is not too complex and the background is mostly still, such as in surveillance video. In this paper, we present an extended summarization framework that, in addition to motion activity, uses semantic cues such as face or skin color appearance, speech and music detection, or other domain dependent semantically significant events to control the playback rate. The semantic features we use are computationally inexpensive and can be computed in compressed domain, yet are robust, reliable, and have a wide range of applicability across different content types. The presented framework also allows for adaptive summaries based on preference, for example, to include more dramatic vs. action elements, or vice versa. The user can switch at any time between the skimming and the normal playback modes. The continuity of the video is preserved, and complete omission of segments that may be important to the user is avoided by using adaptive fast playback instead of skipping over long segments. The rule-set and the input parameters can be further modified to fit a certain domain or application. Our framework can be used by itself, or as a subsequent presentation stage for a summary produced by any other summarization technique that relies on generating a sub-set of the content.

Scene-based scalable video summarization

Ying Li, C. C. Jay Kuo, Daniel Tretter

Show abstract

A scalable video summarization and navigation system is proposed in this work. Particularly, given the desired number of keyframes for a video sequence, we first distribute it among underlying video scenes and sinks based on their respective importance ranks. Then, we select the most important shot of each sink as its R-shot and further assign each sink's designated number of keyframes to its R-shot. Finally, a time-constrained keyframe extraction scheme is developed to locate all keyframes. Consequently, we can achieve a scalable video summary from the initial keyframe set by exploiting such a video structure-based ranking scheme. In addition, a content navigation tool is also developed which could help users freely access or locate specific video scenes or shots. Sophisticated user studies have shown that this summarization and navigation system can not only help users quickly browse video content, but also assist them in searching for particular video segments.

Automatic movie skimming with general tempo analysis

Shih-Hung Lee, Chia-Hung Yeh, C. C. Jay Kuo

Show abstract

Story units are extracted by general tempo analysis including tempos analysis including tempos of audio and visual information in this research. Although many schemes have been proposed to successfully segment video data into shots using basic low-level features, how to group shots into meaningful units called story units is still a challenging problem. By focusing on a certain type of video such as sport or news, we can explore models with the specific application domain knowledge. For movie contents, many heuristic rules based on audiovisual clues have been proposed with limited success. We propose a method to extract story units using general tempo analysis. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.

Audio fingerprint extraction for content identification

Yu Shiu, Chia-Hung Yeh, C. C. Jay Kuo

Show abstract

In this work, we present an audio content identification system that identifies some unknown audio material by comparing its fingerprint with those extracted off-line and saved in the music database. We will describe in detail the procedure to extract audio fingerprints and demonstrate that they are robust to noise and content-preserving manipulations. The main feature in the proposed system is the zero-crossing rate extracted with the octave-band filter bank. The zero-crossing rate can be used to describe the dominant frequency in each subband with a very low computational cost. The size of audio fingerprint is small and can be efficiently stored along with the compressed files in the database. It is also robust to many modifications such as tempo change and time-alignment distortion. Besides, the octave-band filter bank is used to enhance the robustness to distortion, especially those localized on some frequency regions.

Audio thumbnailing using MPEG-7 low-level audio descriptors

Jens Wellhausen, Michael Hoeynck

Show abstract

In this paper we present an audio thumbnailing technique based on audio segmentation by similarity search. The segmentation is performed on MPEG-7 low level audio feature descriptors as a growing source of multimedia meta data. Especially for database applications or audio-on-demand services this technique could be very helpful, because there is no need to have access to the probably copyright protected original audio material. The result of the similarity search is a matrix which contains off-diagonal stripes representing similar regions, which are usually the refrains of a song and thus a very suitable segment to be used as audio thumbnail. Using the a priori knowledge that we search off-diagonal stripes which must represent several seconds of audio data and that the adjustment of the stripes must be characteristically, we implemented a filter to enhance the structure of the similarity matrix and to extract a relevant segment as an audio thumbnail.

Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video

Regunathan Radhakrishan, Ziyou Xiong, Ajay Divakaran, et al.

Show abstract

In our past work, we have attempted to use a mid-level feature namely the state population histogram obtained from the Hidden Markov Model (HMM) of a general sound class, for speaker change detection so as to extract semantic boundaries in broadcast news. In this paper, we compare the performance of our previous approach with another approach based on video shot detection and speaker change detection using the Bayesian Information Criterion (BIC). Our experiments show that the latter approach performs significantly better than the former. This motivated us to examine the mid-level feature closely. We found that the component population histogram enabled discovery of broad phonetic categories such as vowels, nasals, fricatives etc, regardless of the number of distinct speakers in the test utterance. In order for it to be useful for speaker change detection, the individual components should model the phonetic sounds of each speaker separately. From our experiments, we conclude that state/component population histograms can only be useful for further clustering or semantic class discovery if the features are chosen carefully so that the individual states represent the semantic categories of interest.

Semi-automatic approach for music classification

Tong Zhang

Show abstract

Audio categorization is essential when managing a music database, either a professional library or a personal collection. However, a complete automation in categorizing music into proper classes for browsing and searching is not yet supported by today’s technology. Also, the issue of music classification is subjective to some extent as each user may have his own criteria for categorizing music. In this paper, we propose the idea of semi-automatic music classification. With this approach, a music browsing system is set up which contains a set of tools for separating music into a number of broad types (e.g. male solo, female solo, string instruments performance, etc.) using existing music analysis methods. With results of the automatic process, the user may further cluster music pieces in the database into finer classes and/or adjust misclassifications manually according to his own preferences and definitions. Such a system may greatly improve the efficiency of music browsing and retrieval, while at the same time guarantee accuracy and user’s satisfaction of the results. Since this semi-automatic system has two parts, i.e. the automatic part and the manual part, they are described separately in the paper, with detailed descriptions and examples of each step of the two parts included.

Coding format independent multimedia content adaptation using XML

Christian Timmerer, Gabriel Panis, Harald Kosch, et al.

Show abstract

Due to the heterogeneity of the current terminal and network infrastructures, multimedia content needs to be adapted to specific capabilities of these terminals and network devices. Furthermore, user preferences and user environment characteristics must also be taken into consideration. The problem becomes even more complex by the diversity of multimedia content types and encoding formats. In order to meet this heterogeneity and to be applicable to different coding formats, the adaptation must be performed in a generic and interoperable way. As a response to this problem and in the context of MPEG-21, we present an approach which uses XML to describe the high-level structure of a multimedia resource in a generic way, i.e., how the multimedia content is organized, for instance in layers, frames, or scenes. For this purpose, a schema for XML-based bitstream syntax descriptions (generic Bitstream Syntax Descriptions or gBSDs) has been developed. A gBSD can describe the high-level structure of a multimedia resource in a coding format independent way. Adaptation of the resource is based on elementary transformation instructions formulated with respect to the gBSDs. These instructions have been separated from the gBSDs in order to use the same descriptions for different adaptations, e.g., temporal scaling, SNR scaling, or semantic adaptations. In the MPEG-21 framework, those adaptations can be steered for instance by the network characteristics and the user preferences. As a result, it becomes possible for coding format agnostic adaptation engines to transform media bitstreams and associated descriptions to meet the requirements imposed by the network conditions, device capabilities, and user preferences.

Content adaptation architecture with efficient usage of cached data in a multimedia proxy server

Sunil Bandaru, Mrinal K. Mandal

Show abstract

The use of multimedia data is growing at a rapid rate, and bringing multimedia services to terminals with limited capabilities is a major challenge. Efficient schemes are hence required for adapating the multimedia content for delivery to devices with limited resources. In the conventional server and proxy-based architectures, the adaptation is performed either at the server or at the proxy resulting in the loading of the server and the proxy. In this paper we propose a novel distributed adaptation architecture suitable for resource-limited multimedia terminals as well as wired connections with high bandwidths. Here, the data can be adapted at the proxy or at the server, resulting in a faster adaptation process. In addition, we propose an efficient cache replacement policy at the proxy. The proposed architecture is very flexible for both mobile and wired networks. The experimental results show that the proposed architecture improves the performance of the proxy and the server, and reduces network congestion and latency.

DRAG: a database for recognition and analysis of gait

Prem Kuchi, Raghu Ram V. Hiremagalur, Helen Huang, et al.

Show abstract

A novel approach is proposed for creating a standardized and comprehensive database for gait analysis. The field of gait analysis is gaining increasing attention for applications such as visual surveillance, human-computer interfaces, and gait recognition and rehabilitation. Numerous algorithms have been developed for analyzing and processing gait data; however, a standard database for their systematic evaluation does not exist. Instead, existing gait databases consist of subsets of kinematic, kinetic, and electromyographic activity recordings by different investigators, at separate laboratories, and under varying conditions. Thus, the existing databases are neither homogenous nor sufficiently populated to statistically validate the algorithms. In this paper, a methodology for creating a database is presented, which can be used as a common ground to test the performance of algorithms that rely upon external marker data, ground reaction loading data, and/or video images. The database consists of: (1) synchronized motion-capture data (3D marker data) obtained using external markers, (2) computed joint angles, and (3) ground reaction loading acquired with plantar pressure insoles. This database could be easily expanded to include synchronized video, which will facilitate further development of video-based algorithms for motion tracking. This eventually could lead to the realization of markerless gait tracking. Such a system would have extensive applications in gait recognition, as well as gait rehabilitation. The entire database (marker, angle, and force data) will be placed in the public domain, and made available for downloads over the World Wide Web.

Content-based retrieval of distorted images using a hybrid genetic algorithm augmented by a self-organizing network

Igor V. Maslov

Show abstract

Content-based image retrieval involves a direct matching operation between a query image and a database of stored images. In case where the query image can be significantly distorted in relation to the stored image, the common methods of computing similarities between the feature vectors of the images might not provide sufficiently robust base for successful image retrieval. The retrieval problem can be re-formulated then as an optimization problem of finding a correct mapping between images when a search space is considerably large. An approach is proposed that combines a hybrid genetic algorithm and a self-organizing neural network in a global search for a set of parameters defining a correct mapping between images. In order to compute unique characteristics of the query image that are invariant to image transformation and distortion, an image transform is introduced in the form of image local response. The transform is applied at the pre-processing stage to the stored and the query images, to extract their dynamic content. Correlation is applied then to both images in the response space. The technique reduces the search space and allows to find a set of subregions that can contain a potentially correct match. After pre-processing, a hybrid genetic algorithm augmented by a self-organizing network for local refining finishes the search in the reduced parameter space.

Multimedia content analysis and indexing: evaluation of a distributed and scalable architecture

Hasnain Mandviwala, Scott Blackwell, Chris Weikart, et al.

Show abstract

Multimedia search engines facilitate the retrieval of documents from large media content archives now available via intranets and the Internet. Over the past several years, many research projects have focused on algorithms for analyzing and indexing media content efficiently. However, special system architectures are required to process large amounts of content from real-time feeds or existing archives. Possible solutions include dedicated distributed architectures for analyzing content rapidly and for making it searchable. The system architecture we propose implements such an approach: a highly distributed and reconfigurable batch media content analyzer that can process media streams and static media repositories. Our distributed media analysis application handles media acquisition, content processing, and document indexing. This collection of modules is orchestrated by a task flow management component, exploiting data and pipeline parallelism in the application. A scheduler manages load balancing and prioritizes the different tasks. Workers implement application-specific modules that can be deployed on an arbitrary number of nodes running different operating systems. Each application module is exposed as a web service, implemented with industry-standard interoperable middleware components such as Microsoft ASP.NET and Sun J2EE. Our system architecture is the next generation system for the multimedia indexing application demonstrated by www.speechbot.com. It can process large volumes of audio recordings with minimal support and maintenance, while running on low-cost commodity hardware. The system has been evaluated on a server farm running concurrent content analysis processes.

Hypervideo summaries

Andreas Girgensohn, Frank Shipman, Lynn D. Wilcox

Show abstract

Hypervideo is a form of interactive video that allows users to follow links to other video. A simple form of hypervideo, called “detail-on-demand video,” provides at most one link from one segment of video to another, supporting a single-button interaction. Detail-on-demand video is well suited for interactive video summaries, because the user can request a more detailed summary while watching the video. Users interact with the video is through a special hypervideo player that displays keyframes with labels indicating when a link is available. While detail-on-demand summaries can be manually authored, it is a time-consuming task. To address this issue, we developed an algorithm to automatically generate multi-level hypervideo summaries. The highest level of the summary consists of the most important clip from each take or scene in the video. At each subsequent level, more clips from each take or scene are added in order of their importance. We give one example in which a hypervideo summary is created for a linear training video. We also show how the algorithm can be modified to produce a hypervideo summary for home video.

Motion trajectory representations for efficient storage and search

Grzegorz Galinski, Wladyslaw Skarbek

Show abstract

Novel motion trajectory representations are introduced for efficient storage, transmission and search. Bit size reducing is obtained by specialization of motion trajectory temporal models to each spatial coordinate contrary to MPEG-7 approach where temporal models are the same for all spatial dimensions. On trajectories of MPEG-7 experimentation model (XM) the average bit gain is about 17%. For motion trajectory matching in search applications, the proposed similarity measure is based on exact integral formula for piecewise polynomial representation. This approach is not only more accurate but on average at least one order of magnitude faster than vector distance for discrete dense interpolation of trajectories, which is proposed in MPEG-7 XM. For long or complex shape trajectories matching in large databases can be too slow. Therefore we propose two acceleration techniques reducing significantly the number of trajectory comparisons: distance of trajectory centroids and distance of trajectory dispersion vectors. Using both techniques allows to skip about 95% trajectory comparisons when they are stored in X-tree or M-tree.

QCTVA: quality-controlled temporal video adaptation

Klaus Leopold, Hermann Hellwagner, Michael Kropfberger

Show abstract

Multimedia streaming is becoming more and more popular. Seamless video streaming in heterogeneous networks like the Internet turns out as almost impossible due to varying network conditions -- streams must be adapted to the current network QoS. Temporal scalability is one of the most reasonable adaptation techniques because it is fast and easy to perform. Today's approaches simply drop frames out of a video without spending much effort on finding an intelligent dropping behavior. This usually leads to good adaptation results in terms of bandwidth consumption but also to suboptimal video quality within the given bounds. Our approach offers analysis of video streams to achieve the qualitatively best temporal scalability. For this reason, we introduce a data structure called modification lattice which represents all frame dropping combinations within a sequence of frames. On the basis of the modification lattice, quality estimations on frame sequences can be performed. Moreover, a heuristic for fast and efficient quality computation in a modification lattice is presented. Experimental results illustrate that temporal video adaptation based on QCTVA information leads to a better video quality compared to "usual" frame dropping approaches. Furthermore, QCTVA offers frame priority lists for videos. Based on these priorities, numerous adaptation techniques can increase their overall performance when using QCTVA.

Enhanced MHT encryption scheme for chosen plaintext attack

Dahua Xie, C. C. Jay Kuo

Show abstract

Efficient multimedia encryption algorithms play a key role in multimedia security protection. One multimedia encryption algorithm known as the MHT (Multiple Huffman Tables) method was recently developed by Wu and Kuo. Even though MHT has many desirable properties, it is vulnerable to the chosen-plaintext attack (CPA). An enhanced MHT algorithm is proposed in this work to overcome this drawback. It is proved mathematically that the proposed algorithm is secure against the chosen plaintext attack.

Providing end-to-end QoS for multimedia applications in 3G wireless networks

Katherine Guo, Samapth Rangarajan, M. Ali Siddiqui, et al.

Show abstract

As the usage of wireless packet data services increases, wireless carriers today are faced with the challenge of offering multimedia applications with QoS requirements within current 3G data networks. End-to-end QoS requires support at the application, network, link and medium access control (MAC) layers. We discuss existing CDMA2000 network architecture and show its shortcomings that prevent supporting multiple classes of traffic at the Radio Access Network (RAN). We then propose changes in RAN within the standards framework that enable support for multiple traffic classes. In addition, we discuss how Session Initiation Protocol (SIP) can be augmented with QoS signaling for supporting end-to-end QoS. We also review state of the art scheduling algorithms at the base station and provide possible extensions to these algorithms to support different classes of traffic as well as different classes of users.

Dynamic management of QoS multimedia multicasting

Mehul Vora, Sunil Kumar

Show abstract

The interaction between multicasting and real-time multimedia streams poses various new and interesting problems in research on communication protocols and architectures. For example, display systems of heterogeneous users in the same session may have different or even varying latency, display resolution, and/or processing capabilities. In this paper, we propose a receiver-oriented resource reservation mechanism, called Dynamic Management of QoS with Priority (DMQP), for multimedia multicasting over the Internet that provides QoS guarantees with service differentiation for heterogeneous users. In DMQP a real-time application requests QoS by specifying a range of bandwidth values and delay, and the network tries to reserve resources for it within its bandwidth range. The service differentiation is achieved by classifying the end-users into normal user and prioritized user. When the number of end-users increases, the new end-users are not simply rejected. Instead, all nodes, including receiver node(s), sender node(s) and intermediate node(s), readjust their reserved resources dynamically to admit more end-users, as long as their minimum bandwidth requirement is met. By treating the bandwidth requirements as ranges, DMQP provides the flexibility needed for operation in the dynamic Internet environment. It has the significant benefit of allowing more flexible sharing of available resources among applications. We have conducted simulations to evaluate the performance of the proposed mechanism.

Objective quality measurement for audio time-scale modification

Fang Liu, Jae-Joon Lee, C. C. Jay Kuo

Show abstract

The recent ITU-T Recommendation P.862, known as the Perceptual Evaluation of Speech Quality (PESQ) is an objective end-to-end speech quality assessment method for telephone networks and speech codecs through the measurement of received audio quality. To ensure that certain network distortions will not affect the estimated subjective measurement determined by PESQ, the algorithm takes into account packet loss, short-term and long-term time warping resulted from delay variation. However, PESQ does not work well for time-scale audio modification or temporal clipping. We investigated the factors that impact the perceived quality when time-scale modification is involved. An objective measurement of time-scale modification is proposed in this research, where the cross-correlation values obtained from time-scale modification synchronization are used to evaluate the quality of a time-scaled audio sequence. This proposed objective measure has been verified by a subjective test.

QoS considerations in wireless sensor networks for telemedicine

Fei Hu, Sunil Kumar

Show abstract

The integration of telemedicine with medical micro sensor technology (Mobile Sensor Networks for Telemedicine applications -- MSNT) provides a promising approach to improve the quality of people's lives. This type of network can truly implement the goal of providing health-care services anytime and anywhere. Our research in this field generates the following outcomes that are reported in this paper: (1) We propose a mobile sensor network infrastructure to support the third-generation telemedicine applications; (2) An energy-efficient query resolution mechanism in large-scale mobile sensor networks is used for critical medical data collections; (3) To provide the guaranteed mobile QoS for arriving multimedia calls, a new multi-class call admission control mechanism is proposed which is based on dynamically forming a reservation pool for handoff requests. We used discrete-event-based simulation model using OPNET to verify our scheme. The simulation results show that our system can satisfy the adaptive QoS requirements in large-scale telemedicine sensor networks.

QoS-aware call admission control for multimedia over CDMA networks

Sunghyun Seo, Jitae Shin, JongWon Kim

Show abstract

Diverse multimedia services are at hand on 3G-and-beyond multi-service CDMA systems. Based on four different kinds of traffic types, as defined in Universal Mobile Telecommunications System (UMTS), each multimedia service needs to be admitted and allocated with appropriate resources while taking care of efficient utilization of limited resources. In this paper, we are focusing on a priority-based and QoS (quality of service)-aware CAC (call admission control), which is aware of both QoS requirement per traffic type and time-varying CDMA capability, and allows admission discrimination according to traffic types in order to minimize the probability of QoS violation. Also the CAC needs to combine resource allocation schemes such as complete sharing, complete partitioning, and priority sharing in order to provide fairness and service differentiation among traffic types. The proposed CAC adopts total received power as cell load estimation, estimates resource usage for each traffic type, and applies different threshold per each traffic type (i.e., having different SIR requirement). The performance of proposed CAC in combination with resource allocation scheme is evaluated through extensive computer simulations.

Implementation of on-demand QoS allocation system over IP for multimedia applications

Dongwook Lee, Donghoon Yi, Yoonyoung Kim, et al.

Show abstract

To support diverse transmission requirements of multimedia applications, Quality of Service (QoS) should be provided in the Internet, where only the best-effort service is available. Based on the bandwidth broker model for realizing the IETF differentiated service (DiffServ), in this paper, we describe our recent effort on the implementation and verification of an extendable and flexible QoS allocation and resource management system. Focusing on the bandwidth issue over single administrative domain, the implemented system provides real-time resource reservation and allocation, delayed call admission control, simple QoS negotiation between server and user, and simple resource monitoring. The implemented system is verified by evaluating the performance of a resource-intensive application over the real-world testbed network.

Resource management techniques for dynamic multimedia content adaptation in heterogeneous networks

Eugenia Nikolouzou, Petros Sampatakos, V. Kosmatos, et al.

Show abstract

Moving towards an explosion of wireless and wired technologies coupled with an evolution of wireless terminal devices and different types of services, there is a growing need to decouple the service provision from the platform support. This need is considerably significant if we consider the influx of multi-media streaming content, the number of devices with different requirements and capabilities, the different service platforms and the uncountable number of software programs needed to be downloaded in order to present and manipulate the desired content. The architecture described in this paper presents a model for nomadic multimedia environments aiming at providing an “on-the-fly” adaptation of the delivery system. Taking advantage of the innovative capabilities offered by the current reconfigurable devices (FPGAs), the intelligent agent-based service provision and the advanced simulation techniques, it allows a hardware-independent delivery of content, where the end-devices can be reconfigured to optimally represent the multimedia content. Specifically, this paper focuses on the Simulation Server module of the architecture, which provides a dynamic and efficient approach for the optimum adjustment of the communication profile and adoption of the reconfiguration strategy, according to the user/network combination and device capabilities.

Architectural framework for resource management optimization over heterogeneous wireless networks

Nikos Tselikas, Sofia Kapellaki, Eleftherios Koutsoloukas, et al.

Show abstract

The main goal of wireless telecommunication world can be briefly summarized as: “communication anywhere, anytime, any-media and principally at high-data rates.” On the other hand, this goal is in conflict with the co-existence of plenty different current and emerging wireless systems covering almost the whole world, since each one follows its own architecture and is based on its particular bedrocks. This results in a heterogeneous depiction of the hyper-set of wireless communications systems. The scope of this paper is to present a highly innovative and scalable architectural framework, which will allow different wireless systems to be interconnected in a common way, able to achieve resource management optimization, augmentation of network performance and maximum utilization of the networks. It will describe a hierarchical management system covering all GSM, GPRS, UMTS and WLAN networks each one individually, as well as a unified and wide wireless telecommunication system including all later, in order to provide enhanced capacity and quality via the accomplished network interworking. The main idea is to monitor all the resources using distributed monitoring components with intention to feed an additional centralized system with alarms, so that a set of management techniques will be selected and applied where needed. In parallel, the centralized system will be able to combine the aforementioned alarms with business models for the efficient use of the available networks according to the type of user, the type of application as well as the user’s location.

Generating panorama photos

Yining Deng, Tong Zhang

Show abstract

Photo or video mosaicing have drawn a lot of interests in the research field in the past years. Most of the existing work, however, focuses on how to match the images or video frames. This paper presents techniques to handle some practical issues when generating panorama photos. We have found from the experiments that a simple translational motion model gives more robust results than affine model for horizontally panned image sequences. Realizing the fact that there would always be some misalignments between two images no matter how well the matching is done, we propose a stitching method that finds a line of best agreement between two images, to make the misalignments less visible. Also shown in this paper are methods on how to correct camera exposure changes and how to blend the stitching line between the images. We will show panorama photos generated from both still images and video.

MPEG content summarization based on compressed domain feature analysis

Masaru Sugano, Yasuyuki Nakajima, Hiromasa Yanagihara

Show abstract

This paper addresses automatic summarization of MPEG audiovisual content on compressed domain. By analyzing semantically important low-level and mid-level audiovisual features, our method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight. The former is a shortened version of an original, while the latter is an aggregation of important or interesting events. In our proposal, first, the incoming MPEG stream is segmented into shots and the above features are derived from each shot. Then the features are adaptively evaluated in an integrated manner, and finally the qualified shots are aggregated into a summary. Since all the processes are performed completely on compressed domain, summarization is achieved at very low computational cost. The experimental results show that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models. As for digest extraction, subjective evaluation proves that meaningful shots are extracted from content without a priori knowledge, even if it contains multiple genres of programs. Our method also has the advantage of generating an MPEG-7 based description such as summary and audiovisual segments in the course of summarization.

MPEG-7-based video annotation and browsing

Michael Hoeynck, Thorsten Auweiler, Jens Wellhausen

Show abstract

The huge amount of multimedia data produced worldwide requires annotation in order to enable universal content access and to provide content-based search-and-retrieval functionalities. Since manual video annotation can be time consuming, automatic annotation systems are required. We review recent approaches to content-based indexing and annotation of videos for different kind of sports and describe our approach to automatic annotation of equestrian sports videos. We especially concentrate on MPEG-7 based feature extraction and content description, where we apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information. Having determined single shot positions as well as the visual highlights, the information is jointly stored with meta-textual information in an MPEG-7 description scheme. Based on this information, we generate content summaries which can be utilized in a user-interface in order to provide content-based access to the video stream, but further for media browsing on a streaming server.

Octree-based progressive geometry encoder

Jingliang Peng, C. C. Jay Kuo

Show abstract

Among progressive 3D mesh compression algorithms, the kd-tree-based algorithm proposed by Gandoin and Devillers is one of the state-of-the-art algorithms. Based on the observation that this geometry coder has a large amount of overhead at high kd-tree levels, we propose an octree-based geometry coder that demands a less amount of coding bits at high octree levels by applying selective cell subdivision at high tree levels, leading to a better rate-distortion performance for the low bit rate coding. Experimental results show that, compared with the kd-tree-based coder, the proposed 3D geometry coder performs better for an expanded tree of a level less than or equal to 8 but slightly worse for the full tree expansion with 12-bit quantization.

Adaptive distributed multimedia streaming server in Internet settings

Roland Tusch, Christian Spielvogel, Markus Kroepfl, et al.

Show abstract

We present an adaptive distributed multimedia server architecture (ADMS) that builds upon the idea of offensive adaptivity, where the server proactively controls its layout through replication or migration of server components to recommended hosts. Proactive actions are taken when network or server resources become critical when fulfilling client demands. Recommendations are provided by a so-called "host recommender" which represents an integral part of Vagabond2 -- the middleware used for component distribution. Recommendations are based on measured or estimated server and network resource availabilities. Network distance and host resource metrics -- obtained from network and host resource services respectively -- may be communicated as MPEG-21 DIA descriptors. Finally we evaluate our architecture in a real-world streaming scenario.

An analysis of packaging formats for complex digtal objects: review of principles

Jeroen L.N. Bekaert, Patrick Hochstenbach, Emiel De Kooning, et al.

Show abstract

During recent years, the number of organizations making digital information available has massively increased. This evolution encouraged the development of standards for packaging and encoding digital representations of complex objects (such as a digital music albums or digitized books and photograph albums). The primary goal of this article is to offer a method to compare these packaging standards and best practices tailored to the needs of the digital library community and the rising digital preservation programs. The contribution of this paper is the definition of an integrated reference model, based on both the OAIS framework and some additional significant properties that affect the quality, usability, encoding and behavior of the digital objects.

Internet Multimedia Management Systems IV

Volume Details

Table of Contents

Table of Contents