Proceedings Volume 4676

Storage and Retrieval for Media Databases 2002

Minerva M. Yeung, Chung-Sheng Li, Rainer W. Lienhart
cover
Proceedings Volume 4676

Storage and Retrieval for Media Databases 2002

Minerva M. Yeung, Chung-Sheng Li, Rainer W. Lienhart
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 19 December 2001
Contents: 11 Sessions, 41 Papers, 0 Presentations
Conference: Electronic Imaging 2002
Volume Number: 4676

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Feature Extraction and Segmentation
  • Indexing, Classification, and Relevance Feedback
  • Search and Retrieval of Image Databases I
  • Search and Retrieval of Image Databases II
  • Video Segmentation
  • Video Processing
  • Video Indexing
  • Video and Audio Retrieval
  • MPEG-7 and Related Topics
  • Video Summarization
  • Special Session
Image Feature Extraction and Segmentation
icon_mobile_dropdown
Structural segmentation for multimedia content-based information retrieval
Marco Carli, Alberto Degli Esposti, Alessandro Micarelli, et al.
In this contribution we propose a novel semantic-based architecture to manage multimedia data. We propose an innovatory approach, introducing an abstraction level to study the relationships among the low level attributes, as color, motion, in a systematic way, before the visual image content estimation. Aim of this analysis is to unify the descriptors information and to gather them into structures that we call over-regions, which represent particular configurations of the objects to be recognized. This step will allow for the higher abstraction level effective object-based or event-based image recognition. The case-based reasoning paradigm is used in our approach for the high level analysis.
Seeded image segmentation for content-based image retrieval application
Jianping Fan, Mathurin Body, Xingquan Zhu, et al.
Seeded image growing (SRG) algorithm is very attractive for semantic image segmentation but it also suffer from the problems of pixel sorting orders for labeling and automatic seed selection. We design an automatic SRG algorithm, along with a boundary-oriented parallel pixel labeling technique and an automatic seed selection method. In order to support more efficient image access over large-scale database, we suggest a multi-level image database management structure. This framework also supports a concept-oriented image classification via a probabilistic approach. Hierarchical image indexing and summarization are also discussed.
Evaluation of shape correspondence using ordinal measures
Faouzi Alaya Cheikh, Bogdan Cramariuc, Mari Partio, et al.
In this paper we present a novel approach to shape similarity estimation based on ordinal correlation. The proposed method operates in three steps: object alignment, contour to multilevel image transformation and similarity evaluation. This approach is suitable for use in CBIR, shape classification and performance evaluation of segmentation algorithms. The proposed technique produced encouraging results when applied on the MPEG-7 test data.
Indexing, Classification, and Relevance Feedback
icon_mobile_dropdown
Automatic classification of images on the Web
Alexander Hartmann, Rainer W. Lienhart
Numerous research works about the extraction of low-level features from images and videos have been published. However, only recently the focus has shifted to exploiting low-level features to classify images and videos automatically into semantically meaningful and broad categories. In this paper, novel classification algorithms are presented for three broad and general-purpose categories. In detail, we present algorithms for distinguishing photo-like images from graphical images, true photos from only photo-like, but artificial images and presentation slides from comics. On a large image database, our classification algorithm achieved an accuracy of 97.3% in separating photo-like images from graphical images. In the subset of photo-like images, true photos could be separated from ray-traced/rendered image with an accuracy of 87.3%, while with an accuracy of 93.2% the subset of graphical images was successfully partitioned into presentation slides and comics.
Angle-Tree: a new index structure for high-dimensional point data
Daoguo Dong, Xiangyang Xue, Hangzai Luo, et al.
Many multi-dimensional index structures, such as R-Tree, R*-Tree, X-Tree, SS-Tree, VA-File, etc. have been proposed to support similarity search with l1, l2 or l(infinity ) distance as similarity measure. But they can not support such similarity search with cosine as the similarity measure. In this paper, an index structure Angle-Tree is introduced to resolve the problem. It first projects all the high dimensional points onto the unit hyper-spherical surface, i.e. normalize each original vector in the database into a unit one. Then an index structure similar to R-Tree is built for those projected points. The experimental results show that the Angle-Tree can decrease the cost of disk I/O and support fast similarity search.
Logistic regression model for relevance feedback in content-based image retrieval
Geert Caenen, Eric J. Pauwels
We introduce logistic regression to model the dependence between image-features and the relevance that is implicitly defined by user-feedback. We assume that while browsing, the user can single out images as either examples or counter-examples of the sort of picture he is looking for. Based on this information, the system will construct logistic regression models that generalize this relevance probability to all images in the database. This information is then used to iteratively bias the next sample from the database. Furthermore, the diagnostics that are an integral part of the regression procedure can be harnessed for adaptive feature selection by removing features that have low predictive power.
Search and Retrieval of Image Databases I
icon_mobile_dropdown
Image retrieval using texture features BDIP and BVLC
Sang Yong Seo, Young Deok Chun, Dae Sung Kim, et al.
In this paper, we first propose new texture features, BDIP (block difference of inverse probabilities) and BVLC (block variation of local correlation coefficients), for content-based image retrieval (CBIR) and then present an image retrieval method based on the combination of BDIP and BVLC moments. BDIP uses the local probabilities in image blocks to measure the variation of brightness well. BVLC uses the variations of local correlation coefficients in images blocks to measure texture smoothness well. Experimental results show that the presented retrieval method yields about 12% better performance than the method using only BDIP or BVLC moments and about 10% better performance than the method using wavelet moments.
Novel image retrieval technique using salient edges
The rapidly increasing usage of multimedia environments has led to a greater demand for image retrieval. In this paper, we propose a method for image database retrieval based on salient edges. It achieves both the desired efficiency and accuracy using a three-stage: in the first stage, we extract edge points from the original image and link them to edge curves; in the second stage, we select salient edges according to their lengths, and com*pute rotational angle histogram (RAH) and corners' average frequency (CAF) for every salient edge; in the last stage, a feature vector is generated based on those RAHs and CAFs. We have tested this technique using an image database containing more than 4000 images and all results show that our scheme can perform retrieval efficiently. When an image database is on the order of tens of thousands of images, suitable indexing methods become critical for efficient query processing. This paper also present a new indexing method called tree structured triangle inequality (TSTI), which combines triangle inequality indexing method with tree structured indexing technique. The experiments provide evidence that our proposed method can improve the retrieval speed but not reduce its accuracy.
Compressed domain indexing of losslessly compressed images
Image retrieval and image compression have been pursued separately in the past. Only little research has been done on a synthesis of the two by allowing image retrieval to be performed directly in the compressed domain of images without the need to uncompress them first. In this paper methods for image retrieval in the compressed domain of losslessly compressed images are introduced. While most image compression techniques are lossy, i.e.\ discard visually less significant information, lossless techniques are still required in fields like medical imaging or in situations where images must not be changed due to legal reasons. The algorithms in this paper are based on predictive coding methods where a pixel is encoded based on the pixel values of its (already encoded) neighborhood. The first method is based on an understanding that predictively coded data is itself indexable and represents a textural description of the image. The second method operates directly on the entropy encoded data by comparing codebooks of images. Experiments show good image retrieval results for both approaches.
Extensible feature management engine for image retrieval
Simone Santini, Amarnath Gupta
In this paper we describe the general architecture and specific topics of a database system for multimedia (specifically, images) data. We analyze the conditions that must be met in order to achieve goals such as efficient schema design, query optimization, and extensibility. We assume that the engine uses the services of a traditional database (in the specific instance, a relational database), with the addition of user-defined indexing schemes. The paper presents general architectural issue and focuses on two problems: schema normalization using computational properties of the features, and the definition of feature algebras, in particular, the definition of a wavelets algebra for query expression.
Search and Retrieval of Image Databases II
icon_mobile_dropdown
RFLPRetriever: a content-based retrieval system for biological image databases
Chi-Ren Shyu, Seth A. Havermann, Kristopher Stice, et al.
The reliance of life science researchers on computer systems has increased dramatically with the progress of the various genome projects. Currently, the demand exists for a biological image database system for the storage and retrieval of RFLP (Restriction Fragment Length Polymorphism) images. To fulfill this need, we have developed a content-based image retrieval (CBIR) system - RFLPRetriever - for the biological research community. Implementing a CBIR system for storage and retrieval of RFLP images provides several advantages. For example, when a biologist clones a gene, he or she must discover whether the gene has previously been cloned. To do this, the researcher needs to look through thousands of images in search of an RFLP image that contains a similar set of biologically meaningful features as a related image created in the lab. One advantage of using a CBIR system is that by taking the extracted feature vectors, an efficient database indexing structure can be created, providing the system with a rapid retrieval rate. In addition, a CBIR system allows the user to search an image database by submitting a query image in which the user wants to find a matching RFLP image based on a combination of biologically relevant attributes contained in the query image.
Object-oriented query based on belief fusion: application to dermatological databases
Mohamed-Chaker Larabi, Noel Richard, Olivier Colot, et al.
This paper is dedicated to Computer-Aided Diagnosis CAD for skin cancers in order to help the expert (dermatologist) to diagnose a dermatological lesion as benign or malignant. The need of this kind of tools has largely expressed because of the difficulties that have the expert to distinguish benign lesion from melanoma. One way to help him without a classification is to find and display to the expert the most similar images (lesions) to the query (lesion of the patient). The similarity must be measured using features and their representation inspired from the medical diagnosis rules. In fact, the diagnosis rules known as ABCD mnemonics are very interesting because they describe a lesion using color, texture and shape. In order to approach the system from the reality, we build it as a Content-Based Image Retrieval CBIR scheme. Images are represented as an object model including the features and their representation and a set of belief degrees. The aim is to combine, on one hand, the experts analysis which include their knowledge, experience. but also their subjectivity, inexactness, uncertainty, etc. On the other hand, the ground truth based on biopsy results of all the database lesions. The combination gives to the system the autonomy and let it evolve without needing a relevance feedback.
New shape representation and similarity measure for efficient shape indexing
Konstantin Y. Kupeev, Zohar Sivan
Efficient search and retrieval of similar shapes in large databases stipulates two hardly compatible demands to the shape representations. On one hand, shape similarity conveys similarity of spatial relations of the shape parts. Thus, the representation should embed a kind of graph description of the shape, and allow estimation of the (inexact) correspondence between these descriptions. On the other hand, the representation should enable fast retrieval in large databases. Current shape indexing solutions do not comply well to these stipulations simultaneously. The G-graphs have been introduced as shape descriptors conveying structural and quantitative shape information. In the current work we define a representation of the G-graphs by strings consisting of the symbols from a four-letter alphabet such that two G-graphs are isomorphic as G-graphs if and only if their string representations are identical. This allows us to represent shapes by vectors consisting of strings and to introduce a shape representation satisfying both above demands. Experimental results are presented.
Feature element theory for image recognition and retrieval
Traditional systems of image retrieval work as black boxes. The concrete process and result data or coefficients are not the real care point. Thus it brings the problem: these systems cannot fulfill general semantic application. To overcome the problem we turn to psychology and neuroscience to study the cognition mechanism of human brain. Based on the analysis of experiments and evidences, a new hypothesis - element presence theory is proposed to explain the truth of the whole visual cognition. As its basic level that deals with low-level feature data from camera or retina, feature element theory is illustrated in details. Besides, the evaluation on feature elements is discussed and the illustration on feature element theory based image retrieval system is also given.
Hierarchical architecture for content-based image retrieval of paleontology images
In this article a research work in the field of content-based multiresolution indexing and retrieval of images is presented. Our method uses multiresolution decomposition of images using wavelets in the HSV colorspace to extract parameters at multiple scales allowing a progressive (coarse-to-fine) retrieval process. Features are automatically classified into several clusters with K-means algorithm. A model image is computed for each cluster in order to represent all the images of this cluster. The process is reiterated again and again and each cluster is sub-divided into sub-clusters. The model images are stored in a tree which is proposed to users for browsing the database. The nodes of the tree are the families and the leaves are the images of the database. A paleontology images database is used to test the proposed technique. This kind of approach permits to build a visual interface easy to use for users. Our main contribution is the building of the tree with multiresolution indexing and retrieval of images and the generation of model images to be proposed to users.
Video Segmentation
icon_mobile_dropdown
New video shot change detection algorithm based on accurate motion and illumination estimation
Shang-Hong Lai, Wei-Kuang Lee
Temporal segmentation of a video sequence into different shots is fundamental to a number of video retrieval and analysis applications. Motion estimation has been wildly used in many applications of video processing, since it provides the most essential information for an image sequence. In this paper, we explore the possibility to exploit motion and illumination estimation in a video sequence to detect various types of shot changes. Optical flow is the motion vector computed at each pixel in an image sequence from intensity variation. Traditionally, optical flow computational algorithms were derived from the brightness constancy assumption. In this paper, we employ a generalized optical flow constraint that includes an illumination parameter to model local illumination changes. An iterative optical flow and illumination estimation algorithm is developed in this paper to refine the generalized optical flow constraints step by step, thus leading to a very accurate estimation of the optical flow and illumination parameters. Two robust measures are defined from the mean and standard deviation of the estimated intensity compensation values for all the blocks in the same image. Either of these two measures corresponds significantly to various types of shot changes. We show the usefulness of these two measures through experiments.
Video segmentation for post-production
Ciaran Wills
Specialist post-production is an industry that has much to gain from the application of content-based video analysis techniques. However the types of material handled in specialist post-production, such as television commercials, pop music videos and special effects are quite different in nature from the typical broadcast material which many video analysis techniques are designed to work with; shots are short and highly dynamic, and the transitions are often novel or ambiguous. We address the problem of scene change detection and develop a new algorithm which tackles some of the common aspects of post-production material that cause difficulties for past algorithms, such as illumination changes and jump cuts. Operating in the compressed domain on Motion JPEG compressed video, our algorithm detects cuts and fades by analyzing each JPEG macroblock in the context of its temporal and spatial neighbors. Analyzing the DCT coefficients directly we can extract the mean color of a block and an approximate detail level. We can also perform an approximated cross-correlation between two blocks. The algorithm is part of a set of tools being developed to work with an automated asset management system designed specifically for use in post-production facilities.
Efficient algorithm for scene change detection and camera motion characterization using the approach of heterogeneous video transcoding on MPEG compressed videos
Jin-Hau Kuo, Ja-Ling Wu
Due to the increasing amount of information carried by video, video analysis that clips video as changed scenes or key-frames becomes essential for efficient video indexing. In this paper, we proposed a compressed domain scene change detection and camera motion characterization algorithm. We believe that the most vital inherent information hided in the MPEG bitstream, which can aid scene shot and sub-shot detection, are the motion vector and the macroblock type statistics. We evaluate the results of the scene change detection and camera motion characterization to get the accurate shot and sub-shot location.
Video Processing
icon_mobile_dropdown
Data reduction procedure for principal cast and other talking head detection
We describe a technique for reducing the data set for a technique for reducing the data set for principal cast and other taking head detection in broadcast news content using the spatial attributes of MPEG-7 Motion Activity descriptor. The fact that these descriptors are easy to extract from compressed domain and also work well when used for matching talking head sequences, motivated us to utilize them for rapidly pruning the data set for subsequent sophisticated face detection techniques. We are thus able to speed up the process of finding the principal cast from broadcast news content by reducing the number of segments on which computationally more expensive face detection and recognition is employed. We present the experimental results of two from the centroid of ground truth set and is computationally less expensive. The second clustering procedure is based on multiple templates, which are the mean feature vectors of the component Gaussians of a Gaussian Mixture Model (GMM) trained best to fit the training data. We are able to save 50% on computation measured in terms of number of rejected shots to total number of shots while missing 25% of talking head shots in the news program. We also observe that the second clustering procedure while being slightly computationally intensive allows for higher pruning factors with more accuracy.
Logo detection and classification in a sport video: video indexing for sponsorship revenue control
Bohumil Kovar, Alan Hanjalic
This paper presents a novel approach to detecting and classifying a trademark logo in frames of a sport video. In view of the fact that we attempt to detect and recognize a logo in a natural scene, the algorithm developed in this paper differs from traditional techniques for logo detection and classification that are applicable either to well-structured general text documents (e.g. invoices, memos, bank cheques) or to specialized trademark logo databases, where logos appear isolated on a clear background and where their detection and classification is not disturbed by the surrounding visual detail. Although the development of our algorithm is still in its starting phase, experimental results performed so far on a set of soccer TV broadcasts are very encouraging.
Comparison of sequence matching techniques for video copy detection
Video copy detection is a complementary approach to watermarking. As opposed to watermarking, which relies on inserting a distinct pattern into the video stream, video copy detection techniques match content-based signatures to detect copies of video. Existing typical content-based copy detection schemes have relied on image matching. This paper proposes two new sequence-matching techniques for copy detection and compares the performance with one of the existing techniques. Motion, intensity and color-based signatures are compared in the context of copy detection. Results are reported on detecting copies of movie clips.
Video Indexing
icon_mobile_dropdown
Event detection and summarization in American football broadcast video
Baoxin Li, M. Ibrahim Sezan
We propose a framework for event detection and summary generation in football broadcast video. First, we formulate summarization as a play detection problem, with play being defined as the most basic segment of time during which the ball is being played. Then we propose both deterministic and probabilistic approaches to the detection of the plays. The detected plays are concatenated to generate a compact, time-compressed summary of the original video. Such a summary is complete in the sense that it contains every meaningful action of the underlying game, and it also servers as a much better starting point for higher-level summarization and other analyses than the original video does. Based on the summary, we also propose an audio-based hierarchical summarization method. Experimental results show the proposed methods work very well on consumer grade platforms.
Indexing technique for similarity matching in large video databases
Sanghyun Park, June-Suh Cho, Ki-Ho Hyun
Similarity matching in video databases is of growing importance in many new applications such as video clustering and digital video libraries. In order to provide efficient access to relevant data in large databases, there have been many research efforts in video indexing with diverse spatial and temporal features. However, most of the previous works relied on sequential matching methods or memory-based inverted file techniques, thus making them unsuitable for a large volume of video databases. In order to resolve this problem, this paper proposes an effective and scalable indexing technique using a trie, originally proposed for string matching, as an index structure. For building an index, we convert each frame into a symbol sequence using a window order heuristic and build a disk-resident trie from a set of symbol sequences. For query processing, we perform a depth-first search on the trie and execute a temporal segmentation. To verify the superiority of our approach, we perform several experiments with real and synthetic data sets. The results reveal that our approach consistently outperforms the sequential scan method, and the performance gain is maintained even with a large volume of video databases.
Content-based analysis and indexing of sports video
Ming Luo, Xuesheng Bai, Guang-you Xu
An explosion of on-line image and video data in digital form is already well underway. With the exponential rise in interactive information exploration and dissemination through the World-Wide Web, the major inhibitors of rapid access to on-line video data are the management of capture and storage, and content-based intelligent search and indexing techniques. This paper proposes an approach for content-based analysis and event-based indexing of sports video. It includes a novel method to organize shots - classifying shots as close shots and far shots, an original idea of blur extent-based event detection, and an innovative local mutation-based algorithm for caption detection and retrieval. Results on extensive real TV programs demonstrate the applicability of our approach.
Efficient video sequence matching using the Cauchy function and the modified Hausdorff distance
Sang Hyun Kim, Rae-Hong Park
To manipulate large video databases, effective video indexing and retrieval are required. While most algorithms for video retrieval can be commonly used for frame-wise user query or video content query, video sequence matching has not been investigated much. In this paper, we propose an efficient algorithm to match the video sequences using the Cauchy function of histograms between successive frames and the modified Hausdorff distance. To effectively match the video sequences and to reduce the computational complexity, we use the key frames extracted by the cumulative measure, and compare the set of key frames using the modified Hausdorff distance. Experimental results show that the proposed video sequence matching algorithms using the Cauchy function and the modified Hausdorff distance yield the high accuracy and performances compared with conventional algorithms such as histogram difference and directed divergence methods.
Video and Audio Retrieval
icon_mobile_dropdown
Learning to annotate video databases
Model-based approach to video retrieval requires ground-truth data for training the models. This leads to the development of video annotation tools that allow users to annotate each shot in the video sequence as well as to identify and label scenes, events, and objects by applying the labels at the shot-level. The annotation tool considered here also allows the user to associate the object-labels with an individual region in a key-frame image. However, the abundance of video data and diversity of labels make annotation a difficult and overly expensive task. To combat this problem, we formulate the task of annotation in the framework of supervised training with partially labeled data by viewing it as an exercise in active learning. In this scenario, one first trains a classifier with a small set of labeled data, and subsequently updates the classifier by selecting the most informative, or most uncertain subset of the available data-set. Consequently, propagation of labels to yet unlabeled data is automatically achieved as well. The purpose of this paper is primarily twofold. The first is to describe a video annotation tool that has been developed for the purpose of annotating generic video sequences in the context of a recent video-TREC benchmarking exercise. The tool is semi-automatic in that it automatically propagates labels to similar shots, which requires the user to confirm or reject the propagated labels. The second purpose is to show how active learning strategy can be potentially implemented in this context to further improve the performance of the annotation tool. While many versions of active learning could be thought of, we specifically report results on experiments with support vector machine classifiers with polynomial kernels.
Distribution of shot lengths for video analysis
In this paper we investigate the distribution of shot lengths for video sequences containing diverse content. Accurate models for shot lengths are important to model video both for content-based retrieval applications and for performing queuing analysis for the design of video buffers in multimedia networks. Using a large dataset collected from CSPAN programs we have analyze the Pareto, Weibull, and gamma distributions as possible models for the shot length distribution. We have compare the goodness of fit of these possible distribution models using the Kolmogorov-Smirnov statistic.
Automatic categorization design for broadcast news
Huitao Luo, Qian Huang
This paper discusses our work on automatic categorization of broadcast news based on close caption texts. The multimedia news data under study are first segmented into story units based on video and audio signals with our previous developed algorithms. Based on the time stamp information, close caption texts are segmented into text units corresponding to each story unit. A Bayes network is then trained to automatically classify the story units into fourteen categories. The major contribution of this paper is the idea of category, which represents a higher level of semantic generalization as compared with traditional topics. We discusses in detail the administrated bottom-up clustering algorithm to generate semantically meaningful category framework as well as the training procedures to build the brief network that covers the large broadcast news data set. Using LDC (Linguistic Data Consortium)'s CSR LM 1996 data set, we designed a number of experiments to discuss the relationship between categorization design and the classification performance.
Motion feature extraction scheme for content-based video retrieval
Chuan Wu, Yuwen He, Li Zhao, et al.
This paper proposes the extraction scheme of global motion and object trajectory in a video shot for content-based video retrieval. Motion is the key feature representing temporal information of videos. And it is more objective and consistent compared to other features such as color, texture, etc. Efficient motion feature extraction is an important step for content-based video retrieval. Some approaches have been taken to extract camera motion and motion activity in video sequences. When dealing with the problem of object tracking, algorithms are always proposed on the basis of known object region in the frames. In this paper, a whole picture of the motion information in the video shot has been achieved through analyzing motion of background and foreground respectively and automatically. 6-parameter affine model is utilized as the motion model of background motion, and a fast and robust global motion estimation algorithm is developed to estimate the parameters of the motion model. The object region is obtained by means of global motion compensation between two consecutive frames. Then the center of object region is calculated and tracked to get the object motion trajectory in the video sequence. Global motion and object trajectory are described with MPEG-7 parametric motion and motion trajectory descriptors and valid similar measures are defined for the two descriptors. Experimental results indicate that our proposed scheme is reliable and efficient.
Comparison of dictionary-based approaches to automatic repeating melody extraction
Hsuan-Huei Shih, Shrikanth S Narayanan, C.-C. Jay Kuo
Automatic melody extraction techniques can be used to index and retrieve songs in music databases. Here, we consider a piece of music consisting of numerical music scores (e.g. the MIDI file format) as the input. Segmentation is done based on the tempo information, and a music score is decomposed into bars. Each bar is indexed, and a bar index table is built accordingly. Two approaches were proposed to find repeating patterns by the authors recently. In the first approach, an adaptive dictionary-based algorithm known as the Lempel Ziv 78 (LZ-78) was modified and applied to melody extraction, which is called the modified LZ78 algorithm or MLZ78. In the second approach, a sliding window is applied to generate the pattern dictionary. It is called the Exhaustive Search with Progressive LEngth algorithm or ESPLE. Dictionaries generated from both approaches need to be pruned to remove non-repeating patterns. Each iteration of either MLZ78 or ESPLE is followed by pruning of updated dictionaries generated from the previous cycle until the dictionaries converge. Experiments are performed on MIDI files to evaluate the performance of the proposed algorithms. In this research, we compare results obtained from these two systems in terms of complexity, performance accuracy and efficiency. Their relative merits and shortcomings are discussed in detail.
MPEG-7 and Related Topics
icon_mobile_dropdown
Rapid generation of sports video highlights using the MPEG-7 motion activity descriptor
Kadir A. Peker, Romain Cabasson, Ajay Divakaran
We present a technique for rapidly generating highlights of sports videos using temporal patterns of motion activity extracted in the compressed domain. The basic hypothesis of this work is that temporal patterns of motion activity are related with the grammar of the sports video. We present experimental verification of this hypothesis. By using very simple rules depending on the type of sport, we are thus able to provide highlights by skipping over the uninteresting parts of the video and identifying interesting events characterized, for instance, by falling edge or raising edge in the activity domain. Moreover the compressed domain extraction of motion activity intensity is much simpler than the color based summarization calculations. Other compressed domain features or more complex rules can be used to further improve the accuracy.
Three-dimensional browsing environment for MPEG-7 image databases
Thomas Meiers, Thomas Sikora, Ivo Keller
n this paper we address the user-navigation through large volumes of image data. Similarity-measures based on different MPEG-7 descriptors are introduced and multidimensional scaling is employed to display images in three dimensions according to their mutual similarities. With such a view the user can easily see similarity relations between images and understand the structure of the database. In order to cope with large volumes of images a k-means clustering technique is introduced which identifies representative image samples for each cluster. Representative images (up to 100) are then displayed in three dimensions using multidimensional scaling structuring. The clustering technique proposed produces a hierarchical structure of clusters - similar to street maps with various resolutions of details. The user can zoom into various cluster levels to obtain more or less details if required. Further a new query refinement method is introduced. The retrieval process is controlled by learning from positive examples from the user, often called the relevance feedback of the user. The combination of the three techniques 3D-visualization, relevance feedback and the hierarchical structure of the image database leads to an intuitive browsing environment. The results obtained verify the attractiveness of the approach for navigation and retrieval applications.
TVFind (TM): an MPEG-7-based video management system over Internet
Jian Huang, Li Zhao, Shiqiang Yang
A projected growth of digital video databases and hence the importance of management strategies creates a challenging task for researchers. Index and search technology must enable access to humanly meaningful segments within video data streams if this increase of content is to be at all useful. The fundamental task in video analysis and retrieval then is to facilitate the human computer interface by bridging the worlds of low-level physical features in media and high-level human description. In this paper, we have discussed the key issues about design and implement a video management system based on MPEG-7. From this, we introduce Tsinghua Video Find-It System (TVFind), a prototype platform for retrieving and browsing within a video database system over web.
Application of MPEG-7 descriptors for temporal video segmentation
Due to the rapidly growing multimedia content available on the internet it is highly desirable to index multimedia data automatically and to provide content based search and retrieval functionalities. The first step in order to describe and annotate video data is to split the sequences into sub-shots which are related to semantic units. This paper addresses unsupervised scene change detection and keyframe selection of video sequences. Unlike other methods this is performed by using a standardized multimedia content description of the video data. We apply the MPEG-7 scalable color descriptor and the edge histogram descriptor for shot boundary detection and show that this method performs well. Furthermore, we propose to store the output data of our system in a video segment description scheme to provide simple but efficient search and retrieval functionalities for video scenes based on color features.
Video summarization and personalization for pervasive mobile devices
We have designed and implemented a video semantic summarization system, which includes an MPEG-7 compliant annotation interface, a semantic summarization middleware, a real-time MPEG-1/2 video transcoder on PCs, and an application interface on color/black-and-white Palm-OS PDAs. We designed a video annotation tool, VideoAnn, to annotate semantic labels associated with video shots. Videos are first segmentated into shots based on their visual-audio characteristics. They are played back using an interactive interface, which facilitate and fasten the annotation process. Users can annotate the video content with the units of temporal shots or spatial regions. The annotated results are stored in the MPEG-7 XML format. We also designed and implemented a video transmission system, Universal Tuner, for wireless video streaming. This system transcodes MPEG-1/2 videos or live TV broadcasting videos to the BW or indexed color Palm OS devices. In our system, the complexity of multimedia compression and decompression algorithms is adaptively partitioned between the encoder and decoder. In the client end, users can access the summarized video based on their preferences, time, keywords, as well as the transmission bandwidth and the remaining battery power on the pervasive devices.
Video Summarization
icon_mobile_dropdown
Automated video summarization using speech transcripts
Cuneyt M. Taskiran, Arnon Amir, Dulce B. Ponceleon, et al.
Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries for long videos. Our video summarization approach involves mainly two tasks: first, segmenting the video into small, coherent segments and second, ranking the resulting segments. Our proposed algorithm scores segments based on word frequency analysis of speech transcripts. Then a summary is generated by selecting the segments with the highest score to duration ratios and these are concatenating them. We have designed and performed a user study to evaluate the quality of summaries generated. Comparisons are made using our proposed algorithm and a random segment selection scheme based on statistical analysis of the user study results. Finally we discuss various issues that arise in evaluation of automatically generated video summaries.
Extracting movie scenes based on multimodal information
This research addresses the problem of automatically extracting semantic video scenes from daily movies using multimodal information. A 3-stage scene detection scheme is proposed. In the first stage, we use pure visual information to extract a coarse-level scene structure based on generated shot sinks. In the second stage, the audio cue is integrated to further refine scene detection results by considering various kinds of audio scenarios. Finally, in the third stage, we allow users to directly interact with the system so as to fine-tune the detection results to their own satisfaction. The generated scene structure can provide a compact yet meaningful abstraction of the video data, which will apparently facilitate the content access. Preliminary experiments on integrating multiple media cues for movie scene extraction have yielded encouraging results.
Hierarchical video summarization for medical data
Xingquan Zhu, Jianping Fan, Ahmed K. Elmagarmid, et al.
To provide users with an overview of medical video content at various levels of abstraction which can be used for more efficient database browsing and access, a hierarchical video summarization strategy has been developed and is presented in this paper. To generate an overview, the key frames of a video are preprocessed to extract special frames (black frames, slides, clip art, sketch drawings) and special regions (faces, skin or blood-red areas). A shot grouping method is then applied to merge the spatially or temporally related shots into groups. The visual features and knowledge from the video shots are integrated to assign the groups into predefined semantic categories. Based on the video groups and their semantic categories, video summaries for different levels are constructed by group merging, hierarchical group clustering and semantic category selection. Based on this strategy, a user can select the layer of the summary to access. The higher the layer, the more concise the video summary; the lower the layer, the greater the detail contained in the summary.
Summarizing motion contents of the video clip using moving edge overlaid frame (MEOF)
How to quickly and effectively exchange video information with the user is a major task for video searching engine's user interface. In this paper, we proposed to use Moving Edge Overlaid Frame (MEOF) image to summarize both the local object motion and global camera motion information of the video clip into a single image. MEOF will supplement the motion information that is generally dropped by the key frame representation, and it will enable faster perception for the user than viewing the actual video. The key technology of our MEOF generating algorithm involves the global motion estimation (GME). In order to extract the precise global motion model from general video, our GME module takes two stages, the match based initial GME and the gradient based GME refinement. The GME module also maintains a sprite image that will be aligned with the new input frame in the background after the global motion compensation transform. The difference between the aligned sprite and the new frame will be used to extract the masks that will help to pick out the moving objects' edges. The sprite is updated with each input frame and the moving edges are extracted at a constant interval. After all the frames are processed, the extracted moving edges are overlaid to the sprite according to there global motion displacement with the sprite and the temporal distance with the last frame, thus create our MEOF image. Experiments show that the MEOF representation of the video clip helps the user acquire the motion knowledge much faster and also be compact enough to serve the needs of online applications.
Special Session
icon_mobile_dropdown
Benchmarks for storage and retrieval in multimedia databases
David A. Forsyth
There is a substantial body of research on computer methods for managing collections of images and videos. There is little evidence that this research has had important impact on an any community yet. I use an invitation to speak on a topic on which I am not expert to air some opinions about evaluating image retrieval research. In my opinion, there is little to be gained in measuring current solutions with reference collections, because these solutions differ so widely from user needs that the exercise becomes empty. The user studies literature is not well enough read by the image retrieval community. As a result, we tend to study somewhat artificial problems. A study of the user needs literature suggests that we will need to solve deep problems to produce useful solutions to image retrieval problems, but that there may be a need for a number of technologies that can be built in practice. I believe we should concentrate on these issues, rather than on measuring the performance of current systems.
Storage, data management, and retrieval in bioinformatics
The evolution of biology into a large-scale quantitative molecular science has been paralleled by concomitant advances in computer storage systems, processing power, and data-analysis algorithms. The application of computer technologies to molecular biology data has given rise to a new system-based approach to biological research. Bioinformatics addresses problems related to the storage, retrieval and analysis of information about biological structure, sequence and function. Its goals include the development of integrated storage systems and analysis tools to interpret molecular biology data in a biologically meaningful manner in normal and disease processes and in efforts for drug discovery. This paper reviews recent developments in data management, storage, and retrieval that are central to the effective use of structural and functional genomics in fulfilling these goals.