Multimedia Storage and Archiving Systems II

Image and video indexing in the compressed domain

Mrinal K. Mandal, Fayez M. Idris, Sethuraman Panchanathan

Show abstract

Image and video indexing techniques are crucial in multimedia applications. A number of the indexing techniques which operate in the pixel domain have been reported in the literature. The advent of compression standards has led to the proliferation of indexing techniques in the compressed domain. In this paper, we present a critical review of the compressed domain indexing techniques proposed in the literature. These include transform domain techniques using Fourier transform, cosine transform, Karhunen-Loeve transform, subbands and wavelets; and spatial domain techniques using vector quantization. In addition, temporal indexing techniques using motion vectors are also discussed.

Improvement of shot detection methods based on dynamic threshold selection

Mohsen Ardebilian Fard, Xiaowei Tu, Liming Chen

Show abstract

Currently, most shot detection methods proposed in the literature are based on well-chosen static thresholds on which the quality of result largely depends. In this paper, we present a method for dynamic threshold selection based on clustering a set of N points on a comparison curve, which we sue for characteristic feature comparison through images in a video sequence to detect shots In this method we recursively chose N successive values from the curve. Then by using the clustering method on them, we partition this set into two parts, larger values in E1, and smaller values in E2. We try to model the form of the curve as a bimodal one, and try to find a threshold around a valley area. Using above clustering analysis, we first apply color histogram (CH) and double Hough transformation (DHT) that we reported in our previous work on 90 minutes of video sequence. The experimental results show that dynamic threshold based methods improve the static threshold based ones, reducing false and missed detection, and that dynamic threshold based DHT is more robust than dynamic threshold based CH.

Multiscale content extraction and representation for video indexing

Ahmet Mufit Ferman, A. Murat Tekalp

Show abstract

This paper presents a general multiscale framework for extraction and representation of video content. The approach exploits the inherent multiscale nature of many TV and film productions to delineate an input stream effectively and to construct consistent scenes reliably. The method first utilizes basic signal processing techniques, and unsupervised clustering to determine shot boundaries in the video sequence. Similarity comparison using shot representative histograms and clustering to determine shot boundaries in the video sequence. Similarity comparison using shot representative histograms and clustering is then carried out within each shot to automatically select representative key frames. Finally, a model that takes into account the filmic structure of the input stream is discussed and developed to efficiently merge individual shots into coherent, meaningful segments, i.e. scenes.

Video data hiding for video-in-video and other applications

Mitchell D. Swanson, Bin Zhu, Ahmed H. Tewfik

Show abstract

We introduce a scheme for hiding supplementary data into digital video by directly modifying the pixels in the video frames. The techniques requires no separate channel or bit interleaving to transmit the extra information. The data is invisibly embedded using a perception-based projection and quantization algorithm. The data hiding algorithm supports user-defined levels of accessibility and security. We provide several examples of video data hiding including real-time video-in-video and audio-in-video. We also demonstrate the robustness of the data hiding procedure to video degradation and distortions, e.g., those that result from additive noise and compression.

Hierarchical temporal video segmentation and content characterization

Bilge Gunsel, Yue Fu, A. Murat Tekalp

Show abstract

This paper addresses the segmentation of a video sequence into shots, specification of edit effects and subsequent characterization of shots in terms of color and motion content. The proposed scheme uses DC images extracted from MPEG compressed video and performs an unsupervised clustering for the extraction of camera shots. The specification of edit effects, such as fade-in/out and dissolve is based on the analysis of distribution of mean value for the luminance components. This step is followed by the representation of visual content of temporal segments in terms of key frames selected by similarity analysis of mean color histograms. For characterization of the similar temporal segments, motion and color characteristics are classified into different categories using a set of different features derived from motion vectors of triangular meshes and mean histograms of video shots.

Video indexing based on image and sound

Pascal Faudemay, Claude Montacie, Marie-Jose Caraty

Show abstract

Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel. We first present a multi-channel and multi-modal query interface, to query sound, image and script through 'pull' and 'push' queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech. We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation as experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a carton movie, 'Ivanhoe'. For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost architectures or networks of workstations.

Shot-level description and matching of video content

Remi Ronfard

Show abstract

This paper discusses representational issues for the description and matching of video content at the level of the individual shot, using a thesaurus of keywords representing people, places and actions. We argue that a correct representation of shots must include temporal segments with roles such as location, character, camera and action, which can be filled by keywords.

Video segmentation and camera motion characterization using compressed data

Ruggero Milanese, Frederic Deguillaume, Alain Jacot-Descombes

Show abstract

We address the problem of automatically extracting visual indexes from videos, in order to provide sophisticated access methods to the contents of a video server. We focus on tow tasks, namely the decomposition of a video clip into uniform segments, and the characterization of each shot by camera motion parameters. For the first task we use a Bayesian classification approach to detecting scene cuts by analyzing motion vectors. For the second task a least- squares fitting procedure determines the pan/tilt/zoom camera parameters. In order to guarantee the highest processing speed, all techniques process and analyze directly MPEG-1 motion vectors, without need for video decompression. Experimental results are reported for a database of news video clips.

Fast data placement scheme for video server with zoned-disks

Yu-Chung Wang, Shiao-Li Tsao, Ray-I Chang, et al.

Show abstract

Recently, zoning technique has been applied to disk technology to increase disk capacities. As a side effect, data transfer rates from outer zones of a hard disk are much higher than those from inner zones. Unfortunately, either VBR nature of video streams or the effects of disk zoning are neglected by previous studies on data placement of VBR video streams on a zoned- disk. Our objective is to minimize server buffer size and to maximize disk utilization subject to the capacity constraints of disk zones. To solve the problem, we adopt the concept of constant read time in which a constant period of time is allocated to retrieve a variable-sized disk block. Blocks retrieved from the same disk zone have the same size. This problem is then formulated as a constrained combinatorial optimization problem. In a previous paper, we present an optimum algorithm to solve the data placement problem based on dynamic programming. In this paper, we present suboptimum heuristics to reduce time and space complexities. The algorithms are implemented in C language and run on Linux operating system and Pentium Pro 200. Preliminary experimental results show that our solutions are very effective. For example, our approach guarantees 100 percent of disk storage efficiency and bandwidth utilization and its buffer size requirement is no more than 3 disk blocks for practical examples. We also run our program on MPEG/1 encoded movie 'Star War', the optimized buffer size is slightly more than 2 disk blocks, e.g., 500KBytes for 140-220KBytes variable-sized disk blocks, with 70 utilization. Preliminary performance studies also shows that the proposed CRT scheme is very promising in maximizing system throughput.

Dynamic load adjustment on distributed video server system

Shiao-Li Tsao, Meng-Chang Chen, Jan-Ming Ho, et al.

Show abstract

In this paper, a novel initial videos allocation scheme and a load shifting algorithm are proposed to reduce the request fail rate for a distributed video server. The initial allocation scheme determines the maximum numbers of requests that can be served and the proposed load shifting algorithm migrates progressing requests among servers to accommodate more users and reduce the request fail rate under skewed request pattern. According to the simulation results, the proposed algorithms can reduce 50 percent request fail rate from that uses SCAN allocation algorithm, and 25 percent request fail rate from that uses the least load first initial allocation scheme with load shifting procedure. In terms of shifting steps, the proposed algorithms achieves 30 percent to 50 percent less than the DASD dancing algorithm.

Intelligent user interface for intelligent multimedia repository

Phill-Kyu Rhee, Yong-Hwan Kim, B. S. Sim, et al.

Show abstract

Recently, much effort has been made for efficiency of user interface since the assumption of expertise or well-trained users is nor more valid these days. Today's users of computer systems are expanded to ordinary people. Furthermore, too much network accessible information resources in the form of various media increases rapidly everyday. The primary goal of the intelligent multimedia repository (IMR) is to assist users in accessing multimedia information efficiently. Primary users of the IMR are assumed to be novice users even though the system can be used for users at different levels of expertise. Users are not well-trained people in using computer system. Thus, the semantic gap between users and the system must be mainly reduced form the system site. The technology of intelligent user interface is adopted to minimize the semantic gap. For the intelligent user interface of been designed and developed. Machine learning technologies have been employed to provide user adaptation/intelligent capability to the system. The IUI of the IMR consist user interface manager (UIM), and user model (UM). The UIM performs the function of managing intelligent user interface. The UM stores the behavioral knowledge of the user. The UM stores the history of query and response interactions to absorb communication errors due to semantic gaps between the user and the IMR. The UM is implemented by decision tree based case- based reasoning and back propagation neural networks. Experimental result show the IUI can improve the performance of the IMR.

Object-relational database infrastructure for interactive multimedia service

Michael Junke Hu, Miao Chunyan

Show abstract

Modern interactive multimedia services, such as the video-on-demand, electronic library, and etc. tend to involve large-scale media archives of audio records, video clips, image banks, and text documents. Thus, these services impose many challenges on designing and implementing new generation database systems. In this paper, we first introduce a new multimedia data model, which could accommodate sophisticated media types, as well as complex relationships among different media entities. Thereafter, an object-relationship media types, as well as complex relationships among different media entities. Thereafter, an object-relational database infrastructure is proposed, to support applications of the data model developed in our project. The infrastructure is designated both as a framework for designing and implementing multimedia databases, and as a reference model to compare and evaluate different database systems. Features of the proposed infrastructure, as well as its implementation into a prototype multimedia database system, are also discussed in the paper.

Content-based retrieval of music and audio

Jonathan T. Foote

Show abstract

Though many systems exist for content-based retrieval of images, little work has been done on the audio portion of the multimedia stream. This paper presents a system to retrieve audio documents y acoustic similarity. The similarity measure is based on statistics derived from a supervised vector quantizer, rather than matching simple pitch or spectral characteristics. The system is thus able to learn distinguishing audio features while ignoring unimportant variation. Both theoretical and experimental results are presented, including quantitative measures of retrieval performance. Retrieval was tested on a corpus of simple sounds as well as a corpus of musical excerpts. The system is purely data-driven and does not depend on particular audio characteristics. Given a suitable parameterization, this method may thus be applicable to image retrieval as well.

Automated document content characterization for a multimedia document retrieval system

Maija Koivusaari, Jaakko J. Sauvola, Matti Pietikaeinen

Show abstract

We propose a new approach to automate document image layout extraction for an object- oriented database feature population using rapid low level feature analysis, preclassification and predictive coding. The layout information comprised of region location and classification data is transformed into 'feature object(s)'. The information is then fed into an intelligent document image retrieval system (IDIR) to be utilized in document retrieval schemes. The IDIR system consists of user interface, object-oriented database and a variety of document image analysis algorithms. In this paper the object-oriented storage model and the database system are presented in formal and functional domains. Moreover, the graphical user interface and a visual document image browser are described. The document analysis techniques used at document characterization are also presented. In this context the documents consist of text, picture and other media data. Documents are stored in the database as document, page and region objects. Our test systems has been implemented and tested using a document database of 10,000 documents.

Peano key rediscovery for content-based retrieval of images

Youssef Chahir, Liming Chen

Show abstract

Currently the most content-based retrieval methods of images are based on global features like histograms. Few methods have considered the spatial information for the indexing and query purpose. In this paper we present an efficient multi-dimensional spatial indexing method based on the Peano key ordering of spatial locality of regions. The Peano order gives a direct mapping between an integer and its corresponding element in the multi-dimensional space. The position in the ordering of each region in an image can be simply determined by interleaving the bits of the x and y coordinates of the region. In our method, global features of the query image like histograms of colors are first used to eliminate images in the database, which are not similar. Then the query is decomposed into a quadtree in order to extract characteristics, for instance predominant colors, associated with each square. These spatial information are identified by a list of Peano keys. This list constitutes a spatial signature of the query image. This spatial signature is researched into candidate images. For a given candidate image, each Peano key of the signature precisely indicates the spatial region whose characteristics are compared to the ones associated with the Peano key. The main advantages of our method are twofold: first its generality since it allows to associate spatial information to every kind characteristics of images, second its efficiency because there is no need to pre- extract characteristics from images in the database.

Pruned octree feature for interactive retrieval

Xia Wan, C.-C. Jay Kuo

Show abstract

A new color feature obtained from the pruned octree color representation of an image for interactive image retrieval is proposed in this work. The pruned octree is constructed based on a natural color clustering of a given image rather than the fixed bucket color quantization of the entire database. In this new framework, we integrate commonly used features such as color width, color depth, average colore and multi-resolution colore distributions. As a result, it supports flexible filtering, in which the similarity matching procedure can be gradually refined. Examples are given to illustrate the query image analysis and sequential filtering strategy based on the octree color feature.

Content-based image indexing and retrieval system in ImageRoadMap

Forouzan Golshani, Youngchoon Park

Show abstract

ImageRoadMap is a new content-based retrieval system for retrieval of images by visual information. The system provides full capabilities for indexing and retrieval of images, their visual features and many other diverse data types. We introduce combination of effective indexing methods based on a novel spatial color distribution mode. By utilizing Self- Organizing Feature Map and other indexing methods, spatial color distribution, dominant color set, number of objects and other visual features may be computed. It also provides capabilities for similarity measurement and similarity based indexing. ImageRoadMap includes a powerful multi-faceted querying mechanism, which allows queries to be formulated and presented in several different ways. Depending on the characteristics and the nature of the query, the user may choose, Query by Example, Query by Spatial Color Distribution, Query by Color Contents, Query by Sketch, Query by Concept, or a combination of any of the above. The current interface support iterative multi-modal query formulation in which the user presents whatever relevant information that is available through appropriate windows.

Wavelet-compressed image retrieval using successive approximation quantization (SAQ) features

Kai-Chieh Liang, C.-C. Jay Kuo

Show abstract

An integrated wavelet-based image storage and indexing system is proposed in this research. Indexing features of consideration contains the color, brightness, texture, and spatial information of images. These features can be naturally extracted during the successive approximation quantization stage of the wavelet compression process. Extensive experimental results are performed to demonstrate the retrieval performance of the new approach.

Feature-based approach for image retrieval by sketch

Yin Chans, Zhibin Lei, Daniel P. Lopresti, et al.

Show abstract

In this paper, we introduce a feature-based approach for image retrieval by sketch. Using edge segments as features which are modeled by implicit polynomials (IPs), we hope to provide a similarity computation method that is robust towards user query sketch distortions. We report some preliminary results of the first phase of our work in this paper. From these results, we could see that the feature-based method, which currently uses edge features modeled by first degree IPs, generally performs better than a pixel-based method that has been adopted by a number of well-known content-based image indexing and retrieval systems such as the IBM QBIC project. The feature-based method appears more robust and tolerant towards distortions in query sketches. We attribute this quality to the fact that it uses more structural information to compute the similarities between images. We will also describe a prototype built upon the Java technology that allows queries over the WWW. Finally we will discuss the promise and limitations of the feature-based method and conclude the paper with a look at areas for future work.

Object signature curve and invariant shape patches for geometric indexing into pictorial databases

Zhibin Lei, Tolga Tasdizen, David B. Cooper

Show abstract

Implicit polynomials (IPs) are among the most effective representations for modeling and recognition of complex geometric shape structures because of their stability, robustness and invariant characteristics. In this paper, we describe an approach for geometric indexing into pictorial databases using IP representations. We discuss in detail a breakthrough in invariant decomposition of a complex object shape into manageable pieces or patches. The self and mutual invariants of those invariant patches can be then used as geometric indexing features vectors. The new concept of invariant signature curve for complex shapes ins developed that captures the semi-global algebraic structure of the object and has the advantage of being able to deal with multi-scale and object occlusion.

MALCBR: content-based retrieval of image databases at multiple abstraction levels

Vittorio Castelli, Chung-Sheng Li, Lawrence D. Bergman

Show abstract

Content-based search of large image database has received significant attention recently. In this paper, we proposed a new framework, multiple abstraction level content based retrieval, for specifying and process content-based retrieval queries on databases of images, time series, or video data. This framework allows search targets to be expressed in a object-based fashion, that allows the extensible specification of arbitrarily complex queries. In our approach, the search targets are either simple objects, specified at multiple levels of abstraction, or composite objects, defined as collections of relation on the elements of a set of simple objects. During the search, simple objects at the semantic level are retrieved from database tables, feature level objects are computed using pre-extracted features, appropriately indexed, and pixel level objects are extracted from the raw data. Composite objects are computed at query execution time. This framework, provides a powerful mechanism for specifying complicated search target and enable efficient processing of filtering of the search results.

Inverted image indexing and compression

Simon Wing-Wah So, Clement H. C. Leung

Show abstract

Inverted file indexing and its compression have proved to be highly successful for free-text retrieval. Although the 'inverted' nature of the data structure provides an efficient mechanism for searching key words or terms in large documents, for image retrieval, the application of inverted files to the title, caption, or description of the images are not sufficient. One must be able to index and retrieve images based on the visual contents. Many content-based image retrieval techniques are used for the images as a whole picture. Analogous to free-text retrieval, a novel technique, called inverted image indexing and compression, is proposed in this paper. Similar to works in a document, each image can have multiple areas which are perceived to be meaningful visual contents. These areas are selected by users and then undergo two processes: automatic signature generation based on wavelet signatures, and users specification of high-level contents using ternary fact model. The contents in compressed form are inserted into an inverted image file. The concept of composite bitplane signature is also introduced.

ASIMM: a framework for automatic synthesis of query interfaces for multimedia databases

Lawrence D. Bergman, Jerald Schoudt, Vittorio Castelli, et al.

Show abstract

With the advent of access to digital libraries via the Internet and the addition of non-traditional data, such as imagery, the need for flexible, natural language query environments has become more urgent. This paper describes a new query interface based on the combination of natural language and visual programming techniques. The interface, entitled Drag and Drop English, or DanDE, has two components. The first component is an easy-to-use flexible interface that has the feel of a natural language interface, but has more structure and gives a user more guidance in constructing a query without sacrificing flexibility. The second component is a definition facility that allows the interface designer to specify the structure of a query language. The definition facility allows the designer to specify the syntactic structure of the language in a variation of Backus-Naur Form. The definition facility also provides the ability to specify some of the semantics of the query domain. Lastly, the definition facility allows the designer to specify the interactions between the interface and the query system.

Fixed attribute-length linear quadtree representations for storing similar images

Tsong-Wuu Lin

Show abstract

The pictures to be processed are consecutive images in many applications such as computer graphics, image processing, and geographic information systems. These consecutive images form a gradually changing data set. A large amount of memory space is required for storing these images if the similarity relation among them is ignored. Based on the linear quadtree structures and overlapping concepts, a fixed attribute-length quadtree representation is proposed for storing a sequence of similar binary images. Unfortunately, it is inefficient when the overlapping percentages of images are high. We modify its representation to eliminate the drawback. The modified structures work well and their space reductions are greater than the overlapping percentage of these consecutive images. Experiments will be made to compare our representation with other overlapping structures. From the experiments, our representation is shown to be better than other representations.

Shock-based approach for indexing of image databases using shape

Benjamin B. Kimia, Jackie Chan, Dale Bertrand, et al.

Show abstract

Projected shape of objects is a significant cue in fast and robust selection of images in response to visual content queries. Yet, existing methods for indexing into digital image libraries do not take full advantage of shape, primarily because this ultimately involves fundamentally difficult vision problems such as segmentation and recognition. These problems are difficult because realistic images depict objects in occlusion relationship, boundaries of objects are not clearly defined and often become ambiguous among spurious boundaries, boundaries change relative orientation in articulated objects, etc. We present a symmetry- based and contour-based representation of shape, (ii) a propagation-based method to extract partial symmetries from real images, (iii) transforms to augment this representation with that of its 'neighboring' shapes, (iv) the generation of user-sketched queries using intuitive shape deformations, and (v) multistage inexact graph-matching to extract and rank images containing objects similar to the user-defined query.

Outstanding-objects-oriented color image segmentation using fuzzy logic

Rina Hayasaka, Jiying Zhao, Yutaka Matsushita

Show abstract

This paper presents a novel fuzzy-logic-based color image segmentation scheme focusing on outstanding objects to human eyes. The scheme first segments the image into rough fuzzy regions, chooses visually significant regions, and conducts fine segmentation on the chosen regions. It can not only reduce the computational load, but also make contour detection easy because the brief object externals has been previously determined. The scheme reflects human sense, and it can be sued efficiently in automatic extraction of image retrieval key, robot vision and region-adaptive image compression.

Novel wavelet coder for color image compression

Houng-Jyh Mike Wang, C.-C. Jay Kuo

Show abstract

A new still image compression algorithm based on the multi-threshold wavelet coding (MTWC) technique is proposed in this work. It is an embedded wavelet coder in the sense that its compression ratio can be controlled depending on the bandwidth requirement of image transmission. At low bite rates, MTWC can avoid the blocking artifact from JPEG to result in a better reconstructed image quality. An subband decision scheme is developed based on the rate-distortion theory to enhance the image fidelity. Moreover, a new quantization sequence order is introduced based on our analysis of error energy reduction in significant and refinement maps. Experimental results are given to demonstrate the superior performance of the proposed new algorithm in its high reconstructed quality for color and gray level image compression and low computational complexity. Generally speaking, it gives a better rate- distortion tradeoff and performs faster than most existing state-of-the-art wavelet coders.

Fast software-only H.263 video codec

Wei-Lien Hsu

Show abstract

H.263 is one of the most efficient video compression standards available today. It is expected that H.263 will replace H.261 in many applications. This paper analyzes the computational complexity of H.263 and presents a set of methods for maximizing the performance of this codec based on 64-bit Digital's Alpha processor. The optimization problem is approached from two directions: algorithmic enhancement and efficient software implementation. Performance comparisons are evaluated for the default mode and full-option mode. Finally, this paper provides a brief description of multimedia instructions supported by a new generation of Alpha CPU, which is designed to provide the most cost-effective solution for software-only video compression and other multimedia applications.

Predictive modeling: least squares method for compression of time-series data

Saraswathi Mukherjee, Justin Zobel

Show abstract

Time-series data form a major class of numerical data that is stored in statistical databases. In an earlier paper, we instantiated a framework in an effort to automate the process of compression, by designing comparative predictive models for data sources which are time- dependent. In this paper, we include one more model for compression of time-series data, into this framework. This model uses the method of least squares and the parameters in this model are optimized by an off-line process using this method; it allows the data to be efficiently encoded using a combination of Golomb and gamma coding techniques. We achieve enhanced compression performance as compared tour previous models and performs better than existing compression techniques as well. We apply the model to real work data sources such as astrophysical, geographical and business data sources.

Effective wavelet-based compression method with adaptive quantization threshold and zerotree coding

Artur Przelaskowski, Marian Kazubek, Tomasz Jamrogiewicz

Show abstract

Efficient image compression technique especially for medical applications is presented. Dyadic wavelet decomposition by use of Antonini and Villasenor bank filters is followed by adaptive space-frequency quantization and zerotree-based entropy coding of wavelet coefficients. Threshold selection and uniform quantization is made on a base of spatial variance estimate built on the lowest frequency subband data set. Threshold value for each coefficient is evaluated as linear function of 9-order binary context. After quantization zerotree construction, pruning and arithmetic coding is applied for efficient lossless data coding. Presented compression method is less complex than the most effective EZW-based techniques but allows to achieve comparable compression efficiency. Specifically our method has similar to SPIHT efficiency in MR image compression, slightly better for CT image and significantly better in US image compression. Thus the compression efficiency of presented method is competitive with the best published algorithms in the literature across diverse classes of medical images.

Standard (H.324M) compatible video decoding for mobile multimedia systems over error-prone channel

Dong-Seek Park, Jong Dae Kim, Yoon-Soo Kim

Show abstract

Motivated by error-robust mobile multiplexer what was called H.223/Annex A, we investigate restriction of H.263 functionality in H.324 mobile terminals against error-prone channel. Although H.223/A employs strong error protection schemes such as rate compatible punctured convolutional code and automatic retransmission on request, it cannot detect errors is encoded video bit stream. In order to further increase the performance of the total H.324/M system, it is desirable to reduce functionality of H.263 so that H.263 decoder can easily locate errors in incoming H.263 stream, which cannot be made possible by H.223/A. We suggest one possible way to achieve H.263-compatible and reduced functionality incorporated with a new mobile multiplexer in wireless links aiming at enhancement of visual quality at receiver.

Keyword spotting for multimedia document indexing

Philippe Gelin, Christian J. Wellekens

Show abstract

We tackle the problem of multimedia indexing using keyword spotting on the spoken part of the data. Word spotting systems for indexing have to meet vary hard specifications: short response times to queries, speaker independent mode, open vocabulary in order to be able to track any keyword. To meet these constraints keyword models should be build according to their phonetic spelling and the process should be divided in two parts: preprocessing of the speech signal and query over a lattice of hypotheses. Different classification criteria have been studied for hypothesis generation: frame labeling, maximum likelihood and maximum a posteriori (MAP). The hypothesis probability is computed either through standard gaussian model or through a hybrid Hidden Markov Model-Neural Network. The training of the phonemic models is based either on Viterbi alignment or on recursive estimation and maximization of a posteriori probabilities. In the latter discriminant properties between phonemes are enforced. Tests have been conducted on TIMIT database as well as on TV news soundtracks. Interesting results have been obtained in time saving for the documentalist. The ultimate goal is to couple the soundtrack indexing with tools for video indexing in order to enhance the robustness of the system.

Multiresolution image retrieval using B-splines

Mitchell D. Swanson, Ahmed H. Tewfik

Show abstract

We propose a technique to search through large image collections. Each database image is stored as a combination of potential query terms and non-query terms. The query terms are represented by affine-invariant B-spline moments and wavelet transform subbands. The dual representation supports a two-stage image retrieval system. A user-posed query is first mapped to a dictionary of prototype object contours represented by B-spline moments. The B-spline mapping reduces the query search space to a subset of the original database. Furthermore, it provides an estimate of the affine transformation between the query and the prototypes. The second stage consist of a set of embedded VQ dictionaries of multiresolution subbands of image objects. The estimated affine transformation is employed as a correction factor for the multiresolution VQ mapping. A simple bit string matching algorithm compares the resulting query VQ codewords with the codewords of the database images for retrieval.

Image compression based on motion segmentation

Man-Bae Kim, Do-Kyoon Kim

Show abstract

Humans subjectively evaluate the content of a scene. FOr content-based indexing and retrieval, we index and retrieve the scene containing moving objects because they remain in our memory longer than static scene. The importance of processing moving objects has been demonstrated in image compression, content-based data processing, and a variety of video processing techniques. This paper proposes the method of segmenting and then compressing the image including moving objects. The image scene is usually composed of the motion region (MR) and static region (SR). For simplicity, the camera motion region is assigned to the SR, because the region has similar characteristics with the SR. The MR is extracted by our segmentation technique. We propose a line scan-based segmentation method being composed of motion estimation and label assignment. The dominant region owning the largest number of blocks with the same label is classified as SR. The region excluding the SR is MR. Then, the MR is processed by lossless compression or lossy with low-compression ratio to preserve the high quality, and the SR by a lossy method with high compression ratio. Rather than applying separate methods to MR and SR, we use a hybrid compression method based on DCT. Experiments on test video clips show the increase of the compression ratio with respect to the lossless compression and better visualization of the moving objects compared with the lossy compression.

Texture features for image classification and retrieval

Mohamed Borchani, Georges Stamon

Show abstract

In this paper, we present an approach to texture-based image retrieval using image similarity on the basis of the matching of selected texture features. Image texture features are generated via gray level co-occurrence matrix, run-length matrix, and image histogram. Since they are computed over gray levels, color images of the database are first converted to 256 gray levels. For each image of the database, a set of texture features is extracted. They are derived from a modified form of the gray level co-occurrence matrix over several angles and distances, from a modified form of the run-length matrix over several angles, and from the image histogram. A sequential forward search is performed on all these features to reduce the dimensionality of the feature space. A supervised classifier is then applied to this reduced feature space in order to classify images into well separated classes. For measuring the similarity between two images a distance between two texture feature vectors is calculated. First experiments with multiple queries in a large image database give good results in terms of both speed and classification rate.

Image morphing with snake model and thin-plate spline interpolation

Aboul Ella Hassanien, Masayuki Nakajima

Show abstract

The aim of this study is to propose a new solution to the following image morphing problems: feature specification, image warping and cross dissolve between two deformed images. First, we adopt a semi-automatic algorithm based on Ziplock snake to specify the feature correspondence between two given images. It allows a user to extract a contour that defines facial features such as lips, mouth, profile, etc., by specifying only endpoints of the contour around the feature that serve as the extremities of a contour. Then we use these two points as anchor points, and automatically computes the image information around these endpoints to provide boundary conditions. Next, we optimize the contour by taking this information into account first only close its extremities. During the iterative optimization process, the image forces are moving progressively from the contours extremities towards its center to define features. It helps the user to define easily the exact position of features. It may also reduce the time taken to establish feature correspondence between two images. For the second images morphing problem, this paper presents a warping algorithm using thin plate spline, a well known scattered data method which has several advantages. It is efficient in time complex and smoothed interpolated morphed images with only a remarkably small number of feature points specified. It allows each feature point to be mapped out the corresponding feature point in the warped image. Once the image warped to align the positions of feature and their shapes. The in-between animation from given two images could be defined by cross dissolving the positions of correspondence features and their shapes and colors. We describe an efficient cross dissolve algorithm to generate the in-between images.

Architecture for biomedical multimedia information delivery on the World Wide Web

L. Rodney Long, Gin-Hua Goh, Leif Neve, et al.

Show abstract

Research engineers at the National Library of Medicine are building a prototype system for the delivery of multimedia biomedical information on the World Wide Web. This paper discuses the architecture and design considerations for the system, which will be used initially to make images and text from the third National Health and Nutrition Examination Survey (NHANES) publicly available. We categorized our analysis as follows: (1) fundamental software tools: we analyzed trade-offs among use of conventional HTML/CGI, X Window Broadway, and Java; (2) image delivery: we examined the use of unconventional TCP transmission methods; (3) database manager and database design: we discuss the capabilities and planned use of the Informix object-relational database manager and the planned schema for the HNANES database; (4) storage requirements for our Sun server; (5) user interface considerations; (6) the compatibility of the system with other standard research and analysis tools; (7) image display: we discuss considerations for consistent image display for end users. Finally, we discuss the scalability of the system in terms of incorporating larger or more databases of similar data, and the extendibility of the system for supporting content-based retrieval of biomedical images. The system prototype is called the Web-based Medical Information Retrieval System. An early version was built as a Java applet and tested on Unix, PC, and Macintosh platforms. This prototype used the MiniSQL database manager to do text queries on a small database of records of participants in the second NHANES survey. The full records and associated x-ray images were retrievable and displayable on a standard Web browser. A second version has now been built, also a Java applet, using the MySQL database manager.

Multimedia Storage and Archiving Systems II

Volume Details

Table of Contents

Table of Contents