Proceedings Volume 3312

Storage and Retrieval for Image and Video Databases VI

cover
Proceedings Volume 3312

Storage and Retrieval for Image and Video Databases VI

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 23 December 1997
Contents: 10 Sessions, 40 Papers, 0 Presentations
Conference: Photonics West '98 Electronic Imaging 1998
Volume Number: 3312

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Retrieval I
  • Video Representation
  • Image Retrieval II
  • Image Retrieval III
  • Video Segmentation
  • Poster Session
  • Intelligent Tools
  • Similarity Search
  • Image and Video Authentication
  • Video Storage and Delivery
Image Retrieval I
icon_mobile_dropdown
Sequential processing for content-based retrieval of composite objects
Chung-Sheng Li, John R. Smith, Lawrence D. Bergman, et al.
It is becoming increasingly important for multimedia databases to provide capabilities for content-based retrieval of composite objects. Composite objects consist of several simple objects which have feature, spatial, temporal, semantic attributes, and spatial and temporal relationships between them. A content-based composite object query is satisfied by evaluating a program of content-based rules (i.e., color, texture), spatial and temporal rules (i.e., east, west), fuzzy conjunctions (i.e., appears similar AND is spatially near) and database lookups (i.e., semantics). We propose a new sequential processing method for efficiently computing content-based queries of composite objects. The proposed method evaluates the composite object queries by (1) defining an efficient ordering of the sub-goals of the query, which involve spatial, temporal, content-based and fuzzy rules, (2) developing a query block management strategy for generating, evaluating, and caching intermediate sub-goal results, and (3) conducting a best-first dynamic programming-based search with intelligent back-tracking. The method is guaranteed to find the optimal answer to the query and reduces the query time by avoiding the exploration of unlikely candidates.
Segmentation-based image retrieval
Andreas Siebert
We have been developing an image retrieval system, called MIPS (multiscalar image processing and retrieval system), for use in uncontrolled environments. On insertion into the image database, the images are automatically segmented into homogeneous regions. Generic features are computed and stored for each segment. Specifically, we maintain not only geometric and photometric attributes but also simple spatial information for each extracted region. This approach asks the user to construct queries in terms of the given primitives i.e. regions and their spatial relations. Preliminary results show that the success of the system depends on how well the images can be modeled by homogeneous regions, on how useful the generic features are for the given application, and on the knowledge that the user puts into the formulation of the queries. A fully automatic segmentation algorithm is of paramount importance. We have designed an algorithm called perceptual region growing that combines region growing, edge detection, and perceptual organization principles, without resorting to any kind of high level knowledge or interactive user intervention. Decision thresholds and quality measures are directly derived from the image data, based on image statistics. Search through critical parameter spaces is the key idea to cope with noise in uncontrolled environments. The dynamics of the region growing process is constantly monitored and exploited.
Relevance feedback techniques in interactive content-based image retrieval
Yong Rui, Thomas S. Huang, Sharad Mehrotra
Content-based image retrieval (CBIR) has become one of the most active research areas in the past few years. Many visual feature representations have been explored and many systems built. While these research efforts establish the basis of CBIR, the usefulness of the proposed approaches is limited. Specifically, these efforts have relatively ignored two distinct characteristics of CBIR systems: (1) the gap between high level concepts and low level features; (2) subjectivity of human perception of visual content. This paper proposes a relevance feedback based interactive retrieval approach, which effectively takes into account the above two characteristics in CBIR. During the retrieval process, the user's high level query and perception subjectivity are captured by dynamically updated weights based on the user's relevance feedback. The experimental results show that the proposed approach greatly reduces the user's effort of composing a query and captures the user's information need more precisely.
Implementation and performance evaluation of a progressive image retrieval system
Kai-Chieh Liang, C.-C. Jay Kuo
An integrated wavelet-based image storage and indexing system is proposed in this research. Indexing features of consideration contains the texture, color and spatial information of images. These features can be naturally extracted during the successive approximation quantization (SAQ) stage of the wavelet compression process. Extensive experimental results are performed to demonstrate the retrieval performance of the new approach.
Video Representation
icon_mobile_dropdown
Automatic dominant camera motion annotation for video retrieval
Wei Xiong, John Chung-Mong Lee
An efficient method is derived to classify the dominant camera motions in video shots. Various 3-D camera motions including camera pan, tilt, zoom, Z-rotation, and translations are detected. The method is to analyze the optical flow in a decomposed manner. Images are divided into some sub-regions according to our camera model. The projected x and y components of optical flow are analyzed separately in the different sub-regions of the images. Different camera motions are recognized by comparing the computed result with the prior known patterns. The optical flow is computed by using the Lucas-Kanade method, which is quite efficient due to non- iteration computation. Our method is efficient and effective because only some mean values and standard deviations are used. The analysis and detailed description of our method is given in this paper. Experimental results are presented to show the effectiveness of our method.
Classification, simplification, and dynamic visualization of scene transition graphs for video browsing
Boon-Lock Yeo, Minerva M. Yeung
The scene transition graph (STG) is a directed graph structure that compactly captures both image content and temporal flow of video. An STG offers a condensed view of the story content, serves as the summary of the clip represented, and allows nonlinear access to its story element. It can serve as a valuable tool for both the analysis of video structure and presentation of high level visual summary for video browsing applications. In this paper, we study new techniques for classification and simplification of the STG, and present better means of visualizing the graph through dynamic visual display and simplified structures. In other words, our techniques improve significantly the existing graph structure to enable more succinct presentation of the graphs which leads to more efficient utilization of the screen spaces. In addition, a technique that captures and presents visually the temporal dynamics of the video sequence is described. We have tested the graph visualization techniques on various programming types and the new tools are found to effectively handle video from a wider variety than the existing STG structure.
Editing cues for content-based analysis and summarization of motion pictures
Ahmet Mufit Ferman, A. Murat Tekalp
This paper introduces techniques that exploit common film editing practices to perform content-based analysis and summarization of video programs. By observing certain editing conventions we determine the intended associations between shots that constitute a coherent sequence, and utilize this information to generate meaningful semantic decompositions of streams. Dynamic composition, shot pacing, motion continuity, and shot transitions are among the editing tools that we consider for high-level analysis. We also develop techniques for detecting establishing shots in a video program, and demonstrate how they can be used for efficient query processing and summary generation. The proposed framework facilitates such queries as finding shots that occur at the same location or within the same time frame; it also provides a powerful tool for semi-automated EDL and script generation.
Developing high-level representations of video clips using VideoTrails
Vikrant Kobla, David Scott Doermann, Christos Faloutsos
A high-level representation of a video clip comprising information about its physical and semantic structure is necessary for providing appropriate processing, indexing and retrieval capabilities for video databases. We describe a novel technique which reduces a sequence of MPEG encoded video frames to a trail of points in a low dimensional space. In our earlier work, we presented techniques applicable in 3-D, but in this paper, we describe techniques that can be extended to higher dimensions where improved performance is expected. In the low-dimensional space, we can cluster frames, analyze transitions between clusters and compute properties of the resulting trail. Portions of the trail can be classified as either stationary or transitional, leading to high-level descriptions of the video. Tracking the interaction of clusters over time, we lay the groundwork for the complete analysis and representation of the video's physical and semantic structure.
Spatially reduced image extraction from MPEG-2 video: fast algorithms and applications
Junehwa Song, Boon-Lock Yeo
The MPEG-2 video standards are targeted for high-quality video broadcast and distribution, and are optimized for efficient storage and transmission. However, it is difficult to process MPEG-2 for video browsing and database applications without first decompressing the video. Yeo and Liu have proposed fast algorithms for the direct extraction of spatially reduced images from MPEG-1 video. Reduced images have been demonstrated to be effective for shot detection, shot browsing and editing, and temporal processing of video for video presentation and content annotation. In this paper, we develop new tools to handle the extra complexity in MPEG-2 video for extracting spatially reduced images. In particular, we propose new classes of discrete cosine transform (DCT) domain and DCT inverse motion compensation operations for handling the interlaced modes in the different frame types of MPEG-2, and design new and efficient algorithms for generating spatially reduced images of an MPEG-2 video. We also describe key video applications on the extracted reduced images.
Video hypermedia authoring using automatic object tracking
Junshiro Kanda, Koji Wakimoto, Hironobu Abe, et al.
Video hypermedia systems enable users to retrieve information related to an object by selecting it directly in a video sequence. In video hypermedia systems, users must locate an anchor position according to the motion of the object. But it is very laborious to locate an anchor to its suitable position according to the motion of the object. We have proposed a new automatic object tracking method and implemented it to the system. A feature of this method is that it includes various automatic error correction algorithms. We evaluated this system on effectiveness on reducing human operations. As a result, the number of operations reduced to 30.3% of the former method, and the time of operations reduced to 60.1% of the former method.
Image Retrieval II
icon_mobile_dropdown
MetaSEEk: a content-based metasearch engine for images
Mandis Beigi, Ana Belen Benitez, Shih-Fu Chang
Search engines are the most powerful resources for finding information on the rapidly expanding World Wide Web (WWW). Finding the desired search engines and learning how to use them, however, can be very time consuming. The integration of such search tools enables the users to access information across the world in a transparent and efficient manner. These systems are called meta-search engines. The recent emergence of visual information retrieval (VIR) search engines on the web is leading to the same efficiency problem. This paper describes and evaluates MetaSEEk, a content-based meta-search engine used for finding images on the Web based on their visual information. MetaSEEk is designed to intelligently select and interface with multiple on-line image search engines by ranking their performance for different classes of user queries. User feedback is also integrated in the ranking refinement. We compare MetaSEEk with a base line version of meta-search engine, which does not use the past performance of the different search engines in recommending target search engines for future queries.
Image retrieval using the directional detail histogram
Dimitrios Androutsos, Konstantinos N. Plataniotis, Anastasios N. Venetsanopoulos
We present a technique for indexing the directional detail and smoothness present in an image. By directional detail we imply strong directional activity in the horizontal, vertical and diagonal direction present in areas of detail and texture. By smoothness we refer to the smooth or low frequency areas of the image which do not contain prominent edge or texture activity. We map the directional information into 3-d vectors, which are then used to build N-d histograms. These histograms can then be used as database indices which can be queried using histogram techniques.
Image Retrieval III
icon_mobile_dropdown
Color-WISE: a system for image similarity retrieval using color
Ishwar K. Sethi, Ioana L. Coman, Brian Day, et al.
Color is one of the most widely used features for image similarity retrieval. Most of the existing image similarity retrieval schemes employ either global or local color histogramming. In this paper, we explore the use of localized dominant hue and saturation values for color-based image similarity retrieval. This scheme results in a relatively compact representation of color images for similarity retrieval. Experimental results comparing the proposed representation with global and local color histogramming are presented to show the efficacy of the suggested scheme.
Updates to the QBIC system
Carlton Wayne Niblack, Xiaoming Zhu, James Lee Hafner, et al.
QBICTM (Query By Image Content) is a set of technologies and associated software that allows a user to search, browse, and retrieve image, graphic, and video data from large on-line collections. This paper discusses current research directions of the QBIC project such as indexing for high-dimensional multimedia data, retrieval of gray level images, and storyboard generation suitable for video. It describes aspects of QBIC software including scripting tools, application interfaces, and available GUIs, and gives examples of applications and demonstration systems using it.
Fast texture database retrieval using extended fractal features
The increase in the number of multimedia databases consisting of images has created a need for a quick method to search these databases for a particular type of image. An image retrieval system will output images from the database similar to the query image in terms of shape, color, and texture. For the scope of our work, we study the performance of multiscale Hurst parameters as texture features for database image retrieval over a database consisting of homogeneous textures. These extended Hurst features represent a generalization of the Hurst parameter for fractional Brownian motion (fBm) where the extended parameters quantize the texture roughness of an image at various scales. We compare the retrieval performance of the extended parameters against traditional Hurst features and features obtained from the Gabor wavelet. Gabor wavelets have previously been suggested for image retrieval applications because they can be tuned to obtain texture information for a number of different scales and orientations. In our experiments, we form a database combining textures from the Bonn, Brodatz, and MIT VisTex databases. Over the hybrid database, the extended fractal features were able to retrieve a higher percentage of similar textures than the Gabor features. Furthermore, the fractal features are faster to compute than the Gabor features.
Video Segmentation
icon_mobile_dropdown
Multiresolution video segmentation using wavelet transformation
Hong Heather Yu, Wayne H. Wolf
This paper describes a new fade and dissolve detection methodology that utilizes wavelet transformation. This approach takes advantage of the production aspects of video as well as mimicking human perception. Each frame of the video is first decomposed into low-resolution component and high- resolution component using wavelet transformation. The possible gradual changes are first detected with edge spectrum average (ESA) feature which is obtained from the high- resolution component, in the mean time, the changing statistics of the ESA is studied to identify fades. Double chromatic difference is applied later on the low-resolution component to identify the dissolve transitions.
Illumination-invariant video segmentation by hierarchical robust thresholding
Jie Wei, Mark S. Drew, Ze-Nian Li
Many methods for video segmentation rely upon the setting and tuning of thresholds for classifying interframe distances under various difference measures. An approach that has been used with some success has been to establish statistical measures for each new video and identify camera cuts as difference values far from the mean. For this type of strategy the mean and dispersion for some interframe distance measure must be calculated for each new video as a whole. Here we eliminate this statistical characterization step and at the same time allow for segmentation of streaming video by introducing a preprocessing step for illumination-invariance that concomitantly reduces input values to a uniform scale. The preprocessing step provides a solution to the problem that simple changes of illumination in a scene, such as an actor emerging from a shadow, can trigger a false positive transition, no matter whether intensity alone or chrominance is used in a distance measure. Our means of discounting lighting change for color constancy consists of the simple yet effective operation of normalizing each color channel to length 1 (when viewed as a long, length-N vector). We then reduce the dimensionality of color to two-dimensional chromaticity, with values which are in 0..1. Chromaticity histograms can be treated as images, and effectively low-pass filtered by wavelet-based reduction, followed by DCT and zonal coding. This results in an indexing scheme based on only 36 numbers, and lends itself to a binary search approach to transition detection. To this end we examine distributions for intra-clip and inter-clip distances separately, characterizing each using robust statistics, for temporal intervals from 32 frames to 1 frame by powers of 2. Then combining transition and non-transition distributions for each frame internal, we seek the valley between them, again robustly, for each threshold. Using the present method values of precision and recall are increased over previous methods. Moreover, illumination change produces very few false positives.
NeTra-V: toward an object-based video representation
There is a growing need for new representations of video that allow not only compact storage of data but also content-based functionalities such as search and manipulation of objects. We present here a prototype system, called NeTra-V, that is currently being developed to address some of these content related issues. The system has a two-stage video processing structure: a global feature extraction and clustering stage, and a local feature extraction and object-based representation stage. Key aspects of the system include a new spatio-temporal segmentation and object-tracking scheme, and a hierarchical object-based video representation model. The spatio-temporal segmentation scheme combines the color/texture image segmentation and affine motion estimation techniques. Experimental results show that the proposed approach can handle large motion. The output of the segmentation, the alpha plane as it is referred to in the MPEG-4 terminology, can be used to compute local image properties. This local information forms the low-level content description module in our video representation. Experimental results illustrating spatio- temporal segmentation and tracking are provided.
Poster Session
icon_mobile_dropdown
Fuzzy-logic approach to digital video segmentation
A fuzzy logic system for the detection of shot boundaries in video sequences is presented. It integrates multiple metrics and knowledge of editing procedures to detect shot boundaries. Furthermore, the system is capable of classifying the editing process employed to create the shot boundary into one of the following categories: abrupt cut, fade-in, fade-out, or dissolve.
Comparison and improvement of color-based image retrieval techniques
Yujin Zhang, Zhong Wei Liu, Yun He
With the increasing popularity of image manipulation with contents, many color-based image retrieval techniques have been proposed in the literature. A systematic and comparative study of 8 representative techniques is first presented in this paper, which uses a database of 200 images of flags and trademarks. These techniques are determined to cover the variations of the color models used, of the characteristic color features employed and of the distance measures calculated for judging the similarity of color images. The results of this comparative study are presented both by the list of retrieved images for subjective visual inspection and by the retrieving ratios computed for objective judgement. All of them show that the cumulative histogram based techniques using Euclidean distance measures in two perception related color spaces give best results among the 8 techniques under consideration. Started from the best performed techniques, works toward further improving their retrieving capability are then carried on and this has resulted 2 new techniques which use local cumulative histograms. The new techniques have been tested by using a database of 400 images of real flowers which are quite complicated in color contents. Some satisfactory results, compared to that obtained by using existing cumulative histogram based techniques are obtained and presented.
Image indexing technique using entropy measures with multilevel multiresolution approach
Tae-Hee Kim, Dong-Seok Jeong
In this paper, we propose a new content-based indexing algorithm that utilizes pixel-wise entropy and extracts features such as color and entropy from an image as indices. We propose a technique that fulfills both global and regional searching. Global searching scheme utilizes entropy features with multilevel-multiresolution. As resolution of the image is reduced, another information of the image is revealed. As gray-level of the image is reduced, we see how large the gray- level differences are between neighboring pixels. Regional searching utilizes color features that are extracted from regions separated by entropy measures. Our algorithm provides not only the automated extraction of entropy-based regions but also the representation of their color contents. Thus, we can classify images using entropy and multiresolution multi-level based features. Various experiments show the promising future of the proposed algorithm.
WebMIRS: web-based medical information retrieval system
L. Rodney Long, Stanley R. Pillemer, Reva C. Lawrence, et al.
At the Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), we are developing a prototype multimedia database system to provide World Wide Web access to biomedical databases. WebMIRS (Web-based Medical Information Retrieval System) will allow access to databases containing text and images and will allow database query by standard SQL, by image content, or by a combination of the two. The system is being developed in the form of Java applets, which will communicate with the Informix DBMS on an NLM Sun workstation running the Solaris operating system. The system architecture will allow access from any hardware platform, which supports a Java-enabled Web browser, such as Netscape or Internet Explorer. Initial databases will include data from two national health surveys conducted by the National Center for Health Statistics (NCHS), and will include x-ray images from those surveys. In addition to describing in- house research in database access systems, this paper describes ongoing work toward querying by image content. Image content search capability will include capability to search for x-ray images similar to an input image with respect to vertebral morphometry used to characterize features such as fractures and disc space narrowing.
Detection of gradual scene changes for parsing of video data
Samuel Moon-Ho Song, Tae-Hoon Kwon, Woonkyung Michael Kim, et al.
The automatic video parser, a necessary tool for the development and maintenance of a video library, must accurately detect video scene changes so that the resulting video clips can be indexed in some fashion and stored in a video database. With the current existing algorithms, abrupt scene changes are detected fairly well; however, gradual scene changes, including fade-ins, fade-outs, and dissolves, are often missed. In this paper, we propose a new gradual scene change detection algorithm. In particular, we focus on fade- ins, fade-outs, and dissolves. The proposed algorithm is based on the chromatic video edit model. The video edit model indicates that, for sequences without motion, the second partial derivative with respect to time is zero during fade- ins, fade-outs, and dissolves. However, it is also zero for static scenes. Thus, the proposed algorithm computes the first (to disregard static scenes) and second partial derivatives, and if the norm of the second derivative is 'small' relative to the norm of the first derivative, the algorithm declares a gradual scene change. The efficacy of our algorithm is demonstrated using a number of video clips and some performance comparisons are made with other existing approaches.
Video segmentation using combined cues
Mark Steven Toller, Paul H. Lewis, Mark S. Nixon
The use of video data in the multimedia environment is increasing rapidly, and so tools to handle large volumes of video data are required. One of the first steps towards creating more versatile video tools is to segment the video data, i.e. partition it into its component shots. This paper presents a novel combination of algorithms for video segmentation, utilizing histogram comparison, motion vector information (focus of expansion), and edge information to detect transitions between shots. Our method can reliably detect transitions such as camera breaks, fades, dissolves and wipes, in video compressed to the MPEG-I standard.
Extracting image features from MPEG-2 compressed stream
Chee Sun Won, Dong Kwon Park, Seong-Joon Yoo
In this paper, we propose a new image feature extraction algorithm in the compression domain. To minimize the decompression process the proposed feature extraction algorithm executes only the parsing process to the compressed bit stream. Then, by just decoding dct_dc_size in the MPEG-2 bit stream, we can determine if there exists any abrupt brightness change between two DCT blocks. According to the Huffman table for the MPEG-2 encoder, as the difference of the dc values between two succeeding DCT blocks increases, it yields longer coded bits. That is, the length of the coded dc value is proportional to the brightness change between two succeeding DCT blocks. Therefore, one can detect an edge feature between DCT blocks by just decoding the information regarding the number of bits assigned to the difference of the dc values. To demonstrate the usefulness of the proposed feature extraction method, we apply the detected edge features to find scene changes in the MPEG-2 compressed bit stream.
Novel image viewer providing fast object delineation for content-based retrieval and navigation
Stephen T. Perry, Paul H. Lewis
In this paper we describe a novel interactive image viewer incorporating a range of image processing techniques that allows inexperienced users to quickly and easily delineate objects or shapes from a wide range of real world images. The viewer is specifically designed to be easily extensible, and this extensibility is demonstrated with the implementation of an iterative user guided segmentation tool. Using this tool objects can be efficiently extracted from images and used as the basis for navigation and retrieval within MAVIS, the Multimedia Architecture for Video, Image, and Sound.
Synthesizing parallel imaging applications using the CAP computer-aided parallelization tool
Benoit A. Gennart, Marc Mazzariol, Vincent Messerli, et al.
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Intelligent Tools
icon_mobile_dropdown
Models for automatic classification of video sequences
Giridharan Iyengar, Andrew B. Lippman
In this paper, we explore a technique for automatic classification of video sequences, (such as a TV broadcast, movies). This technique analyzes the incoming video sequences and classifies them into categories. It can be viewed as an on-line parser for video signals. We present two techniques for automatic classification. In the first technique, the incoming video sequence is analyzed to extract the motion information. This information is optimally projected onto a single dimension. This projection information is then used to train Hidden Markov Models (HMMs) that efficiently and accurately classify the incoming video sequence. Preliminary results with 50 different test sequences (25 Sports and 25 News sequences) indicae a classification accuracy of 90% by the HMM models. In the second technique, 24 full-length motion picture trailers are classified using HMMs. This classification is compared with the internet movie database and we find that they correlate well. Only two out of 24 trailers were classified incorrectly.
Agent-like support for content-based retrieval and navigation
Periasamy Radhakrishnan, Paul H. Lewis
In this paper we show how agent-like processes may be used to support content based retrieval and navigation in MAVIS, a multimedia architecture for video, image and sound. The processes provide automatic classification of feature vectors from users' selections when a multimedia thesaurus (MMT) is present and feature vector clustering in the absence of an MMT.
Video retrieval: content analysis by ImageMiner
Peter Alshuth, Thorsten Hermes, Lutz Voigt, et al.
In this paper videos are analyzed to get a content-based description of the video. The structure of a given video is useful to index long videos efficiently and automatically. A comparison between shots gives an overview about cut frequency, cut pattern, and scene bounds. After a shot detection the shots are grouped into clusters based on their visual similarity. A time-constraint clustering procedure is used to compare only those shots that are positioned inside a time range. Shots from different areas of the video (e.g., begin/end) are not compared. With this cluster information that contains a list about shots and their clusters it is possible to calculate scene bounds. A labeling of all clusters gives a declaration about the cut pattern. It is easy now to distinguish a dialogue from an action scene. The final content analysis is done by the ImageMinerTM system. The ImageMiner system developed at the University of Bremen of the Image Processing Department of the Center for Computing Technology realizes content-based image retrieval for still images through a novel combination of methods and techniques of computer vision and artificial intelligence. The ImageMiner system consists of three analysis modules for computer vision, namely for color, texture, and contour analysis. Additionally exists a module for object recognition. The output of the object recognition module can be indexed by a text retrieval system. Thus, concepts like forestscene may be searched for. We combine the still image analysis with the results of the video analysis in order to retrieve shots or scenes.
Similarity Search
icon_mobile_dropdown
S-STIR: similarity search through iterative refinement
Chung-Sheng Li, John R. Smith, Vittorio Castelli
Similarity retrieval of images based on texture and color features has generated a lot of interests recently. Most of these similarity retrievals are based on the computation of the Euclidean distance between the target feature vector and the feature vectors in the database. Euclidean distance, however, does not necessarily reflect either relative similarity required by the user. In this paper, a method based on nonlinear multidimensional scaling is proposed to provide a mechanism for the user to dynamically adjust the similarity measure. The results show that a significant improvement on the precision versus recall curve has been achieved.
Multiresolution subimage similarity matching for large image databases
Kai-Sang Leung, Raymond T. Ng
Many database management systems support whole-image matching. However, users may only remember certain subregions of the images. In this paper, we develop Padding and Reduction Algorithms to support subimage queries of arbitrary size based on local color information. The idea is to estimate the best- case lower bound to the dissimilarity measure between the query and the image. By making use of multiresolution representation, this lower bound becomes tighter as the scale becomes finer. Because image contents are usually pre- extracted and stored, a key issue is how to determine the number of levels used in the representation. We address this issue analytically by estimating the CPU and I/O costs, and experimentally by comparing the performance and accuracy of the outcomes of various filtering schemes. Our findings suggest that a 3-level hierarchy is preferred. We also study three strategies for searching multiple resolutions. Our studies indicate that the hybrid strategy with horizontal filtering on the coarse level and vertical filtering on remaining levels is the best choice when using Padding and Reduction Algorithms in the preferred 3-level multiresolution representation. The best 10 desired images can be retrieved efficiently and effectively from a collection of a thousand images in about 3.5 seconds.
VisualGREP: a systematic method to compare and retrieve video sequences
In this paper, we consider the problem of similarity between video sequences. Three basic questions are raised and (partially) answered. Firstly, at what temporal duration can video sequences be compared? The frame, shot, scene and video levels are identified. Secondly, given some image or video feature, what are the requirements on its distance measure and how can it be 'easily' transformed into the visual similarity desired by the inquirer? Thirdly, how can video sequences be compared at different levels? A general approach based on either a set or sequence representation with variable degrees of aggregation is proposed and applied recursively over the different levels of temporal resolution. It allows the inquirer to fully control the importance of temporal ordering and duration. Promising experimental results are presented.
Image and Video Authentication
icon_mobile_dropdown
Adaptive public watermarking of DCT-based compressed images
Matthew J. Holliman, Nasir D. Memon, Boon-Lock Yeo, et al.
We propose an adaptive scheme to embed watermark information in DCT blocks of compressed image data. our scheme is designed to avoid artifacts and prevent an increase in the bit rate of the compressed watermarked image. This is essentially done by judicious selection of appropriate blocks for watermark insertion. We also introduce a block-dependent seed generation algorithm to determine the specific coefficients to modify in a particular block. Our proposed technique is simple to implement in software, and hence suitable for real-time software-only insertion of invisible watermarks in JPEG compressed images, and MPEG or MJPEG video streams. We also demonstrate experimentally that our proposed watermarking scheme actually results in bit rate reduction of JPEG compressed images. The proposed technique can also complement existing DCT-based watermarking schemes for copy protection and detection applications.
Robust image authentication method surviving JPEG lossy compression
Image authentication verifies the originality of an image by detecting malicious manipulations. This goal is different from that of image watermarking which embeds into the image a signature surviving most manipulations. Existing methods for image authentication treat all types of manipulation equally (i.e., as unacceptable). However, some applications demand techniques that can distinguish acceptable manipulations (e.g., compression) from malicious ones. In this paper, we describe an effective technique for image authentication which can prevent malicious manipulations but allow JPEG lossy compression. The authentication signature is based on the invariance of the relationship between DCT coefficients of the same position in separate blocks of an image. This relationship will be preserved when these coefficients are quantized in a JPEG compression process. Our proposed method can distinguish malicious manipulations from JPEG lossy compression regardless of how high the compression ratio is. We also show that, in different practical cases, the design of authenticator depends on the number of recompression times and on whether the image is decoded into integral values in the pixel domain during the recompression process. Theoretical and experimental results indicate that this technique is effective for image authentication.
Robust embedded data from wavelet coefficients
Jong Jin Chae, B. S. Manjunath
An approach to embedding gray scale images using a discrete wavelet transform is proposed. The proposed scheme enables using signature images that could be as much as 25% of the host image data and hence could be used both in digital watermarking as well as image/data hiding. In digital watermarking the primary concern is the recovery or checking for signature even when the embedded image has been changed by image processing operations. Thus the embedding scheme should be robust to typical operations such as low-pass filtering and lossy compression. In contrast, for data hiding applications it is important that there should not be any visible changes to the host data that is used to transmit a hidden image. In addition, in both data hiding and watermarking, it is desirable that it is difficult or impossible for unauthorized persons to recover the embedded signatures. The proposed scheme provides a simple control parameter that can be tailored to either hiding or watermarking purposes, and is robust to operations such as JPEG compression. Experimental results demonstrate that high quality recovery of the signature data is possible.
Video Storage and Delivery
icon_mobile_dropdown
Prototype system of secure VOD
Harumi Minemura, Tomohisa Yamaguchi
Secure digital contents delivery systems are to realize copyright protection and charging mechanism, and aim at secure delivery service of digital contents. Encrypted contents delivery and history (log) management are means to accomplish this purpose. Our final target is to realize a video-on-demand (VOD) system that can prevent illegal usage of video data and manage user history data to achieve a secure video delivery system on the Internet or Intranet. By now, mainly targeting client-server systems connected with enterprise LAN, we have implemented and evaluated a prototype system based on the investigation into the delivery method of encrypted video contents.
Wide-area-distributed disaster-tolerant file system for multimedia data storage
Setsuko Murata, Shigetaro Iwatsu, Masahiro Ueno, et al.
We propose a new disaster-tolerant file system for multimedia data storage. To protect against natural disasters by eliminating the possibility of simultaneous failure in multiple disks constituting a RAID (redundant array of inexpensive disks) system, our proposed file system features a RAID system with member disks spatially distributed over a wide area. The Fiber Channel standard has the potential to create an up to 10-km separation of member disks by using single-mode optical fiber cables. The prototype system of wide-area distributed file (WDF) consisting of a layered structure of RAID level 1 and RAID level 0 was tested. It has its entire data in a local disk and its mirroring data striped into two fiber-channel-connected disks by using optical fiber cables. The effect of underlying software modules and hardware components on its read/write performance was investigated. Compared with a file system using a local disk, the prototype WDF system shows equivalent read performance by directing all reads to its local disk and its write performance is almost 20$% lower. An on-line transaction processing (OLTP) benchmark was also executed. The WDF system achieved a throughput (transactions per second) similar to that of the file system using a local disk.
File caching in video-on-demand servers
Fu-Ching Wang, Shin-Hung Chang, Chi-Wei Hung, et al.
This paper studies the file caching issue in video-on-demand (VOD) servers. Because the characteristics of video files are very different from those of conventional files, different type of caching algorithms must be developed. For VOD servers, the goal is to optimize resource allocation and tradeoff between memory and disk bandwidth. This paper first proves that resource allocation and tradeoff between memory and disk bandwidth is an NP-complete problem. Then, a heuristic algorithm, called the generalized relay mechanism, is introduced and a simulation-based optimization procedure is conducted to evaluate the effects of applying the generalized relay mechanism.
Disk system design for guaranteeing quality of service in interactive video-on-demand systems
Chih-Yuan Cheng, Yen-Jen Oyang, Meng-Huang Lee
This paper discusses disk system design that supports VCR-like interactive video-on-demand (VOD) with a good quality-of- service (QoS) guarantee. The guarantee of QoS is based on a practice that requires no higher data bandwidth to support a stream switching from the normal playback mode to a fast search mode while providing VCR-like visual feeling. The main issue studied in this paper is how to figure out the I/O bandwidth required in order to achieve a certain level of QoS. This paper presents a queueing model to tackle this problem and uses simulation to verify the validity of the queueing model. This paper also addresses the related implementation issues.