Multimedia Storage and Archiving Systems III

Content-based image object retrieval by scale-space representation of shapes

Mark E. Hoffman, Edward K. Wong

Show abstract

Vast amounts of inexpensive storage and cost-effective input devices have promoted a rapid increase in the amount of stored digital images and video. Retrieving desired images from image databases is a challenging problem, not easily solved by existing database methods. Shape is one popular feature used to automate retrieval of images by content. Example shape features include the Fourier descriptor for describing shape boundaries, and area or compactness for describing shape regions, among others. We present a new shape-representation method based on extracting an image surface ridge-line called the Most Prominent Ridge-Line (MPRL) in scale-space. The MPRL is extracted by minimizing the second spatial derivative orthogonal to the ridge-line direction. The scale of the MPRL point is proportional to the image object's width. Matching query and database image MPRLs is shown to be an effective method for retrieving images based on shape. A unique feature of the proposed method is that the scale dimension may be weighted to allow for any desired amount of shape details in image retrieval.

Efficient interactive image retrieval with multiple seed images

Xia Wan, Zijun Yang, C.-C. Jay Kuo

Show abstract

An interactive image query system based on multiple seed images is proposed in this work. With this proposed system, the query can be refined so that the meaning of similarity becomes clear along the query process. A particular way to achieve interactive query is implemented, i.e. adaptive filtering with multiple low-level indexing features based on user's feedback. The proposed query system consists of the following building blocks. First, browsing, image sketching and feature editing are employed for query formation input. The combination of the three methods provides high flexibility for users to get desired query images. In the system, images can also be selected from the candidate image set. By using a query set composed of multiple seed images instead of a single image, we can improve the query performance with more accurate similarity information. Two key procedures, initial guess and further refinement, are utilized to achieve high efficient query. At the initial stage, we try to expand the candidate image set to include as many features as possible. In the refinement process, users are able to initiate a more complex filtering strategy by using the feedback. The relative weighting of different features is further decided. Extensive examples are used to illustrate the proposed interactive query process and the corresponding retrieval performance.

Indexing based on edit-distance matching of shape graphs

Srikanta Tirthapura, Daniel Sharvit, Philip Klein, et al.

Show abstract

We are investigating a graph matching approach for indexing into pictorial databases using shock graphs, a symmetry- based representation of shape. Each shape (or a collection of edge elements) is represented by a shock graph. Indexing of a query into a pictorial database is accomplished by comparing the corresponding shock graph to the graphs representing database elements and selecting the best match. This paper introduces a new metric for comparing shock graphs.

Data representation and handling for large image browsing

Omid E. Kia, Alexandre Schaff, Jaakko J. Sauvola

Show abstract

In this paper we present results of work done on large image browsing. Advances in compression and transmission have allowed efficient usage of storage and transmission resources; however, much work needs to be done in order to integrate such concepts in the processing and handling of data. One such problem is image browsing. Large images, in order of hundreds of Megabytes to Gigabytes, exist that not only strain storage resources of some computers but make transmission almost impossible for dialup networks or even some digital networks. By exploiting the fact that receiving nodes need to only view limited regions of the image, a data organization and handling mechanism is devised to serve the appropriate data in a timely fashion. A wavelet-based approach is pursued with desirable compression and progressive characteristics along with scalar and spatial access to pixel data.

Personalized image retrieval with user's preference model

Yong-Hwan Kim, K. E. Lee, K. S. Choi, et al.

Show abstract

Recently, available information resources in the form of various media have been increased with rapid speed. Many retrieval systems for multimedia information resources have been developed only focused on their efficiency and performance. Therefore, they cannot deal with user's preferences and interests well. In this paper, we present the framework design of a personalized image retrieval system (PIRS) which can reflect user's preferences and interests incrementally. The prototype of PIRS consists of two major parts: user's preference model (UPM) and retrieval module (RM). The UPM plays a role of refining user's query to meet with user's needs. The RM retrieves the proper images for refined query by computing the similarities between each image and refined query, and the retrieved images are ordered by these similarities. In this paper, we mainly discuss about UPM. The incremental machine learning technologies have been employed to provide the user adaptable and intelligent capability to the system. The UPM is implemented by decision tree based on incremental tree induction, and adaptive resonance theory network. User's feedbacks are returned to the UPM, and they modify internal structure of the UPM. User's iterative retrieval activities with PIRS cause the UPM to be revised for user's preferences and interests. Therefore, the PIRS can be adapted to user's preferences and interests. We have achieved encouraging results through experiments.

RIME: a replicated image detector for the World Wide Web

Edward Y. Chang, James Ze Wang, Chen Li, et al.

Show abstract

This paper describes RIME (Replicated IMage dEtector), an alternative approach to watermarking for detecting unauthorized image copying on the Internet. RIME profiles internet images and stores the feature vectors of the images and their URLs in its repository. When a copy detection request is received, RIME matches the requested image's feature vector with the vectors stored in the repository and returns a list of suspect URLs. RIME characterizes each image using Daubechies' wavelets. The wavelet coefficients are stored as the feature vector. RIME uses a multidimensional extensible hashing scheme to index these high-dimensional feature vectors. Our preliminary result shows that it can detect image copies effectively. It can find the top suspects and copes well with image format conversion, resampling, and requantization.

Hierarchical clustering techniques for image database organization and summarization

Asha Vellaikal, C.-C. Jay Kuo

Show abstract

This paper investigates clustering techniques as a method of organizing image databases to support popular visual management functions such as searching, browsing and navigation. Different types of hierarchical agglomerative clustering techniques are studied as a method of organizing features space as well as summarizing image groups by the selection of a few appropriate representatives. Retrieval performance using both single and multiple level hierarchies are experimented with and the algorithms show an interesting relationship between the top k correct retrievals and the number of comparisons required. Some arguments are given to support the use of such cluster-based techniques for managing distributed image databases.

Learning a similarity-based distance measure for image database organization from human partitionings of an image set

David McG. Squire

Show abstract

In this paper our goal is to employ human judgments of image similarity to improve the organization of an image database for content-based retrieval. We first derive a statistic, K_B, for measuring the agreement between two partitionings of an image set into unlabeled subsets. This measure can be used to measure both the degree of agreement between pairs of human subjects and that between human and machine partitionings of an image set. It also allows a direct comparison of database organizations, as opposed to the indirect measure available via precision and recall measurements. This provides a rigorous means of selecting between competing image database organization systems, and assessing how close the performance of such systems is to that which might be expected from a database organization done by hand.

Matching pursuit: contents featuring for image indexing

Yong Man Ro

Show abstract

We proposed an image featuring technique for image indexing. To achieve image indexing, the matching pursuit algorithm is employed. Image domain is classified into several groups, which are determined by atoms in the dictionary of matching pursuit algorithm. Since the image is decomposed into groups of atoms the atoms can describe the contents of image. In this paper, a fast matching-pursuit of image in Radon space (RMP) is proposed. Further a content featuring algorithm by RMP with the dictionary consisting of Gabor functions is proposed. Texture and color featuring are shown as examples of content featuring. The computer simulations and their corresponding experimental results show that the proposed method can achieve contents featuring as well as image compression.

Using browsing to improve content-based image retrieval

Jesse S. Jin, Ruth Kurniawati, Guang-you Xu, et al.

Show abstract

Many content-based methods have been proposed to retrieve images from multimedia databases. Current index structures, such as R*-tree, SS-tree and SS⁺-tree, have a large overlapping area among their nodes, especially at the high level of the indexing tree. The overlapping area causes the search engine to compare a large number of nodes and hence, it is very inefficient to retrieve at very high levels of the index tree. We present a scheme to combine browsing with retrieval in searching for images. The browser uses the content-based index structure of image databases. It provides users with a visual tool to narrow the search quickly to a small region and to avoid a large number of comparisons. Combined with retrieval, it produces a very efficient content-based retrieval method.

Interactive classification and indexing of still and motion pictures in VideoRoadMap

Youngchoon Park, Forouzan Golshani, Sethuraman Panchanathan, et al.

Show abstract

The large scale proliferation of multimedia data necessitates the use of sophisticated techniques for accessing the information based on the content. VideoRoadMap is a new content-based video indexing system for retrieving video clips and images from multimedia databases. The system indexes the audio-visual information using spatio-temporal features and information modeling methods. The proposed system employs adaptive similarity measurements based on the contents of media objects, resulting in more accurate retrievals. Principal component analysis and second order statistical analysis are employed to determine the appropriate combination of weight values in similarity search. In addition, VideoRoadMap includes a powerful multi- faceted querying mechanism which allows queries to be formulated and presented in a variety of modes, including query by example (image and/or video), query by sketch, and query by object motion trajectory.

Embedded mixture modeling for efficient probabilistic content-based indexing and retrieval

Nuno Miguel Vasconcelos, Andrew B. Lippman

Show abstract

By formulating content-based retrieval as a problem of Bayesian inference we have previously developed a retrieval framework with various interesting properties: (1) allows the incorporation of prior beliefs about image relevance in the retrieval process, (2) leads to simple and intuitive mechanisms for combining information from several modalities, such as images, audio, and text during retrieval, (3) provides support for the development of interfaces that learn from user interaction, (4) allows retrieval directly from compressed bitstreams, and (5) lends itself to the construction of indexing structures which can also be computed as a side effect of the compression process.

CSVD: approximate similarity searches in high-dimensional spaces using clustering and singular value decomposition

Alexander Thomasian, Vittorio Castelli, Chung-Sheng Li

Show abstract

Many data-intensive applications, such as content-based retrieval of images or video from multimedia databases and similarity retrieval of patterns in data mining, require the ability of efficiently performing similarity queries. Unfortunately, the performance of nearest neighbor (NN) algorithms, the basis for similarity search, quickly deteriorates with the number of dimensions. In this paper we propose a method called Clustering with Singular Value Decomposition (CSVD), combining clustering and singular value decomposition (SVD) to reduce the number of index dimensions. With CSVD, points are grouped into clusters that are more amenable to dimensionally reduction than the original dataset. Experiments with texture vectors extracted from satellite images show that CSVD achieves significantly higher dimensionality reduction than SVD along for the same fraction of total variance preserved. Conversely, for the same compression ratio CSVD results in an increase in preserved total variance with respect to SVD (e.g., at 70% increase for a 20:1 compression ratio). Then, approximate NN queries are more efficiently processed, as quantified through experimental results.

Similarity search in content-based multimedia retrieval using spatial grid files

Adil Alpkocak, Esen Ozkarahan

Show abstract

A multimedia object is represented by several kinds of feature vectors which are (semi)-automatically extracted. Therefore, a multimedia information system must employ multiple search and indexing mechanisms due to integration of multiple search modalities such as querying and for indexing, keywords and other content-based feature vectors extracted from multimedia objects such as color, texture, shape or sketch. However, today there is no indexing method for content based retrieval of such feature combinations and there is an increasing need for multiple-feature indexing in content based multimedia retrieval systems. In this paper, we introduce a new file structure called Spatial Grid File (SGF). This file structure enables to index multimedia objects by different and independent high-dimensional feature vectors. Moreover, with it, complex queries involving combined retrieval can be efficiently performed. It has unique feature in combining set of search modes each having different properties and similarity metrics. The detailed descriptions of SGF and a summary of the results obtained from the experiments are included.

VOD data management on tape-based tertiary storage systems

Jihad Boulos, Kinji Ono

Show abstract

Video-on-Demand servers are becoming feasible. These servers have voluminous data to store and manage. If only disk-based secondary storage systems are used to store and manage this huge amount of data the system cost would be extensively high. A tape-based tertiary storage system seems to be a reasonable solution to lowering the cost of storage and management of this continuous data. However, the usage of a tertiary storage system to store large continuous data introduces several issues. These are mainly the replacement policy on disks, the decomposition and the placement of continuous data chunks on tapes, and the scheduling of multiple requests for materializing objects from tapes to disks. In this paper we address these issues and we proposed solutions based on some heuristics we experimented in a simulator. We first extend a replacement policy that has been proposed for a single user environment to a multi-user one with several servicing streams. We then study different policies for continuous object decomposition and chunks placement on tapes under different characteristics of the tertiary storage drives. Finally, we propose a scheduling algorithm for object materialization; this algorithm guarantees the materialization on disks of all chunks of an object at their service deadlines in a pipelined service. We present the results of some simulations we made to measure the impacts of our proposed algorithms on the average latency time of the system.

Design cost analysis of a video-on-demand server with different rejection rates

Meng-Huang Lee, Shin-Hung Chang, Ching Hai Chou

Show abstract

This paper bases on the file caching mechanism proposed in our previous work called the generalized relay mechanism, to explore the relationship between the system rejection rates and VOD server design cost. According to traditional VOD design methodology, the amount of system bandwidth required depends on the number of clients that the VOD system wants to support. However, it is hard to practice that all the system clients will request the VOD services at the same time. The design of system bandwidth just accounted for the number of system clients is not cost effective. If a certain rejection rate is allowed in the system, the VOD server design cost will be down in a certain level. Our simulation shows the results for a VOD system server designer to tradeoff the system cost and system rejection rate.

Data striping scheme of VBR video on zoned disk array

Shiao-Li Tsao, Meng-Chang Chen, Jan-Ming Ho, et al.

Show abstract

In this paper, we study the striping of VBR videos on zoned disk array. We apply constant read time approach and round robin block permutation scheme to solve this problem. By using dynamic programming technique, we can obtain an optimal data striping and disk layouts. Moreover, we adopt an admission control policy to dispatch requests on hard disks in order to provide guaranteed services. Simulation results show that our method achieves load balancing on hard disks of a disk array, efficiently utilizes zoned-disk bandwidths and obtains a significant improvement than the traditional data striping scheme.

Integrated multimedia information system on interactive CATV network

Meng-Huang Lee, Shin-Hung Chang

Show abstract

In the current CATV system architectures, they provide one- way delivery of a common menu of entertainment to all the homes through the cable network. Through the technologies evolution, the interactive services (or two-way services) can be provided in the cable TV systems. They can supply customers with individualized programming and support real- time two-way communications. With a view to the service type changed from the one-way delivery systems to the two-way interactive systems, `on demand services' is a distinct feature of multimedia systems. In this paper, we present our work of building up an integrated multimedia system on interactive CATV network in Shih Chien University. Besides providing the traditional analog TV programming from the cable operator, we filter some channels to reserve them as our campus information channels. In addition to the analog broadcasting channel, the system also provides the interactive digital multimedia services, e.g. Video-On- Demand (VOD), Virtual Reality, BBS, World-Wide-Web, and Internet Radio Station. These two kinds of services are integrated in a CATV network by the separation of frequency allocation for the analog broadcasting service and the digital interactive services. Our ongoing work is to port our previous work of building up a VOD system conformed to DAVIC standard (for inter-operability concern) on Ethernet network into the current system.

VideoZoom: a spatiotemporal video browser for the Internet

John R. Smith

Show abstract

We describe a system for browsing and interactively retrieving video sequences over the Internet at multiple spatial and temporal resolutions. Each video sequence is decomposed into a hierarchy of video view elements that are retrieved in a progressive fashion. The client browser builds the views of the video sequence by retrieving, caching and assembling the view elements, as needed. This allows the user to quickly browse the video over the Internet by starting with a coarse, low-resolution view and by selectively zooming-in along the temporal and spatial dimensions. We demonstrate that the video view element method is able to represent and deliver the video in a compact form while significantly speeding up the access and progressive retrieval over the Internet.

Semantically controlled content-based retrieval of video sequences

Giridharan Iyengar, Andrew B. Lippman

Show abstract

In this paper, we present a technique for automatic classification of movies based on their content. This technique analyses shot duration and motion energy of movie trailers to characterize them as Action/Character movies. This approach is then combined with a features-based technique for content-based retrieval of video. Experiments indicate a high retrieval accuracy (greater than 96%) together with semantic-control (Action vs. Character) with this combined approach.

VORTEX: video retrieval and tracking from compressed multimedia databases--template matching from MPEG-2 video compression standard

Dan Schonfeld, Dan Lelescu

Show abstract

In this paper, a novel visual search engine for video retrieval and tracking from compressed multimedia databases is proposed. Our approach exploits the structure of video compression standards in order to perform object matching directly on the compressed video data. This is achieved by utilizing motion compensation--a critical prediction filter embedded in video compression standards--to estimate and interpolate the desired method for template matching. Motion analysis is used to implement fast tracking of objects of interest on the compressed video data. Being presented with a query in the form of template images of objects, the system operates on the compressed video in order to find the images or video sequences where those objects are presented and their positions in the image. This in turn enables the retrieval and display of the query-relevant sequences.

Video retrieval method using shotID for copyright protection systems

Ryoji Muranoi, Jiying Zhao, Rina Hayasaka, et al.

Show abstract

In this paper, we define a shot-identifier (we called this shotID) which easily represents characteristics of each shot. The shotID is calculated from positional and color information. And we propose a video retrieval method using the shotID. The proposed method is robust to compression, format conversion, frame dropping and noise such as watermark and so on. Furthermore, we realize a copyright protection system for digital video with watermarking technique using ideas from spread spectrum communications based on our retrieval method.

TV program skimming method using similarity of scene descriptions

Ichiro Yamada, Hideki Sumiyoshi, Yeun-Bae Kim, et al.

Show abstract

In a broadcasting station, it is important to establish an environment that facilitates the reuse of TV programs and legacy materials. Therefore it is necessary to build efficient video archives that allow for the accumulation of a large number of TV programs. A video skimming system is very useful because it helps us quickly grasp the contents of accumulated TV programs in video archives. In this paper, we propose a video skimming method based on information about a TV program, which have natural language annotations that describe the contents of TV programs. Using these annotations, we can generate skimmed TV programs by extracting representative video scenes. We performed a series of skimming experiments for TV programs, and obtained good results.

Video segmentation in the wavelet domain

Mrinal K. Mandal, Sethuraman Panchanathan

Show abstract

Automatic video indexing is an important feature in the design of a video databases. Recently, compressed domain techniques have become popular due their inherent advantages of efficiency and reduced complexity. Wavelet transform is emerging as a powerful tool for efficient compression of visual information. A variety of wavelet based video compression techniques have been reported in the literature. However, there has been little work done in the area of video indexing in the wavelet domain. In this paper, we present video segmentation techniques in a wavelet based compression framework. These techniques employ wavelet coefficients, their distribution, and the motion vectors estimated by the associated video coder. Simulation result shows that the proposed techniques provide a good indexing performance at a low complexity.

Online scene change detection of multicast (MBone) video

Wensheng Zhou, Ye Shen, Asha Vellaikal, et al.

Show abstract

Many multimedia applications, such as multimedia data management systems and communication systems, require efficient representation of multimedia content. Thus semantic interpretation of video content has been a popular research area. Currently, most content-based video representation involves the segmentation of video based on key frames which are generated using scene change detection techniques as well as camera/object motion. Then, video features can be extracted from key frames. However most of such research performs off-line video processing in which the whole video scope is known as a priori which allows multiple scans of the stored video files during video processing. In comparison, relatively not much research has been done in the area of on-line video processing, which is crucial in video communication applications such as on-line collaboration, news broadcasts and so on. Our research investigates on-line real-time scene change detection of multicast video over the Internet. Our on-line processing system are designed to meet the requirements of real-time video multicasting over the Internet and to utilize the successful video parsing techniques available today. The proposed algorithms extract key frames from video bitstreams sent through the MBone network, and the extracted key frames are multicasted as annotations or metadata over a separate channel to assist in content filtering such as those anticipated to be in use by on-line filtering proxies in the Internet. The performance of the proposed algorithms are demonstrated and discussed in this paper.

Dominant motion estimation and video partitioning with a 1D signal approach

Fabrice Coudert, Jenny Benois-Pineau, Dominique Barba

Show abstract

This paper presents a novel approach for an automatic partitioning of video sequences based on scene change detection and global motion estimation. The method is based on a 1D representation of images, the Bin transform, which is a discrete version of the Radon transform. Analysis of the motion and detection of the scene change are realized in the transform domain using online statistical techniques. The analysis of a 1D signal rather than the mostly used 2D image signal limits computational complexity by itself and permits fast algorithms.

Head gesture recognition technique for visual user interface

Su-Hwan Kim, Hyunil Choi, Ji-Beom Yoo, et al.

Show abstract

This paper addresses a technique of recognizing a head gesture. The proposed system is composed of eye tracking and head motion decision. The eye tracking step is divided into face detection and eye location. Face detection obtains the face region using neural network and mosaic image representation. Eye location extracts the location of eyes from the detected face region. Eye location is performed in the region close to a pair of eyes for real-time eye tracking. If a pair of eyes is not located, face detection is performed again. After eye tracking is performed, the coordinates of the detected eye are transformed into the normalized vector of the x-coordinate and the y-coordinate. Three methods are tested for head motion decision: head gesture recognition with direct observation, head gesture recognition using two HMMs, and head gesture recognition using three HMMs. Head gesture can be recognized by direct observation of the variation of the vector, but the variation of the vector can be observed by two-HMMs for more accurate recognition. However, because this method doesn't recognize neutral head gesture, three-HMMs learned by a directional vector is adopted. The directional vector represents the direction of head movement. The vector is inputted into HMMs to determine neutral gesture as well as positive and negative gesture. Combined head gesture recognition using above three methods is also discussed. The experimental results are reported.

Modeling and analysis of hierarchical storage system for massive-scale multimedia server

Youjip Won, Jaideep Srivastava

Show abstract

In this article, we investigate the performance behavior of the hierarchical storage architecture. To maximize the performance of the hierarchical storage system, the capacity and bandwidth of individual storage hierarchical needs to be well balanced. Another important factor which governs the performance of hierarchical storage system is the user access pattern. We use two orthogonal performance metrics, namely expected service time and system congestion and blocking probability to evaluate the server performance. We establish an analytical formulation for the given performance metrics and investigate how these performance metrics are affected by the parameter of the individual storage hierarchy. This analytical formulation enables us to investigate the effects of different configurations of the storage hierarchy and different data access patterns; it also provides a framework to determine the optimal configuration of storage hierarchies. The results of a simulation-based performance evaluation are also presented.

Caching Web objects using Zipf's law

Dimitrios N. Serpanos, Wayne H. Wolf

Show abstract

We assume that Zipf's law governs the accesses in the Web. Based on this assumption, we show that caching Web objects can lead to very high hit rates. Furthermore, we can calculate the size of the cache that is required to achieve a given hit ratio. Assuming that Zipf's law continually applies to accesses of Internet objects, we argue that a cache replacement policy based on the concept or the Least Frequently Used object suffices to obtain the theoretically achievable results, assuming a large enough cache. Using this theory, we show how results available in the literature can be explained. We also show how a designer can make decisions and develop caching architectures for networks based on the profiles of average users. We propose clip- based caching of video objects rather than caching of complete objects. Clips are disseminated in video library and browsing environments and are small enough to be properly managed by a cache scheme based on Zipf's law.

Distributed multimedia data storage for dependability, scalability, and high performance

Qutaibah M. Malluhi, Gwang S. Jung

Show abstract

In this paper, we describe a method for enabling efficient and dependable multimedia data storage and transfer in a large scale distributed computing and communication environment such as Web environment. The proposed method has several advantages over traditional ones: high data rates, scalability, availability, reliability, and seamless system controlled load balancing.

Selective placement and replication strategies for storing audio clips in a naval application

Cyrus Shahabi, Latifur Khan

Show abstract

Reduced manning in the naval system requires that a given amount of tasks be performed for time-critical and externally paced missions in an accurate and timely manner utilizing less number of personnel in a warship. The MultiModal WatchStation (MMWS) in navy ships provides a mechanism to aid in this endeavor. The focus of this study is on the storage and management of audio clips generated by MMWS operators within and across the navy ships. The generated audio clips must be stored in a persistent storage (i.e., secondary and tertiary storage) in order to facilitate future query on the past messages. Since the persistent storage is distributed across many sites (i.e., ships), a query response time might be intolerable if the requested clips are not stored local to the party submitting the query. In order to improve the query response time, traditional solutions tend to replicate data across all the sites. One problem preventing them from this over- replication is the overhead associated with ensuring data consistency when information is modified. Within the MMWS application, however, the size of audio clips are large and hence it is not cost effective to replicate them at every site. Furthermore, the audio clips once stored are never modified and hence consistency is not an issue. Finally, the types of queries on stored audio clips, and their frequency of submissions are known a priori. This knowledge can be exploited for a selective replication which can improve query response time. Therefore, focusing on MMWS application, we propose a dynamic placement/replacement strategy which intelligently stores read-only and large-size audio clips by considering query workload. We compared the performance of our dynamic strategy with other alternative static strategies using a simulation experiment. Our results indicate that the dynamic strategy is promising with regards to query response time.

Multilayered storage and transmission for animated 3D polygonal meshes

K. Selcuk Candan, Michael G. Wagner

Show abstract

Highly detailed geometric models are rapidly becoming common-place in computer graphics and multimedia. MPEG4 is introducing standards for 3D polygon meshes. Analogously to video, highly complex polygon meshes challenge not only rendering performance but also transmission bandwidth and storage capacities. Current work focuses only on the compression, and progressive refinement and transmission of static polygon meshes. On the other hand, geometric content is no longer static, it allows user interaction and complex changes within the geometry. Since this emerging problem has to be addressed in any future multimedia standard this paper aims to open the research by exploring similarities and differences in transmission and storage of dynamically changing polygon meshes and geometry compared to other stream data, such as video. In this paper we propose techniques for multi-layered storage and transmission of animated geometry in low bandwidth environments. Our approach extends existing technologies for multi-layered representation of static polygon meshes to dynamic architectures.

Dynamic storage in resource-scarce browsing multimedia applications

Herman Elenbaas, Nevenka Dimitrova

Show abstract

In the convergence of information and entertainment there is a conflict between the consumer's expectation of fast access to high quality multimedia content through narrow bandwidth channels versus the size of this content. During the retrieval and information presentation of a multimedia application there are two problems that have to be solved: the limited bandwidth during transmission of the retrieved multimedia content and the limited memory for temporary caching. In this paper we propose an approach for latency optimization in information browsing applications. We proposed a method for flattening hierarchically linked documents in a manner convenient for network transport over slow channels to minimize browsing latency. Flattening of the hierarchy involves linearization, compression and bundling of the document nodes. After the transfer, the compressed hierarchy is stored on a local device where it can be partly unbundled to fit the caching limits at the local site while giving the user availability to the content.

Managing compressed multimedia data in a memory hierarchy: fundamental issues and basic solutions

Jari Veijalainen, Eetu Ojanen

Show abstract

The purpose of the work is to discuss the fundamental issues and solutions in managing compressed and uncompressed multimedia data, especially voluminous continuous mediatypes (video, audio) and text in a memory hierarchy with four levels (main memory, magnetic disk, (optical or magnetic) on-line/near-line low-speed memory, and slow off-line memory, i.e. archive). We view the multimedia data in such a database to be generated, (compressed), and stored into the memory hierarchy (at the lowest non-archiving level), and subsequently retrieved, (decompressed), and presented. It unused, the data either travels down in the memory hierarchy or it is compressed and stored at the same level. We first discuss the general prerequisites of the memory hierarchy, like program locality and decreasing storage costs and performance of each deeper level. To discuss the issues in a greater depth a schematic four level memory hierarchy model is presented.

Hierarchical system for content-based audio classification and retrieval

Tong Zhang, C.-C. Jay Kuo

Show abstract

A hierarchical system for audio classification and retrieval based on audio content analysis is presented in this paper. The system consists of three stages. The audio recordings are first classical and segmented into speech, music, several types of environmental sounds, and silence, based on morphological and statistical analysis of temporal curves of the energy function, the average zero-crossing rate, and the fundamental frequency of audio signals. The first stage is called the coarse-level audio classification and segmentation. Then, environmental sounds are classified into finer classes such as applause, rain, birds' sound, etc., which is called the fine-level audio classification. The second stage is based on time-frequency analysis of audio signals and the use of the hidden Markov model (HMM) for classification. In the third stage, the query-by-example audio retrieval is implemented where similar sounds can be found according to the input sample audio. The way of modeling audio features with the hidden Markov model, the procedures of audio classification and retrieval, and the experimental results are described. It is shown that, with the proposed new system, audio recordings can be automatically segmented and classified into basic types in real time with an accuracy higher than 90%. Examples of audio fine classification and audio retrieval with the proposed HMM-based method are also provided.

Multimedia content customization for universal access

Rakesh Mohan, John R. Smith, Chung-Sheng Li

Show abstract

Content delivery over the Internet, in order to allow universal access, needs to address both the multimedia nature of the content and the capabilities of the diverse client platforms the content is being delivered to. We present a system that tailors multimedia content to optimally match the capabilities of the client device requesting it. This system has three key components: (1) a representation scheme called the InfoPyramid, (2) a set of transcoders for converting modality or resolution, and (3) a customizer that selects the best content representation to meet the client capabilities while delivering the most value.

Data declustering for efficient range and similarity searching

Sunil Prabhakar, Divyakant Agrawal, Amr El Abbadi

Show abstract

Advances in processor and network technologies have catalyzed the growth of data intensive applications such as image repositories and digital libraries. The lack of commensurate improvements in storage systems have resulted in I/O becoming a major bottleneck in modern systems. The use of parallel I/O from multiple devices is a well known technique for improving I/O performance. A key factor in exploiting parallel I/O is knowledge of the access pattern-- the sets of data items that are likely to be accessed concurrently should be declustered across the disks. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia databases. Declustering schemes tailored for improving the performance of range only or similarity only queries have been proposed in the literature. The problem of declustering for combined range and similarity queries has not been addressed in the literature.

Web object collection: here or there?

Ruth Kurniawati, Jesse S. Jin, John A. Shepherd

Show abstract

Although the advantages to performing the collection of web objects at the site where the objects reside seem obvious, most of the popular search engines use their own centralized or coarsely distributed collectors (robots). Sending each object to the collector is almost always the worst option with respect to resource usage. An alternative is to distribute the collection process by sending the collector to the source site, which has the obvious advantage of distributing the significant computational load involved in cataloguing, as well as giving opportunities for summarizing and compression. In this paper, we propose a system for distributed object cataloguing over the world wide web via lightweight collector agents. This approach differs to previous approaches such as Harvest in using small Java- based collectors that can be very easily deployed on the site being indexed, thus allowing much finer grained distribution of the collection task.

Database technology and the management of multimedia data in the Mirror project

Arjen P. de Vries, H. M. Blanken

Show abstract

Multimedia digital libraries require an open distributed architecture instead of a monolithic database system. In the Mirror project, we use the Monet extensible database kernel to manage different representation of multimedia objects. To maintain independence between content, meta-data, and the creation of meta-data, we allow distribution of data and operations using CORBA. This open architecture introduces new problems for data access. From an end user's perspective, the problem is how to search the available representations to fulfill an actual information need; the conceptual gap between human perceptual processes and the meta-data is too large. From a system's perspective, several representations of the data may semantically overlap or be irrelevant. We address these problems with an iterative query process and active user participating through relevance feedback. A retrieval model based on inference networks assists the user with query formulation. The integration of this model into the database design has two advantages. First, the user can query both the logical and the content structure of multimedia objects. Second, the use of different data models in the logical and the physical database design provides data independence and allows algebraic query optimization. We illustrate query processing with a music retrieval application.

Improved region growing method for lossless image compression

Tsong-Wuu Lin

Show abstract

Lossless compression techniques are essential in archival and communication of critical applications such as medical images. A segmentation-based lossless image coding (SLIC) method has been proposed, which is based on a simple region growing procedure. The embedded region growing procedure produces a discontinuity index map, an error image data part, and a seed data part. Both the error image data and the discontinuity index map data are encoded by the Joint Bi-level Image experts Group method. The SLIC method resulted in, on the average, lossless compression to about 1.6 b/pixel from 8 b, and to about 2.9 b/pixel from 10 b. In this paper, we propose an efficient region growing method to improve the speed and to keep the lossless compression. Our method takes only about 35% execution time which is required by the original one.

Fitting coding scheme for image wavelet representation

Artur Przelaskowski

Show abstract

Efficient coding scheme for image wavelet representation in lossy compression scheme is presented. Spatial-frequency hierarchical structure of quantized coefficient and their statistics is analyzed to reduce any redundancy. We applied context-based linear magnitude predictor to fit 1st order conditional probability model in arithmetic coding of significant coefficients to local data characteristics and eliminate spatial and inter-scale dependencies. Sign information is also encoded by inter and intra-band prediction and entropy coding of prediction errors. But main feature of our algorithm deals with encoding way of zerotree structures. Additional symbol of zerotree root is included into magnitude data stream. Moreover, four neighbor zerotree roots with significant parent node are included in extended high-order context model of zerotrees. This significant parent is signed as significant zerotree root and information about these roots distribution is coded separately. The efficiency of presented coding scheme was tested in dyadic wavelet decomposition scheme with two quantization procedures. Simple scalar uniform quantizer and more complex space-frequency quantizer with adaptive data thresholding were used. The final results seem to be promising and competitive across the most effective wavelet compression methods.

Image compression based on genealogical relation of the TSVQ indices

Jamshid Shanbehzadeh, Philip O. Ogunbona, Abdolhosain Sarafzadeh

Show abstract

The indices obtained by tree-structured vector quantization (TSVQ) have an interesting property that enables them to give information about the correlation between two image blocks. If two image blocks are highly correlated, they may have an identical index, or the same ancestors. The existence of high inter-block correlation in natural images results in having neighboring blocks with the same genealogy. This characteristic can be used to compress the indices. This paper introduces a novel method to exploit the genealogical relation between the image block indices obtained from a TSVQ. The performance of this scheme in terms of PSNR versus average rate was compared with some other similar image coders. The results show that this scheme has better compression capability in terms of objective and subjective quality over these schemes at bit rates less than 0.3 bpp.

Semantic search aspects of a MHEG-5-based multimedia object server

Marilde Terezinha Prado Santos, Marina Teresa Pires Vieira, Mauro Biajiz

Show abstract

There is currently some research effort being made in order to identify the needs to support the requirements for the representation and exchange of multimedia applications. To address these points, the ISO/IEC, developed the MHEG-5 international standard. For the SMmD project (Distributed Multimedia System: Support, Structure and Applications), the MHEG-5 standard was adopted as a reference for the representation of multimedia information. As part of this project, an MHEG-5 object server, named SOM-SMmD, was developed.

Vector-based approach to color image retrieval

Dimitrios Androutsos, Konstantinos N. Plataniotis, Anastasios N. Venetsanopoulos

Show abstract

In this paper we present a novel technique for image retrieval based on color. Our system is based on color segmentation where only a small number of representative color vectors are extracted from each image and used to build image indices. These vectors are then used with vector distance measures to determine similarity between a query color and a database image. We test numerous popular vector distance measures in our system, along with popular histogram techniques, and find that angular directional measures using our technique provide more accurate and perceptually relevant retrievals.

Hybrid subband coder

David S. Choi, Sos S. Agaian, Joseph P. Noonan

Show abstract

In this article, we introduce a new subband structure called hybrid subband coder, which uses both linear and morphological filters at each analysis and synthesis stage. This new subband coder significantly reduces the ringing- effect distortions inherent to the linear standard coders and the smearing-effect distortions associated with the nonlinear, morphological subband coders. The performance of this new structure is compared to the linear and nonlinear subband coder results using a set of four grayscale images. The results show a modest improvement in PSNR and in visual quality.

Correlated image set compression system based on new fast efficient algorithm of Karhunen-Loeve transform

Yurij S. Musatenko, Vitalij N. Kurashov

Show abstract

The paper presents improved version of our new method for compression of correlated image sets Optimal Image Coding using Karhunen-Loeve transform (OICKL). It is known that Karhunen-Loeve (KL) transform is most optimal representation for such a purpose. The approach is based on fact that every KL basis function gives maximum possible average contribution in every image and this contribution decreases most quickly among all possible bases. So, we lossy compress every KL basis function by Embedded Zerotree Wavelet (EZW) coding with essentially different loss that depends on the functions' contribution in the images. The paper presents new fast low memory consuming algorithm of KL basis construction for compression of correlated image ensembles that enable our OICKL system to work on common hardware. We also present procedure for determining of optimal losses of KL basic functions caused by compression. It uses modified EZW coder which produce whole PSNR (bitrate) curve during the only compression pass.

Fuzzy logic for multiregion growing

Xavier Jove-Boix, Francesc Tarres-Ruiz, Eva Fresco

Show abstract

The paper is focused to the region segmentation, which will be unsupervised and reached in one step. That is: the algorithm has to grow the regions in the characteristics space at the same time it makes the segmentation (clusters) and only one opportunity is available. Moreover, the algorithm uses its position in the image. Every pixel in the image is checked only once. The system has two different blocks: (1) Initialization of a region; (2) growing all regions together. As we want one iteration unsupervised algorithm, we must have several options in each algorithm step. Moreover, we implement a multi-region growing system to check every pixel once in the image. Our work develops a quasi-optimal solution to an unsupervised one step algorithm based on multi-region growing. Some algorithm details are explained below.

Fractal zooming of thumbnails for progressive image coding

Marcella Ancis, W. -P Buchwald, Daniel D. Giusto, et al.

Show abstract

Many multimedia applications deal with image progressive transmission in order to reduce channel bandwidth and to allow for interactivity between users and service providers. In this framework, the paper presents a novel two-source coding scheme where thumbnail data are interpolated by fractal zooming to obtain higher resolution images; the relevant residual information is then coded by vector quantizing the wavelet decomposition coefficients. Performance of this novel scheme is evaluated on a variety of pictures, confirming its validity both in quantitative and qualitative terms (absence of block distortion).

Indexing and retrieval of multimedia objects at different levels of granularity

Pascal Faudemay, Gwenael Durand, Claude Seyrat, et al.

Show abstract

Intelligent access to multimedia databases for `naive user' should probably be based on queries formulation by `intelligent agents'. These agents should `understand' the semantics of the contents, learn user preferences and deliver to the user a subset of the source contents, for further navigation. The goal of such systems should be to enable `zero-command' access to the contents, while keeping the freedom of choice of the user. Such systems should interpret multimedia contents in terms of multiple audiovisual objects (from video to visual or audio object), and on actions and scenarios.

Querying and navigating of multimedia objects

Miao Chunyan, Michael Junke Hu

Show abstract

The rapidly growing interest in building up multimedia applications has created a need for applying database technology to multimedia systems, to support the efficient access, query, and retrieval of complex multimedia information. In this paper, an Object Relational Multimedia Data Model (ORMD) has been proposed for modeling of multimedia objects. The ORMD model supports not only various relationships but also the hyper links among the multimedia objects. Therefore, querying and navigating of multimedia objects can be facilitated. Based on this data model, an infrastructure design of multimedia database system has been presented. Multimedia extensions of data types and query language have been designed to facilitate various queries of multimedia objects. Currently, we are implementing a prototype, called Multimedia Information Retrieval System to support the efficient querying and navigating of multimedia objects. The development of MIR System shows the potential of features of extending it to diverse multimedia applications, such as digital libraries, Internet video server etc.

Interactive progressive encoding system for transmission of complex images

Borko Furht, Yingli Wang, Joseph Celli

Show abstract

In this paper, we describe an interactive progressive JPEG- based encoding technique, which is suitable for an efficient network transmission of complex images. This system is intended for those applications that require a fast transmission of complex, high-quality, high-resolution images over networks with a limited communication bandwidth. These applications include transmission of medical images, space and earth exploration applications, as well as Internet applications. The proposed technique can be also applied to image archive and browsing systems.

Fast content-based multimedia retrieval technique using compressed data

Borko Furht, Pornvit Saksobhavivat

Show abstract

In this paper, we present a novel technique that can be used for fast similarity-based indexing and retrieval of both image and video databases in distributed environments. We assume that image or video databases are stored in the compressed form using standard techniques such as JPEG for images, and M-JPEG or MPEG for videos. The existing techniques, proposed in the literature, use computationally intensive features and cost functions for content-based image and video retrieval and indexing. The proposed algorithm uses an innovative approach based on histograms of DC coefficients only, and therefore is computationally less expensive than the other approaches.

Multimedia Storage and Archiving Systems III

Volume Details

Table of Contents

Table of Contents