Proceedings Volume 5558

Applications of Digital Image Processing XXVII

cover
Proceedings Volume 5558

Applications of Digital Image Processing XXVII

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 2 November 2004
Contents: 10 Sessions, 93 Papers, 0 Presentations
Conference: Optical Science and Technology, the SPIE 49th Annual Meeting 2004
Volume Number: 5558

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Applications of Neural Networks and Fuzzy Systems
  • Algorithmic Issues
  • Advanced Applications
  • JPEG 2000: Theory and Applications
  • Multimedia Networking and Architectures
  • Advances in the New Emerging Standard: H.264/AVC I
  • Advances in the New Emerging Standard: H.264/AVC II
  • Imaging and Representation I
  • Imaging and Representation II
  • Poster Session
  • Imaging and Representation II
  • Multimedia Networking and Architectures
Applications of Neural Networks and Fuzzy Systems
icon_mobile_dropdown
Dynamic segmentation of gray-scale images in a computer model of the mammalian retina
Garrett T. Kenyon, Neal R. Harvey, Gregory J. Stephens, et al.
Biological studies suggest that neurons in the mammalian retina accomplish a dynamic segmentation of the visual input. When activated by large, high contrast spots, retinal spike trains exhibit high frequency oscillations in the upper gamma band, between 60 to 120 Hz. Despite random phase variations over time, the oscillations recorded from regions responding to the same spot remain phase locked with zero lag whereas the phases recorded from regions activated by separate spots rapidly become uncorrelated. Here, a model of the mammalian retina is used to explore the segmentation of high contrast, gray-scale images containing several well-separated objects. Frequency spectra were computed from lumped spike trains containing 2×2 clusters of neighboring retinal output neurons. Cross-correlation functions were computed between all cell clusters exhibiting significant peaks in the upper gamma band. For each pair of oscillatory cell clusters, the cross-correlation between the lumped spike trains was used to estimate a functional connectivity, given by the peak amplitude in the upper gamma band of the associated frequency spectra. There was a good correspondence between the largest eigenvalues/eigenvectors of the resulting sparse functional connectivity matrix and the individual objects making up the original image, yielding an overall segmentation comparable to that generated by a standard watershed algorithm.
A neural network model of optical gyros drift errors with application to vehicular navigation
Inertial navigation systems (INS) incorporating three mutually orthogonal accelerometers and three mutually orthogonal gyroscopes are integrated with global positioning systems (GPS) to provide reliable and accurate positioning information for vehicular navigation. Because of their high reliability and accuracy, ring laser gyroscopes (RLG) and fiber optic gyroscopes (FOG) are usually utilized inside most of the present INS. However, bias drift at the output of these optical gyroscopes may deteriorate the performance of the overall INS/GPS navigation system. This paper introduces a method to enhance the performance of optical gyros in two phases. The first phase utilizes wavelet multi-resolution analysis to band limit the gyro measurement and improves its signal-to-noise ratio. The second phase employs radial-basis function (RBF) neural networks to predict drift errors. The drift model provided by the RBF network is established using the gyro raw measurement and time as inputs and provides the drift error at its output. The RBF neural networks are utilized in this study since they generally have simpler architecture and faster training procedure than other neural network types. The proposed method is applied to E-core 2000 FOG (KVH Industries Inc., Rhode Island, USA).
Application of genetic algorithms to processing of reflectance spectra of semiconductor compounds
Ivan S. Zaharov, Alexey V. Kochura, Alexandr Y. Kurkin, et al.
The basic task of mathematical processing of reflectance spectra is the calculation of the dielectric function of the substance, which describe response of a crystal to an external electromagnetic field. The most modern and perspective way of the solution of this task is the dispersion analysis [Lorentz model (LM)]. However LM requires large volume of computing works at phonons optimum parameters selection. The rapid computer facilities development promotes overcoming of this difficulty. However without application of effective methods of optimization practically it is impossible to execute DA for composite reflectance spectra. The efficiency GA strongly depends on such details, as a solutions coding method, genetic operations embodying, selection mechanisms, other algorithm parameters adjustment, success criterion. In this paper we offer modification GA for the solution of the reflectance spectra processing problem and results of the obtained algorithm work.
Genetic algorithms for geometry optimization in lithographic imaging systems
This paper illustrates the use of genetic algorithms (GA) in optimizing mask and illumination source geometries for lithographic imaging systems. The main goal of the proposed optimization process is to find optimum conditions for the generation of certain features like lines and spaces patterns or arrays of contact holes by optical projection lithography. Therefore, different optical resolution enhancement techniques, such as optical proximity correction (OPC) by sub-resolution assists, phase shift masks, and off-axis illumination techniques are combined and mutually optimized. This paper focuses on improving both the genetic algorithm's settings and the representation of the mask and source geometries. It is shown that these two issues have a significant impact on the convergence behavior of the GA. Different representation types for the mask and source geometry are introduced, and their advantages and problems are discussed. One of the most critical tasks in formulating the optimization problem is to set up an appropriate fitness function. In our case, the fitness function consists of five sub-functions, which ensure valid geometries, correct feature dimensions, a stable process for different defocus settings, the mask's manufacturability and inspectability, and that no other features besides the specified target are printed. In order to obtain a stable and fast convergence these criteria have to be assessed. Different weight settings are introduced and their impact on the convergence behavior is discussed. Several results show the potential of the proposed approach and directions for further improvements.
A learning-based codebook design for vector quantization using evolution strategies
This paper presents a learning based codebook design algorithm for vector quantization of digital images using evolution strategies (ES). This technique embeds evolution strategies into the standard competitive learning vector quantization algorithm (CLVQ) and efficiently overcomes its problems of under-utilization of neurons and initial codebook dependency. The embedding of ES greatly increases the algorithm’s capability of avoiding the local minimums, leading to global optimization. Experimental results demonstrate that it can obtain significant improvement over CLVQ and other comparable algorithms in image compression applications. In comparison with the FSLVQ and KSOM algorithms, this new technique is computationally more efficient and requires less training time.
Soft computing and small system integration
In this paper, we discuss two important topics: the first one, soft computing, is directly related to the digital image processing, while the second one, Small Systems Integration (SSI), is a broad-range extension of the first one, dealing with such interdisciplinary subjects, as optics, electronics, mechanics, materials, and heat management.
Algorithmic Issues
icon_mobile_dropdown
Reliable band-to-band registration of Multispectral Thermal Imager data using multivariate mutual information and cyclic consistency
In multispectral imaging, automated cross-spectral (band-to-band) image registration is difficult to achieve with a reliability approaching 100%. This is particularly true when registering infrared to visible imagery, where contrast reversals are common and similarity is often lacking. Algorithms that use mutual information as a similarity measure have been shown to work well in the presence of contrast reversal. However, weak similarity between the long-wave infrared (LWIR) bands and shorter wavelengths remains a problem. A method is presented in this paper for registering multiple images simultaneously rather than one pair at a time using a multivariate extension of the mutual information measure. This approach improves the success rate of automated registration by making use of the information available in multiple images rather than a single pair. This approach is further enhanced by including a cyclic consistency check, for example registering band A to B, B to C, and C to A. The cyclic consistency check provides an automated measure of success allowing a different combination of bands to be used in the event of a failure. Experiments were conducted using imagery from the Department of Energy’s Multispectral Thermal Imager satellite. The results show a significantly improved success rate.
An algebraic restoration method for estimating fixed-pattern noise in infrared imagery from a video sequence
The inherent nonuniformity in the photoresponse and readout-circuitry of the individual detectors in infrared focal-plane-array imagers result in the notorious fixed-pattern noise (FPN). FPN generally degrades the performance of infrared imagers and it is particularly problematic in the midwavelength and longwavelength infrared regimes. In many applications, employing signal-processing techniques to combat FPN may be preferred over hard calibration (e.g., two-point calibration), as they are less expensive and, more importantly, do not require halting the operation of the camera. In this paper, a new technique that uses knowledge of global motion in a video sequence to restore the true scene in the presence of FPN is introduced. In the proposed setting, the entire video sequence is regarded as an output of a motion-dependent linear transformation, which acts collectively on the true scene and the unknown bias elements (which represent the FPN) in each detector. The true scene is then estimated from the video sequence according to a minimum mean-square-error criterion. Two modes of operation are considered. First, we consider non-radiometric restoration, in which case the true scene is estimated by performing a regularized minimization, since the problem is ill-posed. The other mode of operation is radiometric, in which case we assume that only the perimeter detectors have been calibrated. This latter mode does not require regularization and therefore avoids compromising the radiometric accuracy of the restored scene. The algorithm is demonstrated through preliminary results from simulated and real infrared imagery.
Efficient computing architectures for processing digital image data
Recursive partitioned architectures in the spatial domain and block-based discrete transform domain algorithms are developed to retrieve information from digital image data. In the former case, the data is represented in forms which facilitate efficient optimization of the objective criterion. In the latter case, discrete, real, and circular Fourier transforms of the data blocks are filtered by coefficients chosen in the discrete frequency domain. This has led to improvements in the feature detection and localization process. In this paper, recursive and iterative approaches are applied to achieve the restoration of digital images from linearly degraded samples. In addition, block-based algorithms are employed to segment images.
Correlation-based rotational signature of planar binary objects
In this paper we introduce the affine spatial overlap operation defined as a generalization of the two-dimensional convolution and cross-correlation operations. Our attention is focused on the rotational overlap operation and some of its mathematical properties as well as its application to shape description. Based on the auto-rotational overlap operation, a one-dimensional signature is proposed as a shape representation for planar binary objects with bounded support. We provide illustrative examples of its digital computation using different binary objects. In addition, physical realization of the 2-D rotational overlap operation is demonstrated with a hybrid optical-digital system for real time processing. The experimental setup uses a microcomputer controlled high-precision rotatory stage for performing analog rotations at the input plane of an incoherent two lens correlator architecture.
Pattern recognition based on rank correlations
Adaptive nonlinear filters based on nonparametric Spearman’s correlation between ranks of an input scene computed in a moving window and ranks of a target for illumination-invariant pattern recognition are proposed. Several properties of the correlations are investigated. Their performance for detection of noisy objects is compared to the conventional linear correlation in terms of noise robustness and discrimination capability. Computer simulation results for a test image corrupted by mixed additive and impulsive noise are provided and discussed.
Multiple-description video coding through adaptive segmentation
Multiple description video coding is one method that can be used to reduce detrimental effects caused by transmission over lossy packet networks. In a multiple description system, a video sequence is segmented into two or more complimentary streams in such a way that each stream is independently decodable. When combined, the streams provide the highest level of quality, yet if one of the streams is lost or delivered late the video can be played out with only a slight reduction in overall quality. Each approach to multiple description coding consists of a tradeoff between compression efficiency and robustness. How efficiently each method achieves this tradeoff depends on the level of quality and robustness desired and on the characteristics of the video itself. Previous approaches to multiple description coding have made the assumption that a single segmentation method would be used for an entire sequence. Yet, the optimal method of segmentation can vary depending on the goals of the system, it can change over time, and it can vary within a frame. This work introduces a unique approach to multiple description coding through the use of adaptive segmentation. By selecting from a set of segmentation methods, the system adapts to the local characteristics of the video and maximizes tradeoff efficiency. We present an overview of this system and analyze its performance on real video sequences.
Complexity scalable motion-compensated temporal filtering
Tom Clerckx, Fabio Verdicchio, Adrian Munteanu, et al.
Computer networks and the internet have taken an important role in modern society. Together with their development, the need for digital video transmission over these networks has grown. To cope with the user demands and limitations of the network, compression of the video material has become an important issue. Additionally, many video-applications require flexibility in terms of scalability and complexity (e.g. HD/SD-TV, video-surveillance). Current ITU-T and ISO/IEC video compression standards (MPEG-x, H.26-x) lack efficient support for these types of scalability. Wavelet-based compression techniques have been proposed to tackle this problem, of which the Motion Compensated Temporal Filtering (MCTF)-based architectures couple state-of-the-art performance with full (quality, resolution, and frame-rate) scalability. However, a significant drawback of these architectures is their high complexity. The computational and memory complexity of both spatial domain (SD) MCTF and in-band (IB) MCTF video codec instantiations are examined in this study. Comparisons in terms of complexity versus performance are presented for both types of codecs. The paper indicates how complexity scalability can be achieved in such video-codecs, and analyses some of the trade-offs between complexity and coding performance. Finally, guidelines on how to implement a fully scalable video-codec that incorporates quality, temporal, resolution and complexity scalability are proposed.
Generalized and optimized classification framework for textural imagery
A wide range of image processing studies are based on the extraction of texture features, the analysis of input data and the identification and design of appropriate classifiers given a particular application, for instance, in the fields of industrial inspection, remote sensing, medicine or biology amongst others. In this paper, we introduce a novel generalized classification framework for texture imagery based on a novel building blocks system architecture and present the advantages of such a system to tackle a variety of image analysis problems at the same time of obtaining good classification performances. Firstly, an overview of the system architecture is described from the texture feature extraction module to the data analysis and the classification building blocks. Thus, we obtain an optimized and generic classification framework which is highly flexible due to its scalable building blocks system approach and provides the facility to extend easily the study obtained for textural images to other kind of imagery. The results of this generalized classification framework are validated using imagery from two different application fields where texture plays a key role. The first one is in the field of remote sensing for agriculture crops classification and the second one, in the area of non-destructive industrial inspection.
Advanced Applications
icon_mobile_dropdown
New applications for mathematical morphology in urban feature extraction from high-resolution satellite imagery
Xiaoying Jin, Curt H. Davis
Recently available commercial high-resolution satellite imaging sensors provide an important source for urban remote sensing applications. The high spatial image resolution reveals very fine details in urban areas and greatly facilitates the extraction of urban-related features such as roads, buildings, and vehicles. Since many urban land cover types have significant spectral overlap, structural information obtained using mathematical morphologic operators can provide complementary information to improve discrimination of different urban features. Here we present research demonstrating new applications of mathematical morphology for urban feature extraction from high-resolution satellite imagery. For image preprocessing, an alternating sequential filter is used to eliminate small spatial-scale disturbances to facilitate the extraction of larger-scale structures. For road extraction, directional morphological filtering is exploited to mask out those structures shorter than the distance of a typical city block. For building extraction, a recently introduced concept called the differential morphological profile (DMP) is used to generate building and shadow hypotheses. For vehicle detection, a morphological shared-weight neural network is used to classify image pixels on roads into target and non-target. Thus, mathematical morphology has a wide variety of useful applications for urban feature extraction from high-resolution satellite imagery.
Detection and delineation of buildings from airborne ladar measurements
Yoram Swirski, Karni Wolowelsky, Renen Adar, et al.
Automatic delineation of buildings is very attractive for both civilian and military applications. Such applications include general mapping, detection of unauthorized constructions, change detection, etc. For military applications, high demand exists for accurate building change updates, covering large areas, and over short time periods. We present two algorithms coupled together. The height image algorithm is a fast coarse algorithm operating on large areas. This algorithm is capable of defining blocks of buildings and regions of interest. The point-cloud algorithm is a fine, 3D-based, accurate algorithm for building delineation. Since buildings may be separated by alleys, whose width is similar or narrower than the LADAR resolution, the height image algorithm marks those crowded buildings as a single object. The point-cloud algorithm separates and accurately delineates individual building boundaries and building sub-sections utilizing roof shape analysis in 3D. Our focus is on the ability to cover large areas with accuracy and high rejection of non-building objects, like trees. We report a very good detection performance with only few misses and false alarms. It is believed that LADAR measurements, coupled with good segmentation algorithms, may replace older systems and methods that require considerable manual work for such applications.
A methodology for determining the resolvability of multiple vehicle occlusion in a monocular traffic image sequence
Clement Chun Cheong Pang, Nelson Hon Ching Yung
This paper proposed a knowledge-based methodology for determining the resolvability of N occluded vehicles seen in a monocular image sequence. The resolvability of each vehicle is determined by: firstly, deriving the relationship between the camera position and the number of vertices of a projected cuboid on the image; secondly, finding the direction of the edges of the projected cuboid in the image; and thirdly, modeling the maximum number of occluded cuboid edges of which the occluded cuboid is irresolvable. The proposed methodology has been tested rigorously on a number of real world monocular traffic image sequences that involves multiple vehicle occlusions, and is found to be able to successfully determine the number of occluded vehicles as well as the resolvability of each vehicle. We believe the proposed methodology will form the foundation for a more accurate traffic flow estimation and recognition system.
Wavelet-based feature indices as a data mining tool for hyperspectral imagery exploitation
Advances in hyperspectral sensor technology increasingly provide higher resolution and higher quality data for the accurate generation of terrain categorization/classification (TERCAT) maps. The generation of TERCAT maps from hyperspectral imagery can be accomplished using a variety of spectral pattern analysis algorithms; however, the algorithms are sometimes complex, and the training of such algorithms can be tedious. Further, hyperspectral imagery contains a voluminous amount of data with contiguous spectral bands being highly correlated. These highly correlated bands tend to provide redundant information for classification/feature extraction computations. In this paper, we introduce the use of wavelets to generate a set of Generalized Difference Feature Index (GDFI) measures, which transforms a hyperspectral image cube into a derived set of GDFI bands. A commonly known special case of the proposed GDFI approach is the Normalized Difference Vegetation Index (NDVI) measure, which seeks to emphasize vegetation in a scene. Numerous other band-ratio measures that emphasize other specific ground features can be shown to be a special case of the proposed GDFI approach. Generating a set of GDFI bands is fast and simple. However, the number of possible bands is capacious and only a few of these “generalized ratios” will be useful. Judicious data mining of the large set of GDFI bands produces a small subset of GDFI bands designed to extract specific TERCAT features. We extract/classify several terrain features and we compare our results with the results of a more sophisticated neural network feature extraction routine.
Combining message-passing and inter-process communication in SMP-hybrid cluster for efficient parallel medical image analysis
Sean Chiew Seng Tan, Bertil Schmidt
Efficient analysis of medical images to assist physician’s decision making is an important task. However, the analysis of such images often requires sophisticated segmentation and classification algorithms. An approach to speed up these time consuming operations is to use parallel processing. In this paper a new parallel system for medical image analysis is presented. The system combines distributed and shared memory architectures using MPI and the inter-process communication switching mechanism (IPC). MPI is used to communicate between nodes and shared-memory IPC is used to perform shared memory operations among processors within a node. We show how to map a clinical endoscopic image analysis algorithm efficiently onto this architecture. This results in an implementation with significant runtime savings.
Adaptive 3D target tracking and surveillance scheme based on pan/tilt-embedded stereo camera system
Jung-Hwan Ko, Jun-Ho Lee, Eun-Soo Kim
In this paper, a new intelligent surveillance system for robust detection and tracking of a moving person by using the pan/tilt-embedded stereo camera system is suggested and implemented. In the proposed system, face coordinates of a target person is detected from the sequential input stereo image pairs by using the YCbCr color model and phase-type correlation methods and then, using this data as well as the geometric information of the stereo tracking system, distance to the target from the stereo camera and 3-dimensional location information of a target person are extracted. Basing on these extracted data the pan/tilt system embedded in the stereo camera is controlled to adaptively track a moving person and as a result, moving trajectory of a target person can be obtained. From some experiments using 780 frames of the sequential stereo image pairs, it is analyzed that standard deviation of the position displacement of the target in the horizontal and vertical directions after tracking is kept to be very low value of 1.5, 0.42 for 780 frames on average, and error ratio between the measured and computed 3D coordinate values of the target is also kept to be very low value of 0.5% on average. These good experimental results suggest a possibility of implementation of a new stereo target tracking system having a high degree of accuracy and a very fast response time with this proposed algorithm.
Comparison of registration techniques for speckle suppression in 2D ladar image sequences
Adam MacDonald, Ernest Armstrong, Stephen C. Cain
Registration of individual images remains a significant problem in the generation of accurate images collected using coherent imaging systems. An investigation of the performance of eight distinct image registration algorithms was conducted using data collected from a coherent optical imaging system developed by the Air Force Research Laboratories, Sensors Division, ARFL/SNJT. A total of 400 images of three distinct scenes were collected by SRJT and made available to the Air Force Institute of Technology (AFIT) for this study. Scenery was collected at 3 and 10 kilometers of wheeled vehicles supporting resolution and uniform target boards. The algorithms under study were developed by scientists and engineers at AFRL, and had varying levels of performance in terms of image mis-registration and execution time. These eight algorithms were implemented on a general-purpose computer running the MATLAB simulation environment. The algorithms compared included: block-match, cross-correlation, cross-search, directional-search, gradient-based, hierarchical-block, three-step, and vector-block methods. It was found that the cross-correlation, gradient-based and vector-block search techniques typically had the lowest error metric. The vector-block and cross-correlation methods proved to have the fastest execution times, while not suffering significant error degradation when estimating the registration shift of the test images.
Deconvolution of laser pulse profiles from 3D ladar temporal returns
3-D imaging LADAR systems that are capable of rapid frame acquisition may suffer from a loss of range resolution due to the duration of the pulse transmitted to the target. Because of the tradeoff between the requirement to produce sufficient illumination and the desire to obtain high range resolution, these systems may sacrifice range resolution in favor of improved signal to noise ratio of the detected signal. In this paper, deconvolution techniques are employed in order to obtain improved range resolution from a sequence of laser radar return images collected at extremely high speeds. The study pursued in this paper explores the degree to which range resolution can be improved in the presence of photon and speckle noise. Noise amplification in the deconvolution process serves to degrade the signal to noise ratio of the reconstructed laser radar return images. The performance of the reconstruction algorithm is quantified through the estimation of the probability of detection. It will be shown that both the probability of detection and range resolution can be improved in imaging laser radar systems.
PCI bus content-addressable-memory (CAM) implementation on FPGA for pattern recognition/image retrieval in a distributed environment
Dalila B. Megherbi, Yin Yan, Parikh Tanmay, et al.
Recently surveillance and Automatic Target Recognition (ATR) applications are increasing as the cost of computing power needed to process the massive amount of information continues to fall. This computing power has been made possible partly by the latest advances in FPGAs and SOPCs. In particular, to design and implement state-of-the-Art electro-optical imaging systems to provide advanced surveillance capabilities, there is a need to integrate several technologies (e.g. telescope, precise optics, cameras, image/compute vision algorithms, which can be geographically distributed or sharing distributed resources) into a programmable system and DSP systems. Additionally, pattern recognition techniques and fast information retrieval, are often important components of intelligent systems. The aim of this work is using embedded FPGA as a fast, configurable and synthesizable search engine in fast image pattern recognition/retrieval in a distributed hardware/software co-design environment. In particular, we propose and show a low cost Content Addressable Memory (CAM)-based distributed embedded FPGA hardware architecture solution with real time recognition capabilities and computing for pattern look-up, pattern recognition, and image retrieval. We show how the distributed CAM-based architecture offers a performance advantage of an order-of-magnitude over RAM-based architecture (Random Access Memory) search for implementing high speed pattern recognition for image retrieval. The methods of designing, implementing, and analyzing the proposed CAM based embedded architecture are described here. Other SOPC solutions/design issues are covered. Finally, experimental results, hardware verification, and performance evaluations using both the Xilinx Virtex-II and the Altera Apex20k are provided to show the potential and power of the proposed method for low cost reconfigurable fast image pattern recognition/retrieval at the hardware/software co-design level.
Development of a diagnostic system for bilirubin detection in cerebral spinal fluid
Prashant R. Bhadri, Vasant Anil Salgaonkar, Anindya Majumdar, et al.
A weakened portion of an artery in the brain leads to a medical condition known as a cerebral aneurysm. A subarachnoid hemorrhage (SAH) occurs when an aneurysm ruptures. For those individuals suspected of having a SAH, a computerized tomography (CT) scan of the brain usually demonstrates evidence of the bleeding. However, in a considerable portion of people, the CT scan is unable to detect the blood that has escaped from the blood vessel. Recent studies have indicated nearly 30% of patients with a SAH are initially misdiagnosed. For circumstances when a SAH is suspected despite a normal CT scan, physicians make the diagnosis of SAH by performing a spinal tap. A spinal tap uses a needle to sample the cerebrospinal fluid (CSF) collected from the patient’s lumbar spine. However, it is also possible for blood to be introduced into the CSF as a result of the spinal tap procedure. Therefore, an effective solution is required to help medical personnel differentiate between the blood that results from a tap and that from a ruptured aneurysm. In this paper, the development of a prototype is described which is sensitive and specific for measuring bilirubin in CSF, hemorrhagic-CSF and CSF-like solutions. To develop this instrument a combination of spectrophotometric analysis, custom data analysis software and other hardware interfaces are assembled that lay the foundation for the development of portable and user-friendly equipment suitable for assisting trained medical personnel with the diagnosis of a ruptured cerebral aneurysm.
Photon transfer methods and results for electron multiplication CCDs
Optical systems designed for some defense, environmental, and commercial remote-sensing applications must simultaneously have a high dynamic range, high sensitivity, and low noise-equivalent contrast. We have adapted James Janesick’s photon transfer technique for characterizing the noise performance of an electron multiplication CCD (EMCCD), and we have developed methods for characterizing performance parameters in a lab environment. We have defined a new figure of merit to complement the traditionally used dynamic range that quantifies the usefulness of EMCCD imagers. We use the results for EMCCDs to predict their performance with hyperspectral and multispectral imaging systems.
JPEG 2000: Theory and Applications
icon_mobile_dropdown
Optimal rate allocation for joint compression and classification in JPEG 2000
We present a framework for optimal rate allocation to image subbands to minimize the distortion in the joint compression and classification of JPEG2000-compressed images. The distortion due to compression is defined as a weighted linear combination of the mean-square error (MSE) and the loss in the Bhattacharyya distance (BD) between the class-conditional distributions of the classes. Lossy compression with JPEG2000 is accomplished via deadzone uniform quantization of wavelet subbands. Neglecting the effect of the deadzone, expressions are derived for the distortion in the case of two classes with generalized Gaussian distributions (GGDs), based on the high-rate analysis of Poor. In this regime, the distortion function takes the form of a weighted MSE (WMSE) function, which can be minimized using reverse water-filling. We present experimental results based on synthetic data to evaluate the efficacy of the proposed rate allocation scheme. The results indicate that by varying the weight factor balancing the MSE and the Bhattacharyya distance, we can control the trade-off between these two terms in the distortion function.
Video surveillance using JPEG 2000
This paper describes a video surveillance system which is composed of three key components, smart cameras, a server, and clients, connected through IP-networks in wired or wireless configurations. The system has been designed so as to protect the privacy of people under surveillance. Smart cameras are based on JPEG 2000 compression where an analysis module allows for events detection and regions of interest identification. The resulting regions of interest can then be encoded with better quality and scrambled. Compressed video streams are scrambled and signed for the purpose of privacy and data integrity verification using JPSEC compliant methods. The same bitstream may also be protected for robustness to transmission errors based on JPWL compliant methods. The server receives, stores, manages and transmits the video sequences on wired and wireless channels to a variety of clients and users with different device capabilities, channel characteristics and preferences. Use of seamless scalable coding of video sequences prevents any need for transcoding operations at any point in the system.
Lossless coding of floating point data with JPEG 2000 Part 10
Manuel Noronha Gamito, Miguel Salles Dias
JPEG 2000 Part 10 is a new work part of the ISO/IEC JPEG Committee dealing with the extension of JPEG 2000 technologies to three-dimensional data. One of the issues in Part 10 is the ability to encode floating point datasets. Many Part 10 use cases come from the scientific and engineering communities, where floating point data is often produced either from numerical simulations or from remote sensing instruments. This paper presents the technologies that are currently being developed to accommodate this Part 10 requirement. The coding of floating point datasets with JPEG 2000 requires two changes to the coding pipeline. Firstly, the wavelet transformation stage is optimized to correctly decorrelate data represented with the IEEE 754 floating point standard. Special IEEE 754 floating point values like Infinities and NaN's are signaled beforehand as they do not correlate well with other floating point values. Secondly, computation of distortion measures on the encoder side is performed in floating point space, rather than in integer space, in order to correctly perform rate allocation. Results will show that these enhancements to the JPEG 2000 coding pipeline lead to better compression results than Part 1 encoding where the floating point data had been retyped as integers.
Variable resolution coding with JPEG 2000 Part 10
Manuel Noronha Gamito, Miguel Salles Dias
JPEG 2000 Part 10 is a new work part of the ISO/IEC JPEG Committee dealing with the extension of JPEG 2000 technologies to three-dimensional data. One of the issues in Part 10 is the ability to encode non-uniform data grids having variable resolution across its domain. Some parts of the grid can be more finely sampled than others in accordance with some pre-specified criteria. Of particular interest to the scientific and engineering communities are variable resolution grids resulting from a process of adaptive mesh refinement of the grid cells. This paper presents the technologies that are currently being developed to accommodate this Part 10 requirement. The coding of adaptive mesh refinement grids with JPEG 2000 works as a two step process. In the first pass, the grid is scanned and its refinement structure is entropy coded. In the second pass, the grid samples are wavelet transformed and quantized. The difference with Part 1 is that wavelet transformation must be done over regions of irregular shape. Results will be shown for adaptive refinement grids with cell-centered or corner-centered samples. It will be shown how the Part 10 coding of an adaptive refinement grid is backwards compatible with a Part 1 decoder.
JPEG vs. JPEG 2000: an objective comparison of image encoding quality
Farzad Ebrahimi, Matthieu Chamik, Stefan Winkler
This paper describes an objective comparison of the image quality of different encoders. Our approach is based on estimating the visual impact of compression artifacts on perceived quality. We present a tool that measures these artifacts in an image and uses them to compute a prediction of the Mean Opinion Score (MOS) obtained in subjective experiments. We show that the MOS predictions by our proposed tool are a better indicator of perceived image quality than PSNR, especially for highly compressed images. For the encoder comparison, we compress a set of 29 test images with two JPEG encoders (Adobe Photoshop and IrfanView) and three JPEG2000 encoders (JasPer, Kakadu, and IrfanView) at various compression ratios. We compute blockiness, blur, and MOS predictions as well as PSNR of the compressed images. Our results show that the IrfanView JPEG encoder produces consistently better images than the Adobe Photoshop JPEG encoder at the same data rate. The differences between the JPEG2000 encoders in our test are less pronounced; JasPer comes out as the best codec, closely followed by IrfanView and Kakadu. Comparing the JPEG- and JPEG2000-encoding quality of IrfanView, we find that JPEG has a slight edge at low compression ratios, while JPEG2000 is the clear winner at medium and high compression ratios.
JPWL: JPEG 2000 for wireless applications
In this paper, we present the current status of the JPWL standardization work item. JPWL is an extension of the JPEG 2000 baseline specification in order to enable the efficient transmission of JPEG 2000 codestream over an error-prone network. In particular, JPWL supports a set of tools and methods for error protection and correction such as Forward Error Correcting (FEC) codes, Unequal Error Protection (UEP), and data partitioning and interleaving. We then evaluate the performance of the JPWL Error Protection Block (EPB) tool. We consider two configurations of EPB: to protect the Main and Tile-part headers, or to protect the whole codestream using UEP. Experimental results show a significant quality improvement when using EPB compared to baseline JPEG 2000.
JPSEC for secure imaging in JPEG 2000
In this paper, we first review the on-going JPSEC standardization activity. Its goal is to extend the baseline JPEG 2000 specification to provide a standardized framework for secure imaging, in order to support tools needed to secure digital images, such as content protection, data integrity check, authentication, and conditional access control. We then present two examples of JPSEC tools. The first one is a technique for secure scalable streaming and secure transcoding. It allows the protected JPSEC codestream to be transcoded while preserving the protection, i.e. without requiring unprotecting (e.g. decrypting) the codestream. The second one is a technique for conditional access control. It can be used for access control by resolution or quality, but also by regions of interest.
A rate control method for motion JPEG 2000
Woei Chan, Axel Becker
In Motion JPEG2000 each frame of a video is encoded with JPEG2000. In JPEG2000, the post-coding rate distortion optimisation (PCRD-opt) technique is recommended for rate control. The advantage of using PCRD-opt for rate control is that at any given rate the optimum visual quality can be achieved for that rate. However, implied in the optimisation procedure each code-block is over-coded. Since the encoding process in JPEG2000 is one of the slower operations, too much over-coding is undesirable especially in video applications. Over-coding also results in more memory than necessary for buffering the intermediate compressed data. This paper proposes a technique that minimises the amount of over-coding in the arithmetic encoding step while maintaining image quality. This is achieved by terminating the encoding process of a code-block once terminating conditions have been met. The terminating conditions are a set of rules based on the coding of the current code-block and other heuristics. The terminating conditions will be discussed fully in detail in this paper. Visually the image quality using the proposed rate control scheme is comparable to that of using PCRD-opt. Furthermore, the proposed rate control strategy produces a fully compliant JPEG2000 codestream.
Multimedia Networking and Architectures
icon_mobile_dropdown
IP-based streaming technology for interactive MPEG-4 contents
In this paper, we present an MPEG-4 contents streaming system and propose MPEG-4 contents streaming scheme by using priority. The presented streaming system which consists of a server and a client supports MPEG-4 contents compliant with ISO/IEC 14496-1 and enables a user to interact with MPEG-4 contents over IP networks. The server consists of GUI, Server Management Layer, Sync Layer, and Delivery Layer. The client supports to display MPEG-4 contents stored in local storage or received through IP networks. Moreover, we propose an MPEG-4 contents streaming scheme that the object a user prefers to watch is sent first by increasing priority and objects with low priority are dropped at a server side when network bandwidth is not enough to transmit all objects that are supposed to appear in the scene. We made experiment of the proposed scheme with the presented MPEG-4 contents streaming system, and the experiment results are shown in this paper. If we use the proposed scheme for MPEG-4 contents streaming, it is possible for a user to watch a video of interest in high quality and video of indifference in low quality.
Format-agnostic adaptation using the MPEG-21 DIA framework
Debargha Mukherjee, Huisheng Wang, Amir Said, et al.
Part 7 of MPEG-21 entitled Digital Item Adaptation (DIA), is an emerging metadata standard defining protocols and descriptions enabling content adaptation for a wide variety of networks and terminals, with attention to format-independent mechanisms. The descriptions standardized in DIA provide a standardized interface not only to a variety of format-specific adaptation engines, but also to format-independent adaptation engines for scalable bit-streams. A fully format-independent engine contains a decision-taking module operating in a media-type and context independent manner, cascaded with a bit-stream adaptation module that models the bit-stream adaptation process as an XML transformation operating on a high-level syntax description of the bit-stream, with parameters derived from decisions taken. In this paper, we describe the DIA descriptions and underlying mechanisms that enable such fully format-independent scalable bit-stream adaptation. Further, a new model-based, compact and lightweight transformation language for scalable bit-streams is described for use in the bit-stream adaptation module. Fully format-independent adaptation mechanisms lead to universal adaptation engines that substantially reduce adoption costs for new media types and formats because the same delivery and adaptation infrastructure can be used for different types of scalable media, including proprietary and encrypted content.
UMDVP-controlled post-processing system for compressed video
Yibin Yang, Lilla Boroczky, Kees van Zon
In this paper we outline a post-processing system for compressed video sources, aimed at reducing the visibility of coding artifacts. To achieve optimal video quality for compressed sources, it addresses artifact reduction and video enhancement functions as well as their interdependency. The system is based on the Unified Metric for Digital Video Processing (UMDVP), a quality metric that estimates the level of coding artifacts on a per-pixel basis. Experiments on MPEG-2 encoded video sequences showed significant improvement in picture quality compared to systems that do not have UMDVP control or that do not consider the interdependency between artifact reduction and video enhancement.
Using overlay network architectures for scalable video distribution
Charalampos Z. Patrikakis, Yannis Despotopoulos, Paraskevi Fafali, et al.
Within the last years, the enormous growth of Internet based communication as well as the rapid increase of available processing power has lead to the widespread use of multimedia streaming as a means to convey information. This work aims at providing an open architecture designed to support scalable streaming to a large number of clients using application layer multicast. The architecture is based on media relay nodes that can be deployed transparently to any existing media distribution scheme, which can support media streamed using the RTP and RTSP protocols. The architecture is based on overlay networks at application level, featuring rate adaptation mechanisms for responding to network congestion.
Rate and distortion models for MPEG-4 video encoding
Rate and distortion models can play a very important role in real-time video encoding, since they can be used to obtain near optimal operation performance in terms of the RD tradeoff without the drawback of having to encode multiple times the same VOP to find the best combination of coding parameters. In the context of object-based video encoding, notably in MPEG-4 video encoding, rate and distortion models characterize the relation between the average number of bits/pixel to code a given Video Object Plane (VOP), the average VOP distortion, and the relevant coding parameters. These models are usually defined in terms of rate-quantization (RQ), distortion-quantization (DQ), and rate-distortion (RD) functions. This paper addresses the problem of rate and distortion modeling for Intra and Inter coding in the context of object-based MPEG-4 video encoding. In the case of Intra coding, the VOP to encode does not depend on other past or future VOPs; therefore, its rate and distortion characteristics depend exclusively on the current quantizer parameter(s) and VOP statistics. In the case of Inter coding, the rate and distortion functions depend not only on the current VOP but also on its reference VOP(s); therefore the rate and distortion functions become bidimensional and consequently more difficult to estimate during encoding. In this paper, a new approach is proposed where the rate and distortion functions for Inter coding are modeled as one-dimensional functions plus an adaptation term.
Scalable motion vector coding
Joeri Barbarien, Adrian Munteanu, Fabio Verdicchio, et al.
Modern video coding applications require transmission of video data over variable-bandwidth channels to a variety of terminals with different screen resolutions and available computational power. Scalable video coding is needed to optimally support these applications. Recently proposed wavelet-based video codecs employing spatial domain motion compensated temporal filtering (SDMCTF) provide quality, resolution and frame-rate scalability while delivering compression performance comparable to that of the state-of-the-art non-scalable H.264-codec. These codecs require scalable coding of the motion vectors in order to support a large range of bit-rates with optimal compression efficiency. Scalable motion vector coding algorithms based on the integer wavelet transform followed by embedded coding of the wavelet coefficients were recently proposed. In this paper, a new and fundamentally different scalable motion vector codec (MVC) using median-based motion vector prediction is proposed. Extensive experimental results demonstrate that the proposed MVC systematically outperforms the wavelet-based state-of-the-art solutions. To be able to take advantage of the proposed scalable MVC, a rate allocation mechanism capable of optimally dividing the available rate among texture and motion information is required. Two rate allocation strategies are proposed and compared. The proposed MVC and rate allocation schemes are incorporated into an SDMCTF-based video codec and the benefits of scalable motion vector coding are experimentally demonstrated.
Application analysis and communication aspects for future multimedia architectures
A principal challenge for reducing the cost of complex systems-on-chip is to pursue more generic systems for a broad range of products. For this purpose, we explore three new architectural concepts for state-of-the-art video applications. First, we discuss a reusable scalable hardware architecture employing a hierarchical communication network fitting with the natural hierarchy of the application. In a case study, we show that MPEG streaming in DTV occurs at high level, while subsystems communicate at lower levels. The second concept is a software design that scales over a number of processors to enable reuse over a range of VLSI process technologies. We explore this via an H.264 decoder implementation that scales nearly linearly over up to eight processors by applying data partitioning. The third concept is resource-scalability, which is required to satisfy real-time constraints in a system with a high amount of shared resources. An example complexity-scalable MPEG-2 encoder scales the required cycle budget with a factor of three, in parallel with a smooth degradation of quality.
Performance considerations for efficient multimedia streaming in wireless local area networks
Dilip Krishnaswamy, Robert J. Stacey, Ryan van Alstine, et al.
This paper investigates multimedia streaming over wireless local area networks. Physical layer sigmoid analytical models are presented for 802.11a/g and for 2x3 MIMO 802.11n MIMO-based systems are presented. Performance results in a wireless LAN environment are presented for traffic using UDP and TCP transport mechanisms. Packet losses are observed in WLAN environments which affects the overall throughput available. Possibilities for performance improvements with the use of 802.11e and MIMO technologies are discussed. System platform architecture performance issues for wireless video conferencing between Intel PXA27x processor-based handheld platforms are presented and results with retry-limit adaptation are also presented.
Optimal media sharing policies in peer-to-peer networks
Multimedia content distribution through a distributed system, a peer-to-peer (P2P) network for instance, is attractive since it harnesses the resources available with the numerous peers in the network. Another advantage of such a system is that the potentially available resources scale in proportion to the demand as more and more peers join the system. Recent studies have concentrated mainly on such aspects of these distributed networks as querying, indexing, etc. These studies however take for granted the voluntary contribution of resources by peers in the system. Empirical evidence however points to the contrary, i.e. in existing P2P systems, a substantial fraction of peers do not contribute resources to the system, while benefiting from the services it provides at the expense of the contributing peers. In this paper we analyze a P2P system in a game-theoretic setting in which games involving content exchange are played repeatedly. The model takes into account the manner in which a peer adapts his contribution to the system depending on the benefit he has derived from the system so far and expects to derive in the long run. The model enables us to formulate an optimization problem that ields optimal content sharing trategies that a peer should adopt in rder to maximize his net benefit by participating in the system.
Advances in the New Emerging Standard: H.264/AVC I
icon_mobile_dropdown
The H.264/AVC Advanced Video Coding standard: overview and introduction to the fidelity range extensions
H.264/MPEG-4 AVC is the latest international video coding standard. It was jointly developed by the Video Coding Experts Group (VCEG) of the ITU-T and the Moving Picture Experts Group (MPEG) of ISO/IEC. It uses state-of-the-art coding tools and provides enhanced coding efficiency for a wide range of applications, including video telephony, video conferencing, TV, storage (DVD and/or hard disk based, especially high-definition DVD), streaming video, digital video authoring, digital cinema, and many others. The work on a new set of extensions to this standard has recently been completed. These extensions, known as the Fidelity Range Extensions (FRExt), provide a number of enhanced capabilities relative to the base specification as approved in the Spring of 2003. In this paper, an overview of this standard is provided, including the highlights of the capabilities of the new FRExt features. Some comparisons with the existing MPEG-2 and MPEG-4 Part 2 standards are also provided.
Channel adaptive video compression for unmanned aerial vehicles (UAVs)
We examine various issues related to demonstrating real-time channel adaptive video communications for UAVs using the latest-generation H.264 video compression technology. These issues include among others: real-time channel estimation techniques, real-time data rate adaptation techniques in H.264/AVC, latency in encoding, current encoding speeds, transcoding, and scalable video developments in H.264, all as essential steps along the way. These demonstrations will be conducted in a communication laboratory and a limited operational testing environment.
Consideration on intra-prediction for pipeline processing in H.264/MPEG-4 AVC
Kazushi Sato, Yoichi Yagasaki
H.264/MPEG-4 AVC is expected to cover wide range of applications not only for smaller images like CIF/QCIF but also for SDTV/HDTV. As encoding/decoding of larger images requires more number of macroblocks to be processed, any kind of technique for speeding up is indispensable for real-time applications. Pipeline processing is an effective tool for hardware implementation. However, when we apply it to intra prediction, additional constraint will be necessary, which causes a severe loss in coding efficiency. In this paper we propose modifications on intra prediction that enable us to apply pipeline processing with an ignorable level of loss in coding efficiency.
AVC/H.264 patent portfolio license
MPEG LA, LLC recently announced terms of a joint patent license for the AVC (a/k/a H.264) Standard (ISO/IEC IS 14496-10: Information technology -- Coding of audio-visual objects -- Part 10: Advanced Video Coding|ITU-T Rec. H.264: Series H: Audiovisual and Multimedia Systems: Infrastructure of audiovisual services -- Coding of moving video: Advanced video coding for generic audiovisual services). Like MPEG LA’s other licenses, the AVC Patent Portfolio License is offered for the convenience of the marketplace as an alternative enabling users to access essential intellectual property owned by many patent holders under a single license rather than negotiating licenses with each of them individually. The AVC Patent Portfolio License includes essential patents owned by Columbia Innovations Enterprises; Electronics and Communications Research Institute (ETRI); France Télécom, société anonyme; Fujitsu Limited; Koninklijke Philips Electronics N.V.; Matsushita Electric Industrial Co., Ltd.; Microsoft Corporation; Mitsubishi Electric Corporation; Robert Bosch GmbH; Samsung Electronics Co., Ltd.; Sharp Kabushiki Kaisha; Sony Corporation; Toshiba Corporation; and Victor Company of Japan, Limited. MPEG LA’s objective is to provide worldwide access to as much AVC essential intellectual property as possible for the benefit of AVC users. Therefore, any party that believes it has essential patents is welcome to submit them for evaluation of their essentiality and inclusion in the License if found essential.
Subjective testing methodology in MPEG video verification
Charles Fenimore, Vittorio Baroncini, Tobias Oelbaum, et al.
The development of new video processing, new displays, and new modes of dissemination and usage enables a variety of moving picture applications intended for mobile and desktop devices as well as the more conventional platforms. These applications include multimedia as well as traditional video and require novel lighting environments and bit rates previously unplumbed in Moving Picture Experts Group (MPEG) video compression. The migration to new environments poses a methodological challenge to testers of video quality. Both the viewing environment and the display characteristics differ dramatically from those used in well-established subjective testing methods for television. The MPEG Test Committee has adapted the television-centric methodology to the new testing environments. The adaptations that are examined here include: (1) The display of progressive scan pictures in the Common Intermediate Format (CIF at 352x288 pixel/frame) and Quarter CIF (QCIF at 176x144 pixel/frame) as well as other, larger moving pictures requires new ways of testing the subjects including different viewing distances and altered ambient lighting. (2) The advent of new varieties of display technologies suggests there is a need for methods of characterizing them to assure the results of the testing do not depend strongly on the display. (3) The use of non-parametric statistical tests in test data analysis. In MPEG testing these appear to provide rigorous confidence statements more in line with testing experience than those provided by classical parametric tests. These issues have been addressed in a recent MPEG subjective test. Some of the test results are reviewed; they suggest that these adaptations of long-established subjective testing methodology for TV are capable of providing practical and reliable measures of subjective video quality for a new generation of technology.
Boundary-energy sensitive visual de-blocking for H.264/AVC coder
Finding out the better parameter set (OffsetA and OffsetB) for conducting the de-blocking process of H.264/AVC, is capable of improving visual quality, said eliminating the resultant blocking artifact. Identifying which edges belong to blocking regions relies on the perceptual judging process of human beings. In fact, this subjective assessment may not exactly match existing objective assessing measurements, and the meaning of high PSNR does not always stand for less blocking artifacts. In this paper, we first introduce a new criterion for measuring the block boundary distortion by comparing the source video and the reconstructed video prior to the deblocking process. By jointly optimizing the objective picture quality and the blocky energy, the deblocking parameters decision process can find out a good balance between signal matching and blocky elimination, and therefore, maximize the effect of the built-in deblocking process. In our experiments, the proposed method can efficiently pick a better deblocking parameter set from all 169 possibilities for each coded frame and result in a better visual quality.
The optimization of H.264/AVC baseline decoder on low-cost TriMedia DSP processor
Sung-Wen Wang, Ya-Ting Yang, Chia-Ying Li, et al.
The emerging video coding standard, H.264/AVC, exhibits the unprecedented coding performance. Comparing to traditional coders, e.g., MPEG-2 and MEPG-4 ASP, about half bitrate saving is shown in the official verification test. Such outstanding performance makes it become the video compression candidate for the upcoming HD-DVD. As a side effect, it was also blamed that H.264/AVC is much more logically complex and requires more computation power than any of the existing standards. A low-cost and efficient implementation of the international standard hence plays an important role of its success. In this paper, we realize an H.264/AVC baseline decoder by a low-cost DSP processor, i.e., Philips’ TriMedia TM-1300, and illustrate that less computation demand for H.264/AVC decoding becomes feasible by using effective software core. To this end, we first consider different approaches and take advantage of SIMD instruction set to optimize critical time-consuming coding modules, such as the fractional motion compensation, spatial prediction and inverse transform. Next, we also present some other optimization approaches for entropy decoding and in-loop deblocking filtering, even though they cannot get benefits from utilizing SIMD. In our experiments, by exploiting appropriate instruction level parallelism and efficient algorithms, the decoding speed can be improved by a factor of 8~10; a CIF video sequence can be decoded at up to 19.74~28.97 fps on a 166-MHz TriMedia TM-1300 processor compared to 2.40~2.98 fps by the standard reference software.
Advances in the New Emerging Standard: H.264/AVC II
icon_mobile_dropdown
Fast intra/inter mode decision for H.264 encoding using a risk-minimization criterion
A fast intra/inter mode decision method using a risk-minimization criterion is proposed to reduce the encoder complexity of the H.264 encoder in this work. The current H.264 reference codes employ exhaustive search to find the best mode that optimizes the rate-distortion performance among all possible intra/inter predictive modes. To develop a fast binary mode decision scheme (i.e. either the inter- or intra-prediction mode to be used), we consider the risk of choosing the wrong predictive mode. If the cost of choosing the wrong mode in terms of the averaged rate-distortion (RD) performance loss is low, then the risk is tolerable. The fast algorithm consists of three steps. First, three features are extracted from the current macroblock to form a 3D feature vector. Second, the feature space is partitioned into three regions, i.e. risk-free, risk-tolerable, and risk-intolerable regions. Finally, depending on the location of the feature vector in the feature space, we can apply mechanisms of different complexities for the final mode decision. The proposed algorithm can select either the correct mode or the wrong mode but with low RD performance degradation. It is demonstrated by experimental results that the proposed algorithm can save approximately 19-25% of the total encoding time of H.264 (JM7.3a) with little degradation in the rate-distortion performance.
Low-complexity integer transform and high-definition coding
Siwei Ma, Xiaopeng Fan, Wen Gao
In H.264/AVC, an integer 4×4 transform is used instead of traditional float DCT transform due to its low complexity and exact reversibility. Combined with the normalization for the integer transform together, a division-free quantization scheme is used in H.264/AVC. H.264/AVC is the most outstanding video coding standard at far. But at first H.264/AVC targets to low bit-rate coding, and almost all experimental results of the proposals for H.264/AVC are tested at low bit-rate. In the near, experimental results show that 8×8 transform can further improve the coding efficiency on high definition (HD) coding. In this paper a kind of 8×8 low complexity integer transforms are studied and corresponding quantization schemes are developed for HD coding. Compared with 4×4 transform/prediction based coder, the proposed 8×8 based coder can achieve better performance on HD coding while with much lower encoder/decoder complexity.
A performance evaluation of MPEG-21 BSDL in the context of H.264/AVC
H.264/AVC is a new specification for digital video coding that aims at a deployment in a lot of multimedia applications, such as video conferencing, digital television broadcasting, and internet streaming. This is for instance reflected by the design goals of the standard, which are about the provision of an efficient compression scheme and a network-friendly representation of the compressed data. Those requirements have resulted in a very flexible syntax and architecture that is fundamentally different from previous standards for video compression. In this paper, a detailed discussion will be provided on how to apply an extended version of the MPEG-21 Bitstream Syntax Description Language (MPEG-21 BSDL) to the Annex B syntax of the H.264/AVC specification. This XML based language will facilitate the high-level manipulation of an H.264/AVC bitstream in order to take into account the constraints and requirements of a particular usage environment. Our performance measurements and optimizations show that it is possible to make use of MPEG-21 BSDL in the context of the current H.264/AVC standard with a feasible computational complexity when exploiting temporal scalability.
MPEG to H.264 transcoding
H.264/MPEG-4 AVC is the latest video-coding standard jointly developed by the Video Coding Expert Group (VCEG) of ITU-T and the Moving Picture Expert Group (MPEG) of ISO/IEC JTC1. It uses to enhance significantly the compression rate when compared to previous standard in the field, especially the largely adopted MPEG-2 one. However, MPEG-2 is presently overwhelmed used by the industry, and as a consequence most of the existing digital video material is encoded in this standard. This paper discuss efficient way to convert existing MPEG-2 and beyond it MPEG-1 and MPEG-4 visual material, toward H.264, what is a critical factor toward a successful H.264 deployment.
Imaging and Representation I
icon_mobile_dropdown
Bandwidth compression of hyperspectral imagery data using a simplified KLT/JPEG 2000 approach
John A. Saghri, Andrew G. Tescher, Anthony M. Planinac
A viable lossy bandwidth compression for hyperspectral imagery is presented. The algorithm is leveraged on the standard JPEG 2000 technology. The component decorrelation of the JPEG 2000 (extension 2) is replaced with a two-level Karhunen-Loeve Transformation (KLT) operation resulting in a reduction in the computation complexity. The set of n2 hyperspectral imagery is arranged as an n by n mosaic. Each of the n columns of the mosaic is spectrally uncorrelated via a first-level KLT operation. The resulting n principal component (PC) images for each column are placed next to one another to form an n by n mosaic of PC images. A second-level KLT is then applied to the first four rows of the n by n PC mosaic to approximate a full spectral decorrelation. This approach reduces the computational complexity of the KLT spectral decorrelation process of JPEG 2000 since, 1) it uses a smaller and computationally more feasible KLT matrix (i.e., n by n KLT matrix instead of size n2 by n2) and 2) it reduces the number of required computations for spectral decorrelation by a factor of n/4.
Multispectral color acquisition and display using commodity hardware
Daniel L. Lau, Ruigang Yang, Andrew M. Tan, et al.
For consumer imaging applications, multi-spectral color refers to capturing and displaying images in more than three primary colors in order to achieve color gamuts significantly larger than those produced by RGB devices. In this paper, we describe the building of both a multi-camera recording system and multi-projector display system using off-the-shelf components that, unlike existing multi-camera/projector systems that rely on expensive and time consuming optical alignment of camera/projector views, relies upon the virtual alignment of views performed in software. Once images are properly aligned, the described systems represent recording/display platforms that scale linearly in cost with the number of color primaries where new colors are added by simply attaching more devices. In this paper, we illustrate frames of the color video produced using a five camera system as well as an image of the aligned six projectors of the display system.
Automatic processing, analysis, and recognition of images
Victor Sergeyevich Abrukov, Evgeniy Vladimirovich Smirnov, Dmitriy Gennadyevich Ivanov
New approaches and computer codes (A&CC) for automatic processing, analysis and recognition of images are offered. The A&CC are based on presentation of object image as a collection of pixels of various colours and consecutive automatic painting of distinguished itself parts of the image. The A&CC have technical objectives centred on such direction as: 1) image processing, 2) image feature extraction, 3) image analysis and some others in any consistency and combination. The A&CC allows to obtain various geometrical and statistical parameters of object image and its parts. Additional possibilities of the A&CC usage deal with a usage of artificial neural networks technologies. We believe that A&CC can be used at creation of the systems of testing and control in a various field of industry and military applications (airborne imaging systems, tracking of moving objects), in medical diagnostics, at creation of new software for CCD, at industrial vision and creation of decision-making system, etc. The opportunities of the A&CC are tested at image analysis of model fires and plumes of the sprayed fluid, ensembles of particles, at a decoding of interferometric images, for digitization of paper diagrams of electrical signals, for recognition of the text, for elimination of a noise of the images, for filtration of the image, for analysis of the astronomical images and air photography, at detection of objects.
Optimal color segmentation with an application to the PCB industry
Ming-Hwei Perng, Tzu-Chao Chen
This paper presents a novel color indexing technique for segmentation and edge detection of objects with a background whose color appears to be very close to the objects. To enhance the discriminability of different colors, each color on the image is first nonlinearly mapped into an enhanced color model in a six-dimensional color space. Then, by solving a linear least square problem which involves only two multiplications and one inversion of a six by six matrix, the present approach converts a color image into a gray image with an optimally enhanced contrast of the gray level between the object and its background so that segmentation and edge detection can be performed using conventional techniques existed for gray images, and thereby considerably salves computational effort especially when comparing to vector order methods, entropy methods and invariant object recognition. Experiments also show that the presented color segmentation technique has a better performance than those operating on any three-dimensional color space. To illustrate one of many possible applications of the present technique in real industrial problems, the present technique is applied to detect missing devices and mis-aligned devices on a printed circuit board (PCB) with the aid of morphological operations and our unique design of go-nogo gages. Experiments show that the present approach is significantly more efficient and effective than the existing industrial algorithms known to the authors.
Segmentation of remote sensing images using multistage unsupervised learning
Murat Sezgin, Okan K. Ersoy, Bingül Yazgan
In this study, we investigate an unsupervised learning algorithm for the segmentation of remote sensing images in which the optimum number of clusters is automatically estimated, and the clustering quality is checked. The computational load is also reduced as compared to a single stage algorithm. The algorithm has two stages. At the first stage of the algorithm, the self-organizing map was used to obtain a large number of prototype clusters. At the second stage, these prototype clusters were further clustered with the K-means clustering algorithm to obtain the final clusters. A clustering validity checking method, Davies-Bouldin validity checking index, was used in the second stage of the algorithm to estimate the optimal number of clusters in the data set.
Volumetric image fusion using the pseudo-Wigner distribution
Salvador Gabarda, Gabriel Cristobal, Sylvain Fischer, et al.
Image fusion methods provide an enhanced image from a set of source images which present regions with different spatial degradation patterns. Here within a fusion procedure is presented, based on the use of a new defocusing pixel-level measure. Such measure is defined through a 1-D Pseudo Wigner Distribution Function (PWD) applied to non-overlapping N-pixel window slices of the original image. The process is repeated up to cover the full image size. By taking a low resolution image as a reference image, which can be defined i.e. by averaging and blurring the two source images, a pixel-level distance measure of the defocus degree can be obtained from the PWD of each image. This procedure makes possible choosing from a focusing point of view, the in-focus pixels from each one of the given source images. The method is illustrated with different examples. The image fusion approach that we proposed here can work for any source and number of images available. Also, evaluation measures, such as mean square error or percentage of correct decisions, show that our framework can outperform the current approaches for the analyzed cases. One additional advantage of the present approach is its reduced computational cost in comparison with other methods based on a full 2-D implementation of the PWD.
Imaging and Representation II
icon_mobile_dropdown
Optic flow estimation using the Hermite transform
Boris Escalante-Ramirez, Jose Luis Silvan-Cardenas, Hector Yuen-Zhuo
In this paper we present a spatiotemporal energy based method to estimate motion from an image sequence. A directional energy is defined in terms of the Radon projections of the Hermite transform. Radon transform provides a suitable representation for image orientation analysis, while Hermite transform describes image features locally in term of Gaussian derivatives. These operators have been used in computer vision for feature extraction and are relevant in visual system modeling. A directional response defined from the directional energy is used to estimate local motion as well as to compute a confidence matrix. This matrix provides a confidence measure for our estimate and is used to propagate the velocity information towards directions with high uncertainty. With this results, there can be applications ranging from motion compensation, and tracking of moving objects, to segmentation and video compression.
Image registration in the JPEG-compressed domain
A novel approach for image mosaicing in the JPEG compressed domain is presented in this paper. This technique employs the Hausdorff Distance Metric (HDM) to compute the regions of overlap between two JPEG images. The DCT blocks of the two overlapping images having significant activity are identified using a variance measure and the HDM metric is employed directly between these DCT blocks to estimate the translation parameter. The results obtained demonstrate reduction in the time taken for the registration step by a factor of 10-30, as compared to a traditional feature-based approach in the uncompressed domain.
Poster Session
icon_mobile_dropdown
Automatic video object segmentation and shadow detection for surveillance applications
Hao Yang, Like Zhang, Heng-Ming Tai, et al.
This paper presents an object-based adaptive background update algorithm that is suitable for video surveillance in the dynamically changing environment. The background model is updated using information from not only the pixel change but also the currently detected object. The proposed method is able to deal with such problems as ghost and uncovered background. In addition, a shadow detection scheme is proposed to eliminate the shadow effect. Experimental results for three video sequences with different background situations are given to demonstrate the effectiveness of the proposed algorithm.
Image compression and transmission based on LAN
In this work an embedded system is designed which implements MPEG-2 LAN transmission of CVBS or S-video signal. The hardware consists of three parts. The first is digitization of analog inputs CVBS or S-video (Y/C) from TV or VTR sources. The second is MPEG-2 compression coding primarily performed by a MPEG-2 1chip audio/video encoder. Its output is MPEG-2 system PS/TS. The third part includes data stream packing, accessing LAN and system control based on an ARM microcontroller. It packs the encoded stream into Ethernet data frames and accesses LAN, and accepts Ethernet data packets bearing control information from the network and decodes corresponding commands to control digitization, coding, and other operations. In order to increase the network transmission rate to conform to the MEPG-2 data stream, an efficient TCP/IP network protocol stack is constructed directly from network hardware provided by the embedded system, instead of using an ordinary operating system for embedded systems. In the design of the network protocol stack to obtain a high LAN transmission rate on a low-end ARM, a special transmission channel is opened for the MPEG-2 stream. The designed system has been tested on an experimental LAN. The experiment shows a maximum LAN transmission rate up to 12.7 Mbps with good sound and image quality, and satisfactory system reliability.
Active contour model based edge restriction and attraction field regularization for brain MRI segmentation
H. Luan, Feihu Qi
Constructing 3D models of the object of interest from brain MRI is useful in numerous biomedical imaging application. In general, the construction of the 3D models is generally carried out according to the contours obtained from a 2D segmentation of each MR slice, so the equality of the 3D model strongly depends on the precision of the segmentation process. Active contour model is an effective edge-based method in segmenting an object of interest. However, its application, which segment boundary of anatomical structure of brain MRI, encounters many difficulties due to undesirable properties of brain MRI, for example complex background, intensity inhomogeneity and discontinuous edges. This paper proposes an active contour model to solve the problems of automatically segmenting the object of interest from a brain MRI. In this proposed algorithm, a new method of calculating attraction field has been developed. This method is based on edge restriction and attraction field regularization. Edge restriction introduces prior knowledge about the object of interest to free contours of being affected by edges of other anatomical structures or spurious edges, while attraction field regularization enables our algorithm to extract boundary correctly even at the place, where the edge of object of interest is discontinuous, by diffusing the edge information gotten after edge restriction. When we apply this proposed algorithm to brain MRI, the result shows this proposed algorithm could overcome those difficulties we mentioned above and convergence to object boundary quickly and accurately.
Adaptive wavelet transform algorithm for lossy image compression
Oleksiy B. Pogrebnyak, Pablo Manrique Ramirez, Marco Antonio Acevedo Mosqueda
A new algorithm of locally adaptive wavelet transform based on the modified lifting scheme is presented. It performs an adaptation of the wavelet high-pass filter at the prediction stage to the local image data activity. The proposed algorithm uses the generalized framework for the lifting scheme that permits to obtain easily different wavelet filter coefficients in the case of the (~N, N) lifting. Changing wavelet filter order and different control parameters, one can obtain the desired filter frequency response. It is proposed to perform the hard switching between different wavelet lifting filter outputs according to the local data activity estimate. The proposed adaptive transform possesses a good energy compaction. The designed algorithm was tested on different images. The obtained simulation results show that the visual and quantitative quality of the restored images is high. The distortions are less in the vicinity of high spatial activity details comparing to the non-adaptive transform, which introduces ringing artifacts. The designed algorithm can be used for lossy image compression and in the noise suppression applications.
Digital image reconstruction using the Legendre and Chebyshev moments
Alfonso Padilla-Vivanco, A. Colunga-Ruiz
An orthogonal basis of moments can be used to reconstruct image intensity distributions. It is presented a study of the image reconstruction using two different orthogonal bases the Legendre and the Chebyshev polynomials. It is analyzed the reconstruction differences for grey level and binary image distributions.
The implementation of thermal image visualization by HDL based on pseudo-color
Yong Zhu, JiangLing Zhang
The pseudo-color method which maps the sampled data to intuitive perception colors is a kind of powerful visualization way. And the all-around system of pseudo-color visualization, which includes the primary principle, model and HDL (Hardware Description Language) implementation for the thermal images, is expatiated on in the paper. The thermal images whose signal is modulated as video reflect the temperature distribution of measured object, so they have the speciality of mass and real-time. The solution to the intractable problem is as follows: First, the reasonable system, i.e. the combining of global pseudo-color visualization and local special area accurate measure, must be adopted. Then, the HDL pseudo-color algorithms in SoC (System on Chip) carry out the system to ensure the real-time. Finally, the key HDL algorithms for direct gray levels connection coding, proportional gray levels map coding and enhanced gray levels map coding are presented, and its simulation results are shown. The pseudo-color visualization of thermal images implemented by HDL in the paper has effective application in the aspect of electric power equipment test and medical health diagnosis.
A new method for feature extraction and matching using the energy of Fourier basis
In this paper we have proposed a method, called the Fourier Feature Extractor (FFE) that relies on orthogonal sinusoidal bases for feature extraction. The response generated on projecting image intensity windows onto these sinusoidal bases is used for extract features. We use energy of Fourier basis to capture the intensity change around pixel points. By interpreting the amplitude values of Fourier basis we distinguish different interest points like, edge-point, corner-point, T-junctions. The method offers the flexibility to choose different kinds of interest points depending on the choice of basis function and energy values. The feature points detected are found to be geometrically stable under different transformations. The Fourier measure assigned during extraction can be used for matching of feature points between images and this make registration efficient.
Virtual prototype model for lunar rover missions
Desheng Lu, Bingrong Hong, Haoxuan Tang
By means of virtual prototype technolgy, the virtual enviroment of the lunar rover is established on graphics workstation. The model of virtual lunar rover based on rocker and redirector is derived for full six degree of freedom motions enabling movements in three directions, as well as pith, roll and yaw rotations. In order to make the operator see and believe the virtual enviroment of the scene while controlling the rover, a self-calibration system is developed by the method of transforming 3-D CAD models into 2-D graphics and superimposing the model built on the computer onto the image taken by the CCD camera, which can calibrate the figure of virtual enviroment. For the purpose of dispelling the time delay impact on the stability of the teleoperation system during lunar missions, a two-level control frame of teleoperation based on virtual reality preview, which can lay communication time delay out of the control loops, is applied to the simulation system. A man-machine interface based on six degree Space ball is developed so as to control the virtual lunar rover efficiently. The effectiveness and the feasibility of the model are verified by the climb simulation experiments of the virtual lunar rover.
An efficient motion vector search algorithm based on block classification
In this paper, to avoid reaching a local minimum and correspond to a variety of real world video sequences, we propose an optimal fast search algorithm. With this objective, according to image property of each block, search strategy is varied adaptively by size of motion in each block. Each block from frames is classified into stationary, small motion and large motion block. We also suggest that the motion vector, which has stationary block such as background or still image, is set by zero, and thus these blocks do not perform search. For the others blocks, by using advantages of conventional search algorithm adaptively, we apply NTSS algorithm for small motion block and DS algorithm for large motion block. The proposed algorithm gives us faster search result and a significant improvement in terms of performance for motion compensated frames and computational complexity.
Impulsive noise removal with the use of local adaptive nonlinear filter
The goal of many image processing tasks is to recover an ideal high-quality signal from data that are degraded by impulsive noise, because the human visual system is very sensitive to the high amplitude of noise signals, thus noise in an image can result in a subjective loss of information. This work presents an elegant solution to the impulsive noise removal problem. The proposed technique takes into account three important factors for image filtering, i.e. noise attenuation, edge preservation, as well as detail retention. The conventional filtering schemes utilize a fixed shape of the moving window such as rectangle and circle. In contrast, the proposed spatially connected filter works with the moving window of signal-dependent shape. Experimental results show the superior performance of the proposed filtering algorithm compared to the conventional schemes in terms of both subjective and objective evaluations.
Adaptive block-based disparity estimation algorithm using object disparity homogeneity
Hyok Song, Jinwoo Bae, Byeongho Choi, et al.
In this paper, effective algorithms for stereoscopic video coding and multi-view coding which are used in 3DAV coding are proposed. In the proposed algorithm, we refer to disparity vectors of adjacent blocks as parameters of the predictor for the disparity vector of current block. Disparity vectors of adjacent blocks extend the usability of the homogeneity. In an object, adjacent block disparities are very similar. These similar vectors result to reduce computational power. Block matching algorithm sometimes gives reconstructed image distortion. In multi-view system, this happens usually in occlusion region or region around edges. Partially matched block needs to divide blocks in quad blocks or smaller blocks to reduce this distortion. Coarse to fine hierarchical algorithm is able to get rid of noise from incorrect matching of a pixel based matching algorithm. And it also reduces computational complexity in finding matched block or pixel. There may be overall luminance difference between multi-view images caused by viewer positions. This problem can be solved by block recursive matching algorithm(BRMA). Experimental results show that this algorithm gives under 80% of calculation load and better image quality at over 0.6dB of PSNR in tested images comparing with symmetric BMA.
Vibration estimation in image sequences for detection of temporal-domain signals
Gennady Feldman, Doron Bar, Israel Tugendhaft, et al.
An algorithm is reported for estimation and suppression of small vibration effects in image sequences. Such effects, even of sub-pixel magnitude, may critically degrade power spectrum of temporal-domain signals. The algorithm consists of the following steps: (1) We perform preliminary detection of the presence of vibration and localize its fundamental frequency by estimating and analyzing the two-dimensional signal, composed of micro-displacements caused by vibrations; (2) We approximate this two-dimensional signal by a two-dimensional periodic function, treating it basically the same way as periodic signals. This model depends on a small number of coefficients. These coefficients are determined by direct LS fitting of the data. (3) We eliminate the effects of the vibration using this model function, for each pixel separately. With this algorithm, several image sequences were processed. The vibration image motions were reconstructed with sub-pixel accuracy and were not, usually, reducible to one-dimensional sinusoidal motion. The algorithm appears to be useful for improving detection of periodic signals in image sequences and reducing false alarms. This article continues our work on detection of periodic signals in image sequences.
Wavelet channel analysis of the multichannel iris recognition system and the improvement by wavelet packets
De Cai, Qiaofeng Tan, Yingbai Yan, et al.
Using iris feature, iris recognition attracts a lot of attention as a new and efficient personal identification technique in recent years. Compared with the frequently used methods of Daugman, Boles, et al., the dual multi-channel iris recognition system based on statistic features proposed by Yong Zhu, et al., has a unique and efficient algorithm. The algorithm processes gray iris image which is suitable to an Asian and takes good use of 2-D wavelet transformed irises. Moreover, they use statistic features to represent iris patterns which make their system more robust to errors caused in the image capturing stage. The recognition performance is better than the system of Wildes and approximates the system proposed by Daugman. But this system still has some open questions, such as, how wavelet filter channels influences the recognition and how to select wavelet channels. In this paper, we try to answer these questions. Via our analysis, it is proved that wavelet feature extraction can improve the identification rate and more wavelet filter channels results in better recognition. We also investigate the rule to choose the wavelet channels and conclude that high frequency channels are better than low frequency ones. Using this rule, we introduce wavelet packet channels to offer more useful information. The efficiency of this modification is shown by the experimental results.
Multifocus image fusion using the Haar wavelet transform
We present the multifocus image fusion based in the Haar transformation of an image. The rows of the matrix transformation are computed by means the dyadic scaling and translating of the Haar function. The Haar transformation matrix is fast, real and orthogonal. These properties are advantages for image processing, particularly in image fusion. A multifocus fusion example using the Haar transform is presented.
Object tracking for video annotation
This paper describes a system for tracking objects in a video stream obtained from a moving airborne platform, which is applied to annotate video objects automatically. The object to annotate is indicated by a mouse click. The proposed tracking algorithm uses a spatio-temporal segmentation followed by temporal tracking. It differs from existing techniques by the following features. The same algorithm is used in tracking both moving and stationary objects by making the stationary objects "move." It is general enough to handle any objects of various types and sizes including point objects. The system has a fast implementation because all image operations are applied on small image regions only. The effectiveness of the proposed algorithm is demonstrated on a few real video sequences.
Superresolution reconstruction of a video captured by a translational vibrated staggered TDI camera
Time-delay and integration (TDI) scanning imaging technique is used in various applications such as military reconnaissance and industrial product inspection. Its high sensitivity is significant in low light-level imaging and in thermal imaging. Due to physical constraints the TDI sensor elements may have a staggered structure, in which the odd and the even sensors are horizontally separated. The electrical cooling system of the detectors, as well as the camera platform, vibrates the system and causes image distortions such as space variant comb effects and motion blur. These vibrations are utilized here in means of superresolution in order to create an improved high resolution sequence from lower resolution sequence in two main stages: inter-frame space-variant motion estimation followed by an efficient implementation of the projection onto convex sets (POCS) restoration method. This work generalizes an algorithm for restoration of a single staggered TDI image preformed previously. The additional information included in an image sequence allows more efficient restoration process, and better restoration results. The lack of any assumption about the correlation between the vibrations of the odd and even sensors enables also an implementation to TDI cameras that don't contain the staggering structure. Experimental results with real degraded thermal video are provided.
Unsupervised segmentation of high-resolution satellite imagery using local spectral and texture features
QiuXiao Chen, JianCheng Luo, ChengHu Zhou
In many cases, segmentation approaches for remotely sensed imagery only deal with grey values, which makes them incompetent for segmenting high resolution imagery because texture features are more clearly displayed on them. On the other hand, texture segmentation approaches utilizing both spectral and texture features are, however, very complicated and time consuming which prevents their application. Therefore, to develop simple and effective segmentation approaches for high resolution satellite imagery is very important. In this article, a simple unsupervised segmentation approach for high resolution satellite imagery is proposed. First, wavelet decomposition is utilized to downsample each band of a multiband image. Then a gradient criterion to incorporate local spectral and texture features is utilized to produce a gradient feature image in which pixels with the high and low values correspond to region boundaries and region interiors respectively. Subsequently, a watershed segmentation approach is implemented based on the gradient feature image. Finally, by taking a strategy to minimize the overall heterogeneity increased within segments at each merging step, an improved merging process is performed. Experiments on Quickbird images show that the proposed method provides good segmentation results on high resolution satellite imagery.
Automatic selection of edge detector parameters based on spatial and statistical measures
Raz Koresh, Yitzhak Yitzhaky
The basic and widely used operation of edge detection in an image usually requires a prior step of setting the edge detector parameters (thresholds, blurring extent etc.). In real-world images this step is usually done subjectively by human observers. Finding the best detector parameters automatically is a problematic challenge because no absolute ground truth exists when real-world images are considered. However, the advantage of automatic processing over manual operations done by humans motivates the development of automatic detector parameter selection which will produce results agreeable by human observers. In this work we propose an automatic method for detector parameter selection which considers both, statistical correspondence of detection results produced from different detector parameters, and spatial correspondence between detected edge points, represented as saliency values. The method improves a recently developed technique that employs only statistical correspondence of detection results, and depends on the initial range of possible parameters. By incorporating saliency values in the statistical analysis, the detector parameters adaptively converge to best values. Automatic edge detection results show considerable improvement of the purely statistical method when a wrong initial parameter range is selected.
Mosaicing of MPEG compressed video: a unified approach
N. B. Vineeth, B. Krishnamoorthy, G. V. Prabhakara Rao
The mosaicing system proposed in this paper consists of a temporal segmentation module and a motion estimation module carried out in the MPEG (1&2) domain. The input MPEG video is temporally segmented into shots by the segmentation module and each shot is mosaiced separately. The output of the system is a set of mosaics, each of which captures the panorama of each shot uniquely. The camera motion parameters are computed for each shot using the MPEG motion vectors; frames from a shot are then aligned and integrated into a static mosaic for each shot. The proposed system is 200-300% faster in executing the registration step for our input video sequences.
Multispectral satellite imagery segmentation using a simplified JSEG approach
QiuXiao Chen, JianCheng Luo, ChengHu Zhou
It is a big challenge to segment remote sensing images especially multispectral satellite imagery due to their unique features. In consideration of the fact that satellite imagery are playing an increasingly important role, we conducted the research on segmentation of such imagery. Since multispectral satellite imagery are more similar to natural color images than to other types of images, it is more likely that studies on natural color images segmentation can be extended to multispectral satellite imagery. The obstacle of applying these studies into multispectral satellite imagery lies into their inefficiency when dealing with the large size of images. Therefore, based on a natural color image segmentation approach -- JSEG, we proposed a more efficient one. First, a grid-based cluster initialization approach is proposed to obtain the initial cluster centers, based on which, a fast image quantization approach is implemented. Second, a feature image named J-image to describe local homogeneity is obtained. Then a watershed approach is applied to the J-image, and initial segmentation results are obtained. At last, based on the histogram similarity of each region, a simplified growth merging approach is proposed and the final segmentation results are obtained. By comparing the result of the JSEG approach and the proposed one, we found that the latter is rather efficient and accurate. Advice on further studies is also presented.
An algorithm of camera self-calibration
Dong Liang, Lu Wang
A new algorithm of camera self-calibration is proposed in this paper according to the situation that the intrinsic camera parameters remain unchanged during the image shooting. The advantage of this algorithm is that it does not need to make any prior assumption for the intrinsic camera parameters. We use conjugate gradient method to estimate the unknown scale factors in Kruppa equations; and then solve Kruppa equations linearly by the estimated scale factors and calibrate the intrinsic camera parameters. The validity of the proposed algorithm has been confirmed by experiments.
A secure watermarking scheme for copyright protection
Jiashu Zhang, Lei Tian, Heng-Ming Tai, et al.
This paper presents a chaotic watermarking scheme for copyright protection. The proposed method employs the singular-value-decomposition (SVD) based watermarking scheme and the encrypted watermark based on the chaotic maps. The rightful owner possesses two secret keys: one related to the owner, another to the original image. Chaotic maps are used with the keys for watermark encryption so as to enhance the anti-counterfeit and noninvertibility properties of the watermark. Examples are illustrated to demonstrate the robustness and security of the proposed watermarking scheme.
High-level feature extraction in JPEG compressed domain
Traditional feature extraction techniques like the KLT, Harris and Wavelet work only in the uncompressed domain. Hence an additional step of decompression is required before any of them could be applied. We propose a two-level technique for extracting high-level feature points directly from JPEG compressed images. At the first level, the Discrete Cosine Transform (DCT) blocks having high activity content are filtered using a variance measure. At the next level, a DCT block centered at every pixel present in the filtered block is constructed from the neighboring DCT blocks. Feature points are then selected by analyzing the AC coefficients of the DCT block centered about it. The proposed method is simple and efficient. The extracted feature points were found to be rich in information content, which could be used for image registration. The results of this technique showed almost the same amount of repeatability between two images with 60% to 70% overlap, when compared with techniques available in the uncompressed domain. The features thus extracted can directly be used to calculate the motion parameters between two images in the compressed domain.
Mosaicing MPEG video sequences
Video Mosaicing is the process of obtaining single unified picture from multiple frames of a video sequence. It involves finding the geometric transformation parameters between frames and aligning them to form a single image representing content of full video. The proposed mosaicing method takes advantage of the information encoded by MPEG in the form of motion vectors, DCT blocks and the error information for generating the mosaics. Only the motion vectors corresponding to blocks of high activity region are considered for finding affine parameters. The feasibility of the method has been tested on panorama views from various MPEG sequences. The performance of proposed method was evaluated against the existing uncompressed domain methods.
An efficient wavelet-based motion estimation algorithm
Jin-Woo Bae, Seung-Hyun Lee, Ji-Sang Yoo
In this paper, we propose a wavelet-based fast motion estimation algorithm for video sequence encoding with a low bit-rate. By using one of the properties of the wavelet transform, multi-resolution analysis (MRA), and the spatial interpolation of an image, we can simultaneously reduce the prediction error and the computational complexity inherent in video sequence encoding. In addition, by defining a significant block (SB) based on the differential information of the wavelet coefficients between successive frames, the proposed algorithm enables us to make up for the increase in the number of motion vectors when the MRME algorithm is used. As a result, we are not only able to improve the peak signal-to-noise ratio (PSNR), but also reduce the computational complexity by up to 67%.
Simulation of early vision mechanisms and application to object shape recognition
In early stages of vision, the images are processed to generate "maps" or point-by-point distributions of values of various quantities including the edge elements, fields of local motion, depth maps and color constancy, etc. These features are then refined and processed in visual cortex. The next stage is recognition which also leads to simple control of behaviors such as steering and obstacle avoidance, etc. In this paper we present a system for object shape recognition that utilizes the features extracted by use of human vision model. The first block of the system performs processing analogous to that in retina for edge feature extraction. The second block represents the processing in visual cortex, where features are refined and combined to form a stimulus to be presented to the recognition model. We use the normalized distances of the edge pixels from the mean to form a feature vector. The next block that accomplishes the task of recognition consists of a counterpropagation neural network model. We use gray scale images of 3D objects to train and test the performance of the system. The experiments show that the system can recognize the objects with some variations in rotation, scaling and translation.
Multilayer cellular neural network and fuzzy C-mean classifiers: comparison and performance analysis
Neural Networks and Fuzzy systems are considered two of the most important artificial intelligent algorithms which provide classification capabilities obtained through different learning schemas which capture knowledge and process it according to particular rule-based algorithms. These methods are especially suited to exploit the tolerance for uncertainty and vagueness in cognitive reasoning. By applying these methods with some relevant knowledge-based rules extracted using different data analysis tools, it is possible to obtain a robust classification performance for a wide range of applications. This paper will focus on non-destructive testing quality control systems, in particular, the study of metallic structures classification according to the corrosion time using a novel cellular neural network architecture, which will be explained in detail. Additionally, we will compare these results with the ones obtained using the Fuzzy C-means clustering algorithm and analyse both classifiers according to its classification capabilities.
Optoelectronic implementation of multilayer perceptron and Hopfield neural networks
Andrzej W. Domanski, Mikolaj K. Olszewski, Tomasz R. Wolinski
In this paper we present an optoelectronic implementation of two networks based on multilayer perceptron and the Hopfield neural network. We propose two different methods to solve a problem of lack of negative optical signals that are necessary for connections between layers of perceptron as well as within the Hopfield network structure. The first method applied for construction of multilayer perceptron was based on division of signals into two channels and next to use both of them independently as positive and negative signals. The second one, applied for implementation of the Hopfield model, was based on adding of constant value for elements of matrix weight. Both methods of compensation of lack negative optical signals were tested experimentally as optoelectronic models of multilayer perceptron and Hopfield neural network. Special configurations of optical fiber cables and liquid crystal multicell plates were used. In conclusion, possible applications of the optoelectronic neural networks are briefly discussed.
Imaging and Representation II
icon_mobile_dropdown
A new lifting scheme for lossless image compression
Since its first introduction, the lifting scheme has become a powerful method to perform wavelet transforms and many other orthogonal transforms. Especially for integer-to-integer wavelet transforms, the lifting scheme is an indispensable tool for lossless image compression. Earlier work has shown that the number of lifting steps can have an impact on the transform performance. The fidelity of integer-to-integer transforms depends entirely on how well they approximate thir original wavelet transforms. The predominant source of errors is due to the rounding-off of the intermediate real result to an integer at each lifting step. Hence, a wavelet transform with a large number of lifting steps would automatically increase the approximation error. In the case of lossy compression, the approximation error is less important because it is usually masked out by the transform coefficient quantization error. However, in the case of lossless compression, the compression performance is certainly affected by the approximation error. Consequently, the number of lifting steps in a wavelet transform is a major concern. The new lifting method presented in this paper reduces the number of lifting steps substantially in lossless data compression. Thus, it also significantly improves the overall rounding errors incurred in the real-to-integer conversion process at each of the lifting steps. The improvement of the overall rounding errors is more pronounced in the integer-to-integer orthogonal wavelet transforms, but the improvement in the integer-to-integer biorthogonal wavelet transforms is also significant. In addition, as a dividend, the new lifting method further saves memory space and decreases signal delay. Many examples on popular wavelet transforms are included.
A fast adaptive lifting method for lossless hyperspectral data compression
The real advantage of using a wavelet transform for image data compression is the power of adapting to local statistics of the pixels. In hyperspectral data, many but not all spectral planes are well correlated. In each spectral plane, the spatial data is composed of patches of relatively smooth areas segmented by edges. The smooth areas can be well compressed by a relatively long wavelet transform with a large number of vanishing moments. However, for the regions around edges, shorter wavelet transforms are preferable. Despite the fact that the local statistics of both the spectral and spatial data change from pixel to pixel, almost all known image data compression algorithms use only one wavelet transform for the entire dataset. For example, the current international still image data compression standard, JPEG2000, has adopted the 5/3 wavelet transform as the default standard for lossless image data compression for all images. There is not a single wavelet filter that performs uniformly better than the others. Thus, it would be beneficial to use many types of wavelet filters based on local activities of the image. The selected wavelet transform can thus be best adapted to the content of the image locally. In this paper, we have derived a fast adaptive lifting scheme that can easily switch wavelet filters from one to the other. The adaptation is performed on a pixel-by-pixel basis, and it does not need any bookkeeping overhead. It is known that the lifting scheme is a fast and powerful tool to implement all wavelet transforms. Especially for integer-to-integer wavelet transforms, the lifting scheme is an indispensable tool for lossless image compression. Taking advantage of our newly developed lossless lifting scheme, the fast adaptive lifting algorithm presented in this paper not only saves two lifting steps but also improves accuracy compared to the conventional lifting scheme for lossless data compression. Moreover, our simulation results for ten two-dimensional images have shown that the fast adaptive lifting scheme outperforms both of the lossless wavelet tranforms used in JPEG2000 and the S+P transform in lossless SPIHT algorithm.
Design of a high-definition imaging (HDI) analysis technique adapted to challenging environments
This paper presents a highly automated, more accurate approach to High Definition Imaging (HDI) using low signal-to-noise digital videos recorded at ground-based telescopes. The HDI approach involves the acquisition of a video sequence (103 - 105 fields) taken through a turbulent atmosphere followed by three-step post-processing. The specific goal is to be able to reproduce expert results, while limiting human interaction, to study both surface features and the atmospheres of planets and moons. The telescopes used here are preferably small and not equipped with Adaptive Optics. The three steps include registration, selection and restoration. First, registration, based on a template, is performed to find the exact position of each object. Then only higher-quality frames are selected by a criterion based on a measure of the blur in a region of interest around that object. The best quality frames are then shifted and added together to create an effective time exposure under ideal observing conditions. The last step is to remove distortions in the image, caused by the atmosphere and the optical equipment, through a regularized deconvolution of instrument and residual atmospheric blur. This procedure is done first in the white light domain, and then the registration information obtained there is applied to spectral data.
Multimedia Networking and Architectures
icon_mobile_dropdown
Complexity analysis of scalable motion-compensated wavelet video decoders
Scalable wavelet video coders based on Motion Compensated Temporal Filtering (MCTF) have been shown to exhibit good coding efficiency over a large range of bit-rates, in addition to providing spatial, temporal and SNR scalabilities. However, the complexity of these wavelet video coding schemes has not been thoroughly investigated. In this paper, we analyze the computational complexity of a fully-scalable MCTF-based wavelet video decoder that is likely to become part of the emerging MPEG-21 standard. We model the change in computational complexity of various components of the decoder as a function of bit-rate, encoding parameters such as filter types for spatial and temporal decomposition and the number of decomposition levels, and sequence characteristics. A key by-product of our analysis is the observation that fixed-function hardware accelerators are not appropriate for implementing these next generation fully scalable video decoders. The absolute complexity of the various functional units as well as their relative complexity varies depending on the transmission bit-rate, thereby requiring different hardware/software architecture support at different bit-rates. To cope with these variations, a preliminary architecture comprising of a reconfigurable co-processor and a general purpose processor is proposed as an implementation platform for these video decoders. We also propose an algorithm to utilize the co-processor efficiently.