Proceedings Volume 6312

Applications of Digital Image Processing XXIX

cover
Proceedings Volume 6312

Applications of Digital Image Processing XXIX

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 24 August 2006
Contents: 9 Sessions, 57 Papers, 0 Presentations
Conference: SPIE Optics + Photonics 2006
Volume Number: 6312

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Models and Processing I
  • Image Coding and Structures
  • Imaging Security
  • Wireless and Mobile Multimedia Systems
  • H.264/AVC and Applications
  • MPEG-7 and Applications
  • Image Models and Processing II
  • Image Models and Processing III
  • Poster Session
Image Models and Processing I
icon_mobile_dropdown
Locally adaptive detection of differences in images
We introduce a method for unsupervised change detection under non-uniform changes of intensity. A locally adaptive normalizing window is correlated with the two images, and a morphological processing is then applied to isolate objects that have been added or removed from the scene. The multiplicative model used represent well image changes of intensity when the locations of the light sources is unchanged between images, but when the illuminating source changes location between exposures, the model depends on the geometry of the surroundings. Computer simulations show that the method works well when the model is satisfied. An example of detection of camouflaged targets is presented. In real images with light sources changed, artifacts are introduced in the difference image. Results from images taken with a web camera are shown.
Resolution modification and context based image processing for retinal prosthesis
Golshah Naghdy, Chris Beston, Jong-Mo Seo, et al.
This paper focuses on simulating image processing algorithms and exploring issues related to reducing high resolution images to 25 x 25 pixels suitable for the retinal implant. Field of view (FoV) is explored, and a novel method of virtual eye movement discussed. Several issues beyond the normal model of human vision are addressed through context based processing.
Object detection by radial basis neural network filtering of spectral data
Tom G. Thomas Jr., M. Serkan Ozkan, Ye Tung, et al.
An object recognition technique has been developed that allows the rapid screening of multispectral images for objects with known spectral signatures. The technique is based on the configuration of a radial basis neural network (RBN) that is specific for a particular object spectral signature or series of object spectral signatures. The method has been used to identify features in CASI-2 and HYDICE images with results comparable to conventional spatial object recognition techniques with a significant reduction in processing time. Radial basis neural networks have several advantages over the more common backpropagation neural networks, including better selectivity and faster training, resulting in a significant reduction in overall image processing time and greater accuracy.
Analyzing target detection performance with multispectral fused images
J. Lanir, M. Maltz
With the advance in multispectral imaging, the use of image fusion has emerged as a new and important research area. Many studies have considered the advantages of specific fusion methods over the individual input bands in terms of human performance, yet few comparison studies have been conducted to determine which fusion method is preferable to another. This paper examines four different fusion methods, and compares human performance of observers viewing fused images in a target detection task. In the presented experiment, we implemented an approach that has not been generally used in the context of image fusion evaluation: we used the paired comparison technique to qualitatively assess and scale the subjective value of the fusion methods. Results indicated that the false color and average methods showed the best results.
Discrete filters and transforms for digital image data analysis
Block-based discrete transform domain algorithms are developed to retrieve information from digital image data. Specifically, discrete, real, and circular Fourier transforms of the data blocks are filtered by coefficients chosen in the discrete frequency domain to improve feature detection. In this paper, the proposed approach is applied to improve the identification of edge discontinuities in digital image data.
An overlap-invariant mutual information estimation method for image registration
A class of implementations of mutual information (MI) based image registration estimate MI from the joint histogram of the overlap of two images. The consequence of this approach is that the MI estimate thus obtained is not overlap invariant: its value tends to increase when the overlapped region is getting smaller. When the two images are very noisy or are so different that the correct MI peak is very weak, it may lead to incorrect registration results using the maximization of mutual information (MMI) criterion. In this paper, we present a new joint histogram estimation scheme for overlap invariant MI estimation. The idea is to keep it a constant the number of samples used for joint histogram estimation. When one image is completely within another, this condition is automatically satisfied. When one image (floating image) partially overlaps another image (reference image) after applying a certain geometric transformation, it is possible that, for a pixel from the floating image, there is no corresponding point in the reference image. In this case, we generate its corresponding point by assuming that its value is a random variable following the distribution of the reference image. In this way, the number of samples utilized for joint histogram estimation is always the same as that of the floating image. The efficacy of this joint histogram estimation scheme is demonstrated by using several pairs of remote sensing images. Our results show that the proposed method is able to produce a mutual information measure that is less sensitive to the size of overlap and the peak found is more reliable for image registration.
Millimeter-wave video sequence denoising and enhancement in concealed weapons detection application
Xiaohui Wei, Hua-Mei Chen, Ishfaq Amad
In this paper, we present an adaptive algorithm to improve the quality of millimeter-wave video sequence by separating each video frame into foreground region and background region, and handle them differently. We separate the foreground from background area by using an adaptive Kalman filter. The background is then denoised by both spatial and temporal algorithms. The foreground is denoised by the block-based motion compensated averaging, and enhanced by wavelet-based multi-scale edge representation. Finally further adaptive contrast enhancement is applied to the reconstructed foreground. The experimental results show that our algorithm is able to produce a sequence with smoother background, more reduced noise, more enhanced foreground and higher contrast of the region of interest.
Image Coding and Structures
icon_mobile_dropdown
Scalable three-dimensional SBHP algorithm with region of interest access and low complexity
A low-complexity three-dimensional image compression algorithm based on wavelet transforms and set-partitioning strategy is presented. The Subband Block Hierarchial Partitioning (SBHP) algorithm is modified and extended to three dimensions, and applied to every code block independently. The resultant algorithm, 3D-SBHP, efficiently encodes 3D image data by the exploitation of the dependencies in all dimensions, while enabling progressive SNR and resolution decompression and Region-of-Interest (ROI) access from the same bit stream. The code-block selection method by which random access decoding can be achieved is outlined.The resolution scalable and random access performances are empirically investigated. The results show 3D-SBHP is a good candidate to compress 3D image data sets for multimedia applications.
Reduction of blocking artifacts using side information
Block-based image and video coding systems are used extensively in practice. In low bit-rate applications, however, they suffer from blocking artifacts. Reduction of blocking artifacts can improve visual quality and PSNR. Most methods in the literature that are proposed to reduce blocking artifacts apply post-processing techniques to the compressed image. One major benefit of such methods is no modification to current encoders. In this paper, we propose an approach where blocking artifacts are reduced using side information transmitted by the encoder. One major benefit of this approach is the ability to compare the processed image directly with the original undegraded image to improve the performance. For example, we could process an image with different methods and transmit the most effective method as part of the side information. A major question in using our proposed approach and compare it with a post-processing type of system and illustrate that the proposed approach has the potential to be beneficial in both visual quality and PSNR for some range of coding bit-rates.
Deinterlacing based on motion compensation with variable block sizes
Inho Kim, Taeuk Jeong, Chulhee Lee
In this paper, we propose a new deinterlacing algorithm based on motion estimation and compensation with variable block size. Motion compensated methods using a fixed block size tend to produce undesirable artifacts when there exist complicated motion and high frequency components. In the proposed algorithm, the initial block size of motion estimation is determined based on the existence of global motion. Then, the block is further divided depending on block characteristics. Since motion compensated deinterlacing may not always provide satisfactory results, the proposed method also use an intrafield spatial deinterlacing. Experimental results show that the proposed method provides noticeable improvements compared to motion compensated deinterlacing with a fixed block size.
An experimental comparison of block matching techniques for detection of moving objects
The detection of moving objects in complex scenes is the basis of many applications in surveillance, event detection, and tracking. Complex scenes are difficult to analyze due to camera noise and lighting conditions. Currently, moving objects are detected primarily using background subtraction algorithms, with block matching techniques as an alternative. In this paper, we complement our earlier work on the comparison of background subtraction methods by performing a similar study of block matching techniques. Block matching techniques first divide a frame of a video into blocks and then determine where each block has moved from in the preceding frame. These techniques are composed of three main components: block determination, which specifies the blocks; search methods, which specify where to look for a match; and, the matching criteria, which determine when a good match has been found. In our study, we compare various options for each component using publicly available video sequences of a traffic intersection taken under different traffic and weather conditions. Our results indicate that a simple block determination approach is significantly faster with minimum performance reduction, the three step search method detects more moving objects, and the mean-squared-difference matching criteria provides the best performance overall.
Hardware acceleration of the motion compensation algorithms in synthetic aperture radar (SAR) platforms
Fernando E. Ortiz, James P. Durbano, Eric J. Kelmelis, et al.
Synthetic Aperture Radar (SAR) techniques employ radar waves to generate high-resolution images in all illumination/weather conditions. The onboard implementation of the image reconstruction algorithms allows for the transmission of real-time video feeds, rather than raw radar data, from unmanned aerial vehicles (UAVs), saving significant communication bandwidth. This in turn saves power, enables longer missions, and allows the transmission of more useful information to the ground. For this application, we created a hardware architecture for a portable implementation of the motion compensation algorithms, which are more computationally intensive than the SAR reconstruction itself, and without which the quality of the SAR images is severely degraded, rendering them unusable.
The Rice coding algorithm achieves high-performance lossless and progressive image compression based on the improving of integer lifting scheme Rice coding algorithm
Xie Cheng Jun, Yan Su, Zhang Wei
In this paper, a modified algorithm was introduced to improve Rice coding algorithm and researches of image compression with the CDF (2,2) wavelet lifting scheme was made. Our experiments show that the property of the lossless image compression is much better than Huffman, Zip, lossless JPEG, RAR, and a little better than (or equal to) the famous SPIHT. The lossless compression rate is improved about 60.4%, 45%, 26.2%, 16.7%, 0.4% on average. The speed of the encoder is faster about 11.8 times than the SPIHT's and its efficiency in time can be improved by 162%. The speed of the decoder is faster about 12.3 times than that of the SPIHT's and its efficiency in time can be rasied about 148%. This algorithm, instead of largest levels wavelet transform, has high coding efficiency when the wavelet transform levels is larger than 3. For the source model of distributions similar to the Laplacian, it can improve the efficiency of coding and realize the progressive transmit coding and decoding.
Application study of piecewise context-based adaptive binary arithmetic coding combined with modified LZC
An algorithm of combining LZC and arithmetic coding algorithm for image compression is presented and both theory deduction and simulation result prove the correctness and feasibility of the algorithm. According to the characteristic of context-based adaptive binary arithmetic coding and entropy, LZC was modified to cooperate the optimized piecewise arithmetic coding, this algorithm improved the compression ratio without any additional time consumption compared to traditional method.
Imaging Security
icon_mobile_dropdown
Hierarchical indexing using R-trees for replica detection
Yannick Maret, David Marimón, Frédéric Dufaux, et al.
Replica detection is a prerequisite for the discovery of copyright infringement and detection of illicit content. For this purpose, content-based systems can be an efficient alternative to watermarking. Rather than imperceptibly embedding a signal, content-based systems rely on content similarity concepts. Certain content-based systems use adaptive classifiers to detect replicas. In such systems, a suspected content is tested against every original, which can become computationally prohibitive as the number of original contents grows. In this paper, we propose an image detection approach which hierarchically estimates the partition of the image space where the replicas (of an original) lie by means of R-trees. Experimental results show that the proposed system achieves high performance. For instance, a fraction of 0.99975 of the test images are filtered by the system when the test images are unrelated to any of the originals while only a fraction of 0.02 of the test images are rejected when the test image is a replica of one of the originals.
Toward a secure JPEG
In this paper, we propose a Secure JPEG, an open and flexible standardized framework to secure JPEG images. Its goal is to allow the efficient integration and use of security tools enabling a variety of security services such as confidentiality, integrity verification, source authentication or conditional access. In other words, Secure JPEG aims at accomplishing for JPEG what JPSEC enables for JPEG 2000. We describe in more details three specific examples of security tools. The first one addresses integrity verification using a hash function to compute local digital signatures. The second one considers the use of encryption for confidentiality. Finally, the third describes a scrambling technique.
Perceptually driven 3D distance metrics with application to watermarking
Guillaume Lavoué, Elisa Drelie Gelasca, Florent Dupont, et al.
This paper presents an objective structural distortion measure which reflects the visual similarity between 3D meshes and thus can be used for quality assessment. The proposed tool is not linked to any specific application and thus can be used to evaluate any kinds of 3D mesh processing algorithms (simplification, compression, watermarking etc.). This measure follows the concept of structural similarity recently introduced for 2D image quality assessment by Wang et al.1 and is based on curvature analysis (mean, standard deviation, covariance) on local windows of the meshes. Evaluation and comparison with geometric metrics are done through a subjective experiment based on human evaluation of a set of distorted objects. A quantitative perceptual metric is also derived from the proposed structural distortion measure, for the specific case of watermarking quality assessment, and is compared with recent state of the art algorithms. Both visual and quantitative results demonstrate the robustness of our approach and its strong correlation with subjective ratings.
Wireless and Mobile Multimedia Systems
icon_mobile_dropdown
Sparse super-resolution reconstructions of video from mobile devices in digital TV broadcast applications
Choong S. Boon, Onur G. Guleryuz, Toshiro Kawahara, et al.
We consider the mobile service scenario where video programming is broadcast to low-resolution wireless terminals. In such a scenario, broadcasters utilize simultaneous data services and bi-directional communications capabilities of the terminals in order to offer substantially enriched viewing experiences to users by allowing user participation and user tuned content. While users immediately benefit from this service when using their phones in mobile environments, the service is less appealing in stationary environments where a regular television provides competing programming at much higher display resolutions. We propose a fast super-resolution technique that allows the mobile terminals to show a much enhanced version of the broadcast video on nearby high-resolution devices, extending the appeal and usefulness of the broadcast service. The proposed single frame super-resolution algorithm uses recent sparse recovery results to provide high quality and high-resolution video reconstructions based solely on individual decoded frames provided by the low-resolution broadcast.
Intelligent video display to raise quality of experience on mobile devices
Changick Kim, Jaeseung Ko, Ilkoo Ahn, et al.
Mobile devices have been transformed from voice communication tools to advanced tools for consuming multimedia contents. The extensive use of such mobile devices entails watching multimedia contents on small LCD panels. However, the most of video sequences are captured for normal viewing on standard TV or HDTV, but for cost reasons, merely resized and delivered without additional editing. This may give the small-display-viewers uncomfortable experiences in understanding what is happening in a scene. For instance, in a soccer video sequence taken by a long-shot camera technique, the tiny objects (e.g., soccer ball and players) may not be clearly viewed on the small LCD panel. Thus, an intelligent display technique needs to be developed to provide small-display-viewers with better experience. To this end, one of the key technologies is to determine region of interest (ROI) and display the magnified ROI on the screen, where ROI is a part of the scene that viewers pay more attention to than other regions. In this paper, which is an extension from our prior work, we focus on soccer video display for mobile devices, and a fully automatic and computationally efficient method is proposed. Instead of taking generic approaches utilizing visually salient features, we take domain-specific approach to exploit the attributes of the soccer video. The proposed scheme consists of two stages: shot classification, and ROI determination. The experimental results show that the proposed scheme offers useful tools for intelligent video display for multimedia mobile devices.
Effect of parameterization and joint layer control for video streaming over wireless network
Mikhail Pozhenko, Sang-bum Suh, Sungkwan Heo, et al.
The effect of parameterization combined with joint layer control for video streaming over wireless IEEE 802.11 network is presented in this paper. We describe an architecture that provides the cross-layer optimization and prioritization of the video streaming. The proposed approach allows us to assign various parameter sets to each priority class independently. We examine the performance of the FEC mechanism available at the application layer for the efficient and robust transmission of MPEG-coded video over IEEE 802.11a WLANs. Besides, we develop a joint-layer control policy to reduce bandwidth consumption and maximize the received video quality by investigating and dynamically selecting the optimal combination of application-layer FEC, interleaving depth, UDP packet size and MAC retransmission rate. The performance of the proposed concept will be shown by carrying out experiments in real wireless networks. The results demonstrate that even a small number of parameters utilized in the optimization process have a significant influence on the received video quality and resources consumption.
H.264/AVC and Applications
icon_mobile_dropdown
Introduction and overview of scalable video coding (SVC)
H.264/MPEG-4 AVC, finished in May, 2003, is now a well-established video-coding standard, and derivative standardization projects are beginning to emerge based on it. The first of these is the so-called Scalable Video Coding (SVC) project. Launched within MPEG about the time that AVC was finishing, it was later moved in 2005 to the Joint Video Team (JVT); the JVT is a joint committee of video experts set up by ISO/IEC MPEG and ITU-T/VCEG back in 2001 to develop AVC. The SVC project aims to develop a fully scalable video codec based on the AVC codec as its backbone. While several previous scalable codecs have already been standardized before (i.e., in MPEG-2, H.263, MPEG-4), each has seen barriers to deployment, mainly based on inadequate performance against single-rate coding. SVC, due out in 2007, appears on the brink of overcoming those barriers to finally bring scalable coding to fruition. This paper aims at an elementary, general account of its current status, which seems unavailable in the literature.
Performance comparison of JPEG2000 and H.264/AVC high profile intra-frame coding on HD video sequences
Pankaj Topiwala, Trac Tran, Wei Dai
This paper reconsiders the rate-distortion performance comparison of JPEG2000 with H.264/AVC High Profile I-frame coding for high definition (HD) video sequences. This work is a follow-on to our paper at SPIE 05 [14], wherein we further optimize both codecs. This also extends a similar earlier study involving H.264/AVC Main Profile [2]. Coding simulations are performed on a set of 720p and 1080p HD video sequences, which have been commonly used for H.264/AVC standardization work. As expected, our experimental results show that H.264/AVC I-frame coding offers consistent R-D performance gains (around 0.2 to 1 dB in peak signal-to-noise ratio) over JPEG2000 color image coding. As in [1, 2], we do not consider scalability, complexity in this study (JPEG2000 is used in non-scalable, but optimal mode).
On comparing JPEG2000 and intraframe AVC
Mourad Ouaret, Frederic Dufaux, Touradj Ebrahimi
In this work, a performance evaluation of AVC Intra and JPEG2000 in terms of rate-distortion performance is conducted. A rich set of test sequences with different spatial resolutions is used in this evaluation. Furthermore, the comparison is made with both the Main and High profiles of AVC Intra. For high spatial resolution sequences, our results show that JPEG2000 is very competitive with AVC High Profile Intra and outperforms the Main Profile. For Intermediate and low spatial resolution sequences JPEG2000 is outperformed by both Profiles of AVC Intra.
AVC/H.264 patent portfolio license
MPEG LA, LLC offers a joint patent license for the AVC (a/k/a H.264) Standard (ISO/IEC IS 14496-10:2004). Like MPEG LA's other licenses, the AVC Patent Portfolio License is offered for the convenience of the marketplace as an alternative enabling users to access essential intellectual property owned by many patent holders under a single license rather than negotiating licenses with each of them individually. The AVC Patent Portfolio License includes essential patents owned by DAEWOO Electronics Corporation; Electronics and Telecommunications Research Institute (ETRI); France Telecom, societe anonyme; Fujitsu Limited; Hitachi, Ltd.; Koninklijke Philips Electronics N.V.; LG Electronics Inc.; Matsushita Electric Industrial Co., Ltd.; Microsoft Corporation; Mitsubishi Electric Corporation; Robert Bosch GmbH; Samsung Electronics Co., Ltd.; Sedna Patent Services, LLC; Sharp Kabushiki Kaisha; Siemens AG; Sony Corporation; The Trustees of Columbia University in the City of New York; Toshiba Corporation; UB Video Inc.; and Victor Company of Japan, Limited. Another is expected also to join as of August 1, 2006. MPEG LA's objective is to provide worldwide access to as much AVC essential intellectual property as possible for the benefit of AVC users. Therefore, any party that believes it has essential patents is welcome to submit them for evaluation of their essentiality and inclusion in the License if found essential.
An efficient H.264-based video encoder using multiscale recurrent patterns
Nuno M. M. Rodrigues, Eduardo A. B. da Silva, Murilo B. de Carvalho, et al.
We investigate new and efficient methods for coding the motion compensated residues in a hybrid video coder framework, that are able to improve upon the performance of the very successful DCT based adaptive block size integer transform used in H.264/AVC. We use an algorithm based on adaptive block size recurrent pattern matching, that encodes each block of motion compensated predicted data using a scaled pattern stored into an adaptive dictionary. We refer to this algorithm as Multidimensional Multiscale Parser, or MMP. A video encoding method is presented, that uses MMP instead of the integer transform in an H.264/AVC encoder framework. Experimental results show that MMP is capable of achieving consistent gains on the final average PSNR, when the new encoder is compared with H.264/AVC's high profile.
Quality analysis of requantization transcoding architectures for H.264/AVC
Stijn Notebaert, Jan De Cock, Davy De Schrijver, et al.
Reduction of the bitrate of video content is necessary in order to satisfy the different constraints imposed by networks and terminals. A fast and elegant solution for the reduction of the bitrate is requantization, which has been successfully applied on MPEG-2 bitstreams. Because of the improved intra prediction in the H.264/AVC specification, existing transcoding techniques are no longer suitable. In this paper we compare requantization transcoders for H.264/AVC bitstreams. The discussion is restricted to intra 4x4 macroblocks only, but the same techniques are also applicable to intra 16x16 macroblocks. Besides the open-loop transcoder and the transcoder with mode reuse, two architectures with drift compensation are described, one in the pixel domain and the other in the transform domain. Experimental results show that these architectures approach the quality of the full decode and recode architecture for low to medium bitrates. Because of the reduced computational complexity of these architectures, in particular the transform-domain compensation architecture, they are highly suitable for real-time adaptation of video content.
Comparison of MPEG-2 and AVC coding on synthetic test materials
Charles Fenimore, John Roberts
The resources for evaluation of moving imagery coding include a variety of subjective and objective methods for quality measurement. These are applied to a variety of imagery, ranging from synthetically-generated to live capture. NIST has created a family of synthetic motion imagery (MI) materials providing image elements such as moving spirals, blocks, text, and spinning wheels. Through the addition of a colored noise background, the materials support the generation of graded levels of MI coding impairments such as image blocking and mosquito noise, impairments that are found in imagery coded with Motion Pictures Expert Group (MPEG) and similar codecs. For typical available synthetic imagery, human viewers respond unfavorably to repeated viewings; so in this case, the use of objective (computed) metrics for evaluation of quality is preferred. Three such quality metrics are described: a standard peak-signal-to-noise measure, a new metric of edge-blurring, and another of added-edge-energy. As applied to the NIST synthetic clips, the metrics confirm an approximate doubling [1] of compression efficiency between two commercial codecs, one an implementation of AVC/H.264 and the other of MPEG-2.
MPEG-7 and Applications
icon_mobile_dropdown
MPEG-7 multimedia-based query format
Searching multimedia content for image, audio, and video is getting more attention especially for personal media content due to the affordability of consumer electronic devices such as MP3 recordable players, digital cameras, DV camcorders, and well-integrated smart phones. The precise search and retrieval of the content derived from these devices can be a very challenging task. Many leading edge search engine vendors have been applying sophisticated and advanced indexing and retrieval techniques on various text-based document formats, but when it comes to retrieving multimedia content, searching based on the media clip filename is the most common practice. As a result, there is an imprecise and ineffective user experience for searching multimedia content. This paper presents a new development underway from a joint effort between International Organization for Standardization (ISO)/International Electrotechnial Commission (IEC) Subcommittee (SC) 29 Working Group (WG) 11 MPEG (Moving Picture Experts Group) and WG1 JPEG (Joint Picture Experts Group) for a universal standard query format called MPEG-7 Query Format (MP7QF) as a means to enable a good user experience for consumers searching multimedia content. It also provides the industry with a unified way to accept and respond to user queries. This paper presents the core requirements for such a universal query format.
Using MPEG-7 audio descriptors for music querying
M. Gruhne, C. Dittmar
Due to the growing amount of digital audio an increasing need to automatically categorize music and to create self-controlled and suitable playlists has been emerged. A few approaches to this task relying on low-level features have been published so far. Unfortunately the results utilizing those technologies are not sufficient yet. This paper gives an introduction how to enhance the results with regard to the perceptual similarity using different high-level descriptors and a powerful interaction between the algorithm and the user to consider his preferences. A successful interaction between server and client requires a powerful standardized query language. This paper describes the tools of the MPEG-7 Audio standard in detail and gives examples of already established query languages. Furthermore the requirements of a multimedia query language are identified and its application is exemplified by an automatic audio creation system using a query language.
Recent advances in MPEG-7 cameras
We propose a smart camera which performs video analysis and generates an MPEG-7 compliant stream. By producing a content-based metadata description of the scene, the MPEG-7 camera extends the capabilities of conventional cameras. The metadata is then directly interpretable by a machine. This is especially helpful in a number of applications such as video surveillance, augmented reality and quality control. As a use case, we describe an algorithm to identify moving objects and produce the corresponding MPEG-7 description. The algorithm runs in real-time on a Matrox Iris P300C camera.
Image Models and Processing II
icon_mobile_dropdown
Exploitation of target shadows in synthetic aperture radar imagery for automatic target recognition
John A. Saghri, Andrew DeKelaita
The utility of target shadows for automatic target recognition (ATR) in synthetic aperture radar (SAR) imagery is investigated. Although target shadow, when available, is not a powerful target discriminating feature, it can effectively increase the overall accuracy of the target classification when it is combined with other target discriminating features such as peaks, edges, and corners. A second and more important utility of target shadow is that it can be used to identify the target pose. Identification of the target pose before the recognition process reduces the number of reference images used for comparison/matching, i.e., the training sets, by at least fifty percent. Since implementation and the computation complexity of the pose detection algorithm is relatively simple, the proposed two-step process, i.e., pose detection followed matching, considerably reduces the complexity of the overall ATR system.
Surface registration technique for close-range mapping applications
Close-range mapping applications such as cultural heritage restoration, virtual reality modeling for the entertainment industry, and anatomical feature recognition for medical activities require 3D data that is usually acquired by high resolution close-range laser scanners. Since these datasets are typically captured from different viewpoints and/or at different times, accurate registration is a crucial procedure for 3D modeling of mapped objects. Several registration techniques are available that work directly with the raw laser points or with extracted features from the point cloud. Some examples include the commonly known Iterative Closest Point (ICP) algorithm and a recently proposed technique based on matching spin-images. This research focuses on developing a surface matching algorithm that is based on the Modified Iterated Hough Transform (MIHT) and ICP to register 3D data. The proposed algorithm works directly with the raw 3D laser points and does not assume point-to-point correspondence between two laser scans. The algorithm can simultaneously establish correspondence between two surfaces and estimates the transformation parameters relating them. Experiment with two partially overlapping laser scans of a small object is performed with the proposed algorithm and shows successful registration. A high quality of fit between the two scans is achieved and improvement is found when compared to the results obtained using the spin-image technique. The results demonstrate the feasibility of the proposed algorithm for registering 3D laser scanning data in close-range mapping applications to help with the generation of complete 3D models.
Fractal dimension based corneal fungal infection diagnosis
Madhusudhanan Balasubramanian, A. Louise Perkins, Roger W. Beuerman, et al.
We present a fractal measure based pattern classification algorithm for automatic feature extraction and identification of fungus associated with an infection of the cornea of the eye. A white-light confocal microscope image of suspected fungus exhibited locally linear and branching structures. The pixel intensity variation across the width of a fungal element was gaussian. Linear features were extracted using a set of 2D directional matched gaussian-filters. Portions of fungus profiles that were not in the same focal plane appeared relatively blurred. We use gaussian filters of standard deviation slightly larger than the width of a fungus to reduce discontinuities. Cell nuclei of cornea and nerves also exhibited locally linear structure. Cell nuclei were excluded by their relatively shorter lengths. Nerves in the cornea exhibited less branching compared with the fungus. Fractal dimensions of the locally linear features were computed using a box-counting method. A set of corneal images with fungal infection was used to generate class-conditional fractal measure distributions of fungus and nerves. The a priori class-conditional densities were built using an adaptive-mixtures method to reflect the true nature of the feature distributions and improve the classification accuracy. A maximum-likelihood classifier was used to classify the linear features extracted from test corneal images as 'normal' or 'with fungal infiltrates', using the a priori fractal measure distributions. We demonstrate the algorithm on the corneal images with culture-positive fungal infiltrates. The algorithm is fully automatic and will help diagnose fungal keratitis by generating a diagnostic mask of locations of the fungal infiltrates.
Autonomous characterization of plastic-bonded explosives
Kim Dalton Linder, Paul DeRego, Antonio Gomez, et al.
Plastic-Bonded Explosives (PBXs) are a newer generation of explosive compositions developed at Los Alamos National Laboratory (LANL). Understanding the micromechanical behavior of these materials is critical. The size of the crystal particles and porosity within the PBX influences their shock sensitivity. Current methods to characterize the prominent structural characteristics include manual examination by scientists and attempts to use commercially available image processing packages. Both methods are time consuming and tedious. LANL personnel, recognizing this as a manually intensive process, have worked with the Kansas City Plant / Kirtland Operations to develop a system which utilizes image processing and pattern recognition techniques to characterize PBX material. System hardware consists of a CCD camera, zoom lens, two-dimensional, motorized stage, and coaxial, cross-polarized light. System integration of this hardware with the custom software is at the core of the machine vision system. Fundamental processing steps involve capturing images from the PBX specimen, and extraction of void, crystal, and binder regions. For crystal extraction, a Quadtree decomposition segmentation technique is employed. Benefits of this system include: (1) reduction of the overall characterization time; (2) a process which is quantifiable and repeatable; (3) utilization of personnel for intelligent review rather than manual processing; and (4) significantly enhanced characterization accuracy.
Image Models and Processing III
icon_mobile_dropdown
An ultrahigh-speed digitizer for the Harvard College Observatory astronomical plates
R. J. Simcoe, J. E. Grindlay, E. J. Los, et al.
A machine capable of digitizing two 8 inch by 10 inch (203 mm by 254 mm) glass astrophotographic plates or a single 14 inch by 17 inch (356 mm by 432 mm) plate at a resolution of 11 μm per pixel or 2309 dots per inch (dpi) in 92 seconds is described. The purpose of the machine is to digitize the ~500,000 plate collection of the Harvard College Observatory in a five-year time frame. The digitization must meet the requirements for scientific work in astrometry, photometry, and archival preservation of the plates. This paper describes the requirements for and the design of the subsystems of the machine that was developed specifically for this task.
Calibration and stability analysis of medium-format digital cameras
Ayman Habib, Paul Quackenbush, Jennifer Lay, et al.
Recent developments in digital cameras in terms of an increase in size of the charged coupled device and the complementary metal oxide semiconductor arrays, as well as a reduction in costs, are leading to their use for traditional and new photogrammetric, surveying, and mapping functions. Such usage should be preceded by careful calibration of the implemented cameras in order to determine their interior orientation parameters. In addition, the wide diversity of expected users mandates the development of a convenient calibration procedure that does not require professional photogrammetrists and/or surveyors. This paper introduces a methodology for calibrating medium-format digital cameras using a test field consisting of straight lines and a few signalized point targets. A framework for the automatic extraction of the linear features and the point targets from the images, and for their incorporation into the calibration procedure, is presented and tested. In addition, the research introduces an approach for testing the camera stability, in which the degree of similarity between the bundles reconstructed from two sets of interior orientation parameters is quantitatively evaluated. Experimental results with real data proved the feasibility of the line-based self-calibration approach. In addition, the analysis of the internal characteristics of the utilized camera estimated from various calibration sessions revealed the camera's stability over a long period.
A hybrid spatiotemporal and Hough-based motion estimation approach applied to magnetic resonance cardiac images
N. Carranza, G. Cristóbal, F. Sroubek, et al.
Myocardial motion analysis and quantification is of utmost importance for analyzing contractile heart abnormalities and it can be a symptom of a coronary artery disease. A fundamental problem in processing sequences of images is the computation of the optical flow, which is an approximation to the real image motion. This paper presents a new algorithm for optical flow estimation based on a spatiotemporal-frequency (STF) approach, more specifically on the computation of the Wigner-Ville distribution (WVD) and the Hough Transform (HT) of the motion sequences. The later is a well-known line and shape detection method very robust against incomplete data and noise. The rationale of using the HT in this context is because it provides a value of the displacement field from the STF representation. In addition, a probabilistic approach based on Gaussian mixtures has been implemented in order to improve the accuracy of the motion detection. Experimental results with synthetic sequences are compared against an implementation of the variational technique for local and global motion estimation, where it is shown that the results obtained here are accurate and robust to noise degradations. Real cardiac magnetic resonance images have been tested and evaluated with the current method.
Image fusion with the multiscale Hermite transform
The steered multiscale Hermite transform is introduced as a tool for image fusion. It is shown how this transform's particular characteristics, closely related to important visual perception properties, efficiently reproduce relevant image structures in the fused products. Two cases of remote sensing image fusion are presented, namely multispectral with panchromatic fusion and SAR with multispectral fusion. In the latter, a noise reduction algorithm also based on the Hermite transform is incorporated within the fusion scheme so that characteristic SAR image speckle is reduced and thus limited from corrupting fused products.
Poster Session
icon_mobile_dropdown
Registration of large data sets for multimodal inspection
Registration plays a key role in multimodal data fusion to extract synergistic information from multiple non-destructive evaluation (NDE) sources. One of the common techniques for registration of point datasets is the Iterative Closest Point (ICP) Algorithm. Generally, modern day NDE techniques generate large datasets and conventional ICP algorithm requires huge amount of time to register datasets to the desired accuracy. In this paper, we present algorithms to aid in the registration of large 3D NDE data sets in less time with the required accuracy. Various methods of coarse registration of data, partial registration and data reduction are used to realize this. These techniques have been used in registration and it is shown that registration can be accomplished to the desired accuracy with more than 90% reduction in time as compared to conventional ICP algorithm. Volumes of interest (VOI) can be defined on the data sets and merged together so that only the features of interest are used in the registration. The proposed algorithm also provides capability for eliminating noise in the data sets. Registration of Computed Tomography (CT) Image data, Coordinate Measuring Machine (CMM) Inspection data and CAD model has been discussed in the present work. The algorithm is generic in nature and can be applied to any other NDE inspection data.
Dual-views display: dual-layer LCDs enable high-resolution full-screen viewing
Kunio Sakamoto, Masayuki Yoshigi
This paper describes tabletop display systems utilizing the stereoscopic 3D display technology. The authors have ever researched 3D display systems using the polarized glasses and liquid crystal shutter glasses, the image splitter such as a parallax barrier or the lenticular screen and the holographic optical elements1)2)3). These image splitting technologies for displaying a stereoscopic 3D image are available for developing the tabletop display that can provide different images to two users surrounding the system. To separate dual images using the polarizer slits, it is necessary to display the orthogonal polarized two images. We developed the dual LCD panel using two liquid crystal layers to make the thin and compact display system. This display panel enables observers to view full screen high resolution images. This study shows that it is possible to simplify the optical system.
Pseudoscopic-free high-resolution lenticular 3D display
Masataka Nishida, Kunio Sakamoto
The rear projection lenticular 3D display system using a projector has superior characteristics, such as having a large screen with wide field of view. However, a conventional system has disadvantages such as having images with divided horizontal resolution. We describe the 3D display system using a lenticular screen attached with vertically striped polarizer slits. This 3D display can avoid the problem of conventional system because this display shows twice the 3D image resolution. Moreover, we propose the lenticular display using double polarizer slit for elimination of pseudoscopic viewing area to solve the pseudoscopic image problem.
Passive ranging within a differential framework
Qingguo Yang, Liren Liu, De'an Liu, et al.
Finding the distance of object in a scene from vision information is an important problem in machine vision. A large number of techniques for passive ranging of unknown objects have been developed over the years (i.e. range from stereo, motion, focus and defocus). Nearly all such techniques may be framed in terms of a differential formalism. In the case of binocular stereo, two different images are taken from cameras at different discrete viewpoints, similarly, difference between consecutive images are often used to determine viewpoint derivatives for structure from motion and two or more different images taken from cameras with different aperture size are used to compute the derivative respect to aperture size for range from focus and defocus method. All this methods may be fallen into a discrete differentiation category. Farid proposed a consecutive differentiation method for range estimation which employs the intensity variation of the images along with the aperture changes to measure the range information. In this paper, we first consider the plenoptic function which is a powerful mathematical tool for understanding the primary vision problem. We then show an algorithm within a differential framework for range estimation based on the assumption of brightness constancy. Finally we show several implementations of passive ranging using this differential algorithm.
Adaptive SDF filters for recognition of partially occluded objects
One of the main problems in visual image processing is incomplete information owing an occlusion of objects by other objects. Since correlation filters mainly use contour information of objects to carry out pattern recognition then conventional correlation filters without training often yield a poor performance to recognize partially occluded objects. Adaptive correlation filters based on synthetic discriminant functions for recognition of partially occluded objects imbedded into a cluttered background are proposed. The designed correlation filters are adaptive to an input test scene, which is constructed with fragments of the target, false objects, and background to be rejected. These filters are able to suppress sidelobes of the given background as well as false objects. The performances of the adaptive filters in real scenes are compared with those of various correlation filters in terms of discrimination capability and robustness to noise.
Detection and localization of degraded objects
In pattern recognition two different tasks are distinguished; that is, detection of objects and estimation of their exact positions (localization) in images. Traditional methods for pattern recognition are based on correlation or template matching. These methods are attractive because they can be easily implemented with digital or optical processors. However, they are very sensitive to intensity degradations that always are present in observed images. In this paper we analyze and compare correlation-based methods for reliable detection and localization of degraded objects.
Implementation of the DMV-based 3D target tracking and monitoring system
Jung-Hwan Ko, Jung-Suk Lee
In this paper, a new 3D object tracking system using the disparity motion vector (DMV) is presented. In the proposed method, the time-sequential disparity maps are extracted from the sequence of the stereo input image pairs and these disparity maps are used to sequentially estimate the DMV defined as a disparity difference between two consecutive disparity maps similarly to motion vectors in the conventional video signals, the DMV provides us with motion information of a moving target by showing a relatively large change in the disparity values in the target areas. Accordingly, this DMV helps detect the target area and its location coordinates. Based on these location data of a moving target, the pan/tilt embedded in the stereo camera system can be controlled and consequently achieve real-time stereo tracking of a moving target. From the results of experiments with 9 frames of the stereo image pairs having 256x256 pixels, it is shown that the proposed DMV-based stereo object tracking system can track the moving target with a relatively low error ratio of about 3. 5 % on average.
Stereo camera-based intelligent UGV system for path planning and navigation
Jung-Suk Lee, Jung-Hwan Ko, Dal-Do Chungb
In this paper, a new real-time and intelligent mobile robot system for path planning and navigation using stereo camera embedded on the pan/tilt system is proposed. In the proposed system, face area of a moving person is detected from a sequence of the stereo image pairs by using the YCbCr color model and using the disparity map obtained from the left and right images captured by the pan/tilt-controlled stereo camera system and depth information can be detected. And then, the distance between the mobile robot system and the face of the moving person can be calculated from the detected depth information. Accordingly, based-on the analysis of these data, three-dimensional objects can be detected. Finally, by using these detected data, 2-D spatial map for a visually guided robot that can plan paths, navigate surrounding objects and explore an indoor environment is constructed. From some experiments on target tracking with 480 frames of the sequential stereo images, it is analyzed that error ratio between the calculated and measured values of the relative position is found to be very low value of 1.4 % on average. Also, the proposed target tracking system has achieved a high speed of 0.04 sec/frame for target detection and 0.06 sec/frame for target tracking.
Extraction of desirable details with adaptive rank-order filters
In this paper, we use adaptive rank-order filters for localization and extraction of desirable details from images. To improve the performance of linear correlations, we use novel local adaptive correlations based on nonparametric Spearman's correlation. These filters are based on correlation between the ranks of the input scene computed in a moving window and those of the target. Their performance and noise robustness are compared with those of the conventional linear correlation. Computer simulation results are provided and discussed.
Single encoder and decoder design for multi-view video
Haksoo Kim, Wonkuk Son, Artem Ignatov, et al.
The progress of data transmission technology through the Internet has spread a variety of realistic contents. One of such contents is multi-view video that is acquired from multiple camera sensors. In general, the multi-view video processing requires encoders and decoders as many as the number of cameras, and thus the processing complexity results in difficulties of practical implementation. To solve for this problem, this paper considers a simple multi-view system utilizing a single encoder and a single decoder. In the encoder side, input multi-view YUV sequences are combined on GOP units by a video mixer. Then, the mixed sequence is compressed by an H.264/AVC encoder. The decoding is composed of a single decoder and a scheduler controlling the decoding process. The goal of the scheduler is to assign approximately identical number of decoded frames to each view sequence by estimating the decoder utilization of a GOP and subsequently applying frame skip algorithms. Furthermore, in the frame skip, efficient frame selection algorithms are studied for H.264/AVC baseline and main profiles based upon a cost function that is related to perceived video quality. Our proposed method has been performed on various multi-view test sequences adopted by MPEG 3DAV. Experimental results show that approximately identical decoder utilization is achieved for each view sequence so that each view sequence is fairly displayed. Finally, the performance of the proposed method is compared with a simulcast encoder in terms of bit-rate and PSNR using a rate-distortion curve.
A robust digital watermarking technique with improved performance under JPEG compression
Fang Fang, Songxin Tan
Digital watermarking is an important technique to protect copyrighted multimedia data. This technique works by hiding secret information into the images. Therefore, it can be used to discourage illicit copying or distribution of copyrighted materials. In this paper, we propose a robust frequency domain digital watermarking algorithm for still image based on discrete cosine transformation. Adjustable parameters are introduced during the watermark embedding process, which adaptively change the JPEG quantization factor, as well as the depth at which the watermark is embedded. The proposed watermarking technique still maintains its validity under certain image processing operations such as low pass filtering, image cropping, etc. Compared with previous method, however, it has improved performance under Joint Photographic Experts Group (JPEG) compression attack. The extracted watermark maintains its high quality in terms of normalized correlation even under a high JPEG compression ratio.
Correlation pattern recognition: optimal parameters for quality standards control of chocolate marshmallow candy
Jorge L. Flores, G. García-Torales, Cristina Ponce Ávila
This paper describes an in situ image recognition system designed to inspect the quality standards of the chocolate pops during their production. The essence of the recognition system is the localization of the events (i.e., defects) in the input images that affect the quality standards of pops. To this end, processing modules, based on correlation filter, and segmentation of images are employed with the objective of measuring the quality standards. Therefore, we designed the correlation filter and defined a set of features from the correlation plane. The desired values for these parameters are obtained by exploiting information about objects to be rejected in order to find the optimal discrimination capability of the system. Regarding this set of features, the pop can be correctly classified. The efficacy of the system has been tested thoroughly under laboratory conditions using at least 50 images, containing 3 different types of possible defects.
Resolution improvement of computationally reconstructed 3D images by use of intermediate elemental images
Jae-Sung Park, Dong-Choon Hwang, Dong-Hak Shin, et al.
In this paper, a new resolution-enhanced computational integral imaging reconstruction method employing an intermediate-view reconstruction technique is presented. In the proposed method, a number of intermediate elemental images can be synthesized as many as required from the limited number of picked up elemental images by using the IVR technique. With sufficient overlapping of this increased number of elemental images in the reconstruction image plane, a resolution-enhanced 3D image can be displayed. To show a feasibility of the proposed scheme, some experiments were performed and its results were presented as well.
A simple SVC algorithm incorporated with the DMB video codec
I. S. Hwang, I. K. Park, H. J. Kim, et al.
We propose a very simple scalable video coding (SVC) system based on the H.264 baseline profile codec. The proposed SVC algorithm can offer three levels of the temporal and spatial scalability - QVGA@15fps, QVGA@30fps, and VGA@30fps. The proposed system achieves the temporal scalability by encoding every other picture as the non-reference P-picture, so that the base layer codec dealing with the QVGA@15fps sequence is fully-compatible with the satellite-digital multimedia broadcasting (S-DMB) system in Korea. In addition, the same decoder can reconstruct the QVGA@30fps sequence when it receives the bits representing the non-reference pictures. For the spatial enhancement layer, the encoder follows the standard H.264 baseline profile except the inter-layer intra prediction. To reduce the computational burden of the encoder, the enhancement layer encoder may skip the motion estimation procedure by interpolating the motion field with that of the base layer. Simulation results show that the proposed system yields less then about 12% of loss in the reconstruction picture quality compared with the anchor H.264 JM encoder. The proposed SVC system still has a room for improvement of coding efficiency by trading with the computational complexity, so that lots of further works are required.
A fast level set implementation method for image segmentation and object tracking
The high computational complexity of level set methods has excluded themselves from many real-time applications. The high algorithm complexity is mainly due to the need of solving partial differential equations (PDEs) numerically. For image segmentation and object tracking applications, it is possible to approximate level set curve evolution process without solving PDEs since we are interested in the final object boundary instead of the accurate curve evolution process. This paper proposes a fast parallel method to simplify curve evolution process using simple binary morphological operations. The proposed fast implementation allows real-time image segmentation and object tracking using level set curve evolution, while preserves the advantage of level set methods for automatically handling topological changes. It can utilize the parallel processing capability of existing embedded hardware, parallel computers or optical processors for fast curve evolution.
Image preprocessing for fast multiple-frame super-resolution reconstruction
Super-resolution reconstruction algorithms have been demonstrated to be very effective in enhancing image spatial resolution by combining several low-resolution images to yield a single high-resolution image. However, the high computational complexity has become a major obstacle for the use of super-resolution techniques in real time applications. Most previous computationally efficient super-resolution techniques have been focused on reducing the number of iterations due to the iterative nature of most super-resolution algorithms. In this paper, we propose a region-of-interest (ROI) image preprocessing technique to improve the processing speed of super-resolution reconstruction. To better integrate the preprocessing with super-resolution, the proposed ROI extraction technique is developed under the same statistical framework as super-resolution. Simulation results are provided to demonstrate the performance of the proposed method.
Stereovision-based 2D spatial map construction for a safe vehicle driving
Jung-Hwan Ko, Jung-Suk Lee
In this paper, the method for an effective and intelligent route decision of an unmanned ground vehicle (UGV) using a 2D spatial map of the stereo camera system is proposed. The depth information and disparity map are detected in the inputting images of a parallel stereo camera. The distance between the automatic moving robot and the obstacle detected and the 2D spatial map obtained from the location coordinates, and then the relative distance between the obstacle and the other objects obtained from them. The unmanned ground vehicle moves automatically by effective and intelligent route decision using the obtained 2D spatial map. From some experiments on robot driving with 24 frames of the stereo images, it is analyzed that error ratio between the calculated and measured values of the distance between the objects is found to be very low value of 1.57% on average, respectably.
Theoretical optimization of artificial ice for Arctic seas
The proposed method offers one possibility to restore climate in order to avoid overheating. The method to create a supplemental icy cover is considered in the paper. We investigate theoretically the creation of artificial rafts in the border of water-ice area in the north seas. Firstly, such artificial rafts or films can be used as additional mirror for san energy. Secondly, these rafts can decrease local water vibration for the ice to form easily in north regions. And finally, these rafts can be treated as crystallization centers in the supercooled water.
Biological crystal alignment using image processing
Kazimierz J. Gofron, Krzysztof Lazarski, Michael Molitsky, et al.
Crystal location and alignment to the x-ray beam is an enabling technology necessary for automation of the macromolecular crystallography at synchrotron beamlines. In a process of crystal structure determination, a small size x-ray synchrotron beam with FWHM as small as 70 μm (bending magnet beamlines) and 20 μm (undulator beamlines) is focused at or downstream of the crystal sample. Protein crystals used in structure determination become smaller and approach 50 μm or less, and need to be precisely placed in the focused x-ray beam. At the Structural Biology Center the crystals are mounted on a goniostat, allowing precise crystal xyz positioning and rotations. One low and two high magnification cameras integrated into synchrotron beamline permit imaging of the crystal mounted on a goniostat. The crystals are held near liquid nitrogen temperatures using cryostream to control secondary radiation damage. Image processing techniques are used for automatic and precise placing of protein crystals in synchrotron beam. Here we are discussing automatic crystal centering process considered for Structure Biology Center utilizing several image processing techniques.