Proceedings Volume 6696

Applications of Digital Image Processing XXX

cover
Proceedings Volume 6696

Applications of Digital Image Processing XXX

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 12 September 2007
Contents: 9 Sessions, 81 Papers, 0 Presentations
Conference: Optical Engineering + Applications 2007
Volume Number: 6696

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 6696
  • Video and Image Technologies
  • Processing and Implementation Technologies I
  • Interaction Between Image Processing, Optics, and Photonics
  • Mobile Video
  • IDCT
  • Processing and Implementation Technologies II
  • Workshop on Optics in Entertainment
  • Poster Session
Front Matter: Volume 6696
icon_mobile_dropdown
Front Matter: Volume 6696
This PDF file contains the front matter associated with SPIE Proceedings Volume 6696, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Video and Image Technologies
icon_mobile_dropdown
A comparative study of JPEG2000, AVC/H.264, and HD photo
Francesca De Simone, Mourad Ouaret, Frederic Dufaux, et al.
In this paper, we report a study evaluating rate-distortion performance between JPEG 2000, AVC/H.264 High 4:4:4 Intra and HD Photo. A set of ten high definition color images with different spatial resolutions has been used. Both the PSNR and the perceptual MSSIM index were considered as distortion metrics. Results show that, for the material used to carry out the experiments, the overall performance, in terms of compression efficiency, are quite comparable for the three coding approaches, within an average range of ±10% in bitrate variations, and outperforming the conventional JPEG.
Complexity modeling for context-based adaptive binary arithmetic coding (CABAC) in H.264/AVC decoder
Szu-Wei Lee, C.-C. Jay Kuo
One way to save the power consumption in the H.264 decoder is for the H.264 encoder to generate decoderfriendly bit streams. By following this idea, a decoding complexity model of context-based adaptive binary arithmetic coding (CABAC) for H.264/AVC is investigated in this research. Since different coding modes will have an impact on the number of quantized transformed coeffcients (QTCs) and motion vectors (MVs) and, consequently, the complexity of entropy decoding, the encoder with a complexity model can estimate the complexity of entropy decoding and choose the best coding mode to yield the best tradeoff between the rate, distortion and decoding complexity performance. The complexity model consists of two parts: one for source data (i.e. QTCs) and the other for header data (i.e. the macro-block (MB) type and MVs). Thus, the proposed CABAC decoding complexity model of a MB is a function of QTCs and associated MVs, which is verified experimentally. The proposed CABAC decoding complexity model can provide good estimation results for variant bit streams. Practical applications of this complexity model will also be discussed.
Low-complexity MPEG-2 to H.264 transcoding
Jan Lievens, Dieter Van de Walle, Jan De Cock, et al.
In this paper, two systems for low-complexity MPEG-2 to H.264 transcoding are presented. Both approaches reuse the MPEG-2 motion information in order to avoid computationally expensive H.264 motion estimation. In the first approach, inter- and intra-coded macroblocks are treated separately. Since H.264 applies intra-prediction, while MPEG-2 does not, intra-blocks are completely decoded and re-encoded. For inter-coded macroblocks, the MPEG-2 macroblock types and motion vectors are first converted to their H.264 equivalents. Thereafter, the quantized DCT coefficients of the prediction residuals are dequantized and translated to equivalent H.264 IT coefficients using a single-step DCT-to-IT transform. The H.264 quantization of the IT coefficients is steered by a rate-control algorithm enforcing a constant bit-rate. While this system is computationally very efficient, it suffers from encoder-decoder drift due to its open-loop structure. The second transcoding solution eliminates encoder-decoder drift by performing full MPEG-2 decoding followed by rate-controlled H.264 encoding using the motion information present in the MPEG-2 source material. This closed-loop solution additionally allows dyadic resolution scaling by performing downscaling after the MPEG-2 decoding and appropriate MPEG-2 to H.264 macroblock type and motion vector conversion. Experimental results show that, in terms of PSNR, the closed-loop transcoder significantly outperforms the open-loop solution. The latter introduces drift, mainly as a result of the difference in sub-pixel interpolation between H.264 and MPEG-2. Complexity-wise, the closed-loop transcoder requires on average 30 % more processing time than the openloop system. The closed-loop transcoder is shown to deliver compression performance comparable to standard MPEG-2 encoding.
PixonVision real-time video processor
PixonImaging LLC and DigiVision, Inc. have developed a real-time video processor, the PixonVision PV-200, based on the patented Pixon method for image deblurring and denoising, and DigiVision's spatially adaptive contrast enhancement processor, the DV1000. The PV-200 can process NTSC and PAL video in real time with a latency of 1 field (1/60th of a second), remove the effects of aerosol scattering from haze, mist, smoke, and dust, improve spatial resolution by up to 2x, decrease noise by up to 6x, and increase local contrast by up to 8x. A newer version of the processor, the PV-300, is now in prototype form and can handle high definition video. Both the PV-200 and PV-300 are FPGA-based processors, which could be spun into ASICs if desired. Obvious applications of these processors include applications in the DOD (tanks, aircraft, and ships), homeland security, intelligence, surveillance, and law enforcement. If developed into an ASIC, these processors will be suitable for a variety of portable applications, including gun sights, night vision goggles, binoculars, and guided munitions. This paper presents a variety of examples of PV-200 processing, including examples appropriate to border security, battlefield applications, port security, and surveillance from unmanned aerial vehicles.
Performance evaluation of H.264/AVC decoding and visualization using the GPU
Bart Pieters, Dieter Van Rijsselbergen, Wesley De Neve, et al.
The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model. This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation (MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based H.264/AVC renderers.
Video error concealment with outer and inner boundary matching algorithms
Low-complexity error concealment techniques for missing macroblock (MB) recovery in mobile video delivery based on the boundary matching principle is extensively studied and evaluated in this work. We first examine the boundary matching algorithm (BMA) and the outer boundary matching algorithm (OBMA) due to their excellent trade-off in complexity and visual quality. Their good performance is explained, and additional experiments are given to identify their strengths and weaknesses. Then, two more extensions of OBMA are presented. One is obtained by extending the search pattern for performance improvement at the cost of additional complexity. The other is based on the use of multiple overlapped outer boundary layers.
New quality metrics for digital image resizing
Hongseok Kim, Soundar Kumara
Digital image rescaling by interpolation has been intensively researched over past decades, and still getting constant attention from many applications such as medical diagnosis, super-resolution, image blow-up, nano-manufacturing, etc. However, there are no consented metrics to objectively assess and compare the quality of resized images. Some existing measures such as peak-signal-to-noise ratio (PSNR) or mean-squared error (MSE), widely used in image restoration area, do not always coincide with the opinions from viewers. Enlarged digital images generally suffer from two major artifacts: blurring, zigzagging, and those undesirable effects especially around edges significantly degrade the overall perceptual image quality. We propose two new image quality metrics to measure the degree of the two major defects, and compare several existing interpolation methods using the proposed metrics. We also evaluate the validity of image quality metrics by comparing rank correlations.
Compressed-domain motion detection for efficient and error-resilient MPEG-2 to H.264 transcoding
Jan Lievens, Peter Lambert, Dieter Van de Walle, et al.
In this paper, a novel compressed-domain motion detection technique, operating on MPEG-2-encoded video, is combined with H.264 flexible macroblock ordering (FMO) to achieve efficient, error-resilient MPEG-2 to H.264 transcoding. The proposed motion detection technique first extracts the motion information from the MPEG-2-encoded bit-stream. Starting from this information, moving regions are detected using a region growing approach. The macroblocks in these moving regions are subsequently encoded separately from those in background regions using FMO. This can be used to increase error resilience and/or to realize additional bit-rate savings compared to traditional transcoding.
HD Photo: a new image coding technology for digital photography
Sridhar Srinivasan, Chengjie Tu, Shankar L. Regunathan, et al.
This paper introduces the HD Photo coding technology developed by Microsoft Corporation. The storage format for this technology is now under consideration in the ITU-T/ISO/IEC JPEG committee as a candidate for standardization under the name JPEG XR. The technology was developed to address end-to-end digital imaging application requirements, particularly including the needs of digital photography. HD Photo includes features such as good compression capability, high dynamic range support, high image quality capability, lossless coding support, full-format 4:4:4 color sampling, simple thumbnail extraction, embedded bitstream scalability of resolution and fidelity, and degradation-free compressed domain support of key manipulations such as cropping, flipping and rotation. HD Photo has been designed to optimize image quality and compression efficiency while also enabling low-complexity encoding and decoding implementations. To ensure low complexity for implementations, the design features have been incorporated in a way that not only minimizes the computational requirements of the individual components (including consideration of such aspects as memory footprint, cache effects, and parallelization opportunities) but results in a self-consistent design that maximizes the commonality of functional processing components.
Performance comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo
Trac D. Tran, Lijie Liu, Pankaj Topiwala
This paper provides a detailed rate-distortion performance comparison between JPEG2000, Microsoft HD Photo, and H.264/AVC High Profile 4:4:4 I-frame coding for high-resolution still images and high-definition (HD) 1080p video sequences. This work is an extension to our previous comparative study published in previous SPIE conferences [1, 2]. Here we further optimize all three codecs for compression performance. Coding simulations are performed on a set of large-format color images captured from mainstream digital cameras and 1080p HD video sequences commonly used for H.264/AVC standardization work. Overall, our experimental results show that all three codecs offer very similar coding performances at the high-quality, high-resolution setting. Differences tend to be data-dependent: JPEG2000 with the wavelet technology tends to be the best performer with smooth spatial data; H.264/AVC High-Profile with advanced spatial prediction modes tends to cope best with more complex visual content; Microsoft HD Photo tends to be the most consistent across the board. For the still-image data sets, JPEG2000 offers the best R-D performance gains (around 0.2 to 1 dB in peak signal-to-noise ratio) over H.264/AVC High-Profile intra coding and Microsoft HD Photo. For the 1080p video data set, all three codecs offer very similar coding performance. As in [1, 2], neither do we consider scalability nor complexity in this study (JPEG2000 is operating in non-scalable, but optimal performance mode).
Processing and Implementation Technologies I
icon_mobile_dropdown
An EO surveillance system for harbor security
Micro USA, Inc. has designed and built an Electro-Optical (EO) system for the detection and surveillance of submerged objects in harbor and open ocean waters. The system consists of a digital camera and advanced image processing software. The camera system uses a custom designed CMOS light sensor array to facilitate the detection of low contrast underwater targets. The software system uses wavelet-based image processing to remove atmospheric and ocean reflectance as well as other noise. Target detection is achieved through a suite of optimal channel prediction algorithms. The novelties of the system are (a) camera calibration routine to remove pixel non-linearities, non-uniformities and lens effect, (b) adaptive algorithms for low-contrast detection. Our system has been tested in San Diego harbor waters and initial results indicate that we can detect small (2 feet diameter), low-reflectance (5%) targets underwater to greater than 2 diffusion depth (DD). This diffusion depth translates to about 25 feet in harbor waters.
Image analysis for the identification of coherent structures in plasma
Turbulence at the edge of the plasma in a nuclear fusion reactor can cause loss of confinement of the plasma. In an effort to study the edge turbulence, the National Spherical Torus Experiment uses a gas puff imaging (GPI) diagnostic to capture images of the turbulence. A gas puff is injected into the torus and visible light emission from the gas cloud is captured by an ultra high-speed camera. Our goal is to detect and track coherent structures in the GPI images to improve our understanding of plasma edge turbulence. In this paper, we present results from various segmentation methods for the identification of the coherent structures. We consider three categories of methods - immersion-based, region-growing, and model-based - and empirically evaluate their performance on four sample sequences. Our preliminary results indicate that while some methods can be sensitive to the settings of parameters, others show promise in being able to detect the coherent structures.
Real-time detection of targets in hyperspectral images using radial basis neural network filtering
A spectral target recognition technique has been developed that detects targets in hyperspectral images in real time. The technique is based on the configuration of a radial basis neural network filter that is specific for a particular target spectral signature or series of target spectral signatures. Detection of targets in actual 36-band CASI, and 210-band HYDICE images is compared to existing recognition techniques and results in considerable reduction in overall image processing time and greater accuracy than existing spectral processing algorithms.
PixonVision real-time Deblurring Anisoplanaticism Corrector (DAC)
DigiVision, Inc. and PixonImaging LLC have teamed to develop a real-time Deblurring Anisoplanaticism Corrector (DAC) for the Army. The DAC measures the geometric image warp caused by anisoplanaticism and removes it to rectify and stabilize (dejitter) the incoming image. Each new geometrically corrected image field is combined into a running-average reference image. The image averager employs a higher-order filter that uses temporal bandpass information to help identify true motion of objects and thereby adaptively moderate the contribution of each new pixel to the reference image. This result is then passed to a real-time PixonVision video processor (see paper 6696-04 note, the DAC also first dehazes the incoming video) where additional blur from high-order seeing effects is removed, the image is spatially denoised, and contrast is adjusted in a spatially adaptive manner. We plan to implement the entire algorithm within a few large modern FPGAs on a circuit board for video use. Obvious applications are within the DOD, surveillance and intelligence, security and law enforcement communities. Prototype hardware is scheduled to be available in late 2008. To demonstrate the capabilities of the DAC, we present a software simulation of the algorithm applied to real atmosphere-corrupted video data collected by Sandia Labs.
ATR for 3D medical imaging
This paper presents a novel concept of Automatic Target Recognition (ATR) for 3D medical imaging. Such 3D imaging can be obtained from X-ray Computerized Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Ultrasonography (USG), functional MRI, and others. In the case of CT, such 3D imaging can be derived from 3D-mapping of X-ray linear attenuation coefficients, related to 3D Fourier transform of Radon transform, starting from frame segmentation (or contour definition) into an object and background. Then, 3D template matching is provided, based on inertial tensor invariants, adopted from rigid body mechanics, by comparing the mammographic data base with a real object of interest, such as a malignant breast tumor. The method is more general than CAD breast mammography.
Image enhancement methods for the visually impaired
O. Bogillo, U. Efron
A novel image enhancement algorithm which simplifies image content and enhances image contrast and color saturation is proposed. The capability to improve the image discriminability by patients with central scotoma was evaluated using computer simulation. Image enhancement and discriminability simulation were based on modeling of contrast sensitivity loss for the low vision patient with central scotoma size of ±10 degrees of the visual field. The results are compared with other methods of image enhancement. The simulations results suggest that the proposed method performs well compared with other tested algorithms, showing significant increase in the average image discriminability, measured using the d' parameter.
An efficient method of noise suppression in security systems
This paper is devoted to denoising technique for video noise removal and deals with advanced WT (Wavelet Transform) based method of noise suppression for security purposes. Many sources of unwanted distortion exist in the real surveillance system especially when the sensing is done at extremely low light level conditions. Our goal was to optimize the WT based algorithm to be applicable for the noise suppression in security videos with high computational efficiency. Preprocessing is applied to the output of the sensing system to make the video data more suitable for further denoising. Then a WT based statistical denoising method is applied. The method uses BLSE (Bayesian Least Square Error Estimator) of WT coefficients while utilizing generalized Laplacian PDF modeling and optimized moment method for parameters estimation. Several tests have been done to verify high noise suppression performance, computational efficiency and low distortion of important features. Experimental results show that the described method performs well for wide range of light conditions and respective signal-to-noise ratios.
Toward a tongue-based task triggering interface for computer interaction
Luis R. Sapaico, Masayuki Nakajima
A system able to detect the existence of the tongue and locate its relative position within the surface of the mouth by using video information obtained from a web camera is proposed in this paper. The system consists of an offline phase, prior to the the operation by the final user, in which a 3-layer cascade of SVM learning classifiers are trained using a database of 'tongue vs. not-tongue' images, that correspond to segmented images containing our region of interest, the mouth with the tongue in three possible positions: center, left or right. The first training stage discerns whether the tongue is present or not, giving the output data to the next stage, in which the presence of the tongue in the center of the mouth is evaluated; finally, in the last stage, a left vs. right position detection is assessed. Due to the novelty of the proposed system, a database needed to be created by using information gathered from different people of distinct ethnic backgrounds. While the system has yet to be tested in an online stage, results obtained from the offline phase show that it is feasible to achieve a real-time performance in the near future. Finally, diverse applications to this prototype system are introduced, demonstrating that the tongue can be effectively used as an alternative input device by a broad range of users, including people with some physical disability condition.
Pattern recognition and signal analysis in a Mach-Zehnder type phasing sensor
I. Surdej, H. Lorch, L. Noethe, et al.
The primary mirror of future Extremely Large Telescopes will be composed of hundreds of individual segments. Misalignments in piston and tip-tilt of such segments must be reduced to a small fraction of the observing wavelength in order not to affect the image quality of these telescopes. In the framework of the Active Phasing Experiment carried out at ESO, new phasing techniques based on the concept of pupil plane detection will be tested. The misalignments of the segments produce amplitude variations at locations on a CCD detector corresponding to the locations of the segment edges. The position of the segment edges on a CCD image must first be determined with pixel accuracy in order to localize the signals which can be analyzed in a second phase with a robust signal analysis algorithm. A method to retrieve the locations of the edges and a phasing algorithm to measure the misalignments between the segments with an accuracy of a few nanometers have been developed. This entire phasing procedure will be presented. The performance of the pattern recognition algorithm will be studied as a function of the number of photons, the amplitude of the segment misalignments and their distribution. Finally, the accuracy achieved under conditions similar to the ones met during observation will be discussed.
Exploitation of hyperspectral imagery using adaptive resonance networks
Hyperspectral imagery consists of a large number of spectral bands that is typically modeled in a high dimensional spectral space by exploitation algorithms. This high dimensional space usually causes no inherent problems with simple classification methods that use Euclidean distance or spectral angle for a metric of class separability. However, classification methods that use quadratic metrics of separability, such as Mahalanobis distance, in high dimensional space are often unstable, and often require dimension reduction methods to be effective. Methods that use supervised neural networks or manifold learning methods are often very slow to train. Implementations of Adaptive Resonance Theory, such as fuzzy ARTMAP and distributed ARTMAP have been successfully applied to single band imagery, multispectral imagery, and other various low dimensional data sets. They also appear to converge quickly during training. This effort investigates the behavior of ARTMAP methods on high dimensional hyperspectral imagery without resorting to dimension reduction. Realistic-sized scenes are used and the analysis is supported by ground truth knowledge of the scenes. ARTMAP methods are compared to a back-propagation neural network, as well as simpler Euclidean distance and spectral angle methods.
Vegetation classification using hyperspectral remote sensing and singular spectrum analysis
B. Hu, Qingmou Li
In this study, classification was investigated based on seasonal variation of the state parameters of vegetation canopies as inferred from visible and near-infrared spectral bands. This analysis was carried out on data collected over agricultural fields with the hyperspectral CHRIS (Compact High Resolution Imaging Spectrometer) in May, June and July, 2004. The singular spectrum analysis was used to remove noise in each reflectance spectrum of the whole image. Decision tree classification was performed on different features, such as reflectance, vegetation indices, and principal components acquired by PCA (Principal Component Analysis) and MNF (Minimum Noise Fraction). The results demonstrated noise-removal using SSA increased classification accuracy by 3-6 percentages depending on the features used. Classification using MNF components was shown to provide the highest accuracy followed by that using vegetation indices.
Rate adaptive live video communications over IEEE 802.11 wireless networks
W. Dai, Sachin Patil, Pankaj Topiwala, et al.
Rate adaptivity is an important concept in data intensive applications such as video communications. In this paper, we present an example rate adaptive, live, video communications system over IEEE 802.11 wireless networks, which integrate channel estimation, rate adaptive video encoding, wireless transmission, reception, and playback. The video stream over the wireless network conforms to the RTP payload format for H.264 video, and can be retrieved and displayed by many popular players such as QuickTime and VLC. A live, video-friendly, packet-dispersion-based algorithm will be used to estimate the available bandwidth, which is then used to trigger rate control to achieve rate adaptive video coding on the fly.
Interaction Between Image Processing, Optics, and Photonics
icon_mobile_dropdown
Wavelet-based denoising for 3D OCT images
Optical coherence tomography produces high resolution medical images based on spatial and temporal coherence of the optical waves backscattered from the scanned tissue. However, the same coherence introduces speckle noise as well; this degrades the quality of acquired images. In this paper we propose a technique for noise reduction of 3D OCT images, where the 3D volume is considered as a sequence of 2D images, i.e., 2D slices in depth-lateral projection plane. In the proposed method we first perform recursive temporal filtering through the estimated motion trajectory between the 2D slices using noise-robust motion estimation/compensation scheme previously proposed for video denoising. The temporal filtering scheme reduces the noise level and adapts the motion compensation on it. Subsequently, we apply a spatial filter for speckle reduction in order to remove the remainder of noise in the 2D slices. In this scheme the spatial (2D) speckle-nature of noise in OCT is modeled and used for spatially adaptive denoising. Both the temporal and the spatial filter are wavelet-based techniques, where for the temporal filter two resolution scales are used and for the spatial one four resolution scales. The evaluation of the proposed denoising approach is done on demodulated 3D OCT images on different sources and of different resolution. For optimizing the parameters for best denoising performance fantom OCT images were used. The denoising performance of the proposed method was measured in terms of SNR, edge sharpness preservation and contrast-to-noise ratio. A comparison was made to the state-of-the-art methods for noise reduction in 2D OCT images, where the proposed approach showed to be advantageous in terms of both objective and subjective quality measures.
Improved invariant optical correlations for 3D target detection
Pascuala García-Martínez, José J. Vallés, Javier García, et al.
Invariant optical correlation method for 3D target detection is addressed. Tridimensionality is expressed in terms of range images. The recognition model is based on a vector space representation using an orthonormal image basis. The recognition method proposed is based on the calculation of the angle between the vector associated with a certain 3D object and a vector subspace. Scale and rotation invariant 3D target detection are obtained using the phase Fourier transform (PhFT) of the range images. In fact, when the 3D object is scaled, the PhFT becomes a distribution multiplied by a constant factor. On the other hand a rotation of the 3D object around the z axis, implies a shift in the PhFT. So, changes of scale and rotation of 3D objects are replaced by changes of intensity and position of PhFT distributions. We applied intensity invariant correlation methods for recognition. In addition to tolerance to scale and rotation, high discrimination against false targets is also achieved.
Multidimensional illumination and image processing techniques in the W-band for recognition of concealed objects
Active millimeter wave imaging technology is emerging, which has the potential to yield much more information when one has control over the illumination parameters. Image processing of this kind of images is almost inexistent in literature. In this paper, we propose multidimensional illumination techniques to improve the mm-wave image quality. Multi-angle, multi-frequency, and cross-polarization illuminations were implemented to obtain multidimensional images. Principle Component Analysis (PCA) and clustering analysis were applied to process the results.
Object specific compressed sensing
Compressed sensing holds the promise for radically novel sensors that can perfectly reconstruct images using considerably less samples of data than required by the otherwise general Shannon sampling theorem. In surveillance systems however, it is also desirable to cue regions of the image where objects of interest may exist. Thus in this paper, we are interested in imaging interesting objects in a scene, without necessarily seeking perfect reconstruction of the whole image. We show that our goals are achieved by minimizing a modified L2-norm criterion with good results when the reconstruction of only specific objects is of interest. The method yields a simple closed form analytical solution that does not require iterative processing. Objects can be meaningfully sensed in considerable detail while heavily compressing the scene elsewhere. Essentially, this embeds the object detection and clutter discrimination function in the sensing and imaging process.
Mobile Video
icon_mobile_dropdown
Fast super-resolution reconstructions of mobile video using warped transforms and adaptive thresholding
Multimedia services for mobile phones are becoming increasingly popular thanks to capabilities brought about by location awareness, customized programming, interactivity, and portability. With mounting attraction to these services there is desire to seamlessly expand the mobile multimedia experience to stationary environments where high-resolution displays can offer significantly better viewing conditions. In this paper, we propose a fast, high quality super-resolution algorithm that enables high resolution display of low-resolution video. The proposed algorithm, SWAT, accomplishes sparse reconstructions using directionally warped transforms and spatially adaptive thresholding. Comparisons are made with some existing techniques in terms of PSNR and visual quality. Simulation examples show that SWAT significantly outperforms these techniques while staying within a limited computational complexity envelope.
Complex function estimation using a stochastic classification/regression framework: specific applications to image superresolution
Karl Ni, Truong Q. Nguyen
A stochastic framework combining classification with nonlinear regression is proposed. The performance evaluation is tested in terms of a patch-based image superresolution problem. Assuming a multi-variate Gaussian mixture model for the distribution of all image content, unsupervised probabilistic clustering via expectation maximization allows segmentation of the domain. Subsequently, for the regression component of the algorithm, a modified support vector regression provides per class nonlinear regression while appropriately weighting the relevancy of training points during training. Relevancy is determined by probabilistic values from clustering. Support vector machines, an established convex optimization problem, provide the foundation for additional formulations of learning the kernel matrix via semi-definite programming problems and quadratically constrained quadratic programming problems.
The intensity reduction of ground shadow to deliver better viewing experiences of soccer videos
In this paper, we present a method for reducing the intensity of shadows cast on the ground in outdoor sports videos to provide TV viewers with a better viewing experience. In the case of soccer videos taken by a long-shot camera technique, it is difficult for viewers to discriminate the tiny objects (i.e., soccer ball and player) from the ground shadows. The algorithm proposed in this paper comprises three modules, such as long-shot detection, shadow region extraction and shadow intensity reduction. We detect the shadow region on the ground by using the relationship between Y and U values in YUV color space and then reduce the shadow components depending on the strength of the shadows. Experimental results show that the proposed scheme offers useful tools to provide a more comfortable viewing environment and is amenable to real-time performance even in a software based implementation.
Real-time high definition H.264 video decode using the Xbox 360 GPU
Juan Carlos Arevalo Baeza, William Chen, Eric Christoffersen, et al.
The Xbox 360 is powered by three dual pipeline 3.2 GHz IBM PowerPC processors and a 500 MHz ATI graphics processing unit. The Graphics Processing Unit (GPU) is a special-purpose device, intended to create advanced visual effects and to render realistic scenes for the latest Xbox 360 games. In this paper, we report work on using the GPU as a parallel processing unit to accelerate the decoding of H.264/AVC high-definition (1920x1080) video. We report our experiences in developing a real-time, software-only high-definition video decoder for the Xbox 360.
A cross-layer adaptive handoff algorithm in wireless multimedia environments
Tsungnan Lin, Chiapin Wang
Providing multimedia services in wireless networks is concerned about the performance of handoff algorithms because of the irretrievable property of real-time data delivery. To lessen unnecessary handoffs and handoff latencies which can cause media disruption perceived by users, we present in this paper a cross-layer handoff algorithm base on link quality. Neural networks are used to learn the cross-layer correlation between the link quality estimator such as packet success rate and the corresponding context metric indictors, e.g. the transmitting packet length, received signal strength, and signal to noise ratio. Based on a pre-processed learning of link quality profile, our approach makes handoff decisions intelligently and efficiently with the evaluations of link quality instead of the comparisons between relative signal strength. The experiment and simulation results show that the proposed method outperforms RSS-based handoff algorithms in a transmission scenario of VoIP applications.
Low latency adaptive streaming of HD H.264 video over 802.11 wireless networks with cross-layer feedback
Andrew Patti, Wai-tian Tan, Bo Shen
Streaming video in consumer homes over wireless IEEE 802.11 networks is becoming commonplace. Wireless 802.11 networks pose unique difficulties for streaming high definition (HD), low latency video due to their error-prone physical layer and media access procedures which were not designed for real-time traffic. HD video streaming, even with sophisticated H.264 encoding, is particularly challenging due to the large number of packet fragments per slice. Cross-layer design strategies have been proposed to address the issues of video streaming over 802.11. These designs increase streaming robustness by imposing some degree of monitoring and control over 802.11 parameters from application level, or by making the 802.11 layer media-aware. Important contributions are made, but none of the existing approaches directly take the 802.11 queuing into account. In this paper we take a different approach and propose a cross-layer design allowing direct, expedient control over the wireless packet queue, while obtaining timely feedback on transmission status for each packet in a media flow. This method can be fully implemented on a media sender with no explicit support or changes required to the media client. We assume that due to congestion or deteriorating signal-to-noise levels, the available throughput may drop substantially for extended periods of time, and thus propose video source adaptation methods that allow matching the bit-rate to available throughput. A particular H.264 slice encoding is presented to enable seamless stream switching between streams at multiple bit-rates, and we explore using new computationally efficient transcoding methods when only a high bit-rate stream is available.
Coding and optimization of a fully scalable motion model
Meng-Ping Kao, Truong Nguyen
Motion information scalability is an important requirement for a fully scalable video codec, especially for decoding scenarios of low bit rate or small image size. So far, several scalable coding techniques on motion information have been proposed, including progressive motion vector precision coding and motion vector field layered coding. However, due to the interactive compatibility issue with other scalabilities, i.e. spatial, temporal, and quality, a complete solution that integrates most of the desirable features of motion scalability is not yet seen. In order to solve the problem, we have recently proposed a fully scalable motion model which offers full functionalities for motion scalability with no compatibility issues. In this paper, we further investigate some coding algorithms for proposed scalable motion model. The purpose is to minimize the coding overhead introduced by motion scalability. Simulation results will be presented to verify the significant improvements using proposed coding and optimization algorithms.
IDCT
icon_mobile_dropdown
Standardization of IDCT approximation behavior for video compression: the history and the new MPEG-C parts 1 and 2 standards
This paper presents the history of international standardization activities and specifications relating to the precision of inverse discrete cosine transform (IDCT) approximations used in video compression designs. The evolution of issues relating to IDCT precision and the "drift" effects of IDCT mismatch between encoder modeling of decoder behavior is traced, starting with the initial requirements specified for ITU-T H.261 and continuing through the MPEG-1, H.262/MPEG-2, H.263, MPEG-4 Part 2, and H.264/MPEG-4 Part 10 AVC projects. Finally, the development of the new MPEG-C Part 1 and Part 2 standards is presented. MPEG-C Part 1 contains a centralized repository for specification of IDCT precision conformance requirements for the various MPEG video coding standards prior to MPEG-4 Part 10. MPEG-C Part 2 specifies one particular IDCT approximation for adoption in industry implementations of existing standards. The use of MPEG-C Part 2 by encoders can eliminate IDCT mismatch drift error when decoded using an MPEG-C Part 2 conforming decoder, resulting in a deterministic decoded result. MPEG-C Part 2 also provides an example for guiding implementers on how to design a decoder with high precision and without excessive computational resource requirements.
From 16-bit to high-accuracy IDCT approximation: fruits of single architecture affliation
Lijie Liu, Trac D. Tran, Pankaj Topiwala
In this paper, we demonstrate an effective unified framework for high-accuracy approximation of the irrational co-effcient floating-point IDCT by a single integer-coeffcient fixed-point architecture. Our framework is based on a modified version of the Loeffler's sparse DCT factorization, and the IDCT architecture is constructed via a cascade of dyadic lifting steps and butterflies. We illustrate that simply varying the accuracy of the approximating parameters yields a large family of standard-compliant IDCTs, from rare 16-bit approximations catering to portable computing to ultra-high-accuracy 32-bit versions that virtually eliminate any drifting effect when pairing with the 64-bit floating-point IDCT at the encoder. Drifting performances of the proposed IDCTs along with existing popular IDCT algorithms in H.263+, MPEG-2 and MPEG-4 are also demonstrated.
Analysis and encoder prevention techniques for pathological IDCT drift accumulation in static video scenes
This paper discusses the problem of severe pathological encoder-decoder IDCT drift in video compression coding when using a very small quantization step size and typical encoding techniques to encode video sequences with areas of completely static source content. We suggest that there are two ways to try to address the problem: 1) using a highaccuracy IDCT (or an encoder-matched IDCT) in a decoder design, and 2) using encoder techniques to avoid such drift build-up. The primary problem is asserted to be the encoder's behavior. Effective encoder techniques to avoid the problem are described, including a simple "generalized Morris drift test", which is suggested as being superior to the test currently described in the MPEG-2 video specification. Experiment results are reported to show that performing this test in an encoder will completely solve the problem. Other encoding techniques to address the problem are also discussed.
Drift analysis for integer IDCT
This paper analyzes the drift phenomenon that occurs between video encoders and decoders that employ different implementations of the Inverse Discrete Cosine Transform (IDCT). Our methodology utilizes MPEG-2, MPEG-4 Part 2, and H.263 encoders and decoders to measure drift occurring at low QP values for CIF resolution video sequences. Our analysis is conducted as part of the effort to define specific implementations for the emerging ISO/IEC 23002-2 Fixed-Point 8x8 IDCT and DCT standard. Various IDCT implementations submitted as proposals for the new standard are used to analyze drift. Each of these implementations complies with both the IEEE Standard 1180 and the new MPEG IDCT precision specification ISO/IEC 23002-1. Reference implementations of the IDCT/DCT, and implementations from well-known video encoders/decoders are also employed. Our results indicate that drift is eliminated entirely only when the implementations of the IDCT in both the encoder and decoder match exactly. In this case, the precision of the IDCT has no influence on drift. In cases where the implementations are not identical, then the use of a highly precise IDCT in the decoder will reduce drift in the reconstructed video sequence only to the extent that the IDCT used in the encoder is also precise.
Multiplier-less approximation of the DCT/IDCT with low complexity and high accuracy
This paper presents a straightforward multiplier-less approximation of the forward and inverse Discrete Cosine Transform (DCT) with low complexity and high accuracy. The implementation, design methodology, complexity and performance tradeoffs are discussed. Particular, the proposed IDCT implementations, in spite of simplicity, comply with and can reach far beyond the MPEG IDCT accuracy specification ISO/IEC 23002-1, and also reduce drift favorably compared to other existing IDCT implementations.
An accurate fixed-point 8×8 IDCT algorithm based on 2D algebraic integer representation
Ihab Amer, Wael Badawy, Vassil Dimitrov, et al.
This paper proposes an algorithm that is based on the application of Algebraic Integer (AI) representation of numbers on the AAN fast Inverse Discrete Cosine Transform (IDCT) algorithm. AI representation allows for maintaining an error-free representation of IDCT until the last step of each 1-D stage of the algorithm, where a reconstruction step from the AI domain to the fixed precision binary domain is required. This delay in introducing the rounding error prevents the accumulation of error throughout the calculations, which leads to the reported high-accuracy results. The proposed algorithm is simple and well suited for hardware implementation due to the absence of computationally extensive multiplications. The obtained results confirm the high accuracy of the proposed algorithm compared to other fixed-point implementations of IDCT.
Efficient fixed-point approximations of the 8×8 inverse discrete cosine transform
This paper describes fixed-point design methodologies and several resulting implementations of the Inverse Discrete Cosine Transform (IDCT) contributed by the authors to MPEG's work on defining the new 8x8 fixed point IDCT standard - ISO/IEC 23002-2. The algorithm currently specified in the Final Committee Draft (FCD) of this standard is also described herein.
A full 2D IDCT with extreme low complexity
Antonio Navarro, Antonio Silva, Yuriy Reznik
In the context of a Call for Proposal for integer IDCTs issued by MPEG in July 2005, a full 2D integer IDCT based on a previous Feig and Winograd's work has been proposed. It achieves a high precision by meeting all IEEE1180 conditions and is suitable of implementation on hardware since it can be performed only with shifts and additions. Furthermore, it can be useful in high video resolution scenarios like in 720p/1080i/p due to its feedforward operation mode without any loop as usual in row-column implementations. The proposed transformation can be implemented without changing other functional blocks either at the encoder or at the decoder or alternatively as a scaled version incorporating the scaling factors into the dequantization stage. Our algorithm uses only 1328 operations for 8x8 blocks, including scaling factors.
Low complexity 1D IDCT for 16-bit parallel architectures
This paper shows that using the Loeffler, Ligtenberg, and Moschytz factorization of 8-point IDCT [2] one-dimensional (1-D) algorithm as a fast approximation of the Discrete Cosine Transform (DCT) and using only 16 bit numbers, it is possible to create in an IEEE 1180-1990 compliant and multiplierless algorithm with low computational complexity. This algorithm as characterized by its structure is efficiently implemented on parallel high performance architectures as well as due to its low complexity is sufficient for wide range of other architectures. Additional constraint on this work was the requirement of compliance with the existing MPEG standards. The hardware implementation complexity and low resources where also part of the design criteria for this algorithm. This implementation is also compliant with the precision requirements described in MPEG IDCT precision specification ISO/IEC 23002-1. Complexity analysis is performed as an extension to the simple measure of shifts and adds for the multiplierless algorithm as additional operations are included in the complexity measure to better describe the actual transform implementation complexity.
Processing and Implementation Technologies II
icon_mobile_dropdown
Regularization for designing spectral matched filter target detectors
This paper describes a new adaptive spectral matched filter that incorporates the idea of regularization (shrinkage) to penalize and shrink the filter coefficients to a range of values. The regularization has the effect of restricting the possible matched filters (models) to a subset which are more stable and have better performance than the non-regularized adaptive spectral matched filters. The effect of regularization depends on the form of the regularization term and the amount of regularization is controlled by so called regularization coefficient. In this paper the sum-of-squares of the filter coefficients is used as the regularization term and several different values for the regularization coefficient are tested. A Bayesian-based derivation of the regularized matched filter is also provided. Experimental results for detecting targets in hyperspectral imagery are presented for regularized and non-regularized spectral matched filters.
A rectangular-fit classifier for synthetic aperture radar automatic target recognition
John A. Saghri, Daniel A. Cary
The utility of a rectangular-fit classifier for Synthetic Aperture Radar Automatic Target Recognition (SAR ATR) is examined. The target is fitted with and modeled as a rectangle that can best approximate its boundary. The rectangular fit procedure involves 1) a preprocessing phase to remove the background clutter and noise, 2) a pose detection phase to establish the alignment of the rectangle via a least squares straight line fitting algorithm, and 3) size determination phase via stretching the width and the height dimensions of the rectangle in order to encapsulate a pre-specified, e.g., 90%, of the points in the target. A training set composed of approximately half the total images in the MSTAR public imagery database are used to obtain and record the statistical variations in the width and height of the resulting rectangles for each potential target. The remaining half of the images is then used to assess the performance of this classifier. Preliminary results using minimum Euclidean and Mahalanobis distance classifiers show overall accuracies of 44% and 42%, respectively. Although the classification accuracy is relatively low, this technique can be successfully used in combination with other classifiers such as peaks, edges, corners, and shadow-based classifiers to enhance their performances. A unique feature of the rectangular fit classifier is that it is rotation invariant in its present form. However, observation of the dataset reveals that in general the shapes of the targets in SAR imagery are not fully rotation invariant. Thus, the classification accuracy is expected to improve considerably using multiple training sets, i.e., one training set generated and used for each possible pose. The tradeoff is the increased computation complexity which tends to be offset by ever increasing efficiency and speed of the processing hardware and software. The rectangular fit classifier can also be used as a pose detection routine and/or in conjunction with other ATR schemes, such as shadow-based ATR, that require an initial pose detection phase prior to matching.
Ship detection and classification from overhead imagery
Heidi Buck, Elan Sharghi, Keith Bromley, et al.
This paper presents a sequence of image-processing algorithms suitable for detecting and classifying ships from nadir panchromatic electro-optical imagery. Results are shown of techniques for overcoming the presence of background sea clutter, sea wakes, and non-uniform illumination. Techniques are presented to measure vessel length, width, and direction-of-motion. Mention is made of the additional value of detecting identifying features such as unique superstructure, weaponry, fuel tanks, helicopter landing pads, cargo containers, etc. Various shipping databases are then described as well as a discussion of how measured features can be used as search parameters in these databases to pull out positive ship identification. These are components of a larger effort to develop a low-cost solution for detecting the presence of ships from readily-available overhead commercial imagery and comparing this information against various open-source ship-registry databases to categorize contacts for follow-on analysis.
Identification of degraded fingerprints using PCA- and ICA-based features
Many algorithms have been developed for fingerprint identification. The main challenge in many of the applications remains in the identification of degraded images in which the fingerprints are smudged or incomplete. Fingerprints from the FVC2000 databases have been utilized in this project to develop and implement feature extraction and classification algorithms. Besides the degraded images in the database, artificially degraded images have also been used. In this paper we use features based on PCA (principal component analysis) and ICA (independent component analysis) to identify fingerprints. PCA and ICA reduce the dimensionality of the input image data. PCA- and ICA-based features do not contain redundancies in the data. Different multilayer neural network architectures have been implemented as classifiers. The performance of different features and networks is presented in this paper.
Building verification from geometrical and photometric cues
Damage assessment, change detection or geographical database update are traditionally performed by experts looking for objects in images, a task which is costly, time consuming and error prone. Automatic solutions for building verification are particularly welcome but suffer from illumination and perspective changes. On the other hand, semi-automatic procedures intend to speed up image analysis while limiting human intervention to doubtful cases. We present a semi-automatic approach to assess the presence of buildings in airborne images from geometrical and photometric cues. For each polygon of the vector database representing a building, a score is assigned, combining geometrical and photometric cues. Geometrical cues relate to the proximity, parallelism and coverage of linear edge segments detected in the image while photometric factor measures shadow evidence based on intensity levels in the vicinity of the polygon. The human operator interacts with this automatic scoring by setting a threshold to highlight buildings poorly assessed by image geometrical and photometric features. After image inspection, the operator may decide to mark the polygon as changed or to update the database, depending on the application.
Automatic identification of vehicle license plates
A new algorithm for automatic identification of vehicle license plate is proposed in this paper. The proposed algorithm uses image segmentation and morphological operation to accurately identify the location of license plate with various background illuminations. The license plate is identified in two steps. At first the original image is segmented using edge detection and morphological operations. Then, the power spectrum (PS) is analyzed in horizontal and vertical directions to identify the license plate. The magnitude of the power frequency spectrum shows special characteristics corresponding to the license plate segment. The proposed algorithm is tested with different gray level car images from different angles of view and the results are all consistent. The proposed algorithm is fast and can effectively identify license plates under various illumination conditions with high accuracy.
Speckle reduction from digital holograms by simulating temporal incoherence
Speckle is an inherent characteristic of coherent imaging systems. Often, as in the case of Ultrasound, Synthetic Aperture Radar, Laser Imaging and Holography, speckle is a source of noise and degrades the reconstructed image. Various methods exist for the removal of speckle in such images. One method, which has received attention for the removal of speckle from coherent imaging, is to use a temporally incoherent source. We create a novel digital signal processing technique for the reduction of speckle from digital holograms by simulating temporal incoherence during the digital reconstruction process. The method makes use of the discrete implementation of the Fresnel Transform, which calculates the reconstructed image for a range of different wavelengths. These different spectral components can be weighted to suit a temporally incoherent source and the intensities from each wavelength are added together. The method is examined using the speckle index metric.
Workshop on Optics in Entertainment
icon_mobile_dropdown
Optical systems in entertainment
Olha Malinochka, Volodymyr Kojemiako
The information revolution, which was held in XIX centuries, considerably lifts intelligence of the man due to storming development and accumulation of base of knowledge of mankind. Time comes and volumes of knowledge and information grow in a geometrical progression, which requires of mankind of the increasing improvement of the knowledge, which in turn will improve the technologies, developed by them, for maintenance of comfortable and productive ability to live. During the certain time the mankind subjectively separated concepts of energy and information. But the objective development provides by itself knowledge installed as single unit, where all is interdependent and interconnected. Such understanding installed deduces mankind in a new plane of intelligence, which derivates by it new tasks and opens new prospects.
Performance improvements in back panel display lighting using near-Lambertian diffuse high-reflectance materials
Bryn Waldwick, James E. Leland, Christina Chase, et al.
LCD backlighting applications require diffuse illumination over an extended area of a display unit while maintaining high luminance levels. Since such applications involve multiple reflections within a reflective cavity, the efficiency of the cavity can be affected significantly by relatively small changes in the reflectance of the cavity material. Materials with diffuse rather than specular (or mirror-like) reflectance scatter light, averaging out hot spots and providing a uniform field of illumination. Reflectors with specular components tend to propagate non-uniformities in the illuminator system. The result is a spatial variation in brightness visible to the viewer of the display. While the undesirability of specular materials for such applications has been widely recognized, some diffuse materials in common use exhibit a significant specular component. This paper describes a method for measuring the specular component of such materials, and presents a simple approach to evaluating the effect of such secondary specular behavior on the performance of a backlight cavity. It is demonstrated that significant differences exist among available diffuse reflectance materials, and that these differences can lead to significant differences in the performance of the displays in which these materials are used.
Tele-counseling and social-skill trainings using JGNII optical network and a mirror-interface system
Sayuri Hashimoto, Nobuyuki Hashimoto, Akira Onozawa, et al.
"Tele-presence" communication using JGNII - an exclusive optical-fiber network system - was applied to social-skills training in the form of child-rearing support. This application focuses on internet counseling and social training skills that require interactive verbal and none-verbal communications. The motivation for this application is supporting local communities by constructing tele-presence education and entertainment systems using recently available, inexpensive IP networks. This latest application of tele-presence communication uses mirror-interface system which provides to users in remote locations a shared quasi-space where they can see themselves as if they were in the same room by overlapping video images from remote locations.
Examples of subjective image quality enhancement in multimedia
Miloš Klíma, Jiří Pazderák, Karel Fliegel
The subjective image quality is an important issue in all multimedia imaging systems with a significant impact onto QoS (Quality of Service). For long time the image fidelity criterion was widely applied in technical systems esp. in both television and image source compression fields but the optimization of subjective perception quality and fidelity approach (such as the minimum of MSE) are very different. The paper presents an experimental testing of three different digital techniques for the subjective image quality enhancement - color saturation, edge enhancement, denoising operators and noise addition - well known from both the digital photography and video. The evaluation has been done for extensive operator parameterization and the results are summarized and discussed. It has been demonstrated that there are relevant types of image corrections improving to some extent the subjective perception of the image. The above mentioned techniques have been tested for five image tests with significantly different image characteristics (fine details, large saturated color areas, high color contrast, easy-to-remember colors etc.). The experimental results show the way to optimized use of image enhancing operators. Finally the concept of impressiveness as a new possible expression of subjective quality improvement is presented and discussed.
Poster Session
icon_mobile_dropdown
Optical resources for highly secure remote object authentication
We review the potential of optical techniques in security tasks and propose to combine some of them in automatic authentication. More specifically, we propose to combine visible and near infrared imaging, optical decryption, distortion-invariant ID tags, optoelectronic devices, coherent image processor, optical correlation, and multiple authenticators. A variety of images and signatures, including biometric and random sequences, can be combined in an optical ID tag for multifactor identification. Encryption of the information codified in the ID tag allows increasing security and deters from unauthorized usage of optical tags. The identification process encompasses several steps such as detection, information decoding and verification which are all detailed in this work. Design of rotation and scale invariant ID tags is taken into account to achieve a correct authentication even if the ID tag is captured in different positions. Resistance to some noise and degradation of the tag is analyzed. Examples and experiments are provided and the results discussed.
Bayesian approach to the thermally generated charge elimination
It is generally known that every astronomical image, which was acquired by CCD sensor, has to be corrected by dark frame. The Dark frame maps the thermally generated charge of the CCD. May become that the dark frame image is not available and it is impossible to correct the astronomical images directly. It is good to note that uncorrected images are not suitable for subsequent investigation. There are simple nonlinear filtering methods, e.g. median filtering, but the obtained results are not so satisfactory. During the recent year the algorithms for the thermally generated charge elimination were proposed. All these algorithms use the Discrete Wavelet Transform (DWT). The DWT transforms image into different frequency bands. Wavelet coefficients histogram should be modeled by generalized Laplacian probability density function (PDF). The Laplacian parameters were estimated by moment method using derived equation system. Furthermore, the images, where the thermally generated charge was suppressed, were estimated using Bayesian estimators. This algorithm will be in the future improved, but now to the promising eliminating algorithm should be involved.
Make it easy: Automatic pictogram generation system enables everybody to design illustrations by computer-aided technology
Mariko Adachi, Takashi Ishihara, Kunio Sakamoto
We developed a prototype design support system for generating illustration of a pictogram. A pictogram is a symbol representing a concept, object, activity, place or event by illustration. This pictogram is produced by the designer who gets an illustration more careful consideration and completes it after repeated trial and error. Thus to design pictograms is complicated process, and it is difficult for the general public to produce new design. This paper describes the automatic design system of a pictogram such that everybody can design it easily by combination of basic illustrations.
Development of air touch interface for floating 3D image in the air
Hiroyuki Fukuda, Hiroyuki Morimoto, Kunio Sakamoto
We developed a prototype virtual air touch interface system for interaction in the virtual 3D space. The spatial imaging display system provides the observer virtual 3D objects. These 3D images are floating in the air and one can directly touch objects or virtual images. To take mutual action we need to prepare the interface system which can recognize that the user moves his hand near the virtual objects. Because a conventional touch-panel system detects the user's operation on the display screen but the touching point differs from the actual displaying space, it is important to realize that the user can operate at the same space. A typical method is to use the computer vision. In this paper, the authors propose the interface system using a theremin which is a musical instrument having the unusual aspect of being controlled by the performer's hand motions near the antennas.
Video viewing browser enables to playback movie contents reproduced by using scene scenario in real-time
Takashi Ishihara, Koji Uchida, Kunio Sakamoto
The authors developed a prototype video viewing browser. Our video viewer has a function to playback movies on the WWW according to the playing scenario. This scenario makes new scenes from original movies. Our video browser features this scene scenario where you can arrange movie's video clips, insert transition effects, apply colored backgrounds, or add captions and titles. The video movie contents on the WWW are copyrighted. The browser cannot alter web's movie contents owing to its copyright like that a conventional video editing software adds effects to the original. The editing software produces reproductions, but our browser doesn't. The browser adds effects according to the scenario and only shows us a new scene. The scene scenario is written in an XML-like script. The video browser has a function to give effect according to operations of the scenario.
Pattern recognition with an adaptive generalized SDF filter
Most of captured images present degradations due to blurring and additive noise; moreover objects of interest can be geometrically distorted. The classical methods for pattern recognition based on correlation are very sensitive to intensity degradations and geometric distortions. In this work, we propose an adaptive generalized filter based on synthetic discriminant function (SDF). With the help of computer simulation we analyze and compare the performance of the adaptive correlation filter with that of common correlation filters in terms of discrimination capability and accuracy of target location when input scenes are degraded and a target is geometrically distorted.
Research of the camera calibration based on digital image processing
Lingjia Gu, Shuxu Guo, Dan Zhang, et al.
The paper discusses the key problem of measurement accuracy in a precise photo electronic measurement system, through combining the camera calibration method based on computer active vision with digital image processing technology, a method for calibrating camera internal parameters is proposed in the paper. The optics high accuracy theodolite and CCD subdivision measurement are combined in the method, and the least squares method is also used to determined the camera internal parameters under the optimum condition. The experimental results indicate: Compare with traditional camera calibration method, the operation of the camera calibration method is simple, the calibration speed is quick, and the applicable scope is broad, it can solve the problem of the distortion of CCD camera well.
The new methods for registration and integration of range images
With the improvements in range image acquisition by optical metrology of our group, we also developed a novel method for the registration and integration of range images. The registration approach is based on texture-feature recognition. Texture-feature pairs in two texture images are identified by cross-correlation, and the validity-checking is implemented through Hausdorff distance comparison. The correspondence between the texture image and range image helped acquire the range point-pairs, and the initial transformation of two range images was computed by least-squares technique. With this initial transformation, the fine registration was achieved by ICP algorithm. The integration of the registered range images is based on ray casting. An axis-aligned bounding box for all range images is computed. Three bundles of uniform-distributed rays are cast and pass through the faces of the box along three orthogonal coordinate axes respectively. The intersections between the rays and the range images are computed and stored in Dexels. The KD-tree structure is used to accelerate computation. Those data points in overlapped region are identified with specific criteria based on the distance and the angle of normals. We can obtain a complete non-redundant digital model after removing the overlapped points. The experimental results illustrate the efficiency of the method in reconstructing the whole three dimensional objects.
Pattern recognition with adaptive nonlinear filters
In this paper, adaptive nonlinear correlation-based filters for pattern recognition are presented. The filters are based on a sum of minima correlations. To improve the recognition performance of the filters in presence of false objects and geometric distortions, information about the objects is used to synthesize the filters. The performance of the proposed filters is compared to that of the linear synthetic discriminant function filters in terms of noise robustness and discrimination capability. Computer simulation results are provided and discussed.
Color component cross-talk pixel SNR correction method for color imagers
A simple multi-channel imager restoration method is presented in this paper. A method is developed to correct channel dependent cross-talk of a Bayer color filter array sensor with signal-dependent additive noise. We develop separate cost functions (weakened optimization) for each color channel-to-color channel component. Regularization is applied to each color component, instead of the standard per color channel basis. This separation of color components allows us to calculate regularization parameters that take advantage of the differing magnitudes of each color component cross-talk blurring. Due to a large variation in the amount of blurring for each color component, this separation can result in an improved trade-off between inverse filtering and noise smoothing. The restoration solution has its regularization parameters determined by maximizing the developed local pixel SNR estimations (HVS detection constraint). Local pixel adaptivity is applied. The total error in the corrected signal estimate (from bias error and amplified noise variance) is used in the local pixel SNR estimates. Sensor characterization a priori information is utilized. The method is geared towards implementation into the on-chip digital logic of low-cost CMOS sensors. Performance data of the proposed correction method is presented using color images captured from low cost embedded imaging CMOS sensors.
Holographic and weak-phase projection system for 3D shape reconstruction using temporal phase unwrapping
C. A. González, A. Dávila, G. Garnica
Two projection systems that use an LCoS phase modulator are proposed for 3D shape reconstruction. The LCoS is used as an holographic system or as a weak phase projector, both configurations project a set of fringe patterns that are processed by the technique known as temporal phase unwrapping. To minimize the influence of camera sampling, and the speckle noise in the projected fringes, an speckle noise reduction technique is applied to the speckle patterns generated by the holographic optical system. Experiments with 3D shape reconstruction of ophthalmic mold and other testing specimens show the viability of the proposed techniques.
Imagery-derived modulation transfer function and its applications for underwater imaging
Weilin Hou, Alan D. Weidemann, Deric J. Gray, et al.
The main challenge working with underwater imagery results from both rapid decay of signals due to absorption, which leads to poor signal to noise returns, and the blurring caused by strong scattering by the water itself and constituents within, especially particulates. The modulation transfer function (MTF) of an optical system gives the detailed and precise information regarding the system behavior. Underwater imageries can be better restored with the knowledge of the system MTF or the point spread function (PSF), the Fourier transformed equivalent, extending the performance range as well as the information retrieval from underwater electro-optical system. This is critical in many civilian and military applications, including target and especially mine detection, search and rescue, and diver visibility. This effort utilizes test imageries obtained by the Laser Underwater Camera Imaging Enhancer (LUCIE) from Defense Research and Development Canada (DRDC), during an April-May 2006 trial experiment in Panama City, Florida. Imaging of a standard resolution chart with various spatial frequencies were taken underwater in a controlled optical environment, at varying distances. In-water optical properties during the experiment were measured, which included the absorption and attenuation coefficients, particle size distribution, and volume scattering function. Resulting images were preprocessed to enhance signal to noise ratio by averaging multiple frames, and to remove uneven illumination at target plane. The MTF of the medium was then derived from measurement of above imageries, subtracting the effect of the camera system. PSFs converted from the measured MTF were then used to restore the blurred imageries by different deconvolution methods. The effects of polarization from source to receiver on resulting MTFs were examined and we demonstrate that matching polarizations do enhance system transfer functions. This approach also shows promise in deriving medium optical properties including absorption and attenuation.
Local adaptive image processing in a sliding transform domain
A local adaptive image processing on a sliding discrete transform is presented. The local restoration technique is performed by pointwise modification of local discrete transform coefficients. To provide image processing at a high rate, a fast recursive algorithm for computing the sliding transform is utilized. The algorithm is based on a recursive relationship between three subsequent local spectra. Computer simulation results using real images are provided and compared with that of common restoration techniques.
Compressed domain statistical snake segmentation for real-time tracking of objects in airborne videos
We present a new compressed domain method for tracking objects in airborne videos. In the proposed scheme, a statistical snake is used for object segmentation in I-frames, and motion vectors extracted from P-frames are used for tracking the object detected in I-frames. It is shown that the energy function of the statistical snake can be obtained directly from the compressed DCT coefficients without the need of full decompression. The number of snake deformation iterations can be also significantly reduced in compressed domain implementation. The computational cost is significantly reduced by using compressed domain processing while the performance is competitive to that of pixel domain processing. The proposed method is tested using several UAV video sequences, and experiments show that the tracking results are satisfactory.
Hyperspectral endmember detection based on strong lattice independence
The advances in image spectroscopy have been applied for Earth observation at different wavelengths of the electromagnetic spectrum using aircrafts or satellite systems. This new technology, known as hyperspectral remote sensing, has found many applications in agriculture, mineral exploration and environmental monitoring since images acquired by these devices register the constituent materials in hundred of spectral bands. Each pixel in the image contains the spectral information of the zone. However, processing these images can be a difficult task because the spatial resolution of each pixel is in the order of meters, an area of such size that can be composed of different materials. The following research presents an alternative methodology to detect pixels in the image that best represent the spectrum of one material with as little contamination of any other as possible. The detection of these pixels, also called endmembers, represents the first step for image segmentation and is based on morphological autoassociative memories and the property of strong lattice independence between patterns. Morphological associative memories and strong lattice independence are concepts based on lattice algebra. Our procedure subdivides a hyperspectral image into regions looking for sets of strong lattice independent pixels. These patterns will be identified as endmembers and will be used for the construction of abundance maps.
Comparison of different illumination arrangements on capillary image quality in nail-fold
Chih-Chieh Wu, Kang-Ping Lin, Being-Tau Chung
The purpose of this study was to investigate which illumination arrangement can provide the highest image quality when using a non-invasive cutaneous imaging system to observe capillaries in nail-fold. We captured the microcirculation images with and without a band-pass filter which wavelength is 556 ± 10 nm in front of a 150 Watt halogen lamp. Furthermore, we varied the illumination angle form 90 degree (co-axial light) to 20 degree to evaluate the image quality under different light source arrangements. The image registration function was used to solve the image movement problem which is due to breathing or slightly movement of volunteers or the imaging device. The contrast-to-noise ratio (CNR) is an evaluation factor to quantify the image quality. The dynamic search method was used to find out the skeleton of a vessel and define the foreground and background parameters for calculating contrast. A Gaussian smooth filter was applied to the original images and noise was resulted from differentiating the coefficient of variance (CV) of the original and processed images. As a result, using a green filter in front of a lamp presents the highest contrast-to-noise ratio when the illumination angle is 20 degree. By normalizing the highest CNR to 1, the CNR of other illumination conditions are 0.49 (without filter, 20 degree), 0.39 (with filter, 90 degree) and 0.28 (without filter, 90 degree), respectively. It is concluded that using a green light source with an illuminating angle of 20 degree will provide better image quality than other arrangements.
Removing foreground objects by using depth information from multi-view images
In this paper, we present a novel method for removing foreground objects in multi-view images. Unlike the conventional methods, which locate the foreground objects interactive way, we intend to develop an automated system. The proposed algorithm consists of two modules: 1) object detection and removal, and 2) detected foreground filling stage. The depth information of multi-view images is a critical cue adopted in this algorithm. By multi-view images, it is not meant a multi-camera equipped system. We use only one digital camera and take photos by hand. Although it may cause bad matching result, it is sufficient to detect and remove the foreground object by using coarse depth information. The experimental results indicate that the proposed algorithm provides an effective tool, which can be used in applications for digital camera, photo-realistic scene generation, digital cinema and so on.
Still image compression using cubic spline interpolation with bit-plane compensation
Tsung-Ching Lin, Shi-Huang Chen, Trieu-Kien Truong, et al.
In this paper, a modified image compression algorithm using cubic spline interpolation (CSI) and bit-plane compensation is presented for low bit-rate transmission. The CSI is developed in order to subsample image data with minimal distortion and to achieve image compression. It has been shown in literatures that the CSI can be combined with the JPEG or JPEG2000 algorithm to develop a modified JPEG or JPEG2000 CODEC, which obtains a higher compression ratio and better quality of reconstructed images than the standard JPEG and JPEG2000 CODECs in low bit-rate range. This paper implements the modified JPEG algorithm, applies bit-plane compensation and tests a few images. Experimental results show that the proposed scheme can increase 25~30% compression ratio of original JPEG data compression system with similar visual quality in low bit-rate range. This system can reduce the loading of telecommunication networks and is quite suitable for low bit-rate transmission.
Blind image quality assessment considering blur, noise, and JPEG compression distortions
Erez Cohen, Yitzhak Yitzhaky
The quality of images may be severely degraded in various situations such as imaging during motion, sensing through a diffusive medium, high compression rate and low signal to noise. Often in such cases, the ideal un-degraded image is not available (no reference exists). This paper overviews past methods that dealt with no-reference (NR) image quality assessment, and then proposes a new NR method for the identification of image distortions and quantification of their impacts on image quality. The proposed method considers both noise and blur distortion types that individually or simultaneously exist in the image. Distortion impacts on image quality are evaluated in the spatial frequency domain, while noise power is further estimated in the spatial domain. Specific distortions addressed here include additive white noise, Gaussian blur, de-focus blur, and JPEG compression. Estimation results are compared to the true distortion quantities, over a set of 75 different images.
2D to 3D stereoscopic conversion: depth-map estimation in a 2D single-view image
With increasing demands of 3D contents, conversion of many existing two-dimensional contents to three-dimensional contents has gained wide interest in 3D image processing. It is important to estimate the relative depth map in a single-view image for the 2D-To-3D conversion technique. In this paper, we propose an automatic conversion method that estimates the depth information of a single-view image based on degree of focus of segmented regions and then generates a stereoscopic image. Firstly, we conduct image segmentation to partition an image into homogeneous regions. Then, we construct a higher-order statistics (HOS) map, which represents the spatial distribution of high-frequency components of the input image. the HOS is known to be well suited for solving detection and classification problems because it can suppress Gaussian noise and preserve some of non-Gaussian information. We can estimate a relative depth map with these two cues and then refine the depth map by post-processing. Finally, a stereoscopic image is generated by calculating the parallax values of each region using the generated depth-map and the input image.
Contribution of image analysis to the definition of explosibility of fine particles resulting from waste recycling process
In waste recycling processes, the development of comminution technologies is one of the main actions to improve the quality of recycled products. This involves a rise in fine particles production, which could have some effects on explosibility properties of materials. This paper reports the results of experiments done to examine the explosibility of the fine particles resulting from waste recycling process. Tests have been conducted for the products derived from milling processes operated in different operative conditions. In particular, the comminution tests have been executed varying the milling temperature by refrigerant agents. The materials utilized in explosibility tests were different typologies of plastics coming from waste products (PET, ABS and PP), characterized by size lower than 1 mm. The results of explosibility tests, carried out by mean of a Hartmann Apparatus, have been compared with the data derived from image analysis procedure aimed to measure the morphological characteristics of particles. For each typology of material, the propensity to explode appears to be correlated not only to particle size, but also to morphological properties, linked to the operative condition of the milling process.
Watershed data aggregation for mean-shift video segmentation
Object segmentation is considered as an important step in video analysis and has a wide range of practical applications. In this paper we propose a novel video segmentation method, based on a combination of watershed segmentation and mean-shift clustering. The proposed method segments video by clustering spatio-temporal data in a six-dimensional feature space, where the features are spatio-temporal coordinates and spectral attributes. The main novelty is an efficient data aggregation method employing watershed segmentation and local feature averaging. The experimental results show that the proposed algorithm significantly reduces the processing time by mean-shift algorithm and results in superior video segmentation where video objects are well defined and tracked throughout the time.
Image blur analysis for the subpixel-level measurement of in-plane vibration parameters of MEMS resonators
Ha Vu Le, Michele Gouiffes, Fabien Parrain, et al.
The objective of this work is to develop a reliable image processing technique to measure the vibration parameters on every part of MEMS resonators using microscopic images of the vibrating devices. Images of resonators vibrating in high frequencies are characterized by the blurs whose point spread functions (PSFs) are expressed in a parametric form with two parameters - vibration orientation and magnitude. We find it necessary to use the reference image (image of the still object) when analyzing the blur image, to achieve a subpixel-level accuracy. The orientation of the vibration is identified by applying the Radon transform on the difference between the reference image and the blur image. A blur image is usually modeled as a convolution of the PSF of the vibration with the reference image and added noise terms, assuming uniform vibration across the view. The vibration magnitude could then be recovered by using a minimum mean-squared error (MMSE) estimator to find the optimal PSF with the identified orientation. However, in real images only parts of the image belong to the vibrating object and the vibration may not be uniform over all parts of it. To overcome that problem, we use local optimization with a mean of weighted squared errors (MWSE) as the cost function instead of MSE. Indeed, it is capable of suppressing non-vibrating high-frequency components of the image. Sensitivity analysis and experiments on real images have been performed.
Validation of training set approaches to hyperparameter estimation for Bayesian tomography
Since algorithms based on Bayesian approaches contain hyperparameters associated with the mathematical model for the prior probability, the performance of algorithms usually depends crucially on the values of these parameters. In this work we consider an approach to hyperparameter estimation for Bayesian methods used in the medical imaging application of emission computed tomography (ECT). We address spline models as Gibbs smoothing priors for our own application to ECT reconstruction. The problem of hyperparameter (or smoothing parameter in our case) estimation can be stated as follows: Given a likelihood and prior model, and given a realization of noisy projection data from a patient, compute some optimal estimate of the smoothing parameter. Among the variety of approaches used to attack this problem in ECT, we base our maximum-likelihood (ML) estimates of smoothing parameters on observed training data, and argue the motivation for this approach. To validate our ML approach, we first perform closed-loop numerical experiments using the images created by Gibbs sampling from the given prior probability with the smoothing parameter known. We then evaluate performance of our method using mathematical phantoms and show that the optimal estimates yield good reconstructions. Our initial results indicate that the hyperparameters obtained from training data perform well with regard to percentage error metric.
Local bivariate Cauchy distribution for video denoising in 3D complex wavelet domain
Hossein Rabbani, Mansur Vafadust, Ivan Selesnick
In this paper, we present a new video denoising algorithm using bivariate Cauchy probability density function (pdf) with local scaling factor for distribution of wavelet coefficients in each subband. The bivariate pdf takes into account the statistical dependency among wavelet coefficients and the local scaling factor model the empirically observed correlation between the coefficient amplitudes. Using maximum a posteriori (MAP) estimator and minimum mean squared estimator (MMSE), we describe two methods for video denoising which rely on the bivariate Cauchy random variables with high local correlation. Because separate 3-D transforms, such as ordinary 3-D wavelet transforms (DWT), have artifacts that degrade their performance for denoising, we implement our algorithms in 3-D complex wavelet transform (DCWT) domain. In addition, we use our denoising algorithm in 2-D DCWT domain, where the 2-D transform is applied to each frame individually. The simulation results show that our denoising algorithms achieve better performance than several published methods both visually and in terms of peak signal-to-noise ratio (PSNR).
Local area signal-to-noise ratio (LASNR) algorithm for image segmentation
Laura Mascio Kegelmeyer, Philip W. Fong, Steven M. Glenn, et al.
Many automated image-based applications have need of finding small spots in a variably noisy image. For humans, it is relatively easy to distinguish objects from local surroundings no matter what else may be in the image. We attempt to capture this distinguishing capability computationally by calculating a measurement that estimates the strength of signal within an object versus the noise in its local neighborhood. First, we hypothesize various sizes for the object and corresponding background areas. Then, we compute the Local Area Signal to Noise Ratio (LASNR) at every pixel in the image, resulting in a new image with LASNR values for each pixel. All pixels exceeding a pre-selected LASNR value become seed pixels, or initiation points, and are grown to include the full area extent of the object. Since growing the seed is a separate operation from finding the seed, each object can be any size and shape. Thus, the overall process is a 2-stage segmentation method that first finds object seeds and then grows them to find the full extent of the object. This algorithm was designed, optimized and is in daily use for the accurate and rapid inspection of optics from a large laser system (National Ignition Facility (NIF), Lawrence Livermore National Laboratory, Livermore, CA), which includes images with background noise, ghost reflections, different illumination and other sources of variation.
Recovery of data from damaged CD/DVD
Dan E. Tamir, Wilbon Davis, Larry Wolfe, et al.
This paper presents a novel system for automatic data recovery from damaged CD (or DVD). The system acquires a sequence of optically magnified high resolution digital images of partially overlapping CD-surface regions. Next, advanced image processing and pattern recognition techniques are used to extract the data encoded on the CD from image frames. Finally, forensic data recovery techniques are applied to provide the maximal usable CD data. Using the CD's error correction information, the entire data of a non-damaged CD can be extracted with 100% accuracy. However, if an image frame covers a damaged area, then the data encoded in the frame and some of its eight neighbor image frames may be compromised. Nevertheless, the effect of the frame overlapping, error correction code, forensic data recovery, and data fusion techniques can maximize the amount of data extracted from compromised frames. The paper analyzes low level image processing techniques, compromised frames scenarios, and data recovery results. An analytical model backed by experimental set up shows that there is high probability of recovering data despite certain damages. Current results should be of high interest to law enforcement and home-land-security agencies. They merit further research and investigation to cover additional applications of image processing of CD frames, such as encryption, parallel access, and zero-seek-time.