Show all abstracts
View Session
- Front Matter: Volume 6696
- Video and Image Technologies
- Processing and Implementation Technologies I
- Interaction Between Image Processing, Optics, and Photonics
- Mobile Video
- IDCT
- Processing and Implementation Technologies II
- Workshop on Optics in Entertainment
- Poster Session
Front Matter: Volume 6696
Front Matter: Volume 6696
Show abstract
This PDF file contains the front matter associated with SPIE
Proceedings Volume 6696, including the Title Page, Copyright
information, Table of Contents, and the
Conference Committee listing.
Video and Image Technologies
A comparative study of JPEG2000, AVC/H.264, and HD photo
Show abstract
In this paper, we report a study evaluating rate-distortion performance between JPEG 2000, AVC/H.264 High 4:4:4 Intra
and HD Photo. A set of ten high definition color images with different spatial resolutions has been used. Both the PSNR
and the perceptual MSSIM index were considered as distortion metrics. Results show that, for the material used to carry
out the experiments, the overall performance, in terms of compression efficiency, are quite comparable for the three
coding approaches, within an average range of ±10% in bitrate variations, and outperforming the conventional JPEG.
Complexity modeling for context-based adaptive binary arithmetic coding (CABAC) in H.264/AVC decoder
Show abstract
One way to save the power consumption in the H.264 decoder is for the H.264 encoder to generate decoderfriendly
bit streams. By following this idea, a decoding complexity model of context-based adaptive binary
arithmetic coding (CABAC) for H.264/AVC is investigated in this research. Since different coding modes will
have an impact on the number of quantized transformed coeffcients (QTCs) and motion vectors (MVs) and, consequently,
the complexity of entropy decoding, the encoder with a complexity model can estimate the complexity
of entropy decoding and choose the best coding mode to yield the best tradeoff between the rate, distortion
and decoding complexity performance. The complexity model consists of two parts: one for source data (i.e.
QTCs) and the other for header data (i.e. the macro-block (MB) type and MVs). Thus, the proposed CABAC
decoding complexity model of a MB is a function of QTCs and associated MVs, which is verified experimentally.
The proposed CABAC decoding complexity model can provide good estimation results for variant bit streams.
Practical applications of this complexity model will also be discussed.
Low-complexity MPEG-2 to H.264 transcoding
Show abstract
In this paper, two systems for low-complexity MPEG-2 to H.264 transcoding are presented. Both approaches reuse the
MPEG-2 motion information in order to avoid computationally expensive H.264 motion estimation. In the first approach,
inter- and intra-coded macroblocks are treated separately. Since H.264 applies intra-prediction, while MPEG-2 does not,
intra-blocks are completely decoded and re-encoded. For inter-coded macroblocks, the MPEG-2 macroblock types and
motion vectors are first converted to their H.264 equivalents. Thereafter, the quantized DCT coefficients of the
prediction residuals are dequantized and translated to equivalent H.264 IT coefficients using a single-step DCT-to-IT
transform. The H.264 quantization of the IT coefficients is steered by a rate-control algorithm enforcing a constant bit-rate.
While this system is computationally very efficient, it suffers from encoder-decoder drift due to its open-loop
structure.
The second transcoding solution eliminates encoder-decoder drift by performing full MPEG-2 decoding followed by
rate-controlled H.264 encoding using the motion information present in the MPEG-2 source material. This closed-loop
solution additionally allows dyadic resolution scaling by performing downscaling after the MPEG-2 decoding and
appropriate MPEG-2 to H.264 macroblock type and motion vector conversion.
Experimental results show that, in terms of PSNR, the closed-loop transcoder significantly outperforms the open-loop
solution. The latter introduces drift, mainly as a result of the difference in sub-pixel interpolation between H.264 and
MPEG-2. Complexity-wise, the closed-loop transcoder requires on average 30 % more processing time than the openloop
system. The closed-loop transcoder is shown to deliver compression performance comparable to standard MPEG-2
encoding.
PixonVision real-time video processor
Show abstract
PixonImaging LLC and DigiVision, Inc. have developed a real-time video processor, the PixonVision PV-200, based on
the patented Pixon method for image deblurring and denoising, and DigiVision's spatially adaptive contrast
enhancement processor, the DV1000. The PV-200 can process NTSC and PAL video in real time with a latency of 1
field (1/60th of a second), remove the effects of aerosol scattering from haze, mist, smoke, and dust, improve spatial
resolution by up to 2x, decrease noise by up to 6x, and increase local contrast by up to 8x. A newer version of the
processor, the PV-300, is now in prototype form and can handle high definition video. Both the PV-200 and PV-300 are
FPGA-based processors, which could be spun into ASICs if desired. Obvious applications of these processors include
applications in the DOD (tanks, aircraft, and ships), homeland security, intelligence, surveillance, and law enforcement.
If developed into an ASIC, these processors will be suitable for a variety of portable applications, including gun sights,
night vision goggles, binoculars, and guided munitions. This paper presents a variety of examples of PV-200 processing,
including examples appropriate to border security, battlefield applications, port security, and surveillance from
unmanned aerial vehicles.
Performance evaluation of H.264/AVC decoding and visualization using the GPU
Show abstract
The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has
limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with
powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be
addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as
generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA
provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In
CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model.
This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation
(MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined
with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new
CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results
compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU
requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an
example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based
H.264/AVC renderers.
Video error concealment with outer and inner boundary matching algorithms
Show abstract
Low-complexity error concealment techniques for missing macroblock (MB) recovery in mobile video delivery based on
the boundary matching principle is extensively studied and evaluated in this work. We first examine the boundary
matching algorithm (BMA) and the outer boundary matching algorithm (OBMA) due to their excellent trade-off in
complexity and visual quality. Their good performance is explained, and additional experiments are given to identify
their strengths and weaknesses. Then, two more extensions of OBMA are presented. One is obtained by extending the
search pattern for performance improvement at the cost of additional complexity. The other is based on the use of
multiple overlapped outer boundary layers.
New quality metrics for digital image resizing
Show abstract
Digital image rescaling by interpolation has been intensively researched over past decades, and still getting constant
attention from many applications such as medical diagnosis, super-resolution, image blow-up, nano-manufacturing, etc.
However, there are no consented metrics to objectively assess and compare the quality of resized images. Some existing
measures such as peak-signal-to-noise ratio (PSNR) or mean-squared error (MSE), widely used in image restoration
area, do not always coincide with the opinions from viewers. Enlarged digital images generally suffer from two major
artifacts: blurring, zigzagging, and those undesirable effects especially around edges significantly degrade the overall
perceptual image quality. We propose two new image quality metrics to measure the degree of the two major defects,
and compare several existing interpolation methods using the proposed metrics. We also evaluate the validity of image
quality metrics by comparing rank correlations.
Compressed-domain motion detection for efficient and error-resilient MPEG-2 to H.264 transcoding
Show abstract
In this paper, a novel compressed-domain motion detection technique, operating on MPEG-2-encoded video, is
combined with H.264 flexible macroblock ordering (FMO) to achieve efficient, error-resilient MPEG-2 to H.264
transcoding. The proposed motion detection technique first extracts the motion information from the MPEG-2-encoded
bit-stream. Starting from this information, moving regions are detected using a region growing approach. The
macroblocks in these moving regions are subsequently encoded separately from those in background regions using FMO.
This can be used to increase error resilience and/or to realize additional bit-rate savings compared to traditional
transcoding.
HD Photo: a new image coding technology for digital photography
Show abstract
This paper introduces the HD Photo coding technology developed by Microsoft Corporation. The storage format for this
technology is now under consideration in the ITU-T/ISO/IEC JPEG committee as a candidate for standardization under
the name JPEG XR. The technology was developed to address end-to-end digital imaging application requirements,
particularly including the needs of digital photography. HD Photo includes features such as good compression capability,
high dynamic range support, high image quality capability, lossless coding support, full-format 4:4:4 color sampling,
simple thumbnail extraction, embedded bitstream scalability of resolution and fidelity, and degradation-free compressed domain
support of key manipulations such as cropping, flipping and rotation. HD Photo has been designed to optimize
image quality and compression efficiency while also enabling low-complexity encoding and decoding implementations.
To ensure low complexity for implementations, the design features have been incorporated in a way that not only
minimizes the computational requirements of the individual components (including consideration of such aspects as
memory footprint, cache effects, and parallelization opportunities) but results in a self-consistent design that maximizes
the commonality of functional processing components.
Performance comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo
Show abstract
This paper provides a detailed rate-distortion performance comparison between JPEG2000, Microsoft HD Photo, and
H.264/AVC High Profile 4:4:4 I-frame coding for high-resolution still images and high-definition (HD) 1080p video
sequences. This work is an extension to our previous comparative study published in previous SPIE conferences [1, 2].
Here we further optimize all three codecs for compression performance. Coding simulations are performed on a set of
large-format color images captured from mainstream digital cameras and 1080p HD video sequences commonly used for
H.264/AVC standardization work. Overall, our experimental results show that all three codecs offer very similar coding
performances at the high-quality, high-resolution setting. Differences tend to be data-dependent: JPEG2000 with the
wavelet technology tends to be the best performer with smooth spatial data; H.264/AVC High-Profile with advanced
spatial prediction modes tends to cope best with more complex visual content; Microsoft HD Photo tends to be the most
consistent across the board. For the still-image data sets, JPEG2000 offers the best R-D performance gains (around 0.2 to
1 dB in peak signal-to-noise ratio) over H.264/AVC High-Profile intra coding and Microsoft HD Photo. For the 1080p
video data set, all three codecs offer very similar coding performance. As in [1, 2], neither do we consider scalability nor
complexity in this study (JPEG2000 is operating in non-scalable, but optimal performance mode).
Processing and Implementation Technologies I
An EO surveillance system for harbor security
Show abstract
Micro USA, Inc. has designed and built an Electro-Optical (EO) system for the detection and surveillance of submerged
objects in harbor and open ocean waters. The system consists of a digital camera and advanced image processing
software. The camera system uses a custom designed CMOS light sensor array to facilitate the detection of low contrast
underwater targets. The software system uses wavelet-based image processing to remove atmospheric and ocean
reflectance as well as other noise. Target detection is achieved through a suite of optimal channel prediction algorithms.
The novelties of the system are (a) camera calibration routine to remove pixel non-linearities, non-uniformities and lens
effect, (b) adaptive algorithms for low-contrast detection. Our system has been tested in San Diego harbor waters and
initial results indicate that we can detect small (2 feet diameter), low-reflectance (5%) targets underwater to greater than
2 diffusion depth (DD). This diffusion depth translates to about 25 feet in harbor waters.
Image analysis for the identification of coherent structures in plasma
Show abstract
Turbulence at the edge of the plasma in a nuclear fusion reactor can cause loss of confinement of the plasma. In
an effort to study the edge turbulence, the National Spherical Torus Experiment uses a gas puff imaging (GPI)
diagnostic to capture images of the turbulence. A gas puff is injected into the torus and visible light emission
from the gas cloud is captured by an ultra high-speed camera. Our goal is to detect and track coherent structures
in the GPI images to improve our understanding of plasma edge turbulence. In this paper, we present results
from various segmentation methods for the identification of the coherent structures. We consider three categories
of methods - immersion-based, region-growing, and model-based - and empirically evaluate their performance on
four sample sequences. Our preliminary results indicate that while some methods can be sensitive to the settings
of parameters, others show promise in being able to detect the coherent structures.
Real-time detection of targets in hyperspectral images using radial basis neural network filtering
Show abstract
A spectral target recognition technique has been developed that detects targets in hyperspectral images in real time. The
technique is based on the configuration of a radial basis neural network filter that is specific for a particular target
spectral signature or series of target spectral signatures. Detection of targets in actual 36-band CASI, and 210-band
HYDICE images is compared to existing recognition techniques and results in considerable reduction in overall image
processing time and greater accuracy than existing spectral processing algorithms.
PixonVision real-time Deblurring Anisoplanaticism Corrector (DAC)
Show abstract
DigiVision, Inc. and PixonImaging LLC have teamed to develop a real-time Deblurring Anisoplanaticism Corrector
(DAC) for the Army. The DAC measures the geometric image warp caused by anisoplanaticism and removes it to
rectify and stabilize (dejitter) the incoming image. Each new geometrically corrected image field is combined into a
running-average reference image. The image averager employs a higher-order filter that uses temporal bandpass
information to help identify true motion of objects and thereby adaptively moderate the contribution of each new pixel to
the reference image. This result is then passed to a real-time PixonVision video processor (see paper 6696-04 note, the
DAC also first dehazes the incoming video) where additional blur from high-order seeing effects is removed, the image
is spatially denoised, and contrast is adjusted in a spatially adaptive manner. We plan to implement the entire algorithm
within a few large modern FPGAs on a circuit board for video use. Obvious applications are within the DOD,
surveillance and intelligence, security and law enforcement communities. Prototype hardware is scheduled to be
available in late 2008. To demonstrate the capabilities of the DAC, we present a software simulation of the algorithm
applied to real atmosphere-corrupted video data collected by Sandia Labs.
ATR for 3D medical imaging
Show abstract
This paper presents a novel concept of Automatic Target Recognition (ATR) for 3D medical imaging. Such 3D imaging
can be obtained from X-ray Computerized Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission
Tomography (PET), Ultrasonography (USG), functional MRI, and others. In the case of CT, such 3D imaging can be
derived from 3D-mapping of X-ray linear attenuation coefficients, related to 3D Fourier transform of Radon transform,
starting from frame segmentation (or contour definition) into an object and background. Then, 3D template matching is
provided, based on inertial tensor invariants, adopted from rigid body mechanics, by comparing the mammographic data
base with a real object of interest, such as a malignant breast tumor. The method is more general than CAD breast
mammography.
Image enhancement methods for the visually impaired
Show abstract
A novel image enhancement algorithm which simplifies image content and enhances image contrast and color saturation
is proposed. The capability to improve the image discriminability by patients with central scotoma was evaluated using
computer simulation. Image enhancement and discriminability simulation were based on modeling of contrast sensitivity
loss for the low vision patient with central scotoma size of ±10 degrees of the visual field. The results are compared with
other methods of image enhancement. The simulations results suggest that the proposed method performs well compared
with other tested algorithms, showing significant increase in the average image discriminability, measured using the d'
parameter.
An efficient method of noise suppression in security systems
Show abstract
This paper is devoted to denoising technique for video noise removal and deals with advanced WT (Wavelet Transform)
based method of noise suppression for security purposes. Many sources of unwanted distortion exist in the real
surveillance system especially when the sensing is done at extremely low light level conditions. Our goal was to optimize
the WT based algorithm to be applicable for the noise suppression in security videos with high computational
efficiency. Preprocessing is applied to the output of the sensing system to make the video data more suitable for further
denoising. Then a WT based statistical denoising method is applied. The method uses BLSE (Bayesian Least Square
Error Estimator) of WT coefficients while utilizing generalized Laplacian PDF modeling and optimized moment method
for parameters estimation. Several tests have been done to verify high noise suppression performance, computational
efficiency and low distortion of important features. Experimental results show that the described method performs well
for wide range of light conditions and respective signal-to-noise ratios.
Toward a tongue-based task triggering interface for computer interaction
Show abstract
A system able to detect the existence of the tongue and locate its relative position within the surface of the mouth by
using video information obtained from a web camera is proposed in this paper. The system consists of an offline phase,
prior to the the operation by the final user, in which a 3-layer cascade of SVM learning classifiers are trained using a
database of 'tongue vs. not-tongue' images, that correspond to segmented images containing our region of interest, the
mouth with the tongue in three possible positions: center, left or right. The first training stage discerns whether the
tongue is present or not, giving the output data to the next stage, in which the presence of the tongue in the center of the
mouth is evaluated; finally, in the last stage, a left vs. right position detection is assessed. Due to the novelty of the
proposed system, a database needed to be created by using information gathered from different people of distinct ethnic
backgrounds. While the system has yet to be tested in an online stage, results obtained from the offline phase show that it
is feasible to achieve a real-time performance in the near future. Finally, diverse applications to this prototype system are
introduced, demonstrating that the tongue can be effectively used as an alternative input device by a broad range of
users, including people with some physical disability condition.
Pattern recognition and signal analysis in a Mach-Zehnder type phasing sensor
Show abstract
The primary mirror of future Extremely Large Telescopes will be composed of hundreds of individual segments.
Misalignments in piston and tip-tilt of such segments must be reduced to a small fraction of the observing
wavelength in order not to affect the image quality of these telescopes. In the framework of the Active Phasing
Experiment carried out at ESO, new phasing techniques based on the concept of pupil plane detection will
be tested. The misalignments of the segments produce amplitude variations at locations on a CCD detector
corresponding to the locations of the segment edges. The position of the segment edges on a CCD image must
first be determined with pixel accuracy in order to localize the signals which can be analyzed in a second phase
with a robust signal analysis algorithm. A method to retrieve the locations of the edges and a phasing algorithm
to measure the misalignments between the segments with an accuracy of a few nanometers have been developed.
This entire phasing procedure will be presented. The performance of the pattern recognition algorithm will
be studied as a function of the number of photons, the amplitude of the segment misalignments and their
distribution. Finally, the accuracy achieved under conditions similar to the ones met during observation will be
discussed.
Exploitation of hyperspectral imagery using adaptive resonance networks
Show abstract
Hyperspectral imagery consists of a large number of spectral bands that is typically modeled in a high dimensional
spectral space by exploitation algorithms. This high dimensional space usually causes no inherent problems with simple
classification methods that use Euclidean distance or spectral angle for a metric of class separability. However,
classification methods that use quadratic metrics of separability, such as Mahalanobis distance, in high dimensional
space are often unstable, and often require dimension reduction methods to be effective. Methods that use supervised
neural networks or manifold learning methods are often very slow to train. Implementations of Adaptive Resonance
Theory, such as fuzzy ARTMAP and distributed ARTMAP have been successfully applied to single band imagery,
multispectral imagery, and other various low dimensional data sets. They also appear to converge quickly during
training. This effort investigates the behavior of ARTMAP methods on high dimensional hyperspectral imagery without
resorting to dimension reduction. Realistic-sized scenes are used and the analysis is supported by ground truth
knowledge of the scenes. ARTMAP methods are compared to a back-propagation neural network, as well as simpler
Euclidean distance and spectral angle methods.
Vegetation classification using hyperspectral remote sensing and singular spectrum analysis
Show abstract
In this study, classification was investigated based on seasonal variation of the state parameters of vegetation canopies as
inferred from visible and near-infrared spectral bands. This analysis was carried out on data collected over agricultural
fields with the hyperspectral CHRIS (Compact High Resolution Imaging Spectrometer) in May, June and July, 2004.
The singular spectrum analysis was used to remove noise in each reflectance spectrum of the whole image. Decision tree
classification was performed on different features, such as reflectance, vegetation indices, and principal components
acquired by PCA (Principal Component Analysis) and MNF (Minimum Noise Fraction). The results demonstrated
noise-removal using SSA increased classification accuracy by 3-6 percentages depending on the features used.
Classification using MNF components was shown to provide the highest accuracy followed by that using vegetation
indices.
Rate adaptive live video communications over IEEE 802.11 wireless networks
Show abstract
Rate adaptivity is an important concept in data intensive applications such as video communications. In this paper, we
present an example rate adaptive, live, video communications system over IEEE 802.11 wireless networks, which
integrate channel estimation, rate adaptive video encoding, wireless transmission, reception, and playback. The video
stream over the wireless network conforms to the RTP payload format for H.264 video, and can be retrieved and
displayed by many popular players such as QuickTime and VLC. A live, video-friendly, packet-dispersion-based
algorithm will be used to estimate the available bandwidth, which is then used to trigger rate control to achieve rate
adaptive video coding on the fly.
Interaction Between Image Processing, Optics, and Photonics
Wavelet-based denoising for 3D OCT images
Show abstract
Optical coherence tomography produces high resolution medical images based on spatial and temporal coherence
of the optical waves backscattered from the scanned tissue. However, the same coherence introduces
speckle noise as well; this degrades the quality of acquired images.
In this paper we propose a technique for noise reduction of 3D OCT images, where the 3D volume is
considered as a sequence of 2D images, i.e., 2D slices in depth-lateral projection plane. In the proposed
method we first perform recursive temporal filtering through the estimated motion trajectory between
the 2D slices using noise-robust motion estimation/compensation scheme previously proposed for video
denoising. The temporal filtering scheme reduces the noise level and adapts the motion compensation on
it. Subsequently, we apply a spatial filter for speckle reduction in order to remove the remainder of noise
in the 2D slices. In this scheme the spatial (2D) speckle-nature of noise in OCT is modeled and used for
spatially adaptive denoising. Both the temporal and the spatial filter are wavelet-based techniques, where
for the temporal filter two resolution scales are used and for the spatial one four resolution scales.
The evaluation of the proposed denoising approach is done on demodulated 3D OCT images on different
sources and of different resolution. For optimizing the parameters for best denoising performance fantom
OCT images were used. The denoising performance of the proposed method was measured in terms of
SNR, edge sharpness preservation and contrast-to-noise ratio. A comparison was made to the state-of-the-art
methods for noise reduction in 2D OCT images, where the proposed approach showed to be advantageous
in terms of both objective and subjective quality measures.
Improved invariant optical correlations for 3D target detection
Show abstract
Invariant optical correlation method for 3D target detection is addressed. Tridimensionality is expressed in terms of
range images. The recognition model is based on a vector space representation using an orthonormal image basis. The
recognition method proposed is based on the calculation of the angle between the vector associated with a certain 3D
object and a vector subspace. Scale and rotation invariant 3D target detection are obtained using the phase Fourier
transform (PhFT) of the range images. In fact, when the 3D object is scaled, the PhFT becomes a distribution multiplied
by a constant factor. On the other hand a rotation of the 3D object around the z axis, implies a shift in the PhFT. So,
changes of scale and rotation of 3D objects are replaced by changes of intensity and position of PhFT distributions. We
applied intensity invariant correlation methods for recognition. In addition to tolerance to scale and rotation, high
discrimination against false targets is also achieved.
Multidimensional illumination and image processing techniques in the W-band for recognition of concealed objects
Show abstract
Active millimeter wave imaging technology is emerging, which has the potential to yield much more information when
one has control over the illumination parameters. Image processing of this kind of images is almost inexistent in
literature. In this paper, we propose multidimensional illumination techniques to improve the mm-wave image quality.
Multi-angle, multi-frequency, and cross-polarization illuminations were implemented to obtain multidimensional images.
Principle Component Analysis (PCA) and clustering analysis were applied to process the results.
Object specific compressed sensing
Show abstract
Compressed sensing holds the promise for radically novel sensors that can perfectly reconstruct
images using considerably less samples of data than required by the otherwise general Shannon
sampling theorem. In surveillance systems however, it is also desirable to cue regions of the image
where objects of interest may exist. Thus in this paper, we are interested in imaging interesting
objects in a scene, without necessarily seeking perfect reconstruction of the whole image. We show
that our goals are achieved by minimizing a modified L2-norm criterion with good results when the
reconstruction of only specific objects is of interest. The method yields a simple closed form
analytical solution that does not require iterative processing. Objects can be meaningfully sensed in
considerable detail while heavily compressing the scene elsewhere. Essentially, this embeds the
object detection and clutter discrimination function in the sensing and imaging process.
Mobile Video
Fast super-resolution reconstructions of mobile video using warped transforms and adaptive thresholding
Show abstract
Multimedia services for mobile phones are becoming increasingly popular thanks to capabilities brought about
by location awareness, customized programming, interactivity, and portability. With mounting attraction to
these services there is desire to seamlessly expand the mobile multimedia experience to stationary environments
where high-resolution displays can offer significantly better viewing conditions. In this paper, we propose a
fast, high quality super-resolution algorithm that enables high resolution display of low-resolution video. The
proposed algorithm, SWAT, accomplishes sparse reconstructions using directionally warped transforms and spatially
adaptive thresholding. Comparisons are made with some existing techniques in terms of PSNR and visual
quality. Simulation examples show that SWAT significantly outperforms these techniques while staying within
a limited computational complexity envelope.
Complex function estimation using a stochastic classification/regression framework: specific applications to image superresolution
Show abstract
A stochastic framework combining classification with nonlinear regression is proposed. The performance evaluation
is tested in terms of a patch-based image superresolution problem. Assuming a multi-variate Gaussian
mixture model for the distribution of all image content, unsupervised probabilistic clustering via expectation
maximization allows segmentation of the domain. Subsequently, for the regression component of the algorithm,
a modified support vector regression provides per class nonlinear regression while appropriately weighting the
relevancy of training points during training. Relevancy is determined by probabilistic values from clustering.
Support vector machines, an established convex optimization problem, provide the foundation for additional formulations
of learning the kernel matrix via semi-definite programming problems and quadratically constrained
quadratic programming problems.
The intensity reduction of ground shadow to deliver better viewing experiences of soccer videos
Show abstract
In this paper, we present a method for reducing the intensity of shadows cast on the ground in outdoor sports videos to
provide TV viewers with a better viewing experience. In the case of soccer videos taken by a long-shot camera
technique, it is difficult for viewers to discriminate the tiny objects (i.e., soccer ball and player) from the ground
shadows. The algorithm proposed in this paper comprises three modules, such as long-shot detection, shadow region
extraction and shadow intensity reduction. We detect the shadow region on the ground by using the relationship between
Y and U values in YUV color space and then reduce the shadow components depending on the strength of the shadows.
Experimental results show that the proposed scheme offers useful tools to provide a more comfortable viewing
environment and is amenable to real-time performance even in a software based implementation.
Real-time high definition H.264 video decode using the Xbox 360 GPU
Show abstract
The Xbox 360 is powered by three dual pipeline 3.2 GHz IBM PowerPC processors and a 500 MHz ATI graphics
processing unit. The Graphics Processing Unit (GPU) is a special-purpose device, intended to create advanced visual
effects and to render realistic scenes for the latest Xbox 360 games. In this paper, we report work on using the GPU as a
parallel processing unit to accelerate the decoding of H.264/AVC high-definition (1920x1080) video. We report our
experiences in developing a real-time, software-only high-definition video decoder for the Xbox 360.
A cross-layer adaptive handoff algorithm in wireless multimedia environments
Show abstract
Providing multimedia services in wireless networks is concerned about the performance of handoff algorithms because
of the irretrievable property of real-time data delivery. To lessen unnecessary handoffs and handoff latencies which can
cause media disruption perceived by users, we present in this paper a cross-layer handoff algorithm base on link quality.
Neural networks are used to learn the cross-layer correlation between the link quality estimator such as packet success
rate and the corresponding context metric indictors, e.g. the transmitting packet length, received signal strength, and
signal to noise ratio. Based on a pre-processed learning of link quality profile, our approach makes handoff decisions
intelligently and efficiently with the evaluations of link quality instead of the comparisons between relative signal
strength. The experiment and simulation results show that the proposed method outperforms RSS-based handoff
algorithms in a transmission scenario of VoIP applications.
Low latency adaptive streaming of HD H.264 video over 802.11 wireless networks with cross-layer feedback
Show abstract
Streaming video in consumer homes over wireless IEEE 802.11 networks is becoming commonplace. Wireless 802.11 networks
pose unique difficulties for streaming high definition (HD), low latency video due to their error-prone physical layer and media
access procedures which were not designed for real-time traffic. HD video streaming, even with sophisticated H.264 encoding, is
particularly challenging due to the large number of packet fragments per slice. Cross-layer design strategies have been proposed
to address the issues of video streaming over 802.11. These designs increase streaming robustness by imposing some degree of
monitoring and control over 802.11 parameters from application level, or by making the 802.11 layer media-aware. Important
contributions are made, but none of the existing approaches directly take the 802.11 queuing into account. In this paper we take
a different approach and propose a cross-layer design allowing direct, expedient control over the wireless packet queue, while
obtaining timely feedback on transmission status for each packet in a media flow. This method can be fully implemented on a
media sender with no explicit support or changes required to the media client. We assume that due to congestion or deteriorating
signal-to-noise levels, the available throughput may drop substantially for extended periods of time, and thus propose video
source adaptation methods that allow matching the bit-rate to available throughput. A particular H.264 slice encoding is presented
to enable seamless stream switching between streams at multiple bit-rates, and we explore using new computationally efficient
transcoding methods when only a high bit-rate stream is available.
Coding and optimization of a fully scalable motion model
Show abstract
Motion information scalability is an important requirement for a fully scalable video codec, especially for decoding
scenarios of low bit rate or small image size. So far, several scalable coding techniques on motion information
have been proposed, including progressive motion vector precision coding and motion vector field layered coding.
However, due to the interactive compatibility issue with other scalabilities, i.e. spatial, temporal, and quality,
a complete solution that integrates most of the desirable features of motion scalability is not yet seen. In order
to solve the problem, we have recently proposed a fully scalable motion model which offers full functionalities
for motion scalability with no compatibility issues. In this paper, we further investigate some coding algorithms
for proposed scalable motion model. The purpose is to minimize the coding overhead introduced by motion
scalability. Simulation results will be presented to verify the significant improvements using proposed coding
and optimization algorithms.
IDCT
Standardization of IDCT approximation behavior for video compression: the history and the new MPEG-C parts 1 and 2 standards
Show abstract
This paper presents the history of international standardization activities and specifications relating to the precision of
inverse discrete cosine transform (IDCT) approximations used in video compression designs. The evolution of issues
relating to IDCT precision and the "drift" effects of IDCT mismatch between encoder modeling of decoder behavior is
traced, starting with the initial requirements specified for ITU-T H.261 and continuing through the MPEG-1,
H.262/MPEG-2, H.263, MPEG-4 Part 2, and H.264/MPEG-4 Part 10 AVC projects. Finally, the development of the new
MPEG-C Part 1 and Part 2 standards is presented. MPEG-C Part 1 contains a centralized repository for specification of
IDCT precision conformance requirements for the various MPEG video coding standards prior to MPEG-4 Part 10.
MPEG-C Part 2 specifies one particular IDCT approximation for adoption in industry implementations of existing
standards. The use of MPEG-C Part 2 by encoders can eliminate IDCT mismatch drift error when decoded using an
MPEG-C Part 2 conforming decoder, resulting in a deterministic decoded result. MPEG-C Part 2 also provides an
example for guiding implementers on how to design a decoder with high precision and without excessive computational
resource requirements.
From 16-bit to high-accuracy IDCT approximation: fruits of single architecture affliation
Show abstract
In this paper, we demonstrate an effective unified framework for high-accuracy approximation of the irrational co-effcient floating-point IDCT by a single integer-coeffcient fixed-point architecture. Our framework is based
on a modified version of the Loeffler's sparse DCT factorization, and the IDCT architecture is constructed
via a cascade of dyadic lifting steps and butterflies. We illustrate that simply varying the accuracy of the
approximating parameters yields a large family of standard-compliant IDCTs, from rare 16-bit approximations
catering to portable computing to ultra-high-accuracy 32-bit versions that virtually eliminate any drifting effect
when pairing with the 64-bit floating-point IDCT at the encoder. Drifting performances of the proposed IDCTs
along with existing popular IDCT algorithms in H.263+, MPEG-2 and MPEG-4 are also demonstrated.
Analysis and encoder prevention techniques for pathological IDCT drift accumulation in static video scenes
Show abstract
This paper discusses the problem of severe pathological encoder-decoder IDCT drift in video compression coding when
using a very small quantization step size and typical encoding techniques to encode video sequences with areas of
completely static source content. We suggest that there are two ways to try to address the problem: 1) using a highaccuracy
IDCT (or an encoder-matched IDCT) in a decoder design, and 2) using encoder techniques to avoid such drift
build-up. The primary problem is asserted to be the encoder's behavior. Effective encoder techniques to avoid the
problem are described, including a simple "generalized Morris drift test", which is suggested as being superior to the test
currently described in the MPEG-2 video specification. Experiment results are reported to show that performing this test
in an encoder will completely solve the problem. Other encoding techniques to address the problem are also discussed.
Drift analysis for integer IDCT
Show abstract
This paper analyzes the drift phenomenon that occurs between video encoders and decoders that employ different
implementations of the Inverse Discrete Cosine Transform (IDCT). Our methodology utilizes MPEG-2, MPEG-4
Part 2, and H.263 encoders and decoders to measure drift occurring at low QP values for CIF resolution video
sequences. Our analysis is conducted as part of the effort to define specific implementations for the emerging ISO/IEC
23002-2 Fixed-Point 8x8 IDCT and DCT standard. Various IDCT implementations submitted as proposals for the new
standard are used to analyze drift. Each of these implementations complies with both the IEEE Standard 1180 and the
new MPEG IDCT precision specification ISO/IEC 23002-1. Reference implementations of the IDCT/DCT, and
implementations from well-known video encoders/decoders are also employed. Our results indicate that drift is
eliminated entirely only when the implementations of the IDCT in both the encoder and decoder match exactly. In this
case, the precision of the IDCT has no influence on drift. In cases where the implementations are not identical, then the
use of a highly precise IDCT in the decoder will reduce drift in the reconstructed video sequence only to the extent that
the IDCT used in the encoder is also precise.
Multiplier-less approximation of the DCT/IDCT with low complexity and high accuracy
Show abstract
This paper presents a straightforward multiplier-less approximation of the forward and inverse Discrete Cosine Transform
(DCT) with low complexity and high accuracy. The implementation, design methodology, complexity and performance
tradeoffs are discussed. Particular, the proposed IDCT implementations, in spite of simplicity, comply with and
can reach far beyond the MPEG IDCT accuracy specification ISO/IEC 23002-1, and also reduce drift favorably compared
to other existing IDCT implementations.
An accurate fixed-point 8×8 IDCT algorithm based on 2D algebraic integer representation
Show abstract
This paper proposes an algorithm that is based on the application of Algebraic Integer (AI) representation of numbers on
the AAN fast Inverse Discrete Cosine Transform (IDCT) algorithm. AI representation allows for maintaining an error-free
representation of IDCT until the last step of each 1-D stage of the algorithm, where a reconstruction step from the AI
domain to the fixed precision binary domain is required. This delay in introducing the rounding error prevents the
accumulation of error throughout the calculations, which leads to the reported high-accuracy results. The proposed
algorithm is simple and well suited for hardware implementation due to the absence of computationally extensive
multiplications. The obtained results confirm the high accuracy of the proposed algorithm compared to other fixed-point
implementations of IDCT.
Efficient fixed-point approximations of the 8×8 inverse discrete cosine transform
Show abstract
This paper describes fixed-point design methodologies and several resulting implementations of the Inverse
Discrete Cosine Transform (IDCT) contributed by the authors to MPEG's work on defining the new 8x8 fixed
point IDCT standard - ISO/IEC 23002-2. The algorithm currently specified in the Final Committee Draft (FCD)
of this standard is also described herein.
A full 2D IDCT with extreme low complexity
Show abstract
In the context of a Call for Proposal for integer IDCTs issued by MPEG in July 2005, a full 2D integer IDCT based on a
previous Feig and Winograd's work has been proposed. It achieves a high precision by meeting all IEEE1180 conditions
and is suitable of implementation on hardware since it can be performed only with shifts and additions. Furthermore, it
can be useful in high video resolution scenarios like in 720p/1080i/p due to its feedforward operation mode without any
loop as usual in row-column implementations. The proposed transformation can be implemented without changing other
functional blocks either at the encoder or at the decoder or alternatively as a scaled version incorporating the scaling
factors into the dequantization stage. Our algorithm uses only 1328 operations for 8x8 blocks, including scaling factors.
Low complexity 1D IDCT for 16-bit parallel architectures
Show abstract
This paper shows that using the Loeffler, Ligtenberg, and Moschytz factorization of 8-point IDCT [2] one-dimensional
(1-D) algorithm as a fast approximation of the Discrete Cosine Transform (DCT) and using only 16 bit numbers, it is
possible to create in an IEEE 1180-1990 compliant and multiplierless algorithm with low computational complexity.
This algorithm as characterized by its structure is efficiently implemented on parallel high performance architectures as
well as due to its low complexity is sufficient for wide range of other architectures. Additional constraint on this work
was the requirement of compliance with the existing MPEG standards. The hardware implementation complexity and
low resources where also part of the design criteria for this algorithm. This implementation is also compliant with the
precision requirements described in MPEG IDCT precision specification ISO/IEC 23002-1. Complexity analysis is
performed as an extension to the simple measure of shifts and adds for the multiplierless algorithm as additional
operations are included in the complexity measure to better describe the actual transform implementation complexity.
Processing and Implementation Technologies II
Regularization for designing spectral matched filter target detectors
Show abstract
This paper describes a new adaptive spectral matched filter that incorporates the idea of regularization (shrinkage) to
penalize and shrink the filter coefficients to a range of values. The regularization has the effect of restricting the possible
matched filters (models) to a subset which are more stable and have better performance than the non-regularized
adaptive spectral matched filters. The effect of regularization depends on the form of the regularization term and the amount of regularization is controlled by so called regularization coefficient. In this paper the sum-of-squares of the filter coefficients is used as the regularization term and several different values for the regularization coefficient are tested. A Bayesian-based derivation of the regularized matched filter is also provided. Experimental results for detecting targets in hyperspectral imagery are presented for regularized and non-regularized spectral matched filters.
A rectangular-fit classifier for synthetic aperture radar automatic target recognition
Show abstract
The utility of a rectangular-fit classifier for Synthetic Aperture Radar Automatic Target Recognition (SAR ATR) is examined. The target is fitted with and modeled as a rectangle that can best approximate
its boundary. The rectangular fit procedure involves 1) a preprocessing phase to remove the background clutter and noise, 2) a pose detection phase to establish the alignment of the rectangle via a least squares straight line fitting algorithm, and 3) size determination phase via stretching the width and the height dimensions of the rectangle in order to encapsulate a pre-specified, e.g., 90%, of the points in the target. A training set composed of approximately half the total images in the MSTAR public imagery database are used to obtain and record the statistical variations in the width and height of the resulting rectangles for each potential target. The remaining half of the images is then used to assess the performance of this classifier. Preliminary results using minimum Euclidean and Mahalanobis distance classifiers show overall accuracies of 44% and 42%, respectively. Although the classification accuracy is relatively low, this technique can be successfully used in combination with other classifiers such as peaks, edges, corners, and shadow-based classifiers to enhance their performances. A unique feature of the rectangular fit classifier is that it is rotation invariant in its present form. However, observation of the dataset reveals that in general the shapes of the targets in SAR imagery are not fully rotation invariant. Thus, the classification accuracy is expected to improve considerably using
multiple training sets, i.e., one training set generated and used for each possible pose. The
tradeoff is the increased computation complexity which tends to be offset by ever increasing
efficiency and speed of the processing hardware and software. The rectangular fit classifier can also
be used as a pose detection routine and/or in conjunction with other ATR schemes, such as
shadow-based ATR, that require an initial pose detection phase prior to matching.
Ship detection and classification from overhead imagery
Show abstract
This paper presents a sequence of image-processing algorithms suitable for detecting and classifying ships from nadir
panchromatic electro-optical imagery. Results are shown of techniques for overcoming the presence of background sea
clutter, sea wakes, and non-uniform illumination. Techniques are presented to measure vessel length, width, and
direction-of-motion. Mention is made of the additional value of detecting identifying features such as unique
superstructure, weaponry, fuel tanks, helicopter landing pads, cargo containers, etc.
Various shipping databases are then described as well as a discussion of how measured features can be used as search
parameters in these databases to pull out positive ship identification. These are components of a larger effort to develop a
low-cost solution for detecting the presence of ships from readily-available overhead commercial imagery and
comparing this information against various open-source ship-registry databases to categorize contacts for follow-on
analysis.
Identification of degraded fingerprints using PCA- and ICA-based features
Show abstract
Many algorithms have been developed for fingerprint identification. The main challenge in many of the applications
remains in the identification of degraded images in which the fingerprints are smudged or incomplete. Fingerprints from
the FVC2000 databases have been utilized in this project to develop and implement feature extraction and classification
algorithms. Besides the degraded images in the database, artificially degraded images have also been used. In this paper
we use features based on PCA (principal component analysis) and ICA (independent component analysis) to identify
fingerprints. PCA and ICA reduce the dimensionality of the input image data. PCA- and ICA-based features do not
contain redundancies in the data. Different multilayer neural network architectures have been implemented as classifiers.
The performance of different features and networks is presented in this paper.
Building verification from geometrical and photometric cues
Show abstract
Damage assessment, change detection or geographical database update are traditionally performed by experts looking for
objects in images, a task which is costly, time consuming and error prone. Automatic solutions for building verification
are particularly welcome but suffer from illumination and perspective changes. On the other hand, semi-automatic
procedures intend to speed up image analysis while limiting human intervention to doubtful cases. We present a semi-automatic
approach to assess the presence of buildings in airborne images from geometrical and photometric cues. For
each polygon of the vector database representing a building, a score is assigned, combining geometrical and photometric
cues. Geometrical cues relate to the proximity, parallelism and coverage of linear edge segments detected in the image
while photometric factor measures shadow evidence based on intensity levels in the vicinity of the polygon. The human
operator interacts with this automatic scoring by setting a threshold to highlight buildings poorly assessed by image
geometrical and photometric features. After image inspection, the operator may decide to mark the polygon as changed
or to update the database, depending on the application.
Automatic identification of vehicle license plates
Show abstract
A new algorithm for automatic identification of vehicle license plate is proposed in this paper.
The proposed algorithm uses image segmentation and morphological operation to accurately
identify the location of license plate with various background illuminations. The license plate is
identified in two steps. At first the original image is segmented using edge detection and
morphological operations. Then, the power spectrum (PS) is analyzed in horizontal and vertical
directions to identify the license plate. The magnitude of the power frequency spectrum shows
special characteristics corresponding to the license plate segment. The proposed algorithm is tested
with different gray level car images from different angles of view and the results are all consistent.
The proposed algorithm is fast and can effectively identify license plates under various illumination
conditions with high accuracy.
Speckle reduction from digital holograms by simulating temporal incoherence
Show abstract
Speckle is an inherent characteristic of coherent imaging systems. Often, as in the case of Ultrasound, Synthetic Aperture
Radar, Laser Imaging and Holography, speckle is a source of noise and degrades the reconstructed image. Various
methods exist for the removal of speckle in such images. One method, which has received attention for the removal of
speckle from coherent imaging, is to use a temporally incoherent source. We create a novel digital signal processing
technique for the reduction of speckle from digital holograms by simulating temporal incoherence during the digital
reconstruction process. The method makes use of the discrete implementation of the Fresnel Transform, which calculates
the reconstructed image for a range of different wavelengths. These different spectral components can be weighted to
suit a temporally incoherent source and the intensities from each wavelength are added together. The method is
examined using the speckle index metric.
Workshop on Optics in Entertainment
Optical systems in entertainment
Show abstract
The information revolution, which was held in XIX centuries, considerably lifts intelligence of the man due to
storming development and accumulation of base of knowledge of mankind. Time comes and volumes of knowledge and
information grow in a geometrical progression, which requires of mankind of the increasing improvement of the
knowledge, which in turn will improve the technologies, developed by them, for maintenance of comfortable and
productive ability to live. During the certain time the mankind subjectively separated concepts of energy and
information. But the objective development provides by itself knowledge installed as single unit, where all is
interdependent and interconnected. Such understanding installed deduces mankind in a new plane of intelligence, which
derivates by it new tasks and opens new prospects.
Performance improvements in back panel display lighting using near-Lambertian diffuse high-reflectance materials
Bryn Waldwick,
James E. Leland,
Christina Chase,
et al.
Show abstract
LCD backlighting applications require diffuse illumination over an extended area of a display unit while maintaining
high luminance levels. Since such applications involve multiple reflections within a reflective cavity, the efficiency of
the cavity can be affected significantly by relatively small changes in the reflectance of the cavity material. Materials
with diffuse rather than specular (or mirror-like) reflectance scatter light, averaging out hot spots and providing a
uniform field of illumination. Reflectors with specular components tend to propagate non-uniformities in the illuminator
system. The result is a spatial variation in brightness visible to the viewer of the display. While the undesirability of
specular materials for such applications has been widely recognized, some diffuse materials in common use exhibit a
significant specular component. This paper describes a method for measuring the specular component of such
materials, and presents a simple approach to evaluating the effect of such secondary specular behavior on the
performance of a backlight cavity. It is demonstrated that significant differences exist among available diffuse
reflectance materials, and that these differences can lead to significant differences in the performance of the displays in
which these materials are used.
Tele-counseling and social-skill trainings using JGNII optical network and a mirror-interface system
Show abstract
"Tele-presence" communication using JGNII - an exclusive optical-fiber network system - was applied to social-skills
training in the form of child-rearing support. This application focuses on internet counseling and social training skills
that require interactive verbal and none-verbal communications. The motivation for this application is supporting local
communities by constructing tele-presence education and entertainment systems using recently available, inexpensive IP
networks. This latest application of tele-presence communication uses mirror-interface system which provides to users in
remote locations a shared quasi-space where they can see themselves as if they were in the same room by overlapping
video images from remote locations.
Examples of subjective image quality enhancement in multimedia
Show abstract
The subjective image quality is an important issue in all multimedia imaging systems with a significant impact onto QoS
(Quality of Service). For long time the image fidelity criterion was widely applied in technical systems esp. in both
television and image source compression fields but the optimization of subjective perception quality and fidelity
approach (such as the minimum of MSE) are very different. The paper presents an experimental testing of three different
digital techniques for the subjective image quality enhancement - color saturation, edge enhancement, denoising
operators and noise addition - well known from both the digital photography and video. The evaluation has been done
for extensive operator parameterization and the results are summarized and discussed. It has been demonstrated that
there are relevant types of image corrections improving to some extent the subjective perception of the image. The above
mentioned techniques have been tested for five image tests with significantly different image characteristics (fine details,
large saturated color areas, high color contrast, easy-to-remember colors etc.). The experimental results show the way to
optimized use of image enhancing operators. Finally the concept of impressiveness as a new possible expression of
subjective quality improvement is presented and discussed.
Poster Session
Optical resources for highly secure remote object authentication
Show abstract
We review the potential of optical techniques in security tasks and propose to combine some of them in automatic
authentication. More specifically, we propose to combine visible and near infrared imaging, optical decryption,
distortion-invariant ID tags, optoelectronic devices, coherent image processor, optical correlation, and multiple
authenticators. A variety of images and signatures, including biometric and random sequences, can be combined in an
optical ID tag for multifactor identification. Encryption of the information codified in the ID tag allows increasing
security and deters from unauthorized usage of optical tags. The identification process encompasses several steps such as
detection, information decoding and verification which are all detailed in this work. Design of rotation and scale
invariant ID tags is taken into account to achieve a correct authentication even if the ID tag is captured in different
positions. Resistance to some noise and degradation of the tag is analyzed. Examples and experiments are provided and
the results discussed.
Bayesian approach to the thermally generated charge elimination
Show abstract
It is generally known that every astronomical image, which was acquired by CCD sensor, has to be corrected by dark
frame. The Dark frame maps the thermally generated charge of the CCD. May become that the dark frame image is not
available and it is impossible to correct the astronomical images directly. It is good to note that uncorrected images are
not suitable for subsequent investigation. There are simple nonlinear filtering methods, e.g. median filtering, but the
obtained results are not so satisfactory. During the recent year the algorithms for the thermally generated charge
elimination were proposed. All these algorithms use the Discrete Wavelet Transform (DWT). The DWT transforms
image into different frequency bands. Wavelet coefficients histogram should be modeled by generalized Laplacian
probability density function (PDF). The Laplacian parameters were estimated by moment method using derived equation
system. Furthermore, the images, where the thermally generated charge was suppressed, were estimated using Bayesian
estimators. This algorithm will be in the future improved, but now to the promising eliminating algorithm should be
involved.
Make it easy: Automatic pictogram generation system enables everybody to design illustrations by computer-aided technology
Show abstract
We developed a prototype design support system for generating illustration of a pictogram. A pictogram is a symbol
representing a concept, object, activity, place or event by illustration. This pictogram is produced by the designer who
gets an illustration more careful consideration and completes it after repeated trial and error. Thus to design
pictograms is complicated process, and it is difficult for the general public to produce new design. This paper
describes the automatic design system of a pictogram such that everybody can design it easily by combination of
basic illustrations.
Development of air touch interface for floating 3D image in the air
Show abstract
We developed a prototype virtual air touch interface system for interaction in the virtual 3D space. The spatial
imaging display system provides the observer virtual 3D objects. These 3D images are floating in the air and one can
directly touch objects or virtual images. To take mutual action we need to prepare the interface system which can
recognize that the user moves his hand near the virtual objects. Because a conventional touch-panel system detects the
user's operation on the display screen but the touching point differs from the actual displaying space, it is important to
realize that the user can operate at the same space. A typical method is to use the computer vision. In this paper, the
authors propose the interface system using a theremin which is a musical instrument having the unusual aspect of being
controlled by the performer's hand motions near the antennas.
Video viewing browser enables to playback movie contents reproduced by using scene scenario in real-time
Show abstract
The authors developed a prototype video viewing browser. Our video viewer has a function to playback movies
on the WWW according to the playing scenario. This scenario makes new scenes from original movies. Our
video browser features this scene scenario where you can arrange movie's video clips, insert transition effects,
apply colored backgrounds, or add captions and titles. The video movie contents on the WWW are copyrighted.
The browser cannot alter web's movie contents owing to its copyright like that a conventional video editing
software adds effects to the original. The editing software produces reproductions, but our browser doesn't.
The browser adds effects according to the scenario and only shows us a new scene. The scene scenario is written
in an XML-like script. The video browser has a function to give effect according to operations of the scenario.
Pattern recognition with an adaptive generalized SDF filter
Show abstract
Most of captured images present degradations due to blurring and additive noise; moreover objects of interest can be
geometrically distorted. The classical methods for pattern recognition based on correlation are very sensitive to intensity
degradations and geometric distortions. In this work, we propose an adaptive generalized filter based on synthetic
discriminant function (SDF). With the help of computer simulation we analyze and compare the performance of the adaptive correlation filter with that of common correlation filters in terms of discrimination capability and accuracy of target location when input scenes are degraded and a target is geometrically distorted.
Research of the camera calibration based on digital image processing
Show abstract
The paper discusses the key problem of measurement accuracy in a precise photo electronic measurement system,
through combining the camera calibration method based on computer active vision with digital image processing
technology, a method for calibrating camera internal parameters is proposed in the paper. The optics high accuracy
theodolite and CCD subdivision measurement are combined in the method, and the least squares method is also used to
determined the camera internal parameters under the optimum condition. The experimental results indicate: Compare
with traditional camera calibration method, the operation of the camera calibration method is simple, the calibration
speed is quick, and the applicable scope is broad, it can solve the problem of the distortion of CCD camera well.
The new methods for registration and integration of range images
Show abstract
With the improvements in range image acquisition by optical metrology of our group, we also developed a novel method
for the registration and integration of range images. The registration approach is based on texture-feature recognition.
Texture-feature pairs in two texture images are identified by cross-correlation, and the validity-checking is implemented
through Hausdorff distance comparison. The correspondence between the texture image and range image helped acquire
the range point-pairs, and the initial transformation of two range images was computed by least-squares technique. With
this initial transformation, the fine registration was achieved by ICP algorithm. The integration of the registered range
images is based on ray casting. An axis-aligned bounding box for all range images is computed. Three bundles of
uniform-distributed rays are cast and pass through the faces of the box along three orthogonal coordinate axes
respectively. The intersections between the rays and the range images are computed and stored in Dexels. The KD-tree
structure is used to accelerate computation. Those data points in overlapped region are identified with specific criteria
based on the distance and the angle of normals. We can obtain a complete non-redundant digital model after removing
the overlapped points. The experimental results illustrate the efficiency of the method in reconstructing the whole three dimensional
objects.
Pattern recognition with adaptive nonlinear filters
Show abstract
In this paper, adaptive nonlinear correlation-based filters for pattern recognition are presented. The filters are based on a
sum of minima correlations. To improve the recognition performance of the filters in presence of false objects and
geometric distortions, information about the objects is used to synthesize the filters. The performance of the proposed
filters is compared to that of the linear synthetic discriminant function filters in terms of noise robustness and
discrimination capability. Computer simulation results are provided and discussed.
Color component cross-talk pixel SNR correction method for color imagers
Show abstract
A simple multi-channel imager restoration method is presented in this paper. A method is developed to correct channel
dependent cross-talk of a Bayer color filter array sensor with signal-dependent additive noise. We develop separate cost
functions (weakened optimization) for each color channel-to-color channel component. Regularization is applied to each
color component, instead of the standard per color channel basis. This separation of color components allows us to
calculate regularization parameters that take advantage of the differing magnitudes of each color component cross-talk
blurring. Due to a large variation in the amount of blurring for each color component, this separation can result in an
improved trade-off between inverse filtering and noise smoothing. The restoration solution has its regularization
parameters determined by maximizing the developed local pixel SNR estimations (HVS detection constraint). Local
pixel adaptivity is applied. The total error in the corrected signal estimate (from bias error and amplified noise variance)
is used in the local pixel SNR estimates. Sensor characterization a priori information is utilized. The method is geared
towards implementation into the on-chip digital logic of low-cost CMOS sensors. Performance data of the proposed
correction method is presented using color images captured from low cost embedded imaging CMOS sensors.
Holographic and weak-phase projection system for 3D shape reconstruction using temporal phase unwrapping
Show abstract
Two projection systems that use an LCoS phase modulator are proposed for 3D shape reconstruction. The
LCoS is used as an holographic system or as a weak phase projector, both configurations project a set of fringe
patterns that are processed by the technique known as temporal phase unwrapping. To minimize the influence of
camera sampling, and the speckle noise in the projected fringes, an speckle noise reduction technique is applied
to the speckle patterns generated by the holographic optical system. Experiments with 3D shape reconstruction
of ophthalmic mold and other testing specimens show the viability of the proposed techniques.
Imagery-derived modulation transfer function and its applications for underwater imaging
Show abstract
The main challenge working with underwater imagery results from both rapid decay of signals due to absorption, which
leads to poor signal to noise returns, and the blurring caused by strong scattering by the water itself and constituents
within, especially particulates. The modulation transfer function (MTF) of an optical system gives the detailed and
precise information regarding the system behavior. Underwater imageries can be better restored with the knowledge of
the system MTF or the point spread function (PSF), the Fourier transformed equivalent, extending the performance
range as well as the information retrieval from underwater electro-optical system. This is critical in many civilian and
military applications, including target and especially mine detection, search and rescue, and diver visibility. This effort
utilizes test imageries obtained by the Laser Underwater Camera Imaging Enhancer (LUCIE) from Defense Research
and Development Canada (DRDC), during an April-May 2006 trial experiment in Panama City, Florida. Imaging of a
standard resolution chart with various spatial frequencies were taken underwater in a controlled optical environment, at
varying distances. In-water optical properties during the experiment were measured, which included the absorption and
attenuation coefficients, particle size distribution, and volume scattering function. Resulting images were preprocessed
to enhance signal to noise ratio by averaging multiple frames, and to remove uneven illumination at target plane. The
MTF of the medium was then derived from measurement of above imageries, subtracting the effect of the camera
system. PSFs converted from the measured MTF were then used to restore the blurred imageries by different
deconvolution methods. The effects of polarization from source to receiver on resulting MTFs were examined and we
demonstrate that matching polarizations do enhance system transfer functions. This approach also shows promise in
deriving medium optical properties including absorption and attenuation.
Local adaptive image processing in a sliding transform domain
Show abstract
A local adaptive image processing on a sliding discrete transform is presented. The local restoration technique is
performed by pointwise modification of local discrete transform coefficients. To provide image processing at a high rate,
a fast recursive algorithm for computing the sliding transform is utilized. The algorithm is based on a recursive
relationship between three subsequent local spectra. Computer simulation results using real images are provided and
compared with that of common restoration techniques.
Compressed domain statistical snake segmentation for real-time tracking of objects in airborne videos
Show abstract
We present a new compressed domain method for tracking objects in airborne videos. In the
proposed scheme, a statistical snake is used for object segmentation in I-frames, and motion
vectors extracted from P-frames are used for tracking the object detected in I-frames. It is shown
that the energy function of the statistical snake can be obtained directly from the compressed
DCT coefficients without the need of full decompression. The number of snake deformation
iterations can be also significantly reduced in compressed domain implementation. The
computational cost is significantly reduced by using compressed domain processing while the
performance is competitive to that of pixel domain processing. The proposed method is tested
using several UAV video sequences, and experiments show that the tracking results are
satisfactory.
Hyperspectral endmember detection based on strong lattice independence
Show abstract
The advances in image spectroscopy have been applied for Earth observation at different wavelengths of the
electromagnetic spectrum using aircrafts or satellite systems. This new technology, known as hyperspectral
remote sensing, has found many applications in agriculture, mineral exploration and environmental monitoring
since images acquired by these devices register the constituent materials in hundred of spectral bands. Each pixel
in the image contains the spectral information of the zone. However, processing these images can be a difficult
task because the spatial resolution of each pixel is in the order of meters, an area of such size that can be composed
of different materials. The following research presents an alternative methodology to detect pixels in the image
that best represent the spectrum of one material with as little contamination of any other as possible. The
detection of these pixels, also called endmembers, represents the first step for image segmentation and is based
on morphological autoassociative memories and the property of strong lattice independence between patterns.
Morphological associative memories and strong lattice independence are concepts based on lattice algebra. Our
procedure subdivides a hyperspectral image into regions looking for sets of strong lattice independent pixels.
These patterns will be identified as endmembers and will be used for the construction of abundance maps.
Comparison of different illumination arrangements on capillary image quality in nail-fold
Show abstract
The purpose of this study was to investigate which illumination arrangement can provide the highest image quality when
using a non-invasive cutaneous imaging system to observe capillaries in nail-fold. We captured the microcirculation
images with and without a band-pass filter which wavelength is 556 ± 10 nm in front of a 150 Watt halogen lamp.
Furthermore, we varied the illumination angle form 90 degree (co-axial light) to 20 degree to evaluate the image quality
under different light source arrangements. The image registration function was used to solve the image movement
problem which is due to breathing or slightly movement of volunteers or the imaging device. The contrast-to-noise ratio
(CNR) is an evaluation factor to quantify the image quality. The dynamic search method was used to find out the
skeleton of a vessel and define the foreground and background parameters for calculating contrast. A Gaussian smooth
filter was applied to the original images and noise was resulted from differentiating the coefficient of variance (CV) of
the original and processed images. As a result, using a green filter in front of a lamp presents the highest
contrast-to-noise ratio when the illumination angle is 20 degree. By normalizing the highest CNR to 1, the CNR of other
illumination conditions are 0.49 (without filter, 20 degree), 0.39 (with filter, 90 degree) and 0.28 (without filter, 90
degree), respectively. It is concluded that using a green light source with an illuminating angle of 20 degree will provide
better image quality than other arrangements.
Removing foreground objects by using depth information from multi-view images
Show abstract
In this paper, we present a novel method for removing foreground objects in multi-view images. Unlike the conventional
methods, which locate the foreground objects interactive way, we intend to develop an automated system. The proposed
algorithm consists of two modules: 1) object detection and removal, and 2) detected foreground filling stage. The depth
information of multi-view images is a critical cue adopted in this algorithm. By multi-view images, it is not meant a
multi-camera equipped system. We use only one digital camera and take photos by hand. Although it may cause bad
matching result, it is sufficient to detect and remove the foreground object by using coarse depth information. The
experimental results indicate that the proposed algorithm provides an effective tool, which can be used in applications for
digital camera, photo-realistic scene generation, digital cinema and so on.
Still image compression using cubic spline interpolation with bit-plane compensation
Show abstract
In this paper, a modified image compression algorithm using cubic spline interpolation (CSI) and bit-plane compensation
is presented for low bit-rate transmission. The CSI is developed in order to subsample image data with minimal
distortion and to achieve image compression. It has been shown in literatures that the CSI can be combined with the
JPEG or JPEG2000 algorithm to develop a modified JPEG or JPEG2000 CODEC, which obtains a higher compression
ratio and better quality of reconstructed images than the standard JPEG and JPEG2000 CODECs in low bit-rate range.
This paper implements the modified JPEG algorithm, applies bit-plane compensation and tests a few images.
Experimental results show that the proposed scheme can increase 25~30% compression ratio of original JPEG data
compression system with similar visual quality in low bit-rate range. This system can reduce the loading of
telecommunication networks and is quite suitable for low bit-rate transmission.
Blind image quality assessment considering blur, noise, and JPEG compression distortions
Show abstract
The quality of images may be severely degraded in various situations such as imaging during motion, sensing through a
diffusive medium, high compression rate and low signal to noise. Often in such cases, the ideal un-degraded image is not
available (no reference exists). This paper overviews past methods that dealt with no-reference (NR) image quality
assessment, and then proposes a new NR method for the identification of image distortions and quantification of their
impacts on image quality. The proposed method considers both noise and blur distortion types that individually or
simultaneously exist in the image. Distortion impacts on image quality are evaluated in the spatial frequency domain,
while noise power is further estimated in the spatial domain. Specific distortions addressed here include additive white
noise, Gaussian blur, de-focus blur, and JPEG compression. Estimation results are compared to the true distortion
quantities, over a set of 75 different images.
2D to 3D stereoscopic conversion: depth-map estimation in a 2D single-view image
Show abstract
With increasing demands of 3D contents, conversion of many existing two-dimensional contents to three-dimensional
contents has gained wide interest in 3D image processing. It is important to estimate the relative depth map in a single-view
image for the 2D-To-3D conversion technique. In this paper, we propose an automatic conversion method that
estimates the depth information of a single-view image based on degree of focus of segmented regions and then
generates a stereoscopic image. Firstly, we conduct image segmentation to partition an image into homogeneous regions.
Then, we construct a higher-order statistics (HOS) map, which represents the spatial distribution of high-frequency
components of the input image. the HOS is known to be well suited for solving detection and classification problems
because it can suppress Gaussian noise and preserve some of non-Gaussian information. We can estimate a relative depth
map with these two cues and then refine the depth map by post-processing. Finally, a stereoscopic image is generated by
calculating the parallax values of each region using the generated depth-map and the input image.
Contribution of image analysis to the definition of explosibility of fine particles resulting from waste recycling process
Show abstract
In waste recycling processes, the development of comminution technologies is one of the main actions to improve the
quality of recycled products. This involves a rise in fine particles production, which could have some effects on
explosibility properties of materials. This paper reports the results of experiments done to examine the explosibility of
the fine particles resulting from waste recycling process. Tests have been conducted for the products derived from
milling processes operated in different operative conditions. In particular, the comminution tests have been executed
varying the milling temperature by refrigerant agents. The materials utilized in explosibility tests were different
typologies of plastics coming from waste products (PET, ABS and PP), characterized by size lower than 1 mm. The
results of explosibility tests, carried out by mean of a Hartmann Apparatus, have been compared with the data derived
from image analysis procedure aimed to measure the morphological characteristics of particles. For each typology of
material, the propensity to explode appears to be correlated not only to particle size, but also to morphological properties,
linked to the operative condition of the milling process.
Watershed data aggregation for mean-shift video segmentation
Show abstract
Object segmentation is considered as an important step in video analysis and has a wide range of practical
applications. In this paper we propose a novel video segmentation method, based on a combination of watershed
segmentation and mean-shift clustering. The proposed method segments video by clustering spatio-temporal data
in a six-dimensional feature space, where the features are spatio-temporal coordinates and spectral attributes.
The main novelty is an efficient data aggregation method employing watershed segmentation and local feature
averaging. The experimental results show that the proposed algorithm significantly reduces the processing time
by mean-shift algorithm and results in superior video segmentation where video objects are well defined and tracked throughout the time.
Image blur analysis for the subpixel-level measurement of in-plane vibration parameters of MEMS resonators
Show abstract
The objective of this work is to develop a reliable image processing technique to measure the vibration parameters
on every part of MEMS resonators using microscopic images of the vibrating devices. Images of resonators
vibrating in high frequencies are characterized by the blurs whose point spread functions (PSFs) are expressed
in a parametric form with two parameters - vibration orientation and magnitude. We find it necessary to use the
reference image (image of the still object) when analyzing the blur image, to achieve a subpixel-level accuracy.
The orientation of the vibration is identified by applying the Radon transform on the difference between the reference image and the blur image. A blur image is usually modeled as a convolution of the PSF of the vibration with the reference image and added noise terms, assuming uniform vibration across the view. The vibration magnitude could then be recovered by using a minimum mean-squared error (MMSE) estimator to find the optimal PSF with the identified orientation. However, in real images only parts of the image belong to the vibrating object and the vibration may not be uniform over all parts of it. To overcome that problem, we use local optimization with a mean of weighted squared errors (MWSE) as the cost function instead of MSE. Indeed, it is capable of suppressing non-vibrating high-frequency components of the image. Sensitivity analysis and experiments on real images have been performed.
Validation of training set approaches to hyperparameter estimation for Bayesian tomography
Show abstract
Since algorithms based on Bayesian approaches contain hyperparameters associated with the mathematical
model for the prior probability, the performance of algorithms usually depends crucially on the values of these
parameters. In this work we consider an approach to hyperparameter estimation for Bayesian methods used in
the medical imaging application of emission computed tomography (ECT). We address spline models as Gibbs
smoothing priors for our own application to ECT reconstruction. The problem of hyperparameter (or smoothing
parameter in our case) estimation can be stated as follows: Given a likelihood and prior model, and given a
realization of noisy projection data from a patient, compute some optimal estimate of the smoothing parameter.
Among the variety of approaches used to attack this problem in ECT, we base our maximum-likelihood (ML)
estimates of smoothing parameters on observed training data, and argue the motivation for this approach. To
validate our ML approach, we first perform closed-loop numerical experiments using the images created by Gibbs
sampling from the given prior probability with the smoothing parameter known. We then evaluate performance
of our method using mathematical phantoms and show that the optimal estimates yield good reconstructions.
Our initial results indicate that the hyperparameters obtained from training data perform well with regard to
percentage error metric.
Local bivariate Cauchy distribution for video denoising in 3D complex wavelet domain
Show abstract
In this paper, we present a new video denoising algorithm using bivariate Cauchy probability density function (pdf) with
local scaling factor for distribution of wavelet coefficients in each subband. The bivariate pdf takes into account the
statistical dependency among wavelet coefficients and the local scaling factor model the empirically observed
correlation between the coefficient amplitudes. Using maximum a posteriori (MAP) estimator and minimum mean
squared estimator (MMSE), we describe two methods for video denoising which rely on the bivariate Cauchy random
variables with high local correlation. Because separate 3-D transforms, such as ordinary 3-D wavelet transforms (DWT),
have artifacts that degrade their performance for denoising, we implement our algorithms in 3-D complex wavelet
transform (DCWT) domain. In addition, we use our denoising algorithm in 2-D DCWT domain, where the 2-D
transform is applied to each frame individually. The simulation results show that our denoising algorithms achieve better
performance than several published methods both visually and in terms of peak signal-to-noise ratio (PSNR).
Local area signal-to-noise ratio (LASNR) algorithm for image segmentation
Show abstract
Many automated image-based applications have need of finding small spots in a variably noisy image. For
humans, it is relatively easy to distinguish objects from local surroundings no matter what else may be in
the image. We attempt to capture this distinguishing capability computationally by calculating a
measurement that estimates the strength of signal within an object versus the noise in its local
neighborhood. First, we hypothesize various sizes for the object and corresponding background areas.
Then, we compute the Local Area Signal to Noise Ratio (LASNR) at every pixel in the image, resulting in
a new image with LASNR values for each pixel. All pixels exceeding a pre-selected LASNR value
become seed pixels, or initiation points, and are grown to include the full area extent of the object. Since
growing the seed is a separate operation from finding the seed, each object can be any size and shape. Thus,
the overall process is a 2-stage segmentation method that first finds object seeds and then grows them to
find the full extent of the object.
This algorithm was designed, optimized and is in daily use for the accurate and rapid inspection of optics
from a large laser system (National Ignition Facility (NIF), Lawrence Livermore National Laboratory,
Livermore, CA), which includes images with background noise, ghost reflections, different illumination
and other sources of variation.
Recovery of data from damaged CD/DVD
Show abstract
This paper presents a novel system for automatic data recovery from damaged CD (or DVD). The system acquires a
sequence of optically magnified high resolution digital images of partially overlapping CD-surface regions. Next,
advanced image processing and pattern recognition techniques are used to extract the data encoded on the CD from
image frames. Finally, forensic data recovery techniques are applied to provide the maximal usable CD data.
Using the CD's error correction information, the entire data of a non-damaged CD can be extracted with 100% accuracy.
However, if an image frame covers a damaged area, then the data encoded in the frame and some of its eight neighbor
image frames may be compromised. Nevertheless, the effect of the frame overlapping, error correction code, forensic
data recovery, and data fusion techniques can maximize the amount of data extracted from compromised frames.
The paper analyzes low level image processing techniques, compromised frames scenarios, and data recovery results. An
analytical model backed by experimental set up shows that there is high probability of recovering data despite certain
damages. Current results should be of high interest to law enforcement and home-land-security agencies. They merit
further research and investigation to cover additional applications of image processing of CD frames, such as encryption,
parallel access, and zero-seek-time.