Show all abstracts
View Session
- Front Matter: Volume 7798
- Image Signal Processing I
- Image Signal Processing II
- Image Signal Processing III
- Digital Cultural Heritage
- Visual Search I
- Visual Search II
- Compression and Transforms for Images and Video I
- Compression and Transforms for Images and Video II
- Computational Imaging I: Joint Session with Conference 7800
- Computational Imaging II: Joint Session with Conference 7800
- Perceptual Coding of Still and Motion Images I
- Perceptual Coding of Still and Motion Images II
- Mobile Video: Processing, Communications, Display, and Applications I
- Mobile Video: Processing, Communications, Display, and Applications II
- Optics, Photonics and Digital Image Processing
- Poster Session
Front Matter: Volume 7798
Front Matter: Volume 7798
Show abstract
This pdf file contains the front matter associated with SPIE Proceedings Volume 7798, including Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Image Signal Processing I
Multi-scale edge detection with local noise estimate
Show abstract
The (unrealistic) assumption that noise can be modeled as independent, additive and uniform can lead to
problems when edge detection methods are applied to real or natural images. The main reason for this is
because filter scale and threshold for the gradient are difficult to determine at a regional or local scale when
the noise estimate is on a global scale. A filter with one global scale might under-smooth areas of high noise,
but over-smooth less noisy area. Similarly, a static, global threshold may not be appropriate for the entire
image because different regions have different degrees of detail. Thus, some methods use more than one filter
for detecting edges and discard the thresholding method in edge discrimination. Multi-scale description of the
image mimics the receptive fields of neurons in the early visual cortex of animals. At the small scale, details can
be reliably detected. At the larger scale, the contours or the frame get more attention. So, the image features
can be fully represented by combining a range of scales. The proposed multi-scale edge detection algorithm
utilizes this hierarchical organization to detect and localize edges. Furthermore, instead of using one default
global threshold, local dynamic threshold is introduced to discriminate edge or non-edge. Based on a critical
value function, the local dynamic threshold for each scale is determined using a novel local noise estimation
(LNE) method. Additionally, the proposed algorithm performs connectivity analysis on edge map to ensure that
small, disconnected edges are removed. Experiments where this method is applied to a sequence of images of
the same scene but with different signal-noise-ratio (SNR), show the method to be robust to noise.
Correlation filter design using a single cluttered training image for detecting a noisy target in a nonoverlapping scene
Show abstract
Classical correlation filters for object detection and location estimation are designed under the assumption that
the shape and intensity values of the object of interest are explicitly known. In this work we assume that
the target is given at unknown coordinates in a reference image with a cluttered background corrupted by
additive noise. We consider the nonoverlapping signal model for both the reference image and the input scene.
Optimal correlation filters, with respect to signal-to-noise ratio and peak-to-output energy, for object detection
and location estimation are derived. Estimation techniques are proposed for the parameters required for filter
design. Computer simulation results obtained with the proposed filters are presented and compared with those
of common correlation filters.
Denoising point clouds using pulling-back method
Show abstract
We propose a method for denoising a point cloud by pulling every noise point to its supposed position. In R2,
suppose that the point cloud be all located on a presumed curve. However, some points are not on the curve
due to noise. For every point, in a small neighborhood the presumed curve is approximated by an osculating
circle. The point is then pulled to the circle, i.e., its new position is the projected point on the circle. In R3,
the 2-D osculating circle is replaced by Dupin indicatrix. This Dupin indicatrix is attached to the noisy point,
thus it also moves with the moving noisy point. The noisy point will move along its normal direction. Then,
along the normal direction, the length of every k-nearest point to the projected point on the Dupin indicatrix is
computed. The noisy point's new position is the place where the sum of squared length of all k-nearest points
reaches minimum. The point cloud data are used to examine the result and found satisfactory.
Image Signal Processing II
Image restoration based on multiple PSF information with applications to phase-coded imaging system
Show abstract
Conventional image restoration technique generally uses one point-spread function (PSF) corresponding to an object
distance (OD) and a viewing angle (VA) in filter design. However, for those imaging systems, which concern a better
balance or a new tradeoff of image restoration within a range of ODs or VAs, the conventional design might be
insufficient to give satisfactory results. In this paper, an extension of the minimum mean square error (MMSE) method is
proposed. The proposed method defines a cost function as a linear combination of multiple mean square errors (MSEs).
Each MSE is for measuring the restoration performance at a specific OD and VA and can be computed from the restored
image and its correspondent target image. Since the MSEs for different ODs are lumped into one cost function, the filter
solved can provide a better balance in restoration compared with the conventional design. The method is applied to an
extended depth-of-field (EDoF) imaging system and computer simulations are performed to verify its effectiveness.
Motion-compensated compressed sensing for dynamic imaging
Show abstract
The recently introduced Compressed Sensing (CS) theory explains how sparse or compressible signals can be
reconstructed from far fewer samples than what was previously believed possible. The CS theory has attracted
significant attention for applications such as Magnetic Resonance Imaging (MRI) where long acquisition times have
been problematic. This is especially true for dynamic MRI applications where high spatio-temporal resolution is needed.
For example, in cardiac cine MRI, it is desirable to acquire the whole cardiac volume within a single breath-hold in order
to avoid artifacts due to respiratory motion. Conventional MRI techniques do not allow reconstruction of high resolution
image sequences from such limited amount of data.
Vaswani et al. recently proposed an extension of the CS framework to problems with partially known support (i.e.
sparsity pattern). In their work, the problem of recursive reconstruction of time sequences of sparse signals was
considered. Under the assumption that the support of the signal changes slowly over time, they proposed using the
support of the previous frame as the "known" part of the support for the current frame. While this approach works well
for image sequences with little or no motion, motion causes significant change in support between adjacent frames. In
this paper, we illustrate how motion estimation and compensation techniques can be used to reconstruct more accurate
estimates of support for image sequences with substantial motion (such as cardiac MRI). Experimental results using
phantoms as well as real MRI data sets illustrate the improved performance of the proposed technique.
Using enhancement data to deinterlace 1080i HDTV
Show abstract
When interlaced scan (IS) is used for television transmission, the received video must be deinterlaced to be displayed on
progressive scan (PS) displays. To achieve good performance, the deinterlacing operation is typically computationally
expensive. We propose a receiver compatible approach which performs a deinterlacing operation inexpensively, with
good performance. At the transmitter, the system analyzes the video and transmits an additional low bit-rate stream.
Existing receivers ignore this information. New receivers utilize this stream and perform a deinterlacing operation
inexpensively with good performance. Results indicate that this approach can improve the digital television standard in a
receiver compatible manner.
Image Signal Processing III
Multispectral MRI-based virtual cystoscopy
Show abstract
Bladder cancer is the fifth cause of cancer deaths in the United States. Virtual cystoscopy (VC) can be a screening means
for early detection of the cancer using non-invasive imaging and computer graphics technologies. Previous researches
have mainly focused on spiral CT (computed tomography), which invasively introduces air into bladder lumen for a
contrast against bladder wall via a small catheter. However, the tissue contrast around bladder wall is still limited in CT-based
VC. In addition, CT-based technique carries additional radiation. We have investigated a procedure to achieve the
screening task by MRI (magnetic resonance imaging). It utilizes the unique features of MRI: (1) the urine has distinct T1
and T2 relaxation times as compared to its surrounding tissues, and (2) MRI has the potential to obtain good tissue
contrast around bladder wall. The procedure is fully non-invasive and easy in implementation. In this paper, we proposed
a MRI-based VC system for computer aided detection (CAD) of bladder tumors. The proposed VC system is an
integration of partial volume-based segmentation containing texture information and fast marching-based CAD
employing geometrical features for detecting of bladder tumors. The accuracy and efficiency of the integrated VC
system are evaluated by testing the diagnoses against a database of patients.
Computational architecture for image processing on a small unmanned ground vehicle
Show abstract
Man-portable Unmanned Ground Vehicles (UGVs) have been fielded on the battlefield with limited computing power.
This limitation constrains their use primarily to teleoperation control mode for clearing areas and bomb defusing. In
order to extend their capability to include the reconnaissance and surveillance missions of dismounted soldiers, a
separate processing payload is desired. This paper presents a processing architecture and the design details on the
payload module that enables the PackBot to perform sophisticated, real-time image processing algorithms using data
collected from its onboard imaging sensors including LADAR, IMU, visible, IR, stereo, and the Ladybug spherical
cameras. The entire payload is constructed from currently available Commercial off-the-shelf (COTS) components
including an Intel multi-core CPU and a Nvidia GPU. The result of this work enables a small UGV to perform
computationally expensive image processing tasks that once were only feasible on a large workstation.
Automatic activity estimation based on object behaviour signature
Show abstract
Automatic estimation of human activities is a topic widely studied. However the process becomes difficult when we
want to estimate activities from a video stream, because human activities are dynamic and complex. Furthermore, we
have to take into account the amount of information that images provide, since it makes the modelling and estimation
activities a hard work. In this paper we propose a method for activity estimation based on object behavior. Objects are
located in a delimited observation area and their handling is recorded with a video camera. Activity estimation can be
done automatically by analyzing the video sequences. The proposed method is called "signature recognition" because it
considers a space-time signature of the behaviour of objects that are used in particular activities (e.g. patients' care in a
healthcare environment for elder people with restricted mobility). A pulse is produced when an object appears in or
disappears of the observation area. This means there is a change from zero to one or vice versa. These changes are
produced by the identification of the objects with a bank of nonlinear correlation filters. Each object is processed
independently and produces its own pulses; hence we are able to recognize several objects with different patterns at the
same time. The method is applied to estimate three healthcare-related activities of elder people with restricted mobility.
Digital Cultural Heritage
Signal processing and analyzing works of art
Show abstract
In examining paintings, art historians use a wide variety of physico-chemical methods to determine, for example, the paints,
the ground (canvas primer) and any underdrawing the artist used. However, the art world has been little touched by signal
processing algorithms. Our work develops algorithms to examine x-ray images of paintings, not to analyze the artist's
brushstrokes but to characterize the weave of the canvas that supports the painting. The physics of radiography indicates
that linear processing of the x-rays is most appropriate. Our spectral analysis algorithms have an accuracy superior to
human spot-measurements and have the advantage that, through "short-space" Fourier analysis, they can be readily applied
to entire x-rays. We have found that variations in the manufacturing process create a unique pattern of horizontal and
vertical thread density variations in the bolts of canvas produced. In addition, we measure the thread angles, providing
a way to determine the presence of cusping and to infer the location of the tacks used to stretch the canvas on a frame
during the priming process. We have developed weave matching software that employs a new correlation measure to find
paintings that share canvas weave characteristics. Using a corpus of over 290 paintings attributed to Vincent van Gogh, we
have found several weave match cliques that we believe will refine the art historical record and provide more insight into
the artist's creative processes.
Texton-based analysis of paintings
Laurens J. P. van der Maaten,
Eric O. Postma
Show abstract
The visual examination of paintings is traditionally performed by skilled art historians using their eyes. Recent advances in intelligent systems may support art historians in determining the authenticity or date of creation of paintings. In this paper, we propose a technique for the examination of brushstroke structure that views the wildly overlapping brushstrokes as texture. The analysis of the painting texture is performed with the help of a texton codebook, i.e., a codebook of small prototypical textural patches. The texton codebook can be learned from a collection of paintings. Our textural analysis technique represents paintings in terms of histograms that measure the frequency by which the textons in the codebook occur in the painting (so-called texton histograms). We present experiments that show the validity and effectiveness of our technique for textural analysis on a collection of digitized high-resolution reproductions of paintings by Van Gogh and his contemporaries.
As texton histograms cannot be easily be interpreted by art experts, the paper proposes to approaches to visualize the results on the textural analysis. The first approach visualizes the similarities between the histogram representations of paintings by employing a recently proposed dimensionality reduction technique, called t-SNE. We show that t-SNE reveals a clear separation of paintings created by Van Gogh and those created by other painters. In addition, the period of creation is faithfully reflected in the t-SNE visualizations. The second approach visualizes the similarities and differences between paintings by highlighting regions in a painting in which the textural structure of the painting is unusual. We illustrate the validity of this approach by means of an experiment in which we highlight regions in a painting by Monet that are not very "Van Gogh-like". Taken together, we believe the tools developed in this study are well capable of assisting for art historians in support of their study of paintings.
Multispectral imaging for digital painting analysis: a Gauguin case study
Show abstract
This paper is an introduction into the analysis of multispectral recordings of paintings. First, we will give an
overview of the advantages of multispectral image analysis over more traditional techniques: first of all, the bands
residing in the visible domain provide an accurate measurement of the color information which can be used for
analysis but also for conservational and archival purposes (i.e. preserving the art patrimonial by making a digital
library). Secondly, inspection of the multispectral imagery by art experts and art conservators has shown that
combining the information present in the spectral bands residing in- and outside the visible domain can lead to
a richer analysis of paintings. In the remainder of the paper, practical applications of multispectral analysis are
demonstrated, where we consider the acquisition of thirteen different, high resolution spectral bands. Nine of
these reside in the visible domain, one in the near ultraviolet and three in the infrared. The paper will illustrate
the promising future of multispectral analysis as a non-invasive tool for acquiring data which cannot be acquired
by visual inspection alone and which is highly relevant to art preservation, authentication and restoration. The
demonstrated applications include detection of restored areas and detection of aging cracks.
Attenuating hue identification and color estimation for underpainting reconstruction from x-ray synchrotron imaging data
Anila Anitha,
Shannon M. Hughes
Show abstract
This paper discusses two new developments in methods for virtually reconstructing paintings that have been
painted over from Xray synchrotron imaging data of their canvases. First, Xray synchrotron data often contains
areas of information loss, in which signal from underlayers was unable to penetrate particularly thick or Xray-
absorbent surface features. We present a new method for automatically identifying these areas so that they may
be inpainted. Second, we present preliminary results in which we reconstruct the colors of the underpainting
directly from the Xray synchrotron imaging data. This is to our knowledge the rst attempt at accurate color
reconstruction from this type of data.
Visual Search I
Keypoint clustering for robust image matching
Show abstract
A number of popular image matching algorithms such as Scale Invariant Feature Transform (SIFT)1 are based on local
image features. They first detect interest points (or keypoints) across an image and then compute descriptors based on
patches around them. In this paper, we observe that in textured or feature-rich images, keypoints typically appear in
clusters following patterns in the underlying structure. We show that such clustering phenomenon can be used to:
1) enhance recall and precision performance of the descriptor matching process, and 2) improve convergence rate of the
RANSAC algorithm used in the geometric verification stage.
Fast quantization and matching of histogram-based image features
Show abstract
We review construction of a Compressed Histogram of Gradients (CHoG) image feature descriptor, and study
quantization problem that arises in its design. We explain our choice of algorithms for solving it, addressing
both complexity and performance aspects. We also study design of algorithms for decoding and matching of
compressed descriptors, and offer several techniques for speeding up these operations.
Permutable descriptors for orientation-invariant image matching
Show abstract
Orientation-invariant feature descriptors are widely used for image matching. We propose a new method of
computing and comparing Histogram of Gradients (HoG) descriptors which allows for re-orientation through
permutation. We do so by moving the orientation processing to the distance comparison, rather than the
descriptor computation. This improves upon prior work by increasing spatial distinctiveness. Our method
method allows for very fast descriptor computation, which is advantageous since many mobile applications of
HoG descriptors require fast descriptor computation on hand-held devices.
Object tracking in real environments
Show abstract
Modern tracking methods typically rely on features to track objects. These methods function best with objects
containing distinguishable features. Previously we proposed a graph cuts approach that utilizes intensity changes
and the likelihood that the RGB intensities associated with a pixel belong to the object. We propose a new
method that models the RGB tuple as a single random variable. This allows for more robust segmentation, but
requires more data to construct the color model.The results show the ability of the method to tracking in a varity
environments and with a large variety of objects.
Visual Search II
Three-dimensional target modeling with synthetic aperture radar
Show abstract
Conventional Synthetic Aperture Radar (SAR) offers high-resolution imaging of a target region in the range and
cross-range dimensions along the ground plane. Little or no data is available in the range-altitude dimension,
however, and target functions and models are limited to two-dimensional images. This paper first investigates some
existing methods for the computation of target reflectivity data in the deficient elevation domain, and a new method is
then proposed for three-dimensional (3-D) SAR target feature extraction.
Simulations are implemented to test the decoupled least-squares technique for high-resolution spectral estimation of
target reflectivity, and the accuracy of the technique is assessed. The technique is shown to be sufficiently accurate at
resolving targets in the third axis, but is limited in practicality due to restrictive requirements on the input data.
An attempt is then made to overcome some of the practical limitations inherent in the current 3-D SAR methods by
proposing a new technique based on the direct extraction of 3-D target features from arbitrary SAR image inputs. The
radar shadow present in SAR images of MSTAR vehicle targets is extracted and used in conjunction with the radar
beam depression angle to compute physical target heights along the range axis. Multiple inputs of elevation data are
then merged to forge rough 3-D target models.
A Bayesian network-based approach for identifying regions of interest utilizing global image features
Show abstract
An image-understanding algorithm for identifying Regions-of-Interest (ROI) in digital images is proposed. Global and
regional features that characterize relations between image segments are fused in a probabilistic framework to generate
ROI for an arbitrary image. Features are introduced as maps for spatial position, weighted similarity, and weighted
homogeneity for image regions. The proposed methodology includes modules for image segmentation, feature
extraction, and probabilistic reasoning. It differs from prior art by using machine learning techniques to discover the
optimum Bayesian Network structure and probabilistic inference. It also eliminates the necessity for semantic
understanding at intermediate stages. Experimental results show a competitive performance in comparison with the state-of-
the-art techniques with an accuracy rate of ~80% on a set of ~20,000 publicly available color images. Applications of
the proposed algorithm include content-based image retrieval, image indexing, automatic image annotation, mobile
phone imagery, and digital photo cropping.
Low-cost asset tracking using location-aware camera phones
Show abstract
Maintaining an accurate and up-to-date inventory of one's assets is a labor-intensive, tedious, and costly operation.
To ease this difficult but important task, we design and implement a mobile asset tracking system for
automatically generating an inventory by snapping photos of the assets with a smartphone. Since smartphones
are becoming ubiquitous, construction and deployment of our inventory management solution is simple and costeffective.
Automatic asset recognition is achieved by first segmenting individual assets out of the query photo
and then performing bag-of-visual-features (BoVF) image matching on the segmented regions. The smartphone's
sensor readings, such as digital compass and accelerometer measurements, can be used to determine the location
of each asset, and this location information is stored in the inventory for each recognized asset.
As a special case study, we demonstrate a mobile book tracking system, where users snap photos of books
stacked on bookshelves to generate a location-aware book inventory. It is shown that segmenting the book spines
is very important for accurate feature-based image matching into a database of book spines. Segmentation
also provides the exact orientation of each book spine, so more discriminative upright local features can be
employed for improved recognition. This system's mobile client has been implemented for smartphones running
the Symbian or Android operating systems. The client enables a user to snap a picture of a bookshelf and to
subsequently view the recognized spines in the smartphone's viewfinder. Two different pose estimates, one from
BoVF geometric matching and the other from segmentation boundaries, are both utilized to accurately draw the
boundary of each spine in the viewfinder for easy visualization. The BoVF representation also allows matching
each photo of a bookshelf rack against a photo of the entire bookshelf, and the resulting feature matches are
used in conjunction with the smartphone's orientation sensors to determine the exact location of each book.
Propagation of geotags based on object duplicate detection
Peter Vajda,
Ivan Ivanov,
Jong-Seok Lee,
et al.
Show abstract
In this paper, we consider the use of object duplicate detection for the propagation of geotags from a small set of
images with location names (IPTC) to a large set of non-tagged images. The motivation behind this idea is that
images of individual locations usually contain specific objects such as monuments, buildings or signs. Therefore,
object duplicate detection can be used to establish the correspondence between tagged and non-tagged images.
Our recent graph based object duplicate detection approach is adapted for this task. The effectiveness of the
approach is demonstrated through a set of experiments considering various locations.
Compression and Transforms for Images and Video I
Design of high-performance fixed-point transforms using the common factor method
Show abstract
Fixed-point implementations of transforms such as the Discrete Cosine Transform (DCT) remain as fundamental
building blocks of state-of-the-art video coding technologies. Recently, the 16x16 DCT has received focus as a
transform suitable for the high efficiency video coding project currently underway in the Joint Collaboration Team - Video Coding. By its definition, the 16x16 DCT is inherently more complex than transforms of traditional sizes such as
4x4 or 8x8 DCTs. However, scaled architectures such as the one employed in the design of the 8x8 DCTs specified in
ISO/IEC 23002-2 can also be utilized to mitigate the complexity of fixed-point approximations of higher-order
transforms such as the 16x16 DCT. This paper demonstrates the application of the Common Factor method to design
two scaled implementations of the 16x16 DCT. One implementation can be characterized by its exceptionally low
complexity, while the other can be characterized by its relatively high precision. We review the Common Factor method
as a method to arrive at fixed-point implementations that are optimized in terms of complexity and precision for such
high performance transforms.
Recent developments in standardization of high efficiency video coding (HEVC)
Show abstract
This paper reports on recent developments in video coding standardization, particularly focusing on the Call for
Proposals (CfP) on video coding technology made jointly in January 2010 by ITU-T VCEG and ISO/IEC MPEG and the
April 2010 responses to that Call. The new standardization initiative is referred to as High Efficiency Video Coding
(HEVC) and its development has been undertaken by a new Joint Collaborative Team on Video Coding (JCT-VC)
formed by the two organizations. The HEVC standard is intended to provide significantly better compression capability
than the existing AVC (ITU-T H.264 | ISO/IEC MPEG-4 Part 10) standard. The results of the CfP are summarized, and
the first steps towards the definition of the HEVC standard are described.
Efficient large size transforms for high-performance video coding
Show abstract
This paper describes design of transforms for extended block sizes for video coding. The proposed transforms are
orthogonal integer transforms, based on a simple recursive factorization structure, and allow very compact and efficient
implementations. We discuss techniques used for finding integer and scale factors in these transforms, and describe our
final design. We evaluate efficiency of our proposed transforms in VCEG's H.265/JMKTA framework, and show that
they achieve nearly identical performance compared to much more complex transforms in the current test model.
Compression and Transforms for Images and Video II
Low-complexity lossless codes for image and video coding
Show abstract
We describe design of lossless block codes for geometric, Laplacian, and similar distributions frequently arising
in image and video coding. Proposed codes can be understood as a generalization of Golomb codes, allowing
more precise adaptation to values of parameters of distributions, and resulting in lower redundancy. Design of
universal block codes for a class of geometric distributions is also studied.
Embedded memory compression for video and graphics applications
Show abstract
We describe design of a low-complexity lossless and near-lossless image compression system with random access,
suitable for embedded memory compression applications. This system employs a block-based DPCM coder using
variable-length encoding for the residual. As part of this design, we propose to use non-prefix (one-to-one) codes for
coding of residuals, and show that they offer improvements in compression performance compared to conventional
techniques, such as Golomb-Rice and Huffman codes.
Self-derivation of motion estimation techniques to improve video coding efficiency
Show abstract
This paper presents the techniques to self derive the motion vectors (MVs) at video decoder side to improve coding
efficiency of B pictures. With the MVs information self derived at video decoder side, the transmission of these self-derived
MVs from video encoder side to video decoder side is skipped and thus better coding efficiency can be achieved.
Our proposed techniques derive the block-based MVs at video decoder side by considering the temporal correlation
among the available pixels in the previously-decoded reference pictures. Utilizing the MVs derived at video decoder side
can be added as one of coding mode candidates from video encoder where the video encoder can utilize this new coding
mode during phase of the coding mode selection to better trade off the rate-distortion performance to improve the coding
efficiency. Experiments have demonstrated that the BD bitrate improvement on top of ITU-T/VCEG Key Technology
Area (KTA) Reference Software platform with an overall about 7% improvement on the hierarchical IbBbBbBbP coding
structure under the common test conditions of the joint call for proposal for the new video coding technology from
ISO/MPEG and ITU-T committee on January 2010.
Variable length coding for binary sources and applications in video compression
Show abstract
This article introduces a lossless encoding scheme for interleaved input from a fixed number of binary sources, each
one characterized by a known probability value. The algorithm achieves compression performance close to the
entropy, providing very fast encoding and decoding speed. The algorithm can efficiently benefit from independent
parallel decoding units, and it is demonstrated to have significant advantages in hardware implementations over
previous technologies.
Subjective evaluation of next-generation video compression algorithms: a case study
Show abstract
This paper describes the details and the results of the subjective quality evaluation performed at EPFL, as a
contribution to the effort of the Joint Collaborative Team on Video Coding (JCT-VC) for the definition of the
next-generation video coding standard. The performance of 27 coding technologies have been evaluated with
respect to two H.264/MPEG-4 AVC anchors, considering high definition (HD) test material. The test campaign
involved a total of 494 naive observers and took place over a period of four weeks. While similar tests have
been conducted as part of the standardization process of previous video coding technologies, the test campaign
described in this paper is by far the most extensive in the history of video coding standardization. The obtained
subjective quality scores show high consistency and support an accurate comparison of the performance of the
different coding solutions.
Computational Imaging I: Joint Session with Conference 7800
High dynamic range video with ghost removal
Stephen Mangiat,
Jerry Gibson
Show abstract
We propose a new method for ghost-free high dynamic range (HDR) video taken with a camera that captures
alternating short and long exposures. These exposures may be combined using traditional HDR techniques,
however motion in a dynamic scene will lead to ghosting artifacts. Due to occlusions and fast moving objects, a
gradient-based optical flow motion compensation method will fail to eliminate all ghosting. As such, we perform
simpler block-based motion estimation and refine the motion vectors in saturated regions using color similarity in
the adjacent frames. The block-based search allows motion to be calculated directly between adjacent frames over
a larger search range, yet at the cost of decreased motion fidelity. To address this, we investigate a new method
to fix registration errors and block artifacts using a cross-bilateral filter to preserve the edges and structure of
the original frame while retaining the HDR color information. Results show promising dynamic range expansion
for videos with fast local motion.
ECME hard thresholding methods for image reconstruction from compressive samples
Show abstract
We propose two hard thresholding schemes for image reconstruction from compressive samples. The measurements
follow an underdetermined linear model, where the regression-coefficient vector is a sum of an unknown
deterministic sparse signal component and a zero-mean white Gaussian component with an unknown variance.
We derived an expectation-conditional maximization either (ECME) iteration that converges to a local maximum
of the likelihood function of the unknown parameters for a given image sparsity level. Here, we present
and analyze a double overrelaxation (DORE) algorithm that applies two successive overrelaxation steps after
one ECME iteration step, with the goal to accelerate the ECME iteration. To analyze the reconstruction accuracy,
we introduce minimum sparse subspace quotient (minimum SSQ), a more flexible measure of the sampling
operator than the well-established restricted isometry property (RIP). We prove that, if the minimum SSQ is
sufficiently large, the DORE algorithm achieves perfect or near-optimal recovery of the true image, provided
that its transform coefficients are sparse or nearly sparse, respectively. We then describe a multiple-initialization
DORE algorithm (DOREMI) that can significantly improve DORE's reconstruction performance. We present
numerical examples where we compare our methods with existing compressive sampling image reconstruction
approaches.
A survey of image retargeting techniques
Daniel Vaquero,
Matthew Turk,
Kari Pulli,
et al.
Show abstract
Advances in imaging technology have made the capture and display of digital images ubiquitous. A variety
of displays are used to view them, ranging from high-resolution computer monitors to low-resolution mobile
devices, and images often have to undergo changes in size and aspect ratio to adapt to different screens. Also,
displaying and printing documents with embedded images frequently entail resizing of the images to comply with
the overall layout. Straightforward image resizing operators, such as scaling, often do not produce satisfactory
results, since they are oblivious to image content. In this work, we review and categorize algorithms for contentaware
image retargeting, i.e., resizing an image while taking its content into consideration to preserve important
regions and minimize distortions. This is a challenging problem, as it requires preserving the relevant information
while maintaining an aesthetically pleasing image for the user. The techniques typically start by computing an
importance map which represents the relevance of every pixel, and then apply an operator that resizes the image
while taking into account the importance map and additional constraints. We intend this review to be useful to
researchers and practitioners interested in image retargeting.
Objective and subjective measurement and modeling of image quality: a case study
Show abstract
The image structure quality resulting from several CMOS pixel structures (conventional, backside-illuminated, and
diagonally oriented) has been compared using three complimentary techniques: (1) objective measurements of noise
equivalent quanta (NEQ) as a function of spatial frequency; (2) perceptual modeling of the multivariate quality loss from
blur and noise in units of just noticeable differences (JNDs); and (3) subjective measurement with the softcopy quality
ruler, also producing results in JNDs. The results of the perceptual modeling and subjective measurement were in good
quantitative agreement. NEQ is not perceptually uniform and so could only be correlated qualitatively with the other
methods, but it was helpful in understanding how performance might vary by application, given the spatial frequencies at
which the curves crossed. The strengths and weaknesses of each approach are compared; all three have potential utility
in evaluating computational imaging systems.
Computationally efficient deblurring of shift-variant highly defocused images
Show abstract
A localized and efficient iterative approach is presented for deblurring images that are highly defocused with
arbitrary shift-variant point spread functions (PSF). This approach extends a recently proposed local technique,
the RT technique, which works only for medium levels of blur to work for significantly higher levels of blur.
The RT technique is used to localize the blurring kernel at each pixel, and a region around the pixel with size
comparable to the size of the blurring kernel is divided into several smaller regions (intervals). The blurred image
in each interval is modeled separately by truncated Taylor-series polynomials. This step improves the accuracy
of the image model for low order truncated Taylor-series expansions. The blurred image value at each pixel is
expressed as the sum of multiple partial blur integrals with each integral term corresponding to one interval.
Then an expression is derived for the focused image value at a pixel in terms of the derivatives of the blurred
image in the central image region and solutions in the surrounding regions. This expression is solved iteratively
at each pixel in parallel to obtain a focused image. The starting solution is assumed to be either zero or the
blurred image itself. It is found that this new technique can effectively invert large blurs for which the original
RT method failed. In our experiments the truncated Taylor series expansion was limited to third order. Theory
and algorithms as well as experimental results on both simulation and real data are presented.
Computational Imaging II: Joint Session with Conference 7800
The restoration of large blur image based on POCS algorithm
Show abstract
When we obtain some static images by a very high speed camera, the images exist large blur, and the PSF may exceed
the half size of image resolution which we call ultra-half-length-blur. All the experiments are based on images which are
256 pixels by 256 pixels. Firstly, we consider horizontal blur of images between 50 and 60 pixels, then, we increase the
length of horizontal blur to 100 pixels which is close to the half length of horizontal resolution of image, the blur images
exist pixel alias in a certain extent, and the POCS algorithm can restore these blur images as well. For the condition of
ultra-half-length-blur which exists more than 150 pixels of horizontal blur, we can not distinguish details in all the blur
images, and can not restore the high-resolution image by traditional POCS algorithm. Therefore, we establish a PSF
model of ultra-half-length-blur, we use different interpolations for the SR estimation at first, then we apply two POCS
algorithms of ultra-half-length-blur PSF model, one is pixel-by-pixel, the other is every-other-pixel. Finally, we identify
the ultimate performance of POCS algorithm by these large blur experiments.
Generating highly realistic 3D animation video with depth-of-field and motion blur effects
Show abstract
A computationally efficient algorithm is described for generating shift-variant defocus and motion blur effects
for animation video. This algorithm precisely models rigid body motion of 3-D objects which includes arbitrary
translational and rotational motion. Camera parameters such as aperture diameter, focal length, and the location
of image detector are used to calculate the blur circle radius of point spread functions (PSFs) modeled by Gaussian
and Cylindrical functions. In addition, a novel and simple method similar to image inpainting is described for
filling missing pixels that arise due to object motion, round-off errors, interpolation or changes in magnification.
Performance of the algorithms are demonstrated on a set of 3D shapes such as sphere, cylinder, cone, etc. The
software tool developed in this research is also useful in computer vision and image processing research. It can
be used for simulating test data with known ground truth in the testing and evaluation of depth-from-defocus
and image/video de-blurring algorithms.
Perceptual Coding of Still and Motion Images I
Evaluation of MPEG4-SVC for QoE protection in the context of transmission errors
Show abstract
Scalable Video Coding (SVC) provides a way to encapsulate several video layers with increasing quality and resolution in a
single bitstream. Thus it is particularly adapted to address heterogeneous networks and a wide variety of decoding devices.
In this paper, we evaluate the interest of SVC in a different context, which is error concealment after transmission on
networks subject to packet loss. The encoded scalable video streams contain two layers with different spatial and temporal
resolutions designed for mobile video communications with medium size and average to low bitrates. The main idea is
to use the base layer to conceal errors in the higher layers if they are corrupted or lost. The base layer is first upscaled
either spatially or temporally to reach the same resolution as the layer to conceal. Two error-concealment techniques
using the base layer are then proposed for the MPEG-4 SVC standard, involving frame-level concealment and pixel-level
concealment. These techniques are compared to the upscaled base layer as well as to a classical single-layer MPEG-
4 AVC/H.264 error-concealment technique. The comparison is carried out through a subjective experiment, in order to
evaluate the Quality-of-Experience of the proposed techniques. We study several scenarios involving various bitrates
and resolutions for the base layer of the SVC streams. The results show that SVC-based error concealment can provide
significantly higher visual quality than single-layer-based techniques. Moreover, we demonstrate that the resolution and
bitrate of the base layer have a strong impact on the perceived quality of the concealment.
Rate allocation as quality index performance test
Thomas Richter
Show abstract
In a recent work,16 the author proposed to study the performance of still image quality indices such as the SSIM by using
it as objective function of a rate allocation algorithm. The outcome of that work was not only a multi-scale SSIM optimal
JPEG 2000 implementation, but also a first-order approximation of the MS-SSIM that is surprisingly similar to more
traditional contrast-sensitivity and visual masking based approaches. It will be seen in this work that the only difference
between the latter works and the MS-SSIM index is the choice of the exponent of the masking term, and furthermore, that
a slight modification of the SSIM definition that reproduces more traditional exponents is able to improve the correlation
with subjective tests and also improves the performance of the SSIM optimized JPEG 2000 code. That is, understanding
the duality of quality indices and rate allocation helps to improve both the visual performance and the performance of the
index.
A compressive sensing approach to perceptual image coding
Show abstract
There exist limitations in the human visual system (HVS) which allow images and video to be reconstructed using fewer
bits for the same perceived image quality. In this paper we will review the basis of spatial masking at edges and show a
new method for generating a just-noticeable distortion (JND) threshold. This JND threshold is then used in a spatial
noise shaping algorithm using a compressive sensing technique to provide a perceptual coding approach for JPEG2000
coding of images. Results of subjective tests show that the new spatial noise shaping framework can provide significant
savings in bit-rate compared to the standard approach. The algorithm also allows much more precise control of distortion
than existing spatial domain techniques and is fully compliant with part 1 of the JPEG2000 standard.
Perceptual Coding of Still and Motion Images II
Perceptually optimized quantization tables for H.264/AVC
Show abstract
The H.264/AVC video coding standard currently represents the state-of-the-art in video compression technology. The
initial version of the standard only supported a single quantization step size for all the coefficients in a transformed
block. Later, support for custom quantization tables was added, which allows to independently specify the quantization
step size for each coefficient in a transformed block. In this way, different quantization can be applied to the highfrequency
and low-frequency coefficients, reflecting the human visual system's different sensitivity to high-frequency
and low-frequency spatial variations in the signal.
In this paper, we design custom quantization tables taking into account the properties of the human visual system as well
as the viewing conditions. Our proposed design is based on a model for the human visual system's contrast sensitivity
function, which specifies the contrast sensitivity in function of the spatial frequency of the signal. By calculating the
spatial frequencies corresponding to each of the transform's basis functions, taking into account viewing distance and dot
pitch of the screen, the sensitivity of the human visual system to variations in the transform coefficient corresponding to
each basis function can be determined and used to define the corresponding quantization step size. Experimental results,
whereby the video quality is measured using VQM, show that the designed quantization tables yield improved
performance compared to uniform quantization and to the default quantization tables provided as a part of the reference
encoder.
Open source database of images DEIMOS: high dynamic range images
Show abstract
Efficient development of image processing techniques requires a database of suitable test images for verification of the
performance, optimization and other related purposes. In this paper, the DEIMOS, an open-source database, is described
including its structure and interface. There is a selected application example on high dynamic range content to illustrate
the database features. This HDR image database contains a variety of natural scenes captured with a digital single-lens
reflex camera (DSLR) under different conditions. The important capture parameters as well as the important
characteristics of the camera are part of the database to ensure that the creation of each image is well documented. The
DEIMOS database is created gradually step-by-step based upon the contributions of team members.
Research of color distribution index in CIE L*a*b* color space
Show abstract
The index for evaluating the ability of color reproduction is required. The color distribution index (CDI) was proposed to
comment the display ability of color distribution of reproduction in CIE Lu'v' color space. A cell of Just Noticeable
Difference (JND) for luminance and chromaticity (u'v') was proposed to qualify whether the reproduced colors are in
some region of color volume of display. Human eye can perceive fewer colors at low luminance, however, the scalar of
chromaticity (u'v') JND at low luminance was the same with the one at other luminance. CDI will be distorted at low
luminance. In this paper, regarding perceptible vision at low luminance, we try to use chromaticity (a*b*) JND to replace
chromaticity (u'v') JND. The color distribution will be discussed in CIE La*b* color space. We find that CDI at low
luminance in CIE L*a*b* color space is higher than in CIE Lu'v' color space, as well as different gamma curves and
different bit depths affect CDI. The displays are going to keep approaching 100% true color reproduction; hence the
index for evaluating the ability of color reproduction is required.
Mobile Video: Processing, Communications, Display, and Applications I
Video quality management for mobile video application
Show abstract
This paper first briefly reviewed sources of visual quality degradation during video compression and also different
video quality assessment techniques. It further extended discussion beyond video compression to other modules
in different video application pipeline. Different video application is composed by different processing modules,
such as sensor, video encoder, and display. Visual experience is not always determined by single module. Hence,
the way of quantifying visual experience on different video applications should vary accordingly. Furthermore,
users have very different expectation on visual experience in each application. Different quality assessment
approach should be adopted based on various users' expectation.
Low-complexity H.264/AVC motion compensated prediction for mobile video applications
Show abstract
The performance of the motion-compensated prediction (MCP) in video coding is degraded by aliasing due to
spatial sampling. To alleviate this problem in H.264/AVC, a low-pass filter is used by the fractional-pel motion
estimation (FME) to suppress the aliasing component. However, the FME demands higher computational
complexity on H.264/AVC encoding. In this work, we first perform a joint quantization and aliasing analysis
on the H.264/AVC MCP process and show that the impact of the aliasing component can be alleviated by the
quantization process. Then, we propose a fast motion estimation (ME) algorithm that uses the FME and the
integer-pel motion estimation (IME) adaptively. The adaptive FME/IME algorithm examines the coding modes
of the reference block for the current coding block, and then decides whether the FME or the IME should be
applied to the current coding block. Experimental results show that the proposed adaptive FME/IME algorithm
can help the encoder generate a bit stream at much lower computational complexity with small degradation in
the coding gain as compared with a pure FME algorithm.
Decoder friendly H.264/AVC deblocking filter design
Show abstract
The complexity model of the H.264 deblocking filter (DBF) is studied in this work. The DBF process consists of three main modules: 1) boundary strength computation, 2) edge detection, and 3) low-pass filtering. Complexities of all three are considered in the proposed model. DBF-based decoding complexity control is also investigated. It is shown experimentally that the proposed complexity model can provide fair estimation results. Besides, the H.264 encoder equipped with the complexity model and the DBF-based decoding complexity control algorithms can generate bit streams to save a significant amount of decoding complexity while offering quality similar to those generated by a typical H.264 encoder.
Mobile Video: Processing, Communications, Display, and Applications II
Postprocessing and denoising of video using sparse multiresolutional transforms
Show abstract
This paper describes the construction of a set of sparsity-distortion-optimized orthonormal transforms designed for wavelet-domain
image denoising. The optimization operates over sub-bands of given orientation and exploits intra-scale dependencies
of wavelet coefficients over image singularities. When applied on the top of standard wavelet transforms, the resulting
new sparse representation provides compaction that can be exploited in transform domain denoising via cycle-spinning.1
Our construction deviates from the literature, which mainly focuses on model-based methods, by offering a data-driven optimization
of wavelet representations. Compared with translational-invariant denoising, the proposed method consistently
offers better performance compared to the original wavelet-representation and can reach up to 3dB improvements.
Image retargeting for small display devices
Show abstract
In this paper, we propose a novel image importance model for image
retargeting. The most widely used image importance model in existing
image retargeting methods is L1-norm or L2-norm of gradient
magnitude. It works well under non-complex environment. However, the
gradient magnitude based image importance model often leads to
severe visual distortions when the scene is cluttered or the
background is complex. In contrast to the most previous approaches,
we focus on the excellence of gradient domain statistics (GDS) for
more effective image retargeting rather than the gradient magnitude
itself. In our work, the image retargeting is developed in the sense
of human visual perception. We assume that the human visual
perception is highly adaptive and sensitive to structural
information in an image rather than non-structural information. We
do not model the image structure explicitly since there are diverse
aspects of image structure. Instead, our method obtains the
structural information in an image by exploiting the gradient domain
statistics in an implicit manner. Experimental results show that the
proposed method is more effective than the previous image
retargeting methods.
Adaptive image backlight compensation for mobile phone users
Show abstract
The user-friendliness and cost-effectiveness have contributed to the growing popularity of mobile phone cameras.
However, images captured by such mobile phone cameras are easily distorted by a wide range of factors, such
as backlight, over-saturation, and low contrast. Although several approaches have been proposed to solve the
backlight problems, most of them still suffer from distorted background colors and high computational complexity.
Thus, they are not deployable in mobile applications requiring real-time processing with very limited resources. In
this paper, we present a novel framework to compensate image backlight for mobile phone applications based on
an adaptive pixel-wise gamma correction which is computationally efficient. The proposed method is composed
of two sequential stages: 1) illumination condition identification and 2) adaptive backlight compensation. Given
images are classified into facial images and non-facial images to provide prior knowledge for identifying the
illumination condition at first. Then we further categorize the facial images into backlight images and nonbacklight
images based on local image statistics obtained from corresponding face regions. We finally compensate
the image backlight using an adaptive pixel-wise gamma correction method while preserving global and local
contrast effectively. To show the superiority of our algorithm, we compare our proposed method with other
state-of-the-art methods in the literature.
Remote gaming on resource-constrained devices
Show abstract
Games have become important applications on mobile devices. A mobile gaming approach known as remote gaming is being developed to support games on low cost mobile devices. In the remote gaming approach, the responsibility of rendering a game and advancing the game play is put on remote servers instead of the resource constrained mobile devices. The games rendered on the servers are encoded as video and streamed to mobile devices. Mobile devices gather user input and stream the commands back to the servers to advance game play. With this solution, mobile devices with video playback and network connectivity can become game consoles. In this paper we present the design and development of such a system and evaluate the performance and design considerations to maximize the end user gaming experience.
Optics, Photonics and Digital Image Processing
Multivariate image analysis of laser-induced photothermal imaging used for detection of caries tooth
Show abstract
Time-resolved photothermal imaging has been investigated to characterize tooth for the purpose of
discriminating between normal and caries areas of the hard tissue using thermal camera. Ultrasonic
thermoelastic waves were generated in hard tissue by the absorption of fiber-coupled Q-switched
Nd:YAG laser pulses operating at 1064 nm in conjunction with a laser-induced photothermal
technique used to detect the thermal radiation waves for diagnosis of human tooth. The concepts
behind the use of photo-thermal techniques for off-line detection of caries tooth features were
presented by our group in earlier work. This paper illustrates the application of multivariate image
analysis (MIA) techniques to detect the presence of caries tooth. MIA is used to rapidly detect the
presence and quantity of common caries tooth features as they scanned by the high resolution color
(RGB) thermal cameras. Multivariate principal component analysis is used to decompose the
acquired three-channel tooth images into a two dimensional principal components (PC) space.
Masking score point clusters in the score space and highlighting corresponding pixels in the image
space of the two dominant PCs enables isolation of caries defect pixels based on contrast and color
information. The technique provides a qualitative result that can be used for early stage caries tooth
detection. The proposed technique can potentially be used on-line or real-time resolved to prescreen
the existence of caries through vision based systems like real-time thermal camera. Experimental
results on the large number of extracted teeth as well as one of the thermal image panoramas of the
human teeth voltanteer are investigated and presented.
A multi-pedestrian detection and counting system using fusion of stereo camera and laser scanner
Show abstract
Automated vehicle counting technology has been in use for many years, but developments in automated pedestrian
counting technology have been limited. Pedestrians are more difficult to detect, track and count because their paths are
much less constrained. In this paper, we present an advanced pedestrian counting system using a stereo camera and a
laser scanner. A mapping algorithm has been developed to map the detection locations in the laser scanner coordinates to
the stereo-image coordinates. For pedestrian tracking, we apply the nonparametric statistical hypothesis tests such as
Kolmogorov- Smirnov test for association of close tracks, and incorporate pedestrian image features such as SIFT (Scale
Invariance Feature Transform) into Kalman filter for multi-pedestrian tracking. Test results based on the data collected at
a street intersection have demonstrated that this pedestrian counting system can accurately detect, track and count
multiple pedestrians walking in a large group.
Defect detection and classification of machined surfaces under multiple illuminant directions
Show abstract
Continuous improvement of product quality is crucial to the successful and competitive automotive manufacturing industry in the 21st century. The presence of surface porosity located on flat machined surfaces such as cylinder heads/blocks and transmission cases may allow leaks of coolant, oil, or combustion gas between critical mating surfaces, thus causing damage to the engine or transmission. Therefore 100% inline inspection plays an important role for improving product quality. Although the techniques of image processing and machine vision have been applied to machined surface inspection and well improved in the past 20 years, in today's automotive industry, surface porosity inspection is still done by skilled humans, which is costly, tedious, time consuming and not capable of reliably detecting small defects. In our study, an automated defect detection and classification system for flat machined surfaces has been designed and constructed. In this paper, the importance of the illuminant direction in a machine vision system was first emphasized and then the surface defect inspection system under multiple directional illuminations was designed and constructed. After that, image processing algorithms were developed to realize 5 types of 2D or 3D surface defects (pore, 2D blemish, residue dirt, scratch, and gouge) detection and classification. The steps of image processing include: (1) image acquisition and contrast enhancement (2) defect segmentation and feature extraction (3) defect classification. An artificial machined surface and an actual automotive part: cylinder head surface were tested and, as a result, microscopic surface defects can be accurately detected and assigned to a surface defect class. The cycle time of this system can be sufficiently fast that implementation of 100% inline inspection is feasible. The field of view of this system is 150mm×225mm and the surfaces larger than the field of view can be stitched together in software.
Comparison between two different methods to obtain the wavefront aberration function
Show abstract
The analysis and measurement of the wavefront aberration function are very important tools that allow us to evaluate the
performance of any specified optical system. This technology has been adopted in visual optics for the analysis of optical
aberrations in the human eye, before and after being subjected to laser refractive surgery. We have been working in the
characterization and evaluation of the objective performance of human eyes that have been subjected to two different
surface ablation techniques known as ASA and PASA1. However, optical aberrations in the human eye are time-dependent2
and, hence, difficult to analyze. In order to obtain a static profile from the post-operatory wavefront
aberration function we applied these ablation techniques directly over hard contact lenses. In this work we show the
comparison between two different methods to obtain the wavefront aberration function from a reference refractive
surface, in order to generalize this method and being able to fully characterize hard contact lenses which have been
subjected to different ablation techniques typically used in refractive surgery for vision correction. For the first method
we used a Shack-Hartmann wavefront sensor, and in the second method we used a Mach-Zehnder type interferometer.
We show the preliminary results of this characterization.
Refractive power maps of the anterior surface of the cornea according to different models
Lucerito Morales-Tellez,
Marco A. Rosales,
Estela López-Olazagasti,
et al.
Show abstract
In order to explore and analyze the effect of an ablation performed on the anterior corneal surface, it is useful to
calculate the refractive power maps of the original and the treated corneas. The optical characteristics of the anterior
corneal surfaces are typically simulated with different models, according to different degrees of simplification. To
predict which ablation would improve the refractive power of such cornea, which is directly related with the spherical
aberration associated with the shape of the anterior corneal surface, it is important to analyze those simplifications. Such
information is displayed in a refractive power map, which yields the true refractive power of the corneal surface, point
by point, expressing this power in diopters. The aim of the present work is twofold: different corneal models are
simulated so as to compare the spherical aberration produced by each one. On the other hand, simulations are made in
such a way that permits to foresee how the visual performance of an eye can be achieved by modifying the anterior
surface of its cornea through the corresponding power maps.
Rapid ideal template creation for the inspection of MEMS based on self-similarity characteristics
Show abstract
Surface metrology of MEMS requires high resolution sensors due to their fine structures. An automated multiscale
measurement system with multiple sensors at multiple scales enables fast acquisition of the surface data by utilizing high
resolution sensors only at locations required. We propose a technique that depends on the fact that often MEMS have
features (e.g. combs) repeating across the surface. These features can be segmented and fused to generate an ideal
template. We present an automated similarity search approach based on feature detection, rotation invariant matching,
and sum of absolute differences to find similar structures on the specimen. Then, similar segments are fused and replaced
in the original image to generate an ideal template.
Automatic alignment of multi-temporal images of planetary nebulae using local optimization
Show abstract
Automatic alignment of time-separated astronomical images have historically proven to be difficult. The main reason for
this difficulty is the amount of sporadic and unpredictable noise associated with astronomical images. A few examples
of these effects are: image distortion due to optics, cosmic ray hits, transient background sources (super novae) and
various artifact sources associated with the CCD imager itself. In this paper a new automated image registration method
is introduced for aligning two time-separated images while minimizing the inherent errors and unpredictabilities. Using
local optimization, the two images are aligned when the root mean square of the difference between the two images is
minimized. The dataset consists of images of galactic planetary nebulae acquired by the Hubble Space Telescope. The
aligned centroids inferred by the suggested method agree with the results from previously aligned images by inspection
with high confidence. It is also demonstrated that this method is robust, sufficient, does not require extensive user input
and it is highly sensitive to minor adjustments.
Poster Session
The development of an automatic scanning path generation method for the spinneret test
Show abstract
An automatic scanning path generation method is developed. The method is based on a 3-axis automatic inspection
system which is used to detect the clearance ratio of spinneret plate. The user can rely on this method to automatically
generate the scanning path for an unknown spinneret plate in the spinneret test. Then the scanning path can be learned by
the inspection system and repeated it for other the same spinneret. Two type spinnerets are introduced in this paper to
describe the automatic scanning path generation method. In this paper, the 3-axis automatic inspection system includes a
3-axes motorized linear stage, a telcentric lens, a top light source, a bottom light source, 1 CCD camera and a controlled
PC.
Meteor automatic imager and analyzer: analysis of noise characteristics and possible noise suppression
Show abstract
This paper is devoted to the noise analysis and noise suppression in a system for double station observation of
the meteors now known as MAIA (Meteor Automatic Imager and Analyzer). The noise analysis is based on
acquisition of testing video sequences at different light conditions and their further analysis. The main goal is to
find a suitable noise model and subsequently determine if the noise is signal dependent or not. Noise and image
model in the wavelet domain should be based on Gaussian mixture model (GMM) or Generalized Laplacian
Model (GLM) and the model parameters should be estimated by moment method. GMM and GLM allow to
model various types of probability density functions. Finally the advanced de-noising algorithm using Bayesian
estimator will be applied.
Correlation-based nonlinear composite filters applied to image recognition
Show abstract
Correlation-based pattern recognition has been an area of extensive research in the past few decades. Recently,
composite nonlinear correlation filters invariants to translation, rotation, and scale were proposed. The design of the
filters is based on logical operations and nonlinear correlation. In this work nonlinear filters are designed and applied to
non-homogeneously illuminated images acquired with an optical microscope. Images are embedded into cluttered
background, non-homogeneously illuminated and corrupted by random noise, which makes difficult the recognition task.
Performance of nonlinear composite filters is compared with performance of other composite correlation filters, in terms
discrimination capability.
Automated tracking of yeast cell lineages
Show abstract
We propose a cell progeny tracking method that sequentially employs image alignment, chamber cropping, cell
segmentation, per-cell feature measurement, and progeny (lineage) tracking modules. It enables biologists to keep track
of phenotypic patterns not only over time but also over multiple generations. Yeast cells encapsulated in chambers of a
polydimethylsiloxane (PDMS) microfluidic device were imaged over time to monitor changes in fluorescence levels. We
implemented our method in an automated cell image analysis tool, CellProfiler, and performed initial testing. Once
refined and validated, the approach could be adapted/used in other cell segmentation and progeny tracking experiments.
Center location error correction of circular targets
Show abstract
Circular targets are commonly used in vision measurement and photogrammetry. Due to the asymmetric projection, the geometric centroid of the ellipse projection and the true projection of the target center are not identical, which leads to a systematic center location error. A method to correct the center location error is presented in this paper. Surface normal directions of circular targets are determined by camera calibration in advance. Then the correction values of the geometric centroids are calculated with space analytic geometry. The experimental results show the improvement of accuracy can be achieved after error correction by our method.
Performance of visual tasks from contour information
Show abstract
A recently proposed visual aid for patients with a restricted visual field (tunnel vision) combines a see-through head-mounted display (HMD) and a simultaneous minified contour view of the wide field image of the environment. Such a widening of the effective visual field is helpful for tasks such as visual search, mobility and orientation. The sufficiency of contours (outlines of the objects in the image) for performing everyday visual tasks by human observers is of major importance for this application, as well as for other applications, and for basic understanding of human vision. Due to their efficient properties as good object descriptors, contours are widely used in computer vision applications, and therefore many methods have been developed for automatic extraction of them from the image. The purpose of this research is to examine and compare the use of different types of automatically created contours, and contour representations, for practical everyday visual operations using commonly observed images. The visual operations include visual searching for items such as keys, remote control, etc. Considering different recognition levels, identification of an object is distinguished from detection (when it is not clearly identified). Some new non-conventional visual-based contour representations were developed for this purpose. Experiments were performed with normal vision subjects, by superposing contours of the wide-field of the scene, over a narrow field (see-through) background. Results show that about 85% success is obtained by for searched object identification when the best contour versions are employed.
Augmented reality system
Chien-Liang Lin,
Yu-Zheng Su,
Min-Wei Hung,
et al.
Show abstract
In recent years, Augmented Reality (AR)[1][2][3] is very popular in universities and research organizations. The AR
technology has been widely used in Virtual Reality (VR) fields, such as sophisticated weapons, flight vehicle
development, data model visualization, virtual training, entertainment and arts. AR has characteristics to enhance the
display output as a real environment with specific user interactive functions or specific object recognitions. It can be use
in medical treatment, anatomy training, precision instrument casting, warplane guidance, engineering and distance robot
control. AR has a lot of vantages than VR. This system developed combines sensors, software and imaging algorithms to
make users feel real, actual and existing. Imaging algorithms include gray level method, image binarization method, and
white balance method in order to make accurate image recognition and overcome the effects of light.
Image restoration with local adaptive methods
Show abstract
Local adaptive processing in sliding transform domains for image restoration and noise removal with preservation of
edges and detail boundaries represents a substantial advance in the development of signal and image processing
techniques, thanks to its robustness to signal imperfections and local adaptivity (context sensitivity). Local filters in the
domain of orthogonal transforms at each position of a moving window modify the orthogonal transform coefficients of a
signal to obtain only an estimate of the central pixel of the window. A minimum mean-square error estimator in the
domain of sliding discrete cosine and sine transforms for noise removal and restoration is derived. This estimator is
based on fast inverse sliding transforms. To provide image processing at a high rate, fast recursive algorithm for
computing the sliding sinusoidal transforms are utilized. The algorithms are based on a recursive relationship between
three subsequent local spectra. Computer simulation results using synthetic and real images are provided and discussed.
Improvement of visual perception in cloudy environments
Show abstract
A new iterative algorithm for the improvement of the visual perception in cloudy environments is presented. The proposed
approach is based on an heuristic search algorithm used to estimate the depth map of a scene taken under bad
weather conditions. By the use of the suggested algorithm, an undegraded signal can be locally estimated at in an iteratve
way, increasing the confidence in the decision making in computer vision applications for human assistance. Computer
simulation results obtained with the proposed algorithm are provided and discussed in terms of performance metrics and
computational complexity.
Performance test of optical and electronic image stabilizer for digital imaging system
Qi Li,
Zhihai Xu,
Huajun Feng,
et al.
Show abstract
We designed and fabricated test apparatus to analysis performance characteristic of optical and electronic image
stabilization. The imaging system (digital video camera with image stabilization function) was fixed on a platform;
vibration frequency of the platform varies with the input voltage of electrical motor, and vibration amplitude of
the platform is changed through position adjustment of shaft of electrical motor. We start the vibration platform
and acquire ordinary image sequence, and then turn on the stabilizer and record image sequence under optical
stabilizer, afterwards, the optical stabilizer was turned off, the motion detection and compensation were used to
process the acquired image frames, and the image sequence with electronic image stabilization was obtained. We
analyzed and processed two kinds of image sequences from test apparatus and summarized some conclusions
about performance characteristic of image stabilizer. The electronic image stabilization effect is better at low
frequencies and the optical image stabilization effect is better at high frequencies. Furthermore, the improvement
in the degree of image stability caused by the electronic image stabilization is basically not related to the vibration
frequency, while the improvement in the degree of image stability caused by optical image stabilization increases
significantly with the increase in vibration frequency.
Meteor automatic imager and analyzer: system design and its parameters
Show abstract
A system for double station observation of the meteors now known as MAIA (Meteor Automatic Imager and Analyzer) is introduced in the paper. This system is an evolution of current analog solution. The system is based on the two stations with the gigabite ethernet cameras, sensitive image intensifiers and automatic processing of the recorded image data. The aim of such design is to capture and analyze images of meteors down to masses of fractions of gram. This paper presents the measured electrooptical characteristics of the particular components and the overall performance of the new digital system in the comparison to the current analog solution. At first the optimal settings of various parameters for each subsystem (primary lens, image intensifier, secondary lens and camera) are determined. Then the set of test images is captured and analyzed. The analysis of the images captured with both artificial and real targets verifies the suitability of the selected system design.
Analysis of the selection of overlapping region of sectioned restoration for images with space-variant point spread function
Show abstract
Classical image restoration is mostly base on the image deconvolution under the assumptions of linear system
transformation, stationary signal statistics and stationary, signal independent noise. Unfortunately, the assumptions are
not always accurate in real problems. For example, the optical aberrations, local defocus, local motion blur, temperature
variation, flexible medium, and non-stationary platform all cause the uncertain different degradation in different area of
the images. Therefore, overlapping-region sectioned restoration is suggested to reconstruct such blurred images with
space-variant point spread function (SVPSF). First of all, the full image is divided into several sub-sections, in which the
PSF nominally space invariant (SI). After the restoration with SI algorithm, the sub-frames are spliced to construct the
composite full-frame. Moreover, overlapping extension is employed to isolate edge-ringing effects from circular
convolution between the different restored sub-frames. In this paper, with the help of SSIM (Structural Similarity) and
GRM (Gradient Ringing Matrix) image quality assessment approaches, we discussed the selection of overlapping region
of the sectioned restoration with different algorithms, for images with signal to noise ratio (SNR) from 25db to 40db.
Our investigation proves that the restored image quality is best when the overlapping region as wide as the energydistribution-
area of degradation function.
Image restoration of nonuniformly illuminated images with camera microscanning
Show abstract
Various techniques for image recovery from degraded observed images were proposed. Most of the methods deal with
linear degradations and carry out signal processing using a single observed image. In this paper multiplicative, additive,
and impulsive image degradations are investigated. We propose restoration algorithms based on three observed degraded
images obtained from a microscanning camera. It is assumed that degraded images contain information about an original
image, illumination function, and noise. Using three degraded images and mathematical model of degradation a set of
equations is formed. By solving the system of equations with the help of an iterative algorithm the original image is
recovered.
Vertex-based marching algorithms for finding multidimensional geometric intersections
Show abstract
This article is a survey of the current state of the art in vertex-based marching algorithms for solving systems of
nonlinear equations and solving multidimensional intersection problems. It addresses also ongoing research and
future work on the topic. Among the new topics discussed here for the first time is the problem of characterizing
the type of singularities of piecewise affine manifolds, which are the numerical approximations to the solution
manifolds, as generated by the most advanced of the considered vertex-based algorithms: the Marching-Simplex
algorithm. Several approaches are proposed for solving this problem, all of which are related to modifications,
extensions and generalizations of the Morse lemma in differential topology.
Calibration of a dual-PTZ camera system for stereo vision
Show abstract
In this paper, we propose a calibration process for the intrinsic and extrinsic parameters of dual-PTZ camera systems.
The calibration is based on a complete definition of six coordinate systems fixed at the image planes, and the pan and tilt
rotation axes of the cameras. Misalignments between estimated and ideal coordinates of image corners are formed into
cost values to be solved by the Nelder-Mead simplex optimization method. Experimental results show that the system is
able to obtain 3D coordinates of objects with a consistent accuracy of 1 mm when the distance between the dual-PTZ camera
set and the objects are from 0.9 to 1.1 meters.
Utilization of consumer level digital cameras in astronomy
Show abstract
This paper presents a study of possible utilization of digital single-lens reflex (DSLR) cameras in astronomy.
The DSLRs have a great advantage over the professional equipments in better cost efficiency with comparable
usability for selected purposes. The quality of electro-optical system in the DSLR camera determines the area
where it can be used with acceptable precision. At first a set of important camera parameters for astronomical
utilization is introduced in the paper. Color filter array (CFA) structure, demosaicing algorithm, image sensor
spectral properties, noise and transfer characteristics are the parameters that belong among the very important
ones and these are further analyzed in the paper. Compression of astronomical images using the KLT approach
is also described below. The potential impact of these parameters on position and photometric measurement
is presented based on the analysis and measurements with the wide-angle lens. The prospective utilization of
consumer DSLR camera as a substitute for expensive devices is discussed.
Use of the EM algorithm in image registration in a scene captured by a moving camera
Show abstract
This paper presents the use of Expectation-Maximization (EM) method for image motion registration in a scene
captured by a moving camera. In [1] we presented a new iterative algorithm for the correction of geometrical
distortion caused by global motion in a scene. A binary hypotheses test was subsequently established to classify the
pixels in the corrected image as either locally moving (object motion) or not moving (stationary). There were some
unknown parameters, such as noise variance and motion variance, in the developments that needed to be estimated.
This paper presents the use of the EM algorithm to estimate these parameters. We present experiments with real
image sequences to validate the analytical developments.
Scene kinetics mitigation using factor analysis with derivative factors
Show abstract
Line of sight jitter in staring sensor data combined with scene information can obscure critical information for change
analysis or target detection. Consequently before the data analysis, the jitter effects must be significantly reduced.
Conventional principal component analysis (PCA) has been used to obtain basis vectors for background estimation;
however PCA requires image frames that contain the jitter variation that is to be modeled. Since jitter is usually chaotic
and asymmetric, a data set containing all the variation without the changes to be detected is typically not available. An
alternative approach, Scene Kinetics Mitigation, first obtains an image of the scene. Then it computes derivatives of that
image in the horizontal and vertical directions. The basis set for estimation of the background and the jitter consists of
the image and its derivative factors. This approach has several advantages including: 1) only a small number of images
are required to develop the model, 2) the model can estimate backgrounds with jitter different from the input training
images, 3) the method is particularly effective for sub-pixel jitter, and 4) the model can be developed from images before
the change detection process. In addition the scores from projecting the factors on the background provide estimates of
the jitter magnitude and direction for registration of the images. In this paper we will present a discussion of the
theoretical basis for this technique, provide examples of its application, and discuss its limitations.
Novel gray coded pattern for unwrapping phase in fringe projection based 3D profiling
Show abstract
A method to reliably extract object profiles even with height discontinuities (that leads to 2nπ phase jumps) is
proposed. This method uses Fourier transform profilometry to extract wrapped phase, and an additional image formed by
illuminating the object of interest by a novel gray coded pattern for phase unwrapping. Simulation results suggest that
the proposed approach not only retains the advantages of the original method, but also contributes significantly in the
enhancement of its performance. Fundamental advantage of this method stems from the fact that both extraction of
wrapped phase and unwrapping the same were done by gray scale images. Hence, unlike the methods that use colors,
proposed method doesn't demand a color CCD camera and is ideal for profiling objects with multiple colors.
A comparison between intensity and depth images for extracting features related to wear labels in carpets
Show abstract
Carpet manufacturers certify their products with labels corresponding to the capability of the carpets in retaining
the original appearance. Traditionally, these labels are subjectively defined by reference cases where
human experts evaluate the degree of wear, which is quantified by a number called the wear label. Industry is
very interested in converting these traditional standards to automated objective standards. With this purpose,
research has been conducted using image analysis with either depth or intensity data. In this paper, we present
a comparison of texture features extracted from both types of images. For this, we scanned 3D data and photographed
eight types of images provided from the EN1471 standard. The features are extracted comparing the
distribution of Local Binary Patterns (LBPs) computed from images of original and change in appearance. We
assess how well we can arrange the features in the order of the wear labels and count the number of consecutive
wear labels that can be statistically distinguished. We found that two of the eight carpet types are properly
described using depth data and five using intensity data while one type could not be described. These results
suggest that both types of images can be complementary used for representing the wear labels. This can lead to
an automated and universal labeling system for carpets.