Proceedings Volume 7245

# Image Processing: Algorithms and Systems VII

View the digital version of this volume at SPIE Digital Libarary.

## Volume Details

Date Published: 9 February 2009
Contents: 9 Sessions, 43 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2009
Volume Number: 7245

All links to SPIE Proceedings will open in the SPIE Digital Library.
View Session
• Transform Methods
• Image Restoration
• Image Processing Algorithms I
• Image Processing Algorithms II
• Image Processing Algorithms III
• Image Processing Systems
• Interactive Paper Session
• Neural Networks Application in Image Processing I
• Neural Networks Application in Image Processing II
• Interactive Paper Session
Transform Methods
Discrete integer Fourier transform in real space: elliptic Fourier transform
Artyom M. Grigoryan, Merughan M. Grigoryan
The concept of the N-point DFT is generalized, by considering it in the real space (not complex). The multiplication by twiddle coefficients is considered in matrix form; as the Givens transformation. Such block-wise representation of the matrix of the DFT is effective. The transformation which is called the T-generated N-block discrete transform, or N-block T-GDT is introduced. For each N-block T-GDT, the inner product is defined, with respect to which the rows (and columns) of the matrices X are orthogonal. By using different parameterized matrices T, we define metrics in the real space of vectors. The selection of the parameters can be done among only the integer numbers, which leads to integer-valued metric. We also propose a new representation of the discrete Fourier transform in the real space R2N. This representation is not integer, and is based on the matrix C (2x2) which is not a rotation, but a root of the unit matrix. The point (1, 0) is not moving around the unite circle by the group of motion generated by C, but along the perimeter of an ellipse. The N-block C-GDT is therefore called the N-block elliptic FT (EFT). These orthogonal transformations are parameterized; their properties are described and examples are given.
Reversible integer 2D Fourier transform
This paper describes the 2-D reversible integer discrete Fourier transform (RiDFT), which is based on the concept of the paired representation of the 2-D signal or image. The Fourier transform is split into a minimum set of short transforms. By means of the paired transform, the 2-D signal is represented as a set of 1-D signals which carry the spectral information of the signal at disjoint sets of frequency-points. The paired transform-based 2-D DFT involves a few operations of multiplication that can be approximated by integer transforms. Such one-point transforms with one control bit are applied for calculating the 2-D DFT. 24 real multiplications and 24 control bits are required to perform the 8x8-point RiDFT, and 264 real multiplications and 168 control bits for the 16 x 16-point 2-D RiDFT of real inputs. The computational complexity of the proposed 2-D RiDFTs is comparative with the complexity of the fast 2-D DFT.
On the use of the Stockwell transform for image compression
In this paper, we investigate the use of the Stockwell Transform for image compression. The proposed technique uses the Discrete Orthogonal Stockwell Transform (DOST), an orthogonal version of the Discrete Stockwell Transform (DST). These mathematical transforms provide a multiresolution spatial-frequency representation of a signal or image. First, we give a brief introduction for the Stockwell transform and the DOST. Then we outline a simplistic compression method based on setting the smallest coefficients to zero. In an experiment, we use this compression strategy on three different transforms: the Fast Fourier transform, the Daubechies wavelet transform and the DOST. The results show that the DOST outperforms the two other methods.
B-term approximation using tree-structured Haar transforms
Hsin-Han Ho, Karen O. Egiazarian, Sanjit K. Mitra
We present a heuristic solution for B-term approximation using Tree-Structured Haar (TSH) transforms. Our solution consists of two main stages: best basis selection and greedy approximation. In addition, when approximating the same signal with different B constraint or error metric, our solution also provides the flexibility of having less overall running time at expense of more storage space. We adopted lattice structure to index basis vectors, so that one index value can fully specify a basis vector. Based on the concept of fast computation of TSH transform by butterfly network, we also developed an algorithm for directly deriving butterfly parameters and incorporated it into our solution. Results show that, when the error metric is normalized ℓ1-norm and normalized ℓ2-norm, our solution has comparable (sometimes better) approximation quality with prior data synopsis algorithms.
Image Restoration
Local adaptive filtering of images corrupted by nonstationary noise
Vladimir Vasilyevich Lukin, Dmitriy V. Fevralev, Nikolay N. Ponomarenko, et al.
In various practical situations of remote sensing image processing it is assumed that noise is nonstationary and no a priory information on noise dependence on local mean or about local properties of noise statistics is available. It is shown that in such situations it is difficult to find a proper filter for effective image processing, i.e., for noise removal with simultaneous edge/detail preservation. To deal with such images, a local adaptive filter based on discrete cosine transform in overlapping blocks is proposed. A threshold is set locally based on a noise standard deviation estimate obtained for each block. Several other operations to improve performance of the locally adaptive filter are proposed and studied. The designed filter effectiveness is demonstrated for simulated data as well as for real life radar remote sensing and marine polarimetric radar images.
Noise reduction using multi-resolution edge detection
Bo Jiang, Zia-ur Rahman
In this paper, a new noise reduction algorithm is proposed. In general, an edge-high frequency information in an image-would be filtered or suppressed after image smoothing. The noise would be attenuated, but the image would lose its sharp information. This defect makes the post-processing harder. One new algorithm performs connectivity analysis on edge-data to make sure that only isolated edge information that represents noise gets filtered out, hence preserving the overall edge structure of the original image. The steps of new algorithm are as follows. First, find the edge from the noisy image by multi-resolution analysis. Second, use connectivity analysis to direct a mean filter to suppress the noise while preserving the edge information. In the first step, we propose a new algorithm to find edges in a very noisy image. The algorithm is based on the analysis of a group of multi-resolution images obtained by processing the original noisy image by different Gaussian filters. After applied to a sequence of images of the same scene but with different signal-noise-ratio (snr), this method is robust to remove noise and keep the edge. Also, through statistic analysis, there exists the regularity that the parameters of the algorithm would be constant with varying images under the same snr.
Image restoration regularized by a fourth-order PDE
Wenhua Ma, Yu-Li You, M. Kaveh
A fourth order PDE is proposed as the regularization operator for image restoration in order to alleviate the "blocky" effects that frequently mar restored images regularized with anisotropic difusion (a second order PDE). This is motivated by its desirable property of evolving toward an image consisting of piecewise planar areas which is a less blocky and better approximation to natural images. In order to mitigate speckle artifacts that it frequently brings about, image gradient magnitude is added as an additional variable of the nonlinearity function that controls its behavior. A numerical implementation method is presented and simulation results indicate that the proposed method tend to produce restored images which are smoother in smooth areas and sharper in feature-rich areas. However, speckle artifacts need to be carefully addressed.
Texture preservation in de-noising UAV surveillance video through multi-frame sampling
Yi Wang, Ronald A. Fevig, Richard R. Schultz
Image de-noising is a widely-used technology in modern real-world surveillance systems. Methods can seldom do both de-noising and texture preservation very well without a direct knowledge of the noise model. Most of the neighborhood fusion-based de-noising methods tend to over-smooth the images, which causes a significant loss of detail. Recently, a new non-local means method has been developed, which is based on the similarities among the different pixels. This technique results in good preservation of the textures; however, it also causes some artifacts. In this paper, we utilize the scale-invariant feature transform (SIFT) [1] method to find the corresponding region between different images, and then reconstruct the de-noised images by a weighted sum of these corresponding regions. Both hard and soft criteria are chosen in order to minimize the artifacts. Experiments applied to real unmanned aerial vehicle thermal infrared surveillance video show that our method is superior to popular methods in the literature.
Image Processing Algorithms I
Fast geodesic distance approximation using mesh decimation and front propagation
Polygon meshes are collections of vertices, edges and faces defining surfaces in a 3D environment. Computing geometric features on a polygon mesh is of major interest for various applications. Among these features, the geodesic distance is the distance between two vertices following the surface defined by the mesh. In this paper, we propose an algorithm for fast geodesic distance approximation using mesh decimation and front propagation. This algorithm is appropriated when a fast geodesic distances computation is needed and when no fine precision is required.
Image object removal in redundant wavelet transform domain
Yonghui Wang, Suxia Cui, Jian-ao Lian, et al.
In this paper, we present a novel approach on image object removal by extending subpatch texture synthesis technique into redundant wavelet transform (RDWT) domain. As an overcompleted wavelet transform, RDWT is shift invariant and obtained without downsampling. Also, each RDWT highpass subband exhibits one specific orientation features of the image, in horizontal, vertical, or diagonal. All these make RDWT ideal for performing texture synthesis object removal techniques. In our experiments, subpatch texture synthesis in RDWT is introduced to remove unwanted objects from digital photographs. Specifically, for each RDWT subband, depending on the subband orientation, a particular direction subpatch texture synthesis is applied independently. Experimental results reveal that our simple algorithm performs better than previous methods.
Nonlinear mapping of the luminance in dual-layer high dynamic range displays
It has long been known that the human visual system (HVS) has a nonlinear response to luminance. This nonlinearity can be quantified using the concept of just noticeable difference (JND), which represents the minimum amplitude of a specified test pattern an average observer can discern from a uniform background. The JND depends on the background luminance following a threshold versus intensity (TVI) function. It is possible to define a curve which maps physical luminances into a perceptually linearized domain. This mapping can be used to optimize a digital encoding, by minimizing the visibility of quantization noise. It is also commonly used in medical applications to display images adapting to the characteristics of the display device. High dynamic range (HDR) displays, which are beginning to appear on the market, can display luminance levels outside the range in which most standard mapping curves are defined. In particular, dual-layer LCD displays are able to extend the gamut of luminance offered by conventional liquid crystals towards the black region; in such areas suitable and HVS-compliant luminance transformations need to be determined. In this paper we propose a method, which is primarily targeted to the extension of the DICOM curve used in medical imaging, but also has a more general application. The method can be modified in order to compensate for the ambient light, which can be significantly greater than the black level of an HDR display and consequently reduce the visibility of the details in dark areas.
Color enhancement in a high dynamic range environment
Stefano Marsi, Alfredo Restrepo, Gabriele Guarnieri
We present techniques for the processing of color, high-dynamic luminance images of a type aiming for objectivity and also of a type aiming for aesthetic improvement. In the first case we start with camera raw data, propose a variant white balance, darken very light spots and lighten very dark spots. In the second case we use color spaces of the type hue-saturation-luminance; we propose a hue processing method inspired in the Bezold-Brucke effect as well as a luminance-dependant displacement of color saturation.
Image Processing Algorithms II
Active contours that grow and compete driven by local region descriptors
Region-based active contours are a variational framework for image segmentation. It involves estimating the probability distributions of observed features within each image region. Subsequently, these so-called region descriptors are used to generate forces to move the contour toward real image boundaries. In this paper region descriptors are computed from samples within windows centered on contour pixels and they are named local region descriptors (LRDs). With these descriptors we introduce an equation for contour motion with two terms: growing and competing. This equation yields a novel type of AC that can adjust the behavior of contour pieces to image patches and to the presence of other contours. The quality of the proposed motion model is demonstrated on complex images.
A fast intensity based non-rigid 2D-3D registration using statistical shape models with application in radiotherapy
In this paper we present a novel fast method for the non-rigid registration of a few X-ray projections with CT data. The method involves non-parametric non-rigid registration techniques for the difficult 2D-3D case, combined with knowledge of probable deformations modeled as active shape models (ASMs). ASMs allow us to cope with as few as two projections by regularizing the registration process. The model is learned from deformations observed during respiration in a 4D-CT. This method can be applied in motion compensated radiation therapy to eliminate the need for fiducial implantation. We designed a fast C++ implementation for our method in order to make it practicable. Our tests on real 4D-CT data achieved registration times of 2-4 minutes using a desktop PC.
Morphological demosaicking
Bayer patterns, in which a single value of red, green or blue is available for each pixel, are widely used in digital color cameras. The reconstruction of the full color image is often referred to as demosaicking. This paper introduced a new approach - morphological demosaicking. The approach is based on strong edge directionality selection and interpolation, followed by morphological operations to refine edge directionality selection and reduce color aliasing. Finally performance evaluation and examples of color artifacts reduction are shown.
A kernel representation for exponential splines with global tension
Sven Barendt, Bernd Fischer, Jan Modersitzki
Interpolation is a key ingredient in many imaging routines. In this note, we present a thorough evaluation of an interpolation method based on exponential splines in tension. They are based on so-called tension parameters, which allow for a tuning of their properties. As it turns out, these interpolants have very many nice features, which are, however, not born out in the literature. We intend to close this gap. We present for the first time an analytic representation of their kernel which enables one to come up with a space and frequency domain analysis. It is shown that the exponential splines in tension, as a function of the tension parameter, bridging the gap between linear and cubic B-Spline interpolation. For example, with a certain tension parameter, one is able to suppress ringing artefacts in the interpolant. On the other hand, the analysis in the frequency domain shows that one derives a superior signal reconstruction quality as known from the cubic B-Spline interpolation, which, however, suffers from ringing artifacts. With the ability to offer a trade-off between opposing features of interpolation methods we advocate the use of the exponential spline in tension from a practical point of view and use the new kernel representation to qualify the trade-off.
Compression of multispectral fluorescence microscopic images based on a modified set partitioning in hierarchal trees
Modern automated microscopic imaging techniques such as high-content screening (HCS), high-throughput screening, 4D imaging, and multispectral imaging are capable of producing hundreds to thousands of images per experiment. For quick retrieval, fast transmission, and storage economy, these images should be saved in a compressed format. A considerable number of techniques based on interband and intraband redundancies of multispectral images have been proposed in the literature for the compression of multispectral and 3D temporal data. However, these works have been carried out mostly in the elds of remote sensing and video processing. Compression for multispectral optical microscopy imaging, with its own set of specialized requirements, has remained under-investigated. Digital photography{oriented 2D compression techniques like JPEG (ISO/IEC IS 10918-1) and JPEG2000 (ISO/IEC 15444-1) are generally adopted for multispectral images which optimize visual quality but do not necessarily preserve the integrity of scientic data, not to mention the suboptimal performance of 2D compression techniques in compressing 3D images. Herein we report our work on a new low bit-rate wavelet-based compression scheme for multispectral fluorescence biological imaging. The sparsity of signicant coefficients in high-frequency subbands of multispectral microscopic images is found to be much greater than in natural images; therefore a quad-tree concept such as Said et al.'s SPIHT1 along with correlation of insignicant wavelet coefficients has been proposed to further exploit redundancy at high-frequency subbands. Our work propose a 3D extension to SPIHT, incorporating a new hierarchal inter- and intra-spectral relationship amongst the coefficients of 3D wavelet-decomposed image. The new relationship, apart from adopting the parent-child relationship of classical SPIHT, also brought forth the conditional "sibling" relationship by relating only the insignicant wavelet coefficients of subbands at the same level of decomposition. The insignicant quadtrees in dierent subbands in the high-frequency subband class are coded by a combined function to reduce redundancy. A number of experiments conducted on microscopic multispectral images have shown promising results for the proposed method over current state-of-the-art image-compression techniques.
Image Processing Algorithms III
Robust measurement of the blocking artefact
A method is presented to measure the intensity of the blocking artefact in compressed pictures or video frames. First, a way is devised to artificially introduce pure blocking, which closely resembles the real one subsequent to JPEG compression. Then a modified no-reference measurement is proposed that requires less computations than other formerly presented methods, permits to take into account the whole image or frame area, and is not affected by interlaced video. Some first experiments indicate that the measured values relate closely to the introduced blockiness effect. The robustness of the metric to the influence of other typical JPEG artefacts is also checked. Further, the effect on blockiness of some enhancement strategies is measured. Pictures enhanced with methods introducing the most severe blockiness are found to have the highest value of the proposed metric. Finally the problem of blockiness measurement in video sequences is addressed. In this case the blocking grid is no longer regular. In fact, blocks of different size could be used in encoding, and single blocks could be shifted in referenced (P and B) frames due to motion compensation. A method is devised for grid detection.
Image Processing Systems
Image pixel guided tours: a software platform for non-destructive x-ray imaging
K. P. Lam, R. Emery
Multivariate analysis seeks to describe the relationship between an arbitrary number of variables. To explore highdimensional data sets, projections are often used for data visualisation to aid discovering structure or patterns that lead to the formation of statistical hypothesis. The basic concept necessitates a systematic search for lower-dimensional representations of the data that might show interesting structure(s). Motivated by the recent research on the Image Grand Tour (IGT), which can be adapted to view guided projections by using objective indexes that are capable of revealing latent structures of the data, this paper presents a signal processing perspective on constructing such indexes under the unifying exploratory frameworks of Independent Component Analysis (ICA) and Projection Pursuit (PP). Our investigation begins with an overview of dimension reduction techniques by means of orthogonal transforms, including the classical procedure of Principal Component Analysis (PCA), and extends to an application of the more powerful techniques of ICA in the context of our recent work on non-destructive testing technology by element specific x-ray imaging.
Ensemble registration: aligning many multi-sensor images simultaneously
Jeff Orchard, Laure Jonchery
To register three or more images together, current approaches involve registering them two at a time. This pairwise approach can lead to registration inconsistencies. It can also result in diminished accuracy because only a fraction of the total data is being used at any given time. We propose a registration method that simultaneously registers the entire ensemble of images. This ensemble registration of multi-sensor datasets is done using clustering in the joint intensity space. Experiments demonstrate that the ensemble registration method overcomes serious issues that hinder pairwise multi-sensor registration methods.
Substitutive steganography in the generalized Fibonacci domain
In this contribution the robustness of a novel steganographic scheme based on the generalized Fibonacci sequence against Chi-square attacks is investigated. In essence, an image is first represented in a basis defined by a generalized Fibonacci sequence. Then the secret data are inserted by substitution technique into selected bit planes preserving the first order distributions, and finally, the inverse Fibonacci decomposition is applied to obtain the stego-image. Secret data are scrambled before the embedding to improve the security of the whole system. In order to perform Chi-square attacks, the knowledge of both the parameters determining the binary Fibonacci representation of an image is assumed. Experimental results show that no visual impairments are introduced and the probability of detecting the presence of hidden data is small even if a modest capacity loss is present.
Interactive Paper Session
Decomposition by series direction images: image reconstruction and enhancement
In this paper, we focus on the effective representation of the image, which is called the paired representation and reduces the image to the set of independent 1-D signals and splits the 2-D DFT into a minimal number of 1-D DFTs. The paired transform is a frequency and time representation of the image. Splitting-signals carry the spectral information in disjoint subsets of frequencies, which allows for enhancing the image by processing splitting-signals separately and changing the resolution of periodic structures composing the image. We present a new effective formula for the inverse 2-D paired transform, which can be used for solving the algebraic system of equations with measurement data for image reconstruction without using the Fourier transform technique. The image is reconstructed directly from the splitting-signals which can be calculated from projection data. The same inverse formula can be used for image enhancement, such as the known method of α-rooting. A new concept of direction images is introduced, that define the decomposition of the image by directions.
Eye blink detection based on eye contour extraction
Eye blink detection is one of the important problems in computer vision. It has many applications such as face live detection and driver fatigue analysis. The existing methods towards eye blink detection can be roughly divided into two categories: contour template based and appearance based methods. The former one usually can extract eye contours accurately. However, different templates should be involved for the closed and open eyes separately. These methods are also sensitive to illumination changes. In the appearance based methods, image patches of open-eyes and closed-eyes are collected as positive and negative samples to learn a classifier, but eye contours can not be accurately extracted. To overcome drawbacks of the existing methods, this paper proposes an effective eye blink detection method based on an improved eye contour extraction technique. In our method, eye contour model is represented by 16 landmarks therefore it can describe both open and closed eyes. Each landmark is accurately recognized by fast classifier which is trained from the appearance around this landmark. Experiments have been conducted on YALE and another large data set consisting of frontal face images to extract the eye contour. The experimental results show that the proposed method is capable of affording accurate eye location and robust in closed eye condition. It also performs well in the case of illumination variants. The average time cost of our method is about 140ms on Pentium IV 2.8GHz PC 1G RAM, which satisfies the real-time requirement for face video sequences. This method is also applied in a face live detection system and the results are promising.
Precision feature point tracking method using a drift-correcting template update strategy
Xiaoming Peng, Qian Ma, Qiheng Zhang, et al.
We present a drift-correcting template update strategy for precisely tracking a feature point in 2D image sequences in this paper. The proposed strategy greatly extends Matthews et al's template tracking strategy [I. Matthews, T. Ishikawa and S. Baker, The template update problem, IEEE Trans. PAMI 26 (2004) 810-815.] by incorporating a robust non-rigid image registration step used in medical imaging. Matthews et al's strategy uses the first template to correct drifts in the current template; however, the drift would still build up if the first template becomes quite different from the current one as the tracking continues. In our strategy the first template is updated timely when it is quite different from the current one, and henceforth the updated first template can be used to correct template drifts in subsequent frames. The method based on the proposed strategy yields sub-pixel accuracy tracking results measured by the commercial software REALVIZ(R) MatchMover(R) Pro 4.0. Our method runs fast on a desktop PC (3.0 GHz Pentium(R) IV CPU, 1GB RAM, Windows(R) XP professional operating system, Microsoft Visual C++ 6.0 (R) programming), using about 0.03 seconds on average to track the feature point in a frame (under the assumption of a general affine transformation model, 61×61 pixels in template size) and when required, less than 0.1 seconds to update the first template. We also propose the architecture for implementing our strategy in parallel.
A generalized set of kernels for edge and line detection
Edge detection is an important image processing task which has been used extensively in object detection and recognition. Over the years, many edge detection algorithms have been established, with most algorithms largely based around linear convolution operations. In such methods, smaller kernel sizes have generally been used to extract fine edge detail, but suffer from low noise tolerance. The use of higher dimension kernels is known to have good implications for edge detection, as higher dimension kernels generate coarser scale edges. This suppresses noise and proves to be particularly important for detection and recognition systems. This paper presents a generalized set of kernels for edge and line detection which are orthogonal to each other to yield nxn kernels for any odd dimension n. Some of the kernels can also be generalized to form mxn rectangular kernels. In doing so, it unifies small and large kernel approaches in order to reap the benefits of both. It is also seen that the Frei and Chen orthogonal kernel set is a single instance of this new generalization. Experimental results show that the new generalized set of kernels can improve edge detection results by combining the usefulness of both lower and higher dimension kernels.
Comparative study of methods for recognition of an unknown person's action from a video sequence
Takayuki Hori, Jun Ohya, Jun Kurumisawa
This paper proposes a Tensor Decomposition Based method that can recognize an unknown person's action from a video sequence, where the unknown person is not included in the database (tensor) used for the recognition. The tensor consists of persons, actions and time-series image features. For the observed unknown person's action, one of the actions stored in the tensor is assumed. Using the motion signature obtained from the assumption, the unknown person's actions are synthesized. The actions of one of the persons in the tensor are replaced by the synthesized actions. Then, the core tensor for the replaced tensor is computed. This process is repeated for the actions and persons. For each iteration, the difference between the replaced and original core tensors is computed. The assumption that gives the minimal difference is the action recognition result. For the time-series image features to be stored in the tensor and to be extracted from the observed video sequence, the human body silhouette's contour shape based feature is used. To show the validity of our proposed method, our proposed method is experimentally compared with Nearest Neighbor rule and Principal Component analysis based method. Experiments using 33 persons' seven kinds of action show that our proposed method achieves better recognition accuracies for the seven actions than the other methods.
Efficient detection of ellipses from an image by a guided modified RANSAC
Yingdi Xie, Jun Ohya
In this paper, we propose a novel ellipse detection method which is based on a modified RANSAC, with automatic sampling guidance from the edge orientation difference curve. Hough Transform family is one of the most popular and methods for shape detection, but the Standard Hough Transform loses its computation efficiency if the dimension of the parameter space gets high. Randomized Hough Transform, an improved version of Standard Hough Transform has difficulty in detecting shapes from complicated, cluttered scenes because of its random sampling process. As a pre-process for random selection of five pixels to be used to build the ellipse's equation, we propose a two-step algorithm: (1) region segmentation and contour detection by mean shift algorithm (2) contour splitting based on the edge orientation difference curve obtained from the contour of each region. In each contour segment obtained by step (2), 5 pixels are randomly selected and the modified RANSAC is applied to the 5 pixels so that an accurate ellipse model is obtained. Experimental result show that the proposed method can achieve high accuracies and low computation cost in detecting multiple ellipses from an image.
Adaptive image restoration using a proximity measure to boundary
In this study, an iterative maximum a posteriori (MAP) approach using a Bayesian model of Markov random field (MRF) was proposed for image restoration to reduce or remove the noise resulted from imperfect sensing. Image process is assumed to combine the random fields associated with the observed intensity process and the image texture process respectively. The objective measure for determining the optimal restoration of this "double compound stochastic" image process is based on Bayes' theorem, and the MAP estimation employs the Point-Jacobian iteration to obtain the optimal solution. In the proposed algorithm, MRF is used to quantify the spatial interaction probabilistically, that is, to provide a type of prior information on the image texture and the neighbor window of any size is defined for contextual information on a local region. However, the window of a certain size would result in using wrong information for the estimation from adjacent regions with different characteristics at the pixels close to or on the boundary. To overcome this problem, the new method is designed to use less information from more distant neighbors as the pixel is closer to the boundary. It can reduce the possibility to involve the pixel values of adjacent region with different characteristics. The proximity to the boundary is estimated using a non-uniformity measurement based on edge value, standard deviation, entropy, and the 4th moment of intensity distribution. This study evaluated the new scheme using simulation data, and the experimental results show a considerable improvement in image restoration.
Multichannel image processing by use of median M-type L-filter
In this paper, we introduce the Vector Median M-type L (VMML) -filter to remove impulsive and Gaussian noise from color images and video color sequences. This filter utilizes vector approach and the Median M-type (MM) estimator with different influence functions in the filtering scheme of L-filter. We also introduce the use of impulsive noise detectors to improve the properties of noise suppression and detail preservation in the proposed filtering scheme in the case of low and high densities of impulsive noise. To demonstrate the performance of the proposed filtering scheme in real applications, we applied it for filtering of SAR images, which naturally have speckle noise. Simulation results indicate that the proposed filter consistently outperforms other color image filters by balancing the tradeoff between noise suppression, detail preservation, and color retention.
Robustness and security assessment of image watermarking techniques by a stochastic approach
V. Conotter, G. Boato, C. Fontanari, et al.
In this paper we propose to evaluate both robustness and security of digital image watermarking techniques by considering the perceptual quality of un-marked images in terms of Weightened PSNR. The proposed tool is based on genetic algorithms and is suitable for researchers to evaluate robustness performances of developed watermarking methods. Given a combination of selected attacks, the proposed framework looks for a fine parameterization of them ensuring a perceptual quality of the un-marked image lower than a given threshold. Correspondingly, a novel metric for robustness assessment is introduced. On the other hand, this tool results to be useful also in those scenarios where an attacker tries to remove the watermark to overcome copyright issues. Security assessment is provided by a stochastic search of the minimum degradation that needs to be introduced in order to obtain an un-marked version of the image as close as possible to the given one. Experimental results show the effectiveness of the proposed approach.
Template matching based on quadtree Zernike decomposition
Alessandro Neri, Marco Carli, Veronica Palma, et al.
In this paper a novel technique for rotation independent template matching via Quadtree Zernike decomposition is presented. Both the template and the target image are decomposed by using a complex polynomial basis. The template is analyzed in block-based manner by using a quad tree decomposition. This allows the system to better identify the object features. Searching for a complex pattern into a large multimedia database is based on a sequential procedure that verifies whether the candidate image contains each square of the ranked quadtree list and refining, step-by-step, the location and orientation estimate.
A distributed coding approach for stereo sequences in the tree structured Haar transform domain
M. Cancellaro, M. Carli, A. Neri
In this contribution, a novel method for distributed video coding for stereo sequences is proposed. The system encodes independently the left and right frames of the stereoscopic sequence. The decoder exploits the side information to achieve the best reconstruction of the correlated video streams. In particular, a syndrome coder approach based on a lifted Tree Structured Haar wavelet scheme has been adopted. The experimental results show the effectiveness of the proposed scheme.
Hyper-spectral image segmentation using spectral clustering with covariance descriptors
Olcay Kursun, Fethullah Karabiber, Cemalettin Koc, et al.
Image segmentation is an important and difficult computer vision problem. Hyper-spectral images pose even more difficulty due to their high-dimensionality. Spectral clustering (SC) is a recently popular clustering/segmentation algorithm. In general, SC lifts the data to a high dimensional space, also known as the kernel trick, then derive eigenvectors in this new space, and finally using these new dimensions partition the data into clusters. We demonstrate that SC works efficiently when combined with covariance descriptors that can be used to assess pixelwise similarities rather than in the high-dimensional Euclidean space. We present the formulations and some preliminary results of the proposed hybrid image segmentation method for hyper-spectral images.
Neural Networks Application in Image Processing I
Minimization of color halftone texture visibility using three-dimensional error diffusion neural network
Previously we have shown that error diffusion neural networks (EDNs) find local minima of frequency-weighted error between a binary halftone output and corresponding smoothly varying input, an ideal framework for solving halftone problems. An extension of our work to color halftoning employs a three dimensional (3D) interconnect scheme. We cast color halftoning as four related sub-problems: the first three are to compute good binary halftones for each primary color and the fourth is to simultaneously minimize frequency-weighted error in the luminosity of the composite result. We have showed that an EDN with a 3D interconnect scheme can solve all four problems in parallel. This paper shows that our 3D EDN algorithm not only shapes the error to frequencies to which the Human Visual System (HVS) is least sensitive but also shapes the error in colors to which the HVS is least sensitive. The correlation among the color planes by luminosity reduces the formation of high contrast pixels, such as black and white pixels that often constitute color noise, resulting in a smoother and more homogeneous appearance in a halftone image and a closer resemblance to the continuous tone image. The texture visibility of color halftone patterns is evaluated in two ways: (1) by computing the radially averaged power spectrum (2) by computing the visual cost function.
Application of SGRBF for level set based image segmentation
Yingxuan Zhu, Miyoung Shin, Amrit L. Goel
In this study, the radial basis functions based SG algorithm (SGRBF) is applied for evolution of level sets in image segmentation. The implementation of level set method in image processing often involves solving partial differential equations (PDEs). Finite differences implicit scheme is a prevalent method to solve PDE for extending the evolution of level sets. Instead of using finite differences method, SGRBF is used in our study for evolving level sets. The SGRBF is a mathematical framework developed for function approximation using Gaussian RBFs. In SGRBF, the number and centers of the basis functions are determined in a systematic and mathematically sound way using a purely algebraic approach. The numerical results show that, except for a continuous representation of both the implicit function and its level sets, the algorithm we introduce here can reduce the computation cost by selecting the most contributive centers for radial basis functions.
Concurrent grammar inference machines for 2-D pattern recognition: a comparison with the level set approach
K. P. Lam, P. Fletcher
Parallel processing promises scalable and effective computing power which can handle the complex data structures of knowledge representation languages efficiently. Past and present sequential architectures, despite the rapid advances in computing technology, have yet to provide such processing power and to offer a holistic solution to the problem. This paper presents a fresh attempt in formulating alternative techniques for grammar learning, based upon the parallel and distributed model of connectionism, to facilitate the more cognitively demanding task of pattern understanding. The proposed method has been compared with the contemporary approach of shape modelling based on level sets, and demonstrated its potential as a prototype for constructing robust networks on high performance parallel platforms.
Robust image retrieval from noisy inputs using lattice associative memories
Lattice associative memories also known as morphological associative memories are fully connected feedforward neural networks with no hidden layers, whose computation at each node is carried out with lattice algebra operations. These networks are a relatively recent development in the field of associative memories that has proven to be an alternative way to work with sets of pattern pairs for which the storage and retrieval stages use minimax algebra. Different associative memory models have been proposed to cope with the problem of pattern recall under input degradations, such as occlusions or random noise, where input patterns can be composed of binary or real valued entries. In comparison to these and other artificial neural network memories, lattice algebra based memories display better performance for storage and recall capability; however, the computational techniques devised to achieve that purpose require additional processing or provide partial success when inputs are presented with undetermined noise levels. Robust retrieval capability of an associative memory model is usually expressed by a high percentage of perfect recalls from non-perfect input. The procedure described here uses noise masking defined by simple lattice operations together with appropriate metrics, such as the normalized mean squared error or signal to noise ratio, to boost the recall performance of either the min or max lattice auto-associative memories. Using a single lattice associative memory, illustrative examples are given that demonstrate the enhanced retrieval of correct gray-scale image associations from inputs corrupted with random noise.
Neural Networks Application in Image Processing II
Principle and design of a dynamic neural network for efficient and accurate recognition of a time-varying object based on its static patterns and its dynamic pattern variations
Based on our research in the last 17 years (with 68 papers published) on the subject of artificial neural network studied from the point of view of N-dimension geometry, a novel neural network system, the dynamic neural network, is proposed here for detecting an unknown moving (or time-varying) object such that the object will not only be detected by its static images, but also by the way it moves if this object follows a constant moving pattern. The system is designed to identify the unknown object by comparing a few time-separated snapshots of the object to a few standard moving objects learned or memorized in the system. The identification is determined by a user entered accuracy control. It could be very accurate, yet still be quite robust and quite fast in identification (e.g., identification in real-time) because of the simplicity of the algorithm. It is different from most other neural network systems because it employs the ND geometrical concept.
Efficient implementation of neural network deinterlacing
Interlaced scanning has been widely used in most broadcasting systems. However, there are some undesirable artifacts such as jagged patterns, flickering, and line twitters. Moreover, most recent TV monitors utilize flat panel display technologies such as LCD or PDP monitors and these monitors require progressive formats. Consequently, the conversion of interlaced video into progressive video is required in many applications and a number of deinterlacing methods have been proposed. Recently deinterlacing methods based on neural network have been proposed with good results. On the other hand, with high resolution video contents such as HDTV, the amount of video data to be processed is very large. As a result, the processing time and hardware complexity become an important issue. In this paper, we propose an efficient implementation of neural network deinterlacing using polynomial approximation of the sigmoid function. Experimental results show that these approximations provide equivalent performance with a considerable reduction of complexity. This implementation of neural network deinterlacing can be efficiently incorporated in HW implementation.
Optimal input sizes for neural network de-interlacing
Neural network de-interlacing has shown promising results among various de-interlacing methods. In this paper, we investigate the effects of input size for neural networks for various video formats when the neural networks are used for de-interlacing. In particular, we investigate optimal input sizes for CIF, VGA and HD video formats.
Edge detection algorithms implemented on Bi-i cellular vision system
Fethullah Karabiber, Sabri Arik
Bi-i (Bio-inspired) Cellular Vision system is built mainly on Cellular Neural /Nonlinear Networks (CNNs) type (ACE16k) and Digital Signal Processing (DSP) type microprocessors. CNN theory proposed by Chua has advanced properties for image processing applications. In this study, the edge detection algorithms are implemented on the Bi-i Cellular Vision System. Extracting the edge of an image to be processed correctly and fast is of crucial importance for image processing applications. Threshold Gradient based edge detection algorithm is implemented using ACE16k microprocessor. In addition, pre-processing operation is realized by using an image enhancement technique based on Laplacian operator. Finally, morphologic operations are performed as post processing operations. Sobel edge detection algorithm is performed by convolving sobel operators with the image in the DSP. The performances of the edge detection algorithms are compared using visual inspection and timing analysis. Experimental results show that the ACE16k has great computational power and Bi-i Cellular Vision System is very qualified to apply image processing algorithms in real time.
Interactive Paper Session
Nonlinear manifold based discriminant analysis for face recognition
Manifolds are mathematical spaces whose points have Euclidean neighborhoods, but whose global structure could be more complex. A one dimensional manifold has a neighborhood that resembles a line. A two dimensional one resembles a plane. If we consider a one dimensional example, most system neighborhoods cannot be represented optimally by a straight line. A multi-ordered nonlinear line would be better suited to represent most data. A learning algorithm to model the pipeline, based on Fischer Linear Discriminant (FLD), using least squares estimation is presented in this paper. Face patterns are known to show continuous variability. Yet face images of one individual tend to cluster together and can be considered as a neighborhood. Such similar patterns form a pipeline in state space that can be used for pattern classification. Multiple patterns can be trained by having separate lines for each pattern. Face points are now projected onto a low-dimensional mean nonlinear pipe-line, thus providing an easy intuitive way to place new points. Given a test point/face, the classification problem is now simplified to checking the nearest neighbors. This can be done by finding the minimum distance pipe-line from the test-point. The proposed representation of a face image results in improved accuracy when compared to the classical point representation.
Semantic home video categorization
Hyun-Seok Min, Young Bok Lee, Wesley De Neve, et al.
Nowadays, a strong need exists for the efficient organization of an increasing amount of home video content. To create an efficient system for the management of home video content, it is required to categorize home video content in a semantic way. So far, a significant amount of research has already been dedicated to semantic video categorization. However, conventional categorization approaches often rely on unnecessary concepts and complicated algorithms that are not suited in the context of home video categorization. To overcome the aforementioned problem, this paper proposes a novel home video categorization method that adopts semantic home photo categorization. To use home photo categorization in the context of home video, we segment video content into shots and extract key frames that represent each shot. To extract the semantics from key frames, we divide each key frame into ten local regions and extract lowlevel features. Based on the low level features extracted for each local region, we can predict the semantics of a particular key frame. To verify the usefulness of the proposed home video categorization method, experiments were performed with home video sequences, labeled by concepts part of the MPEG-7 VCE2 dataset. To verify the usefulness of the proposed home video categorization method, experiments were performed with 70 home video sequences. For the home video sequences used, the proposed system produced a recall of 77% and an accuracy of 78%.