Proceedings Volume 7873

Computational Imaging IX

cover
Proceedings Volume 7873

Computational Imaging IX

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 7 February 2011
Contents: 10 Sessions, 33 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2011
Volume Number: 7873

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter
  • Inverse Problems
  • Image and Video Analysis
  • Image and Video Analysis II
  • Imaging for Aerospace Applications
  • Image Processing for Mobile Device Applications
  • Special Session: Advance Methods in Tomographic Imaging I
  • Special Session: Advance Methods in Tomographic Imaging II
  • Advanced Methods in Inverse Problems
  • Interactive Paper Session
Front Matter
icon_mobile_dropdown
Front Matter: Volume 7873
This PDF file contains the front matter associated with SPIE Proceedings Volume 7873, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Inverse Problems
icon_mobile_dropdown
Myopic sparse image reconstruction with application to MRFM
Se Un Park, Nicolas Dobigeon, Alfred O. Hero
We propose a solution to the image deconvolution problem where the convolution operator or point spread function (PSF) is assumed to be only partially known. Small perturbations generated from the model are exploited to produce a few principal components explaining the uncertainty in a high dimensional space. Specifically, we assume the image is sparse corresponding to the natural sparsity of magnetic resonance force microscopy (MRFM). Our approach adopts a Bayesian Metropolis-within-Gibbs sampling framework. The performance of our Bayesian myopic algorithm is superior to previously proposed algorithms such as the alternating minimization (AM) algorithm for sparse images. We illustrate our myopic algorithm on real MRFM tobacco virus data.
Seismic imaging of transmission overhead line structure foundations
Denis Vautrin, Matthieu Voorons, Jérôme Idier, et al.
The goal of the presented work is to determine the shape of transmission overhead line structure foundations. A seismic imaging technique is used. It is formulated as an inverse scattering problem where two-dimensional maps of the pressure- and shear-wave velocities are estimated. The inversion amounts to a large-scale, nonlinear programming problem. It is rendered all the more difficult by the large dimensions of the scattering object and the high velocity contrasts. In this context, our goal is to propose an inversion scheme that produces precise images with an acceptable computational effort. This goal is met by combining the following elements: (i) minimization of a penalized least-square criterion with a quasi-Newton algorithm, (ii) frequency domain formulation in order to introduce the measured data progressively and (iii) introduction of a logarithmic change of variables for the quantities to be estimated. The latter point is our main contribution. Its role is to counterbalance the lack of sensitivity of the criterion and its introduction results in a significant acceleration of the inversion process.
Inverse problems for cryo electron microscopy of viruses: randomly oriented projection images of random 3D structures in noise
Qiu Wang, Peter C. Doerschuk
Instances of biological macromolecular complexes that have identical chemical constituents may not have the same geometry due to, for example, flexibility. Cryo electron microscopy provides one noisy projection image of each of many instances of a complex where the projection directions for the different instances are random. The noise is sufficient severe (SNR << 1) that the projection direction for a particular image cannot be easily estimated from the individual image. The goal is to determine the 3-D geometry of the complex (the 3-D distribution of electron scattering intensity) which requires fusing information from these many images of many complexes. In order to describe the geometric heterogeneity of the complexes, the complex is described as a weighted sum of basis functions where the weights are random. In order to get tractable algorithms, the weights are modeled as Gaussian random variables with unknown statistics and the noise is modeled as additive Gaussian random variables with unknown covariance. The statistics of the weights and the statistics of the noise are jointly estimated by maximum likelihood by a generalized expectation maximization algorithm. An example using these ideas on images of Flock House Virus is described.
Inverse problems arising in different synthetic aperture radar imaging systems and a general Bayesian approach for them
Sha Zhu, Ali Mohammad-Djafari, Xiang Li, et al.
Synthetic Aperture Radar (SAR) imaging systems are nowadays very common technics of imaging in remote sensing and environment survey. There are different acquisition modes: spotlight, stripmap, scan; different geometries: mono-, bi- and multi-static; and varieties of specific applications: interferometric SAR (InSAR), polarimetric SAR etc. In this paper, first a common inverse problem framework for all of them is given, and then basics of SAR imaging and the classical deterministic inversion methods are presented. Aiming at overcoming the inadequacies of deterministic methods, a general probabilistic Bayesian estimation method is pioneered for solving image reconstruction problems. In particular, two priors which simply allow the automated determination of the hyperparameters in a Type-II likelihood framework are considered. Finally, the performances of the proposed methods on synthetic data.
Medical image enhancement using resolution synthesis
We introduce a post-processing approach to improve the quality of CT reconstructed images. The scheme is adapted from the resolution-synthesis (RS)1 interpolation algorithm. In this approach, we consider the input image, scanned at a particular dose level, as a degraded version of a high quality image scanned at a high dose level. Image enhancement is achieved by predicting the high quality image by classification based linear regression. To improve the robustness of our scheme, we also apply the minimum description length principle to determine the optimal number of predictors to use in the scheme, and the ridge regression to regularize the design of the predictors. Experimental results show that our scheme is effective in reducing the noise in images reconstructed from filtered back projection without significant loss of image details. Alternatively, our scheme can also be applied to reduce dose while maintaining image quality at an acceptable level.
Image and Video Analysis
icon_mobile_dropdown
An open level set framework for image segmentation and restoration using the Mumford and Shah model
Rami Mohieddine, Luminita A. Vese
In two dimensions, the Mumford and Shah functional for image segmentation and regularization15 has minimizers (u,K), where u is a piecewise-smooth approximation of the image data f, and K represents the set of discontinuities of u (a union of curves). Theoretically, the edge set K could include both closed and open curves. The current level set and piecewise-smooth Mumford-Shah based segmentation algorithms4, 23, 24 can only detect objects with closed edges, which are boundaries of open sets. We propose an efficient Mumford-Shah and level set based algorithm for segmenting images with edges which are made up of open curves or crack-tips. By adapting Smereka's open level set formulation21 to variational problems, we are able to extend the current piecewise-smooth and level-set based image segmentation methods, such as4, 23, 24 to the case of open curve segmentation. The algorithm retains many of the advantages of using level sets, such as well-defined boundaries and ability to change topology. We solve the resulting Euler-Lagrange equations by Sobolev H1 gradient descent, avoiding instability and the need for additional regularization of the level set functions, while also accelerating convergence to the reconstructed image. Finally, we present the numerical implementation and experimental results on various noisy images.
Fisher information embedding for video indexing and retrieval
In this paper, we present a novel information embedding based approach for video indexing and retrieval. The high dimensionality for video sequences still poses a major challenge of video indexing and retrieval. Different from the traditional dimensionality reduction techniques such as Principal Component Analysis (PCA), we embed the video data into a low dimensional statistical manifold obtained by applying manifold learning techniques to the information geometry of video feature probability distributions (PDF). We estimate the PDF of the video features using histogram estimation and Gaussian mixture models (GMM), respectively. By calculating the similarities between the embedded trajectories, we demonstrate that the proposed approach outperforms traditional approaches to video indexing and retrieval with real world data.
Segmentation assisted food classification for dietary assessment
Fengqing Zhu, Marc Bosch, TusaRebecca Schap, et al.
Accurate methods and tools to assess food and nutrient intake are essential for the association between diet and health. Preliminary studies have indicated that the use of a mobile device with a built-in camera to obtain images of the food consumed may provide a less burdensome and more accurate method for dietary assessment. We are developing methods to identify food items using a single image acquired from the mobile device. Our goal is to automatically determine the regions in an image where a particular food is located (segmentation) and correctly identify the food type based on its features (classification or food labeling). Images of foods are segmented using Normalized Cuts based on intensity and color. Color and texture features are extracted from each segmented food region. Classification decisions for each segmented region are made using support vector machine methods. The segmentation of each food region is refined based on feedback from the output of classifier to provide more accurate estimation of the quantity of food consumed.
Image and Video Analysis II
icon_mobile_dropdown
Sparse Fisher's linear discriminant analysis
Hasib Siddiqui, Hau Hwang
Fisher's linear discriminant analysis (LDA) is traditionally used in statistics and pattern recognition to linearlyproject high-dimensional observations from two or more classes onto a low-dimensional feature space before classification. The computational complexity of the linear feature extraction method increases linearly with dimensionality of the observation samples. For high-dimensional signals, high computational cost can render the method unsuitable for implementation in real time. In this paper, we propose sparse Fisher's linear discriminant analysis, which allows one to search for lowdimensional subspaces, spanned by sparse discriminant vectors, in the high-dimensional space of observation samples from two classes. The sparsity constraints on the space of potential discriminant feature vectors are enforced using the sparse matrix transform (SMT) framework, proposed recently for regularized covariance estimation. Classical Fisher's LDA is a special case of sparse Fisher's LDA when the sparsity constraints on the feature vectors in the estimation algorithm are fully relaxed. The number of non-zero components in a discriminant direction estimated using our proposed discriminant analysis technique is tunable; this feature can be used to control the compromise between computational complexity and accuracy of the eventual classification algorithm. The experimental results discussed in the manuscript demonstrate the effectiveness of the new method for low-complexity data-classification applications.
Imaging for Aerospace Applications
icon_mobile_dropdown
Characterization of moving dust particles
A large depth-of-field Particle Image Velocimeter (PIV) has been developed at NASA GSFC to characterize dynamic dust environments on planetary surfaces. This instrument detects and senses lofted dust particles. To characterize a dynamic planetary dust environment, the instrument would have to operate for at least several minutes during an observation period, easily producing more than a terabyte of data per observation. Given current technology, this amount of data would be very difficult to store onboard a spacecraft and downlink to Earth. We have been developing an autonomous image analysis algorithm architecture for the PIV instrument to greatly reduce the amount of data that it has to store and downlink. The algorithm analyzes PIV images and reduces the image information down to only the particle measurement data we are interested in receiving on the ground - typically reducing the amount of data to be handled by more than two orders of magnitude. We give a general description of the PIV algorithms and describe in detail the algorithm for estimating the direction and velocity of the traveling particles, which was done by taking advantage of the optical properties of moving dust particles along with image processing techniques.
A super-resolution algorithm for enhancement of flash lidar data
Alexander Bulyshev, Michael Vanek, Farzin Amzajerdian, et al.
A novel method for enhancement of the spatial resolution of 3-diminsional Flash Lidar images is being proposed for generation of elevation maps of terrain from a moving platform. NASA recognizes the Flash LIDAR technology as an important tool for enabling safe and precision landing in future unmanned and crewed lunar and planetary missions. The ability of the Flash LIDAR to generate 3-dimensional maps of the landing site area during the final stages of the descent phase for detection of hazardous terrain features such as craters, rocks, and steep slopes is under study in the frame of the Autonomous Landing and Hazard Avoidance (ALHAT) project. Since single frames of existing FLASH LIDAR systems are not sufficient to build a map of entire landing site with acceptable spatial resolution and precision, a super-resolution approach utilizing multiple frames has been developed to overcome the instrument's limitations. Performance of the super-resolution algorithm has been analyzed through a series of simulation runs obtained from a high fidelity Flash LIDAR model and a high resolution synthetic lunar elevation map. For each simulation run, a sequence of FLASH LIDAR frames are recorded and processed as the spacecraft descends toward the landing site. Simulations runs having different trajectory profiles and varying LIDAR look angles of the terrain are also analyzed. The results show that adequate levels of accuracy and precision are achieved for detecting hazardous terrain features and identifying safe areas of the landing site.
Image registration for stability testing of MEMS
Image registration, or alignment of two or more images covering the same scenes or objects, is of great interest in many disciplines such as remote sensing, medical imaging, astronomy, and computer vision. In this paper, we introduce a new application of image registration algorithms. We demonstrate how through a wavelet based image registration algorithm, engineers can evaluate stability of Micro-Electro-Mechanical Systems (MEMS). In particular, we applied image registration algorithms to assess alignment stability of the MicroShutters Subsystem (MSS) of the Near Infrared Spectrograph (NIRSpec) instrument of the James Webb Space Telescope (JWST). This work introduces a new methodology for evaluating stability of MEMS devices to engineers as well as a new application of image registration algorithms to computer scientists.
Image Processing for Mobile Device Applications
icon_mobile_dropdown
Capacitive touch sensing: signal and image processing algorithms
Capacitive touch sensors have been in use for many years, and recently gained center stage with the ubiquitous use in smart-phones. In this work we will analyze the most common method of projected capacitive sensing, that of absolute capacitive sensing, together with the most common sensing pattern, that of diamond-shaped sensors. After a brief introduction to the problem, and the reasons behind its popularity, we will formulate the problem as a reconstruction from projections. We derive analytic solutions for two simple cases: circular finger on a wire grid, and square finger on a square grid. The solutions give insight into the ambiguities of finding finger location from sensor readings. The main contribution of our paper is the discussion of interpolation algorithms including simple linear interpolation , curve fitting (parabolic and Gaussian), filtering, general look-up-table, and combinations thereof. We conclude with observations on the limits of the present algorithmic methods, and point to possible future research.
Denoising, deblurring, and superresoluton in mobile phones
Filip Šroubek, Jan Kamenický, Jan Flusser
Current mobile phones and web cameras are equipped with low-budget digital cameras and very poor optics. Consequently, images acquired by such cameras are deteriorated by noise and blur, and have effective resolution lower than the number of pixels. Recovering a noise-free, sharp and high-resolution image from a single input image is a heavily ill-posed problem. We propose a novel algorithm which takes a set of acquired images from low-budget cameras and performs simultaneously three tasks: registration, denoising, deblurring and resolution enhancement. The amount of each depends on the characteristics of the input set. In order to achieve all tasks in one framework, we formulate the image restoration as an energy minimization problem. A special attention is paid to implementation, so that a fast algorithm is achieved. We demonstrate performance of the proposed algorithm on a system, which comprises a camera in a mobile phone (or web camera) and a PC. The mobile acquires images, connects to the PC via wireless network, sends the images and shows the output after it is calculated on the PC.
Arabic word recognizer for mobile applications
Nitin Khanna, Golnaz Abdollahian, Ben Brame, et al.
When traveling in a region where the local language is not written using a "Roman alphabet," translating written text (e.g., documents, road signs, or placards) is a particularly difficult problem since the text cannot be easily entered into a translation device or searched using a dictionary. To address this problem, we are developing the "Rosetta Phone," a handheld device (e.g., PDA or mobile telephone) capable of acquiring an image of the text, locating the region (word) of interest within the image, and producing both an audio and a visual English interpretation of the text. This paper presents a system targeted for interpreting words written in Arabic script. The goal of this work is to develop an autonomous, segmentation-free Arabic phrase recognizer, with computational complexity low enough to deploy on a mobile device. A prototype of the proposed system has been deployed on an iPhone with a suitable user interface. The system was tested on a number of noisy images, in addition to the images acquired from the iPhone's camera. It identifies Arabic words or phrases by extracting appropriate features and assigning "codewords" to each word or phrase. On a dictionary of 5,000 words, the system uniquely mapped (word-image to codeword) 99.9% of the words. The system has a 82% recognition accuracy on images of words captured using the iPhone's built-in camera.
Volume estimation using food specific shape templates in mobile image-based dietary assessment
Junghoon Chae, Insoo Woo, SungYe Kim, et al.
As obesity concerns mount, dietary assessment methods for prevention and intervention are being developed. These methods include recording, cataloging and analyzing daily dietary records to monitor energy and nutrient intakes. Given the ubiquity of mobile devices with built-in cameras, one possible means of improving dietary assessment is through photographing foods and inputting these images into a system that can determine the nutrient content of foods in the images. One of the critical issues in such the image-based dietary assessment tool is the accurate and consistent estimation of food portion sizes. The objective of our study is to automatically estimate food volumes through the use of food specific shape templates. In our system, users capture food images using a mobile phone camera. Based on information (i.e., food name and code) determined through food segmentation and classification of the food images, our system choose a particular food template shape corresponding to each segmented food. Finally, our system reconstructs the three-dimensional properties of the food shape from a single image by extracting feature points in order to size the food shape template. By employing this template-based approach, our system automatically estimates food portion size, providing a consistent method for estimation food volume.
Special Session: Advance Methods in Tomographic Imaging I
icon_mobile_dropdown
A hybrid approach to imaging and anomaly characterization from dual energy CT data
In this paper we present a novel polychromatic dual energy algorithm with an emphasis on detection of anomalies whose physical properties are assumed to be known with some level of uncertainty. We assume that material characteristics are defined by energy independent Compton scatter and photoelectric absorption coefficients. Uncertainty in material properties are characterized by an elliptical constraint regions in the Compton scatterphotoelectric coefficient space. We employ an image based iterative reconstruction algorithm to produce images of Compton scatter and photoelectric absorption coefficients of the medium. The solution is obtained via a nonlinear optimization process where the prior knowledge about the characteristics of object of interest is imposed as hard constraints. We also introduce a novel gradient-based similarity regularizer to cope with physics based limitations on accurately reconstructing the photoelectric absorption coefficient component. Our approach is based on a parametric level-set representation of the characteristic function of the object. For the reconstruction of the background we use basis expansion approach using compactly supported exponential radial basis functions. Numerical results show that the algorithm gives results superior to conventional filtered back projection (FBP) dual energy method in the presence of noise.
Robust multifrequency inversion in terahertz diffraction tomography
Ke Chen, David A. Castañón
Multi-frequency terahertz imaging has received much attention in recent years due to its ability to observe unique spectral characteristics of chemicals, which can be used in numerous applications such as explosives detection. Short-pulse terahertz sources can provide broadband excitation, but current approaches for image formation based on diffraction tomography construct images independently for each frequency. This results in a lack of resolution at lower frequencies, and lower signal-to-noise reconstructions. In this paper, we explore different techniques for joint image formation using multiple frequencies for enhanced detection. Among these are techniques that use prior information on spectral characteristics of materials of interest to coherently combine information from multiple frequencies, as well as robust techniques that assume incomplete or inaccurate prior knowledge of spectral signatures. We explore the relative performance of these techniques on image reconstruction and object recognition tasks using numerical simulations.
Classification-aware dimensionality reduction methods for explosives detection using multi-energy x-ray computed tomography
Limor Eger, Prakash Ishwar, W. Clem Karl, et al.
Multi-Energy X-ray Computed Tomography (MECT) is a non-destructive scanning technology in which multiple energyselective measurements of the X-ray attenuation can be obtained. This provides more information about the chemical composition of the scanned materials than single-energy technologies and potential for more reliable detection of explosives. We study the problem of discriminating between explosives and non-explosives using low-dimensional features extracted from the high-dimensional attenuation versus energy curves of materials. We study various linear dimensionality reduction methods and demonstrate that the detection performance can be improved by using more than two features and when using features different than the standard photoelectric and Compton coefficients. This suggests the potential for improved detection performance relative to conventional dual-energy X-ray systems.
Special Session: Advance Methods in Tomographic Imaging II
icon_mobile_dropdown
Constrain static target kinetic iterative image reconstruction for 4D cardiac CT imaging
Iterative image reconstruction offers improved signal to noise properties for CT imaging. A primary challenge with iterative methods is the substantial computation time. This computation time is even more prohibitive in 4D imaging applications, such as cardiac gated or dynamic acquisition sequences. In this work, we propose only updating the time-varying elements of a 4D image sequence while constraining the static elements to be fixed or slowly varying in time. We test the method with simulations of 4D acquisitions based on measured cardiac patient data from a) a retrospective cardiac-gated CT acquisition and b) a dynamic perfusion CT acquisition. We target the kinetic elements with one of two methods: 1) position a circular ROI on the heart, assuming area outside ROI is essentially static throughout imaging time; and 2) select varying elements from the coefficient of variation image formed from fast analytic reconstruction of all time frames. Targeted kinetic elements are updated with each iteration, while static elements remain fixed at initial image values formed from the reconstruction of data from all time frames. Results confirm that the computation time is proportional to the number of targeted elements; our simulations suggest that <30% of elements need to be updated in each frame leading to >3 times reductions in reconstruction time. The images reconstructed with the proposed method have matched mean square error with full 4D reconstruction. The proposed method is amenable to most optimization algorithms and offers the potential for significant computation improvements, which could be traded off for more sophisticated system models or penalty terms.
Kinetic parameter reconstruction for motion compensation in transmission tomography
Model based iterative reconstruction (MBIR) algorithms have recently been applied to computed tomography and demonstrated superior image quality. This algorithmic framework also provides us the flexibility to incorporate more sophisticated models of the data acquisition process. In this paper, we present the kinetic parameter iterative reconstruction (KPIR) algorithm which estimates voxel values as a function of time in the MBIR framework. We introduce a parametric kinetic model for each voxel, and estimate the kinetic parameters directly from the data. Results on phantom study and clinical data show that the proposed method can significantly reduce motion artifacts in the reconstruction.
Bayesian estimation with Gauss-Markov-Potts priors in optical diffraction tomography
In this paper, Optical Diffraction Tomography (ODT) is considered as an inverse scattering problem. The goal is to retrieve a map of the electromagnetic parameters of an unknown object from measurements of the scattered electric field that results from its interaction with a known interrogating wave. This is done in a Bayesian estimation framework. A Gauss-Markov-Potts prior appropriately translates the a priori knowledge that the object is made of a finite number of homogeneous materials distributed in compact regions. First, we express the a posteriori distributions of all the unknowns and then a Gibbs sampling algorithm is used to generate samples and estimate the posterior mean of the unknowns. Some preliminary results, obtained by applying the inversion algorithm to experimental laboratory controlled data, will illustrate the performances of the proposed method which is compared to the more classical Contrast Source Inversion method (CSI) developed in a deterministic framework.
Advanced Methods in Inverse Problems
icon_mobile_dropdown
Accelerating sparse reconstruction for fast and precomputable system matrix inverses
Signal reconstruction using an l1-norm penalty has proven to be valuable in edge-preserving regularization as well as in sparse reconstruction problems. The developing field of compressed sensing typically exploits this approach to yield sparse solutions in the face of incoherent measurements. Unfortunately, sparse reconstruction generally requires significantly more computation because of the nonlinear nature of the problem and because the most common solutions damage any structure that may otherwise exist in the system matrix. In this work we adopt a majorizing function for the absolute value term that can be used with structured system matrices so that the regularization term in the matrix to be inverted does not destroy the structure of the original matrix. As a result, a system inverse can be precomputed and applied efficiently at each iteration to speed the estimation process. We demonstrate that this method can yield significant computational advantages when the original system matrix can be represented or decomposed into an efficiently applied singular value decomposition.
An expectation maximization solution for fusing 2D and 3D ladar data
LADAR (LAser Detection and Ranging) systems can be used to provide 2-D and 3-D images of scenes. Generally, 2-D images possess superior spatial resolution without range data due to the density of their focal plane arrays. A 3-D LADAR system can produce range to target data at each pixel, but lacks the 2-D system's superior spatial resolution. It is the goal of this work to develop an algorithm using an Expectation Maximization approach for fusing 2-D and 3-D LADAR data. The algorithm developed demonstrates both spatial and range resolution improvement using simulated 2-D and 3-D LADAR data.
Superresolution with the focused plenoptic camera
Todor Georgiev, Georgi Chunev, Andrew Lumsdaine
Digital images from a CCD or CMOS sensor with a color filter array must undergo a demosaicing process to combine the separate color samples into a single color image. This interpolation process can interfere with the subsequent superresolution process. Plenoptic superresolution, which relies on precise sub-pixel sampling across captured microimages, is particularly sensitive to such resampling of the raw data. In this paper we present an approach for superresolving plenoptic images that takes place at the time of demosaicing the raw color image data. Our approach exploits the interleaving provided by typical color filter arrays (e.g., Bayer filter) to further refine plenoptic sub-pixel sampling. Our rendering algorithm treats the color channels in a plenoptic image separately, which improves final superresolution by a factor of two. With appropriate plenoptic capture we show the theoretical possibility for rendering final images at full sensor resolution.
Content-preserving zoom-in view generation for surveillance videos
Kenji Watanabe, Naoko Nitta, Noboru Babaguchi
There are several zoom-in video display methods including full-zoom and fisheye view that magnify the regions of interest (ROIs). However, those methods usually discard or deform the remaining regions without considering their content. In this paper, we propose a method for generating a content-preserving zoom-in view which magnifies ROIs and at the same time preserves the content of the remaining regions. Targeting on surveillance videos, our method firstly extracts moving objects from every input frame as ROIs. Then, the importance score is calculated for each pixel in the input frame based on its content to determine where the deformation, which may cause the destruction of the content, should be avoided. Finally, a mapping problem from the input frame to the zoom-in view with respect to the importance score is formulated to deform less important regions more than the important ones. Experiments are conducted to study the effectiveness of considering the content importance. We also compare the results of our method with those of other methods, fisheye view and a method of using uniform scaling and seam carving.
Interactive Paper Session
icon_mobile_dropdown
Colour image compression by grey to colour conversion
Mark S. Drew, Graham D. Finlayson, Abhilash Jindal
Instead of de-correlating image luminance from chrominance, some use has been made of using the correlation between the luminance component of an image and its chromatic components, or the correlation between colour components, for colour image compression. In one approach, the Green colour channel was taken as a base, and the other colour channels or their DCT subbands were approximated as polynomial functions of the base inside image windows. This paper points out that we can do better if we introduce an addressing scheme into the image description such that similar colours are grouped together spatially. With a Luminance component base, we test several colour spaces and rearrangement schemes, including segmentation. and settle on a log-geometric-mean colour space. Along with PSNR versus bits-per-pixel, we found that spatially-keyed s-CIELAB colour error better identifies problem regions. Instead of segmentation, we found that rearranging on sorted chromatic components has almost equal performance and better compression. Here, we sort on each of the chromatic components and separately encode windows of each. The result consists of the original greyscale plane plus the polynomial coefficients of windows of rearranged chromatic values, which are then quantized. The simplicity of the method produces a fast and simple scheme for colour image and video compression, with excellent results.
Study of recognizing human motion observed from an arbitrary viewpoint based on decomposition of a tensor containing multiple view motions
Takayuki Hori, Jun Ohya, Jun Kurumisawa
We propose a Tensor Decomposition based algorithm that recognizes the observed action performed by an unknown person and unknown viewpoint not included in the database. Our previous research aimed motion recognition from one single viewpoint. In this paper, we extend our approach for human motion recognition from an arbitrary viewpoint. To achieve this issue, we set tensor database which are multi-dimensional vectors with dimensions corresponding to human models, viewpoint angles, and action classes. The value of a tensor for a given combination of human silhouette model, viewpoint angle, and action class is the series of mesh feature vectors calculated each frame sequence. To recognize human motion, the actions of one of the persons in the tensor are replaced by the synthesized actions. Then, the core tensor for the replaced tensor is computed. This process is repeated for each combination of action, person, and viewpoint. For each iteration, the difference between the replaced and original core tensors is computed. The assumption that gives the minimal difference is the action recognition result. The recognition results show the validity of our proposed method, the method is experimentally compared with Nearest Neighbor rule. Our proposed method is very stable as each action was recognized with over 75% accuracy.
Visual real-time detection, recognition and tracking of ground and airborne targets
Levente Kovács, Csaba Benedek
This paper presents methods and algorithms for real-time visual target detection, recognition and tracking, both in the case of ground-based objects (surveyed from a moving airborne imaging sensor) and flying targets (observed from a ground-based or vehicle mounted sensor). The methods are highly parallelized and partially implemented on GPU, with the goal of real-time speeds even in the case of multiple target observations. Real-time applicability is in focus. The methods use single camera observations, providing a passive and expendable alternative for expensive and/or active sensors. Use cases involve perimeter defense and surveillance situations, where passive detection and observation is a priority (e.g. aerial surveillance of a compound, detection of reconnaissance drones, etc.).
Illuminant color estimation by hue categorization based on gray world assumption
Harumi Kawamura, Shunichi Yonemura, Jun Ohya, et al.
This paper proposes a gray world assumption based method for estimating an illuminant color from an image by hue categorization. The gray world assumption hypothesizes that the average color of all the objects in a scene is gray. However, it is difficult to estimate an illuminant color correctly if the colors of the objects in a scene are dominated by certain colors. To solve this problem, our method uses the opponent color properties that the average of a pair of opponent colors is gray. Thus our method roughly categorizes the colors derived from the image based on hue and selects them one by one from the hue categories until selected colors satisfy the gray world assumption. In our experiments, we used three kinds of illuminants (i.e., CIE standard illuminants A and D65, and a fluorescent light) and two kinds of data sets. One data set satisfies the gray world assumption, and the other does not. Experiment results show that estimated illuminants are closer to the correct ones than those obtained with the conventional method and the estimation error for both using CIE standard illuminants A and D65 by our method are within the barely noticeable difference in human color perception.
Super-resolved refocusing with a plenoptic camera
Zhiliang Zhou, Yan Yuan, Xiangli Bin, et al.
This paper presents an approach to enhance the resolution of refocused images by super resolution methods. In plenoptic imaging, we demonstrate that the raw sensor image can be divided to a number of low-resolution angular images with sub-pixel shifts between each other. The sub-pixel shift, which defines the super-resolving ability, is mathematically derived by considering the plenoptic camera as equivalent camera arrays. We implement simulation to demonstrate the imaging process of a plenoptic camera. A high-resolution image is then reconstructed using maximum a posteriori (MAP) super resolution algorithms. Without other degradation effects in simulation, the super resolved image achieves a resolution as high as predicted by the proposed model. We also build an experimental setup to acquire light fields. With traditional refocusing methods, the image is rendered at a rather low resolution. In contrast, we implement the super-resolved refocusing methods and recover an image with more spatial details. To evaluate the performance of the proposed method, we finally compare the reconstructed images using image quality metrics like peak signal to noise ratio (PSNR).
Compressive through-focus wavefield imaging
Edwin A. Marengo, Oren Mangoubi
Optical sensing and imaging applications often suffer from a combination of low resolution object reconstructions and a large number of sensors which, depending on frequency, can be quite expensive or bulky. It is therefore desirable to minimize the number of sensors (which reduces cost) for a given target resolution level (image quality) and permissible total sensor array size (compactness). Equivalently, for a given imaging hardware one seeks to maximize image quality, which in turn means fully exploiting the available sensors as well as all priors about the properties of the sought-after objects such as sparsity properties, and other, which can be incorporated into reconstruction schemes. This paper proposes a compressive-sensing-based method to process through-focus optical field data captured at a sensor array. The proposed approach treats in-focus and out-offocus data as projective measurements for compressive sensing, and assumes that the objects are sparse under known transformations applied to them. The proposed compressive through-focus imaging is illustrated for both coherent and incoherent light. The results illustrate the combined use of through-focus imaging and compressive sensing techniques, and provide insight on the information in in-focus and out-of-focus data for coherent as well as incoherent light.