Accurate estimation of principal point using three mutually orthogonal horizon lines
Author(s):
Abdulrahman S. Alturki;
John S. Loomis
Show Abstract
Accurate estimation of the principal point (PP) is critical for many camera calibration applications. Taking the PP as the intersection of the optical axis with the image plane, it is possible to identify this location as the orthocenter of three orthogonal vanishing points. This paper presents a method for accurately identifying the three vanishing points as the intersections of three horizon lines. The technique utilizes groups of images of a checkerboard test pattern collected from three orthogonal planes. Each group consists of images of the checkerboard rotated to different positions in the same plane, which can be used to identify a single horizon line. This is achieved by locating checkerboard corner points as saddle points, and using a Hough transform to group them into rows and columns. The vanishing points generated from the rows and columns lie along the same horizon line. Applying this technique to the rotated images within the group allows accurate estimation of the horizon line. Repeating this for image groups on three orthogonal planes creates horizon lines that effectively intersect at three orthogonal vanishing points, allowing for identification of the PP. The advantage of this technique is that it indirectly finds the three orthogonal vanishing points using horizon lines that are accurately found using fits to multiple vanishing points. Experiments with this technique indicate that it significantly reduces the error in finding the PP.
Automatic Mexican sign language and digits recognition using normalized central moments
Author(s):
Francisco Solís;
David Martínez;
Oscar Espinosa;
Carina Toxqui
Show Abstract
This work presents a framework for automatic Mexican sign language and digits recognition based on computer vision
system using normalized central moments and artificial neural networks. Images are captured by digital IP camera, four
LED reflectors and a green background in order to reduce computational costs and prevent the use of special gloves. 42
normalized central moments are computed per frame and used in a Multi-Layer Perceptron to recognize each database.
Four versions per sign and digit were used in training phase. 93% and 95% of recognition rates were achieved for
Mexican sign language and digits respectively.
An effective hair detection algorithm for dermoscopic melanoma images of skin lesions
Author(s):
Damayanti Chakraborti;
Ravneet Kaur;
Scott Umbaugh;
Robert LeAnder
Show Abstract
Dermoscopic images are obtained using the method of skin surface microscopy. Pigmented skin lesions are
evaluated in terms of texture features such as color and structure. Artifacts, such as hairs, bubbles, black
frames, ruler-marks, etc., create obstacles that prevent accurate detection of skin lesions by both clinicians and
computer-aided diagnosis. In this article, we propose a new algorithm for the automated detection of hairs, using an
adaptive, Canny edge-detection method, followed by morphological filtering and an arithmetic addition operation. The
algorithm was applied to 50 dermoscopic melanoma images. In order to ascertain this method’s relative detection
accuracy, it was compared to the Razmjooy hair-detection method [1], using segmentation error (SE), true detection rate
(TDR) and false positioning rate (FPR). The new method produced 6.57% SE, 96.28% TDR and 3.47% FPR, compared
to 15.751% SE, 86.29% TDR and 11.74% FPR produced by the Razmjooy method [1]. Because of the 7.27-9.99%
improvement in those parameters, we conclude that the new algorithm produces much better results for detecting thick,
thin, dark and light hairs. The new method proposed here, shows an appreciable difference in the rate of detecting
bubbles, as well.
Comparison of algorithms for automatic border detection of melanoma in dermoscopy images
Author(s):
Sowmya Srinivasa Raghavan;
Ravneet Kaur;
Robert LeAnder
Show Abstract
Melanoma is one of the most rapidly accelerating cancers in the world [1]. Early diagnosis is critical to an effective cure.
We propose a new algorithm for more accurately detecting melanoma borders in dermoscopy images. Proper border
detection requires eliminating occlusions like hair and bubbles by processing the original image. The preprocessing step
involves transforming the RGB image to the CIE L*u*v* color space, in order to decouple brightness from color
information, then increasing contrast, using contrast-limited adaptive histogram equalization (CLAHE), followed by
artifacts removal using a Gaussian filter. After preprocessing, the Chen-Vese technique segments the preprocessed
images to create a lesion mask which undergoes a morphological closing operation. Next, the largest central blob in the
lesion is detected, after which, the blob is dilated to generate an image output mask. Finally, the automatically-generated
mask is compared to the manual mask by calculating the XOR error [3]. Our border detection algorithm was developed
using training and test sets of 30 and 20 images, respectively. This detection method was compared to the SRM method
[4] by calculating the average XOR error for each of the two algorithms. Average error for test images was 0.10, using
the new algorithm, and 0.99, using SRM method. In comparing the average error values produced by the two algorithms,
it is evident that the average XOR error for our technique is lower than the SRM method, thereby implying that the new
algorithm detects borders of melanomas more accurately than the SRM algorithm.
3D reconstruction from images taken with a coaxial camera rig
Author(s):
Richard Kirby;
Ross Whitaker
Show Abstract
A coaxial camera rig consists of a pair of cameras which acquire images along the same optical axis but at different
distances from the scene using different focal length optics. The coaxial geometry permits the acquisition of image pairs
through a substantially smaller opening than would be required by a traditional binocular stereo camera rig. This is
advantageous in applications where physical space is limited, such as in an endoscope. 3D images acquired through an
endoscope are desirable, but the lack of physical space for a traditional stereo baseline is problematic. While image
acquisition along a common optical axis has been known for many years; 3D reconstruction from such image pairs has
not been possible in the center region due to the very small disparity between corresponding points. This characteristic of
coaxial image pairs has been called the unrecoverable point problem. We introduce a novel method to overcome the
unrecoverable point problem in coaxial camera rigs, using a variational methods optimization algorithm to map pairs of
optical flow fields from different focal length cameras in a coaxial camera rig. Our method uses the ratio of the optical
flow fields for 3D reconstruction. This results in accurate image pair alignment and produces accurate dense depth maps.
We test our method on synthetic optical flow fields and on real images. We demonstrate our method's accuracy by
evaluating against a ground-truth. Accuracy is comparable to a traditional binocular stereo camera rig, but without the
traditional stereo baseline and with substantially smaller occlusions.
High dynamic range subjective testing
Author(s):
Brahim Allan;
Mike Nilsson
Show Abstract
This paper describes of a set of subjective tests that the authors have carried out to assess the end user perception of
video encoded with High Dynamic Range technology when viewed in a typical home environment.
Viewers scored individual single clips of content, presented in High Definition (HD) and Ultra High Definition (UHD),
in Standard Dynamic Range (SDR), and in High Dynamic Range (HDR) using both the Perceptual Quantizer (PQ) and
Hybrid Log Gamma (HLG) transfer characteristics, and presented in SDR as the backwards compatible rendering of the
HLG representation.
The quality of SDR HD was improved by approximately equal amounts by either increasing the dynamic range or
increasing the resolution to UHD. A further smaller increase in quality was observed in the Mean Opinion Scores of the
viewers by increasing both the dynamic range and the resolution, but this was not quite statistically significant.
Single-layer HDR video coding with SDR backward compatibility
Author(s):
S. Lasserre;
E. François;
F. Le Léannec;
D. Touzé
Show Abstract
The migration from High Definition (HD) TV to Ultra High Definition (UHD) is already underway. In addition to an
increase of picture spatial resolution, UHD will bring more color and higher contrast by introducing Wide Color Gamut
(WCG) and High Dynamic Range (HDR) video. As both Standard Dynamic Range (SDR) and HDR devices will coexist
in the ecosystem, the transition from Standard Dynamic Range (SDR) to HDR will require distribution solutions
supporting some level of backward compatibility. This paper presents a new HDR content distribution scheme, named
SL-HDR1, using a single layer codec design and providing SDR compatibility. The solution is based on a pre-encoding
HDR-to-SDR conversion, generating a backward compatible SDR video, with side dynamic metadata. The resulting
SDR video is then compressed, distributed and decoded using standard-compliant decoders (e.g. HEVC Main 10
compliant). The decoded SDR video can be directly rendered on SDR displays without adaptation. Dynamic metadata of
limited size are generated by the pre-processing and used to reconstruct the HDR signal from the decoded SDR video,
using a post-processing that is the functional inverse of the pre-processing. Both HDR quality and artistic intent are
preserved. Pre- and post-processing are applied independently per picture, do not involve any inter-pixel dependency,
and are codec agnostic. Compression performance, and SDR quality are shown to be solidly improved compared to the
non-backward and backward-compatible approaches, respectively using the Perceptual Quantization (PQ) and Hybrid
Log Gamma (HLG) Opto-Electronic Transfer Functions (OETF).
FV10: an efficient single-layer approach to HDR coding, with backward compatibility options
Author(s):
Pankaj Topiwala;
Wei Dai;
Madhu Krishnan
Show Abstract
High Dynamic Range and Wide Color Gamut (HDR/WCG) video is now at the forefront of modern broadcast and other
video delivery systems. The efficient transmission and display of such video over diverse networks and systems is an
important problem. This paper presents a novel, state of the art approach in HDR/WCG video coding (called FV10)
which uses a new, fully automatic video data adaptive regrading process, which converts HDR to Standard Dynamic
Range (SDR). Our method differs from one developed recently in standards committees (the Joint Collaborative Team
on Video Coding, or JCT-VC, of ITU|ISO/IEC), based on the HEVC Main10 Profile as the core codec, which is an
HDR10 compliant system (“anchor”). FV10 also works entirely within the framework of HEVC Main10 Profile, but
makes greater use of existing SEI messages. Reconstructed video using our methods show a subjective visual quality
superior to the output of an example HDR10 anchor. Moreover, a usable backwards compatible SDR video is obtained
as a byproduct in the processing chain, allowing service efficiencies. Representative objective results for the system
include: results for RGB-PSNR, DE100, MD100, tOSNR-XYZ were -46.0%, -21.6%, -29.6%, 16.2% respectively.
Analysis of visual quality improvements provided by known tools for HDR content
Author(s):
Jaehwan Kim;
Elena Alshina;
JongSeok Lee;
Youngo Park;
Kwang Pyo Choi
Show Abstract
In this paper, the visual quality of different solutions for high dynamic range (HDR) compression using MPEG test contents is analyzed. We also simulate the method for an efficient HDR compression which is based on statistical property of the signal. The method is compliant with HEVC specification and also easily compatible with other alternative methods which might require HEVC specification changes. It was subjectively tested on commercial TVs and compared with alternative solutions for HDR coding. Subjective visual quality tests were performed using SUHD TVs model which is SAMSUNG JS9500 with maximum luminance up to 1000nit in test. The solution that is based on statistical property shows not only improvement of objective performance but improvement of visual quality compared to other HDR solutions, while it is compatible with HEVC specification.
Adaptive reshaper for high dynamic range and wide color gamut video compression
Author(s):
Taoran Lu;
Fangjun Pu;
Peng Yin;
Jaclyn Pytlarz;
Tao Chen;
Walt Husak
Show Abstract
High Dynamic Range (HDR) and Wider Color Gamut (WCG) content represents a greater range of luminance levels and
a more complete reproduction of colors found in real-world scenes. The characteristics of HDR/WCG content are very
different from the SDR content. It poses a challenge to the compression system which is originally designed for SDR
content. Recently in MPEG/VCEG, two directions have been taken to improve compression performances for
HDR/WCG video using HEVC Main10 codec. The first direction is to improve HDR-10 using encoder optimization.
The second direction is to modify the video signal in pre/post processing to better fit compression system. The process
therefore is out of coding loop and does not involve changes to the HEVC specification. Among many proposals in the
second direction, reshaper is identified to be the key component. In this paper, a novel luma reshaper is presented which
re-allocates the codewords to help codec improve subjective quality. In addition, encoder optimization can be performed
jointly with reshaping. Experiments are conducted with ICtCp color difference signal. Simulation results show that if
both joint optimization of reshaper and encoder are carried out, there is evidence that improvement over the HDR-10
anchor can be achieved.
HDR video synthesis for vision systems in dynamic scenes
Author(s):
Ivana Shopovska;
Ljubomir Jovanov;
Bart Goossens;
Wilfried Philips
Show Abstract
High dynamic range (HDR) image generation from a number of differently exposed low dynamic range (LDR) images has been extensively explored in the past few decades, and as a result of these efforts a large number of HDR synthesis methods have been proposed. Since HDR images are synthesized by combining well-exposed regions of the input images, one of the main challenges is dealing with camera or object motion. In this paper we propose a method for the synthesis of HDR video from a single camera using multiple, differently exposed video frames, with circularly alternating exposure times. One of the potential applications of the system is in driver assistance systems and autonomous vehicles, involving significant camera and object movement, non- uniform and temporally varying illumination, and the requirement of real-time performance. To achieve these goals simultaneously, we propose a HDR synthesis approach based on weighted averaging of aligned radiance maps. The computational complexity of high-quality optical flow methods for motion compensation is still pro- hibitively high for real-time applications. Instead, we rely on more efficient global projective transformations to solve camera movement, while moving objects are detected by thresholding the differences between the trans- formed and brightness adapted images in the set. To attain temporal consistency of the camera motion in the consecutive HDR frames, the parameters of the perspective transformation are stabilized over time by means of computationally efficient temporal filtering. We evaluated our results on several reference HDR videos, on synthetic scenes, and using 14-bit raw images taken with a standard camera.
Tone compatibility between HDR displays
Author(s):
Cambodge Bist;
Rémi Cozot;
Gérard Madec;
Xavier Ducloux
Show Abstract
High Dynamic Range (HDR) is the latest trend in television technology and we expect an in
ux of HDR capable
consumer TVs in the market. Initial HDR consumer displays will operate on a peak brightness of about 500-1000
nits while in the coming years display peak brightness is expected to go beyond 1000 nits. However, professionally
graded HDR content can range from 1000 to 4000 nits. As with Standard Dynamic Range (SDR) content, we
can expect HDR content to be available in variety of lighting styles such as low key, medium key and high key
video. This raises concerns over tone-compatibility between HDR displays especially when adapting to various
lighting styles. It is expected that dynamic range adaptation between HDR displays uses similar techniques as
found with tone mapping and tone expansion operators. In this paper, we survey simple tone mapping methods
of 4000 nits color-graded HDR content for 1000 nits HDR displays. We also investigate tone expansion strategies
when HDR content graded in 1000 nits is displayed on 4000 nits HDR monitors. We conclude that the best tone
reproduction technique between HDR displays strongly depends on the lighting style of the content.
HDR color conversion with varying distortion metrics
Author(s):
Andrey Norkin
Show Abstract
The paper compares three algorithms, which attenuate artifacts that may appear in HDR video in Y'CbCr 4:2:0 format.
The algorithms attenuate artifacts in colors at the color gamut boundaries while also improving the objective quality.
Two closed form solutions demonstrate the same subjective quality as the iterative approach, while being
computationally simpler. One of the closed form solutions also shows similar objective results to the iterative algorithm.
The choice of the upsampling filter in the pre-processing stage is important and may negatively affect both the objective
and subjective quality if there is a mismatch with the upsampling filter used in the reconstruction.
Application of field dependent polynomial model
Author(s):
Petr Janout;
Petr Páta;
Petr Skala;
Karel Fliegel;
Stanislav Vítek;
Jan Bednář
Show Abstract
Extremely wide-field imaging systems have many advantages regarding large display scenes whether for use in
microscopy, all sky cameras, or in security technologies. The Large viewing angle is paid by the amount of aberrations,
which are included with these imaging systems. Modeling wavefront aberrations using the Zernike polynomials is
known a longer time and is widely used. Our method does not model system aberrations in a way of modeling
wavefront, but directly modeling of aberration Point Spread Function of used imaging system. This is a very complicated
task, and with conventional methods, it was difficult to achieve the desired accuracy. Our optimization techniques of
searching coefficients space-variant Zernike polynomials can be described as a comprehensive model for ultra-wide-field
imaging systems. The advantage of this model is that the model describes the whole space-variant system, unlike the
majority models which are partly invariant systems. The issue that this model is the attempt to equalize the size of the
modeled Point Spread Function, which is comparable to the pixel size. Issues associated with sampling, pixel size, pixel
sensitivity profile must be taken into account in the design. The model was verified in a series of laboratory test patterns,
test images of laboratory light sources and consequently on real images obtained by an extremely wide-field imaging
system WILLIAM. Results of modeling of this system are listed in this article.
Evaluation of color mapping algorithms in different color spaces
Author(s):
Timothée-Florian Bronner;
Ronan Boitard;
Mahsa T. Pourazad;
Panos Nasiopoulos;
Touradj Ebrahimi
Show Abstract
The color gamut supported by current commercial displays is only a subset of the full spectrum of colors visible by the human eye. In High-Definition (HD) television technology, the scope of the supported colors covers 35.9% of the full visible gamut. For comparison, Ultra High-Definition (UHD) television, which is currently being deployed on the market, extends this range to 75.8%. However, when reproducing content with a wider color gamut than that of a television, typically UHD content on HD television, some original color information may lie outside the reproduction capabilities of the television. Efficient gamut mapping techniques are required in order to fit the colors of any source content into the gamut of a given display. The goal of gamut mapping is to minimize the distortion, in terms of perceptual quality, when converting video from one color gamut to another. It is assumed that the efficiency of gamut mapping depends on the color space in which it is computed. In this article, we evaluate 14 gamut mapping techniques, 12 combinations of two projection methods across six color spaces as well as R’G’B’ Clipping and wrong gamut interpretation. Objective results, using the CIEDE2000 metric, show that the R’G’B’ Clipping is slightly outperformed by only one combination of color space and projection method. However, analysis of images shows that R’G’B’ Clipping can result in loss of contrast in highly saturated images, greatly impairing the quality of the mapped image.
Expert viewing protocol performance study: the case of subjective evaluation of HDR coding
Author(s):
Vittorio Baroncini;
Giacomo Baroncini;
Pankaj Topiwala
Show Abstract
This paper tries to examine the results of a subjective evaluation experiment, made by means of the new Expert Viewing
Protocol, recently approved by ITU-R Study Group 6 [1]. The EVP subjective test was designed and performed to
compare different HDR coding technologies during an MPEG meeting (San Diego, CA, February 2016) [2]. Thanks to
the wide and enthusiastic participation of the MPEG experts to the subjective evaluation experiment, it was possible to
collect data from a total of sixteen viewers; this allowed to perform a sort of “validation” of the performance of the EVP.
The ITU-R Recommendation states that tests with nine viewers is sufficient to get acceptable results from an EVP
experiment. In our case, having data from 16 viewers, it was possible to compute the MOS and the Confidence Interval
data as if it were a standard subjective assessment experiment (which typically requires more viewers). This allowed a
sort of “validation” of the results obtained using results from 9 experts only vs. the results obtained using the data from
the 16 viewers. The analysis of the raw data showed a rather good conversion of the EVP results towards the results
obtained using the full viewers’ data set. The results of the EVP evaluation of MPEG HDR content was described in
details in a previous paper [3], to which we defer for details on the EVP protocol procedure and rules. This paper instead
tries to answer to a demand for further clarification on the “context” and “limitations of use” of the EVP when performed
in alternative to a formal subjective experiment trial.
Development and deployment of a tiled full parallax light field display system
Author(s):
Zahir Y. Alpaslan;
Rie Matsubara;
Hussein S. El-Ghoroury
Show Abstract
Ostendo’s Quantum Photonic Imager (QPI) is very small pixel pitch, emissive display with high brightness and low
power consumption. We used QPI’s to create a high performance light field display tiles with a very small form factor.
Using these light field display tiles various full parallax light field displays demonstrating small form factor, high
resolution and focus cues were created. In this paper, we will explain the design choices that were made in creating the
displays and their effects on the display performance. This paper details the system design approach including: hardware
design, software design, compression methods and human factors.
Improved integral images compression based on multi-view extraction
Author(s):
Antoine Dricot;
Joel Jung;
Marco Cagnazzo;
Béatrice Pesquet;
Frédéric Dufaux
Show Abstract
Integral imaging is a technology based on plenoptic photography that captures and samples the light-field of a scene through a micro-lens array. It provides views of the scene from several angles and therefore is foreseen as a key technology for future immersive video applications. However, integral images have a large resolution and a structure based on micro-images which is challenging to encode. A compression scheme for integral images based on view extraction has previously been proposed, with average BD-rate gains of 15.7% (up to 31.3%) reported over HEVC when using one single extracted view. As the efficiency of the scheme depends on a tradeoff between the bitrate required to encode the view and the quality of the image reconstructed from the view, it is proposed to increase the number of extracted views. Several configurations are tested with different positions and different number of extracted views. Compression efficiency is increased with average BD-rate gains of 22.2% (up to 31.1%) reported over the HEVC anchor, with a realistic runtime increase.
Accuracy and robustness evaluation in stereo matching
Author(s):
Duc Minh Nguyen;
Jan Hanca;
Shao-Ping Lu;
Peter Schelkens;
Adrian Munteanu
Show Abstract
Stereo matching has received a lot of attention from the computer vision community, thanks to its wide range of applications. Despite of the large variety of algorithms that have been proposed so far, it is not trivial to select suitable algorithms for the construction of practical systems. One of the main problems is that many algorithms lack sufficient robustness when employed in various operational conditions. This problem is due to the fact that most of the proposed methods in the literature are usually tested and tuned to perform well on one specific dataset. To alleviate this problem, an extensive evaluation in terms of accuracy and robustness of state-of-the-art stereo matching algorithms is presented. Three datasets (Middlebury, KITTI, and MPEG FTV) representing different operational conditions are employed. Based on the analysis, improvements over existing algorithms have been proposed. The experimental results show that our improved versions of cross-based and cost volume filtering algorithms outperform the original versions with large margins on Middlebury and KITTI datasets. In addition, the latter of the two proposed algorithms ranks itself among the best local stereo matching approaches on the KITTI benchmark. Under evaluations using specific settings for depth-image-based-rendering applications, our improved belief propagation algorithm is less complex than MPEG's FTV depth estimation reference software (DERS), while yielding similar depth estimation performance. Finally, several conclusions on stereo matching algorithms are also presented.
Capturing the plenoptic function in a swipe
Author(s):
Michael Lawson;
Mike Brookes;
Pier Luigi Dragotti
Show Abstract
Blur in images, caused by camera motion, is typically thought of as a problem. The approach described in this paper shows instead that it is possible to use the blur caused by the integration of light rays at different positions along a moving camera trajectory to extract information about the light rays present within the scene. Retrieving the light rays of a scene from different viewpoints is equivalent to retrieving the plenoptic function of the scene. In this paper, we focus on a specific case in which the blurred image of a scene, containing a flat plane with a texture signal that is a sum of sine waves, is analysed to recreate the plenoptic function. The image is captured by a single lens camera with shutter open, moving in a straight line between two points, resulting in a swiped image. It is shown that finite rate of innovation sampling theory can be used to recover the scene geometry and therefore the epipolar plane image from the single swiped image. This epipolar plane image can be used to generate unblurred images for a given camera location.
Improved inter-layer prediction for light field content coding with display scalability
Author(s):
Caroline Conti;
Luís Ducla Soares;
Paulo Nunes
Show Abstract
Light field imaging based on microlens arrays – also known as plenoptic, holoscopic and integral imaging – has recently
risen up as feasible and prospective technology due to its ability to support functionalities not straightforwardly available
in conventional imaging systems, such as: post-production refocusing and depth of field changing. However, to
gradually reach the consumer market and to provide interoperability with current 2D and 3D representations, a display
scalable coding solution is essential.
In this context, this paper proposes an improved display scalable light field codec comprising a three-layer hierarchical
coding architecture (previously proposed by the authors) that provides interoperability with 2D (Base Layer) and 3D
stereo and multiview (First Layer) representations, while the Second Layer supports the complete light field content. For
further improving the compression performance, novel exemplar-based inter-layer coding tools are proposed here for the
Second Layer, namely: (i) an inter-layer reference picture construction relying on an exemplar-based optimization
algorithm for texture synthesis, and (ii) a direct prediction mode based on exemplar texture samples from lower layers.
Experimental results show that the proposed solution performs better than the tested benchmark solutions, including the
authors’ previous scalable codec.
Impact of multi-focused images on recognition of soft biometric traits
Author(s):
V. Chiesa;
J. L. Dugelay
Show Abstract
In video surveillance semantic traits estimation as gender and age has always been debated topic because of the
uncontrolled environment: while light or pose variations have been largely studied, defocused images are still rarely
investigated. Recently the emergence of new technologies, as plenoptic cameras, yields to deal with these problems
analyzing multi-focus images. Thanks to a microlens array arranged between the sensor and the main lens, light field
cameras are able to record not only the RGB values but also the information related to the direction of light rays: the
additional data make possible rendering the image with different focal plane after the acquisition. For our experiments,
we use the GUC Light Field Face Database that includes pictures from the First Generation Lytro camera. Taking
advantage of light field images, we explore the influence of defocusing on gender recognition and age estimation
problems.
Evaluations are computed on up-to-date and competitive technologies based on deep learning algorithms. After studying
the relationship between focus and gender recognition and focus and age estimation, we compare the results obtained by
images defocused by Lytro software with images blurred by more standard filters in order to explore the difference
between defocusing and blurring effects. In addition we investigate the impact of deblurring on defocused images with
the goal to better understand the different impacts of defocusing and standard blurring on gender and age estimation.
An improved enhancement layer for octree based point cloud compression with plane projection approximation
Author(s):
Khartik Ainala;
Rufael N. Mekuria;
Birendra Khathariya;
Zhu Li;
Ye-Kui Wang;
Rajan Joshi
Show Abstract
Recent advances in point cloud capture and applications in VR/AR sparked new interests in the point cloud data
compression. Point Clouds are often organized and compressed with octree based structures. The octree subdivision
sequence is often serialized in a sequence of bytes that are subsequently entropy encoded using range coding, arithmetic
coding or other methods. Such octree based algorithms are efficient only up to a certain level of detail as they have an
exponential run-time in the number of subdivision levels. In addition, the compression efficiency diminishes when the
number of subdivision levels increases. Therefore, in this work we present an alternative enhancement layer to the coarse
octree coded point cloud. In this case, the base layer of the point cloud is coded in known octree based fashion, but the
higher level of details are coded in a different way in an enhancement layer bit-stream. The enhancement layer coding
method takes the distribution of the points into account and projects points to geometric primitives, i.e. planes. It then
stores residuals and applies entropy encoding with a learning based technique. The plane projection method is used for
both geometry compression and color attribute compression. For color coding the method is used to enable efficient raster
scanning of the color attributes on the plane to map them to an image grid. Results show that both improved compression
performance and faster run-times are achieved for geometry and color attribute compression in point clouds.
Three-dimensional rendering of computer-generated holograms acquired from point-clouds on light field displays
Author(s):
Athanasia Symeonidou;
David Blinder;
Beerend Ceulemans;
Adrian Munteanu;
Peter Schelkens
Show Abstract
Holograms, either optically acquired or simulated numerically from 3D datasets, such as point clouds, have
special rendering requirements for display. Evaluating the quality of hologram generation techniques is not
straightforward, since high-quality holographic display technologies are still immature, In this paper we present
a framework for three-dimensional rendering of colour computer-generated holograms (CGHs) acquired from
point-clouds, on high-end light field displays. This allows for the rendering of holographic content with horizontal
parallax and wide viewing angle. We deploy prior work, namely a fast CGH method that inherently handles
occlusion problems to acquire high quality colour holograms from point clouds. Our experiments showed that
rendering holograms with the proposed framework provides 3D effect with depth disparity and horizontal-only
with wide viewing angle. Therefore, it allows for the evaluation of CGH techniques regarding functional properties
such as depth cues and efficient occlusion handling.
Representation and coding of large-scale 3D dynamic maps
Author(s):
Robert A. Cohen;
Dong Tian;
Maja Krivokuća;
Kazuo Sugimoto;
Anthony Vetro;
Koji Wakimoto;
Shunichi Sekiguchi
Show Abstract
combined with depth and color measurements of the surrounding environment. Localization could be achieved with GPS, inertial measurement units (IMU), cameras, or combinations of these and other devices, while the depth measurements could be achieved with time-of-flight, radar or laser scanning systems. The resulting 3D maps, which are composed of 3D point clouds with various attributes, could be used for a variety of applications, including finding your way around indoor spaces, navigating vehicles around a city, space planning, topographical surveying or public surveying of infrastructure and roads, augmented reality, immersive online experiences, and much more. This paper discusses application requirements related to the representation and coding of large-scale 3D dynamic maps. In particular, we address requirements related to different types of acquisition environments, scalability in terms of progressive transmission and efficiently rendering different levels of details, as well as key attributes to be included in the representation. Additionally, an overview of recently developed coding techniques is presented, including an assessment of current performance. Finally, technical challenges and needs for future standardization are discussed.
A real-time 3D end-to-end augmented reality system (and its representation transformations)
Author(s):
Donny Tytgat;
Maarten Aerts;
Jeroen De Busser;
Sammy Lievens;
Patrice Rondao Alface;
Jean-Francois Macq
Show Abstract
The new generation of HMDs coming to the market is expected to enable many new applications that allow free viewpoint experiences with captured video objects. Current applications usually rely on 3D content that is manually created or captured in an offline manner. In contrast, this paper focuses on augmented reality applications that use live captured 3D objects while maintaining free viewpoint interaction. We present a system that allows live dynamic 3D objects (e.g. a person who is talking) to be captured in real-time. Real-time performance is achieved by traversing a number of representation formats and exploiting their specific benefits. For instance, depth images are maintained for fast neighborhood retrieval and occlusion determination, while implicit surfaces are used to facilitate multi-source aggregation for both geometry and texture. The result is a 3D reconstruction system that outputs multi-textured triangle meshes at real-time rates. An end-to-end system is presented that captures and reconstructs live 3D data and allows for this data to be used on a networked (AR) device. For allocating the different functional blocks onto the available physical devices, a number of alternatives are proposed considering the available computational power and bandwidth for each of the components. As we will show, the representation format can play an important role in this functional allocation and allows for a flexible system that can support a highly heterogeneous infrastructure.
Compressed digital holography: from micro towards macro
Author(s):
Colas Schretter;
Stijn Bettens;
David Blinder;
Béatrice Pesquet-Popescu;
Marco Cagnazzo;
Frédéric Dufaux;
Peter Schelkens
Show Abstract
signal processing methods from software-driven computer engineering and applied mathematics. The compressed
sensing theory in particular established a practical framework for reconstructing the scene content using few linear
combinations of complex measurements and a sparse prior for regularizing the solution. Compressed sensing found
direct applications in digital holography for microscopy. Indeed, the wave propagation phenomenon in free space
mixes in a natural way the spatial distribution of point sources from the 3-dimensional scene. As the 3-dimensional
scene is mapped to a 2-dimensional hologram, the hologram samples form a compressed representation of the
scene as well. This overview paper discusses contributions in the field of compressed digital holography at the
micro scale. Then, an outreach on future extensions towards the real-size macro scale is discussed. Thanks to
advances in sensor technologies, increasing computing power and the recent improvements in sparse digital signal
processing, holographic modalities are on the verge of practical high-quality visualization at a macroscopic scale
where much higher resolution holograms must be acquired and processed on the computer.
Photonics-enhanced smart imaging systems
Author(s):
Heidi Ottevaere;
Gebirie Y. Belay;
Hugo Thienpont
Show Abstract
We discuss different photonics-enhanced multichannel multiresolution imaging systems in which the different channels
have different imaging properties, namely a different FOV and angular resolution, over different areas of an image sensor.
This could allow different image processing algorithms to be implemented to process the different images. A basic threechannel
multiresolution imaging system was designed at 587.6 nm where each of the three channels consist of four
aspherical lens surfaces. These lenses have been fabricated in PMMA through ultra-precision diamond tooling and
afterwards assembled with aperture stops, baffles and a commercial CMOS sensor. To reduce the influence of chromatic
aberrations, hybrid lenses, which contain diffractive surfaces on top of refractive ones, have been included within the
previous designs of the three channels. These hybrid lenses have also been fabricated through ultra-precision diamond
tooling, assembled and verified in an experimental demonstration. The three channels with hybrid lenses show better image
quality (both in the simulation and experiment) compared to the purely refractive three channel design. Because of a limited
depth of field of the aforementioned multichannel multiresolution imaging systems, a voltage tunable lens has been
integrated in the first channel to extend the depth of field of the overall system. The refocusing capability has significantly
improved the depth of field of the system and ranged from 0.25 m to infinity compared to 9 m to infinity for the
aforementioned basic three-channel multiresolution imaging system.
A new approach to subjectively assess quality of plenoptic content
Author(s):
Irene Viola;
Martin Řeřábek;
Touradj Ebrahimi
Show Abstract
Plenoptic content is becoming increasingly popular thanks to the availability of acquisition and display devices. Thanks to image-based rendering techniques, a plenoptic content can be rendered in real time in an interactive manner allowing virtual navigation through the captured scenes. This way of content consumption enables new experiences, and therefore introduces several challenges in terms of plenoptic data processing, transmission and consequently visual quality evaluation. In this paper, we propose a new methodology to subjectively assess the visual quality of plenoptic content. We also introduce a prototype software to perform subjective quality assessment according to the proposed methodology. The proposed methodology is further applied to assess the visual quality of a light field compression algorithm. Results show that this methodology can be successfully used to assess the visual quality of plenoptic content.
Physics-inspired image analytics
(Conference Presentation)
Author(s):
Bahram Jalali;
Mohamad Asghari
Show Abstract
We describe a new computational approach to image analytics and its application to feature enhancement. The algorithm reveals latent features in the image by a transformation known as the Phase Stretch Transform. This computationally efficient transform emulates the propagation of light through a physical medium followed by detection of light’s complex amplitude. We show that the phase of the transform reveals transitions in image intensity and can be used for edge detection with excellent low light level sensitivity. When the diffractive medium has a warped frequency response, the transform engineers the space-bandwidth product of the image with potential application in data compression. Image processing inspired by optical physics has emerged from the research on Photonic Time Stretch, a time-domain signal processing technique that employs temporal dispersion to slow down, capture, and digitally process fast waveforms in real time. This talk will focus on the Phase Stretch Transform (PST), its extension to machine learning and applications in radiology, astronomy and security image analytics.
Asteroid detection using a single multi-wavelength CCD scan
Author(s):
Jonathan Melton
Show Abstract
Asteroid detection is a topic of great interest due to the possibility of diverting possibly dangerous asteroids or
mining potentially lucrative ones. Currently, asteroid detection is generally performed by taking multiple images of
the same patch of sky separated by 10-15 minutes, then subtracting the images to find movement. However, this is
time consuming because of the need to revisit the same area multiple times per night. This paper describes an
algorithm that can detect asteroids using a single CCD camera scan, thus cutting down on the time and cost of an
asteroid survey. The algorithm is based on the fact that some telescopes scan the sky at multiple wavelengths with a
small time separation between the wavelength components. As a result, an object moving with sufficient speed will
appear in different places in different wavelength components of the same image. Using image processing
techniques we detect the centroids of points of light in the first component and compare these positions to the
centroids in the other components using a nearest neighbor algorithm. The algorithm was used on a test set of 49
images obtained from the Sloan telescope in New Mexico and found 100% of known asteroids with only 3 false
positives. This algorithm has the advantage of decreasing the amount of time required to perform an asteroid scan,
thus allowing more sky to be scanned in the same amount of time or freeing a telescope for other pursuits.
Application of phase stretch transform to plate license identification under blur and noise conditions
(Conference Presentation)
Author(s):
Hossein Asghari;
Ofer Hadar;
Bahram Jalali
Show Abstract
This paper deals with implementing a new algorithm for edge detection based on the Phase Stretch Transform (PST) for purposes of car plate license recognition. In PST edge detection algorithm, the image is first filtered with a spatial kernel followed by application of a nonlinear frequency-dependent phase. The output of the transform is the phase in the spatial domain. The main step is the 2-D phase function which is typically applied in the frequency domain. The amount of phase applied to the image is frequency dependent with higher amount of phase applied to higher frequency features of the image. Since sharp transitions, such as edges and corners, contain higher frequencies, PST emphasizes the edge information. Features can be further enhanced by applying thresholding and morphological operations.
Here we investigate the influence of noise and blur on the ability to recognize the characters in the plate license, by comparison of our suggested algorithm with the well known Canny algorithm.
We use several types of noise distributions among them, Gaussian noise, salt and paper noise and uniform distributed noise, with several levels of noise variances. The simulated blur is related to the car velocity and we applied several filters representing different velocities of the car.
Another interesting degradation that we intend to investigate is the cases that Laser shield license plate cover is used to distort the image taken by the authorities.
Our comparison results are presented in terms of True positive, False positive and False negative probabilities.
Quick probabilistic binary image matching: changing the rules of the game
Author(s):
Adnan A. Y. Mustafa
Show Abstract
A Probabilistic Matching Model for Binary Images (PMMBI) is presented that predicts the probability of matching
binary images with any level of similarity. The model relates the number of mappings, the amount of similarity between
the images and the detection confidence. We show the advantage of using a probabilistic approach to matching in
similarity space as opposed to a linear search in size space. With PMMBI a complete model is available to predict the
quick detection of dissimilar binary images. Furthermore, the similarity between the images can be measured to a good
degree if the images are highly similar. PMMBI shows that only a few pixels need to be compared to detect dissimilarity
between images, as low as two pixels in some cases. PMMBI is image size invariant; images of any size can be matched
at the same quick speed. Near-duplicate images can also be detected without much difficulty. We present tests on real
images that show the prediction accuracy of the model.
Anomaly detection of blast furnace condition using tuyere cameras
Author(s):
Naoshi Yamahira;
Takehide Hirata;
Kazuro Tsuda;
Yasuyuki Morikawa;
Yousuke Takata
Show Abstract
We present a method of anomaly detection using multivariate statistical process control(MSPC) to detect the abnormal behaviors of a blast furnace. Tuyere cameras attached circumferentially at the lower side of a blast furnace are used to monitor the inside of the furnace and this method extracts abnormal behaviors of intensities. It is confirmed that with our method, detecting timing is earlier than operators’ notice. Besides, misalignment of cameras doesn’t affect detecting performance, which is important property in actual use.
Novel intra prediction modes for VP10 codec
Author(s):
Ariel Shleifer;
Chinmayi Lanka;
Mohit Setia;
Shubham Agarwal;
Ofer Hadar;
Debargha Mukherjee
Show Abstract
The demand for high quality video is permanently on the rise and with it the need for more effective compression.
Compression scope can be further expanded due to increased spatial correlation of pixels within a high quality video frame.
One basic feature that takes advantage of pixels’ spatial correlation for video compression is Intra-Prediction, which
determines the codec’s compression efficiency. Intra-Prediction enables significant reduction of the Intra-frame (I-frame)
size and, therefore, contributes to more efficient bandwidth exploitation. It has been observed that the intra frame coding
efficiency of VP9 is not as good as that of H.265/MPEG-HEVC. One possible reason is that HEVC’s Intra-prediction
algorithm uses as many as 35 prediction directions, while VP9 uses only 9 directions including the TM prediction mode.
Therefore, there is high motivation to improve the Intra-Prediction scheme with new, original and proprietary algorithms
that will enhance the overall performance of Google’s future codec and bring its performance closer to that of HEVC. In
this work, instead of using different angles for predictions, we introduce four unconventional Intra-Prediction modes for
the VP10 codec – Weighted CALIC (WCALIC), Intra-Prediction using System of Linear Equations (ISLE), Prediction of
Discrete Cosine Transformations (PrDCT) Coefficients and Reverse Least Power of Three (RLPT). Employed on a
selection eleven (11) typical images with a variety of spatial characteristics, by using Mean Square Error (MSE) evaluation
criteria, we show that our proposed algorithms (modes) were preferred and thus selected around 57% of the blocks,
resulting in a reduced average prediction error, i.e. the MSE of 26%. We believe that our proposed techniques will achieve
higher compression without compromising video quality, thus improving the Rate-Distortion (RD) performances of the
compressed video stream.
Perceptually-driven video coding with the Daala video codec
Author(s):
Yushin Cho;
Thomas J. Daede;
Nathan E. Egge;
Guillaume Martres;
Tristan Matthews;
Christopher Montgomery;
Timothy B. Terriberry;
Jean-Marc Valin
Show Abstract
The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered
codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches,
many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like
PSNR. This paper documents some of our experiences with these tools, which ones worked and which did not.
We evaluate which tools are easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.
A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications
Author(s):
Jan De Cock;
Aditya Mavlankar;
Anush Moorthy;
Anne Aaron
Show Abstract
Over the last years, we have seen exciting improvements in video compression technology, due to the introduction of HEVC and royalty-free coding specifications such as VP9. The potential compression gains of HEVC over H.264/AVC have been demonstrated in different studies, and are usually based on the HM reference software. For VP9, substantial gains over H.264/AVC have been reported in some publications, whereas others reported less optimistic results. Differences in configurations between these publications make it more difficult to assess the true potential of VP9. Practical open-source encoder implementations such as x265 and libvpx (VP9) have matured, and are now showing high compression gains over x264. In this paper, we demonstrate the potential of these encoder imple- mentations, with settings optimized for non-real-time random access, as used in a video-on-demand encoding pipeline. We report results from a large-scale video codec comparison test, which includes x264, x265 and libvpx. A test set consisting of a variety of titles with varying spatio-temporal characteristics from our catalog is used, resulting in tens of millions of encoded frames, hence larger than test sets previously used in the literature. Re- sults are reported in terms of PSNR, SSIM, MS-SSIM, VIF and the recently introduced VMAF quality metric. BD-rate calculations show that using x265 and libvpx vs. x264 can lead to significant bitrate savings for the same quality. x265 outperforms libvpx in most cases, but the performance gap narrows (or even reverses) at the higher resolutions.
Performance evaluation of MPEG internet video coding
Author(s):
Jiajia Luo;
Ronggang Wang;
Kui Fan;
Zhenyu Wang;
Ge Li;
Wenmin Wang
Show Abstract
Internet Video Coding (IVC) has been developed in MPEG by combining well-known existing technology elements and
new coding tools with royalty-free declarations. In June 2015, IVC project was approved as ISO/IEC 14496-33 (MPEG-
4 Internet Video Coding). It is believed that this standard can be highly beneficial for video services in the Internet
domain. This paper evaluates the objective and subjective performances of IVC by comparing it against Web Video
Coding (WVC), Video Coding for Browsers (VCB) and AVC High Profile. Experimental results show that IVC’s
compression performance is approximately equal to that of the AVC High Profile for typical operational settings, both
for streaming and low-delay applications, and is better than WVC and VCB.
Recent improvements to Thor with emphasis on perceptual coding tools
Author(s):
Thomas Davies;
Gisle Bjøntegaard;
Arild Fuldseth;
Steinar Midtskogen
Show Abstract
Thor supports a number of coding tools that have the potential to improve perceptual as well as objective quality.
Synthetic reference frames may be used to support high frame rate applications in circumstances where encoders might
typically transmit reduced frame rate content. Quantization matrices can be used to give improved visual quality by
allocating bits more closely according to perceptual significance. Thor’s Constrained Low Pass Loop Filter provides
significant subjective benefit in removing coding artefacts such as ringing. Improved colour fidelity can also be obtained
by leveraging luma information. This paper discusses developments in these tools and their objective and subjective
performance.
On transform coding tools under development for VP10
Author(s):
Sarah Parker;
Yue Chen;
Jingning Han;
Zoe Liu;
Debargha Mukherjee;
Hui Su;
Yongzhe Wang;
Jim Bankoski;
Shunyao Li
Show Abstract
Google started the WebM Project in 2010 to develop open source, royaltyfree
video codecs designed specifically for
media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube,
and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the
growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition
codec, VP10, that achieves at least a generational improvement in coding efficiency over VP9. Starting from VP9, a set
of new experimental coding tools have already been added to VP10 to achieve decent coding gains. Subsequently,
Google joined a consortium of major tech companies called the Alliance for Open Media to jointly develop a new codec
AV1. As a result, the VP10 effort is largely expected to merge with AV1. In this paper, we focus primarily on new tools
in VP10 that improve coding of the prediction residue using transform coding techniques. Specifically, we describe tools
that increase the flexibility of available transforms, allowing the codec to handle a more diverse range or residue
structures. Results are presented on a standard test set.
Patent landscape for royalty-free video coding
Author(s):
Cliff Reader
Show Abstract
Digital video coding is over 60 years old and the first major video coding standard – H.261 – is over 25 years old, yet
today there are more patents than ever related to, or evaluated as essential to video coding standards. This paper
examines the historical development of video coding standards, from the perspective of when the significant
contributions for video coding technology were made, what performance can be attributed to those contributions and
when original patents were filed for those contributions. These patents have now expired, so the main video coding tools,
which provide the significant majority of coding performance, are now royalty-free. The deployment of video coding
tools in a standard involves several related developments. The tools themselves have evolved over time to become more
adaptive, taking advantage of the increased complexity afforded by advances in semiconductor technology. In most
cases, the improvement in performance for any given tool has been incremental, although significant improvement has
occurred in aggregate across all tools. The adaptivity must be mirrored by the encoder and decoder, and advances have
been made in reducing the overhead of signaling adaptive modes and parameters. Efficient syntax has been developed to
provide such signaling. Furthermore, efficient ways of implementing the tools with limited precision, simple
mathematical operators have been developed. Correspondingly, categories of patents related to video coding can be
defined. Without discussing active patents, this paper provides the timeline of the developments of video coding and lays
out the landscape of patents related to video coding. This provides a foundation on which royalty free video codec
design can take place.
Performance comparison of HEVC reference SW, x265 and VPX on 8-bit 1080p content
Author(s):
Pankaj Topiwala;
Wei Dai;
Madhu Krishnan
Show Abstract
This paper presents a study comparing the coding efficiency performance of three software codecs: (a) the HEVC Main
Profile Reference Software; (b) the x265 codec; and (c) VP10. Note here that we are specifically testing only 8-bit
performance. Performance is tabulated for 1-pass encoding on two fronts: (1) objective performance (PSNR), (2)
informal subjective assessment. Finally, two approaches to coding were used: (i) constant quality; and (ii) fixed bit rate.
Constant quality encoding is performed with all the three codecs for an unbiased comparison of the core coding tools.
Whereas target bitrate coding is done to study the compression efficiency achieved with rate control, which can and does
have a significant impact. Our general conclusion is that under constant quality coding, the HEVC reference software
appears to be superior to the other two, whereas with rate control and fixed rate coding, these codecs are more on an
equal footing. We remark that this latter result may be partly or mainly due to the maturity of the various rate control
mechanisms in these codecs.
FVP10: enhancements of VPX for SDR/HDR applications
Author(s):
Pankaj Topiwala;
Wei Dai;
Madhu Krishnan
Show Abstract
This paper describes a study to investigate possible ways to improve the VPX codecs in the context of both 8-bit SDR
video and 10-bit HDR video content, for two types of applications: streaming and high quality (near lossless) coding for
content contribution editing. For SDR content, the following tools are investigated: (a) lapped biorthogonal transforms for near lossless applications; and (b) optimized resampling filter pairs for adaptive resolution coding in streaming
applications. For HDR content, a data adaptive grading technique in conjunction with the VP9/VP10 encoder is studied.
Both the objective metrics (measured using BD rate) and informal subjective visual quality assessments are recorded. It is asserted that useful improvements are possible in each of these categories. In particular, substantial value is offered in the coding of HDR content, and especially in creating a coding scheme offering backwards compatibility with SDR.
Optical remote sensing and correlation of office equipment functional state and stress levels via power quality disturbances inefficiencies
Author(s):
Oren Sternberg;
Valerie R. Bednarski;
Israel Perez;
Sara Wheeland;
John D. Rockway
Show Abstract
Non-invasive optical techniques pertaining to the remote sensing of power quality disturbances (PQD) are part of an
emerging technology field typically dominated by radio frequency (RF) and invasive-based techniques. Algorithms and
methods to analyze and address PQD such as probabilistic neural networks and fully informed particle swarms have been
explored in industry and academia. Such methods are tuned to work with RF equipment and electronics in existing power
grids. As both commercial and defense assets are heavily power-dependent, understanding electrical transients and failure
events using non-invasive detection techniques is crucial. In this paper we correlate power quality empirical models to the
observed optical response. We also empirically demonstrate a first-order approach to map household, office and
commercial equipment PQD to user functions and stress levels. We employ a physics-based image and signal processing
approach, which demonstrates measured non-invasive (remote sensing) techniques to detect and map the base frequency
associated with the power source to the various PQD on a calibrated source.
Computer-aided diagnostic approach of dermoscopy images acquiring relevant features
Author(s):
H. Castillejos-Fernández;
A. Franco-Arcega;
O. López-Ortega
Show Abstract
In skin cancer detection, automated analysis of borders, colors, and structures of a lesion relies upon an accurate segmentation process and it is an important first step in any Computer-Aided Diagnosis (CAD) system. However, irregular and disperse lesion borders, low contrast, artifacts in images and variety of colors within the interest region make the problem difficult. In this paper, we propose an efficient approach of automatic classification which considers specific lesion features. First, for the selection of lesion skin we employ the segmentation algorithm W-FCM.1 Then, in the feature extraction stage we consider several aspects: the area of the lesion, which is calculated by correlating axes and we calculate the specific the value of asymmetry in both axes. For color analysis we employ an ensemble of clusterers including K-Means, Fuzzy K-Means and Kohonep maps, all of which estimate the presence of one or more colors defined in ABCD rule and the values for each of the segmented colors. Another aspect to consider is the type of structures that appear in the lesion Those are defined by using the ell-known GLCM method. During the classification stage we compare several methods in order to define if the lesion is benign or malignant. An important contribution of the current approach in segmentation-classification problem resides in the use of information from all color channels together, as well as the measure of each color in the lesion and the axes correlation. The segmentation and classification measures have been performed using sensibility, specificity, accuracy and AUC metric over a set of dermoscopy images from ISDIS data set
A dehazing algorithm with multiple simultaneously captured images
Author(s):
José L. López-Martínez;
Vitaly Kober;
Manuel Escalante-Torres
Show Abstract
Recently, many efficient methods have been developed for dehazing using a single observed image. Such dehazing algorithms estimate scene depths and then compute the thickness of haze. However, since the problem is ill-posed, the restored image often contains artificial colors and overstretched contrast. In this work, we use multiple capturing of hazed images with three cameras to solve the dehazing problem. A new dehazing method with three multiple images is based on solution of explicit linear systems of equations derived from optimization of an objective function. The performance of the proposed method is compared with that of common dehazing algorithm in terms of accuracy of the quality of image restoration.
Improved lossless intra coding for next generation video coding
Author(s):
Rahul Vanam;
Yuwen He;
Yan Ye
Show Abstract
Recently, there have been efforts by the ITU-T VCEG and ISO/IEC MPEG to further improve the compression
performance of the High Efficiency Video Coding (HEVC) standard for developing a potential next generation video
coding standard. The exploratory codec software of this potential standard includes new coding tools for inter and intra
coding. In this paper, we present a new intra prediction mode for lossless intra coding. Our new intra mode derives a
prediction filter for each input pixel using its neighboring reconstructed pixels, and applies this filter to the nearest
neighboring reconstructed pixels to generate a prediction pixel. The proposed intra mode is demonstrated to improve the
performance of the exploratory software for lossless intra coding, yielding a maximum and average bitrate savings of 4.4%
and 2.11%, respectively.
Scene-aware joint global and local homographic video coding
Author(s):
Xiulian Peng;
Jizheng Xu;
Gary J. Sullivan
Show Abstract
Perspective motion is commonly represented in video content that is captured and compressed for various applications
including cloud gaming, vehicle and aerial monitoring, etc. Existing approaches based on an eight-parameter
homography motion model cannot deal with this efficiently, either due to low prediction accuracy or excessive
bit rate overhead. In this paper, we consider the camera motion model and scene structure in such video content
and propose a joint global and local homography motion coding approach for video with perspective motion.
The camera motion is estimated by a computer vision approach, and camera intrinsic and extrinsic parameters
are globally coded at the frame level. The scene is modeled as piece-wise planes, and three plane parameters
are coded at the block level. Fast gradient-based approaches are employed to search for the plane parameters
for each block region. In this way, improved prediction accuracy and low bit costs are achieved. Experimental
results based on the HEVC test model show that up to 9.1% bit rate savings can be achieved (with equal PSNR
quality) on test video content with perspective motion. Test sequences for the example applications showed a
bit rate savings ranging from 3.7 to 9.1%.
Sub-block motion derivation for merge mode in HEVC
Author(s):
Wei-Jung Chien;
Ying Chen;
Jianle Chen;
Li Zhang;
Marta Karczewicz;
Xiang Li
Show Abstract
The new state-of-the-art video coding standard, H.265/HEVC, has been finalized in 2013 and it achieves roughly 50%
bit rate saving compared to its predecessor, H.264/MPEG-4 AVC. In this paper, two additional merge candidates,
advanced temporal motion vector predictor and spatial-temporal motion vector predictor, are developed to improve
motion information prediction scheme under the HEVC structure. The proposed method allows each Prediction Unit
(PU) to fetch multiple sets of motion information from multiple blocks smaller than the current PU. By splitting a large
PU into sub-PUs and filling motion information for all the sub-PUs of the large PU, signaling cost of motion information
could be reduced. This paper describes above-mentioned techniques in detail and evaluates their coding performance
benefits based on the common test condition during HEVC development. Simulation results show that 2.4%
performance improvement over HEVC can be achieved.
Image compression algorithm using wavelet transform
Author(s):
Luis Cadena;
Franklin Cadena;
Konstantin Simonov;
Alexander Zotin;
Grigory Okhotnikov
Show Abstract
Within the multi-resolution analysis, the study of the image compression algorithm using the Haar wavelet has been performed. We have studied the dependence of the image quality on the compression ratio. Also, the variation of the compression level of the studied image has been obtained. It is shown that the compression ratio in the range of 8-10 is optimal for environmental monitoring. Under these conditions the compression level is in the range of 1.7 - 4.2, depending on the type of images. It is shown that the algorithm used is more convenient and has more advantages than Winrar. The Haar wavelet algorithm has improved the method of signal and image processing.
Layer-based buffer aware rate adaptation design for SHVC video streaming
Author(s):
Srinivas Gudumasu;
Ahmed Hamza;
Eduardo Asbun;
Yong He;
Yan Ye
Show Abstract
This paper proposes a layer based buffer aware rate adaptation design which is able to avoid abrupt video quality
fluctuation, reduce re-buffering latency and improve bandwidth utilization when compared to a conventional simulcast
based adaptive streaming system. The proposed adaptation design schedules DASH segment requests based on the
estimated bandwidth, dependencies among video layers and layer buffer fullness.
Scalable HEVC video coding is the latest state-of-art video coding technique that can alleviate various issues caused by
simulcast based adaptive video streaming. With scalable coded video streams, the video is encoded once into a number
of layers representing different qualities and/or resolutions: a base layer (BL) and one or more enhancement layers (EL),
each incrementally enhancing the quality of the lower layers. Such layer based coding structure allows fine granularity
rate adaptation for the video streaming applications.
Two video streaming use cases are presented in this paper. The first use case is to stream HD SHVC video over a
wireless network where available bandwidth varies, and the performance comparison between proposed layer-based
streaming approach and conventional simulcast streaming approach is provided. The second use case is to stream
4K/UHD SHVC video over a hybrid access network that consists of a 5G millimeter wave high-speed wireless link and a
conventional wired or WiFi network. The simulation results verify that the proposed layer based rate adaptation
approach is able to utilize the bandwidth more efficiently. As a result, a more consistent viewing experience with higher
quality video content and minimal video quality fluctuations can be presented to the user.
High speed, low-complexity image coding for IP-transport with JPEG XS
Author(s):
Thomas Richter;
Siegfried Fößel;
Joachim Keinert;
Antonin Descampe
Show Abstract
The JPEG committee (formally, ISO/IEC SC 29 WG 01) is currently investigating a new work item on near
lossless low complexity coding for IP streaming of moving images. This article discusses the requirements
and use cases of this work item, gives some insight into the anchors that are used for the purpose of
standardization, and provides a short update on the current proposals that reached the committee.
JPEG backward compatible coding of omnidirectional images
Author(s):
Martin Řeřábek;
Evgeniy Upenik;
Touradj Ebrahimi
Show Abstract
Omnidirectional image and video, also known as 360 image and 360 video, are gaining in popularity with the recent growth in availability of cameras and displays that can cope with such type of content. As omnidirectional visual content represents a larger set of information about the scene, it typically requires a much larger volume of information. Efficient compression of such content is therefore important. In this paper, we review the state of the art in compression of omnidirectional visual content, and propose a novel approach to encode omnidirectional images in such a way that they are still viewable on legacy JPEG decoders.
Visibility thresholds for visually lossy JPEG2000
Author(s):
Feng Liu;
Yuzhang Lin;
Eze L. Ahanonu;
Michael W. Marcellin;
Amit Ashok;
Elizabeth A. Krupinski;
Ali Bilgin
Show Abstract
Recently, Han et. al. developed a method for visually lossless compression using JPEG2000. In this method, visibility thresholds (VTs) are experimentally measured and used during quantization to ensure that the errors introduced by quantization are below these thresholds. In this work, we extend the work of Han et. al. to visually lossy regime. We propose a framework where a series of experiments are conducted to measure Just-Noticeable-Differences using the quantization distortion model introduced by Han et. al. The resulting thresholds are incorporated into a JPEG2000 encoder to yield visually lossy, JPEG2000 Part 1 compliant codestreams.
Prediction of visual saliency in video with deep CNNs
Author(s):
Souad Chaabouni;
Jenny Benois-Pineau;
Ofer Hadar
Show Abstract
Prediction of visual saliency in images and video is a highly researched topic. Target applications include Quality assessment of multimedia services in mobile context, video compression techniques, recognition of objects in video streams, etc. In the framework of mobile and egocentric perspectives, visual saliency models cannot be founded only on bottom-up features, as suggested by feature integration theory. The central bias hypothesis, is not respected neither. In this case, the top-down component of human visual attention becomes prevalent. Visual saliency can be predicted on the basis of seen data. Deep Convolutional Neural Networks (CNN) have proven to be a powerful tool for prediction of salient areas in stills. In our work we also focus on sensitivity of human visual system to residual motion in a video. A Deep CNN architecture is designed, where we incorporate input primary maps as color values of pixels and magnitude of local residual motion. Complementary contrast maps allow for a slight increase of accuracy compared to the use of color and residual motion only. The experiments show that the choice of the input features for the Deep CNN depends on visual task:for th eintersts in dynamic content, the 4K model with residual motion is more efficient, and for object recognition in egocentric video the pure spatial input is more appropriate.
Consecutive pedestrian tracking in large scale space
Author(s):
Jinpeng Lan;
Yi Xu
Show Abstract
Pedestrian tracking is an important and meaningful part of the computer vision topic. Given the position of pedestrian in
the first frame, our goal is to automatically determine the accurate position of the target pedestrian in every frame that
follows. Current tracking methods show good performance in short-term tracking. However, there are still some open
problems in real scenes, e.g. pedestrian re-identification under multi-camera surveillance and pedestrian tracking under
occlusions. In our paper, we proposed an efficient method for consecutive tracking, which can deal with the challenging
view changes and occlusions. Proposed tracker consists of short-time tracking mechanism and consecutive tracking
mechanism. The consecutive tracking mechanism will be activated while the target pedestrian is under occlusion or
changes dramatically in appearance. In consecutive tracking mechanism, proposed algorithm will detect the target
pedestrian using a coarse but fast feature as first level classifier and a fine feature as the last level classifier. After regaining
the accurate position of target pedestrian, the appearance model of the target pedestrian will be updated as historical
information and the short-time tracking mechanism will be activated again to continue tracking the target pedestrian.
Experimental results show that the proposed method can handle hard cases and achieve higher success rate than the current
existing methods.
No-reference face image assessment based on deep features
Author(s):
Guirong Liu;
Yi Xu;
Jinpeng Lan
Show Abstract
Face quality assessment is important to improve the performance of face recognition system. For instance, it is required to
select images of good quality to improve recognition rate for the person of interest. Current methods mostly depend on
traditional image assessment, which use prior knowledge of human vision system. As a result, the quality score of face
images shows consistency with human vision perception but deviates from the processing procedure of a real face
recognition system. It is the fact that the state-of-art face recognition systems are all built on deep neural networks.
Naturally, it is expected to propose an efficient quality scoring method of face images, which should show high consistency
with the recognition rate of face images from current face recognition systems. This paper proposes a non-reference face
image assessment algorithm based on the deep features, which is capable of predicting the recognition rate of face images.
The proposed face image assessment algorithm provides a promising tool to filter out the good input images for the real
face recognition system to achieve high recognition rate.
Deep RNNs for video denoising
Author(s):
Xinyuan Chen;
Li Song;
Xiaokang Yang
Show Abstract
Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.
Global velocity constrained cloud motion prediction for short-term solar forecasting
Author(s):
Yanjun Chen;
Wei Li;
Chongyang Zhang;
Chuanping Hu
Show Abstract
Cloud motion is the primary reason for short-term solar power output fluctuation. In this work, a new cloud motion
estimation algorithm using a global velocity constraint is proposed. Compared to the most used Particle Image Velocity
(PIV) algorithm, which assumes the homogeneity of motion vectors, the proposed method can capture the accurate
motion vector for each cloud block, including both the motional tendency and morphological changes. Specifically,
global velocity derived from PIV is first calculated, and then fine-grained cloud motion estimation can be achieved by
global velocity based cloud block researching and multi-scale cloud block matching. Experimental results show that the
proposed global velocity constrained cloud motion prediction achieves comparable performance to the existing PIV and
filtered PIV algorithms, especially in a short prediction horizon.
Towards an animated JPEG
Author(s):
Joël Theytaz;
Lin Yuan;
David McNally;
Touradj Ebrahimi
Show Abstract
Recently, short animated image sequences have become very popular in social networks. Most animated images are represented in GIF format. In this paper we propose an animated JPEG format, called aJPEG, which allows the standard JPEG format to be extended in a backward compatible way in order to cope with animated images. After presenting the proposed format, we illustrate it using two prototype applications: the first in form of a GIF-to-aJPEG converter on a personal computer and the second in form of an aJPEG viewer on a smart phone. The paper also reports the performance evaluation of aJPEG when compared to GIF. Experimental results show that aJPEG outperforms animated GIF in both file size overhead and image quality.
Super-resolution reconstruction algorithm based on adaptive convolution kernel size selection
Author(s):
Hang Gao;
Qian Chen;
Xiubao Sui;
Junjie Zeng;
Yao Zhao
Show Abstract
Restricted by the detector technology and optical diffraction limit, the spatial resolution of infrared imaging system is
difficult to achieve significant improvement. Super-Resolution (SR) reconstruction algorithm is an effective way to solve
this problem. Among them, the SR algorithm based on multichannel blind deconvolution (MBD) estimates the
convolution kernel only by low resolution observation images, according to the appropriate regularization constraints
introduced by a priori assumption, to realize the high resolution image restoration. The algorithm has been shown
effective when each channel is prime. In this paper, we use the significant edges to estimate the convolution kernel and
introduce an adaptive convolution kernel size selection mechanism, according to the uncertainty of the convolution
kernel size in MBD processing. To reduce the interference of noise, we amend the convolution kernel in an iterative
process, and finally restore a clear image. Experimental results show that the algorithm can meet the convergence
requirement of the convolution kernel estimation.
Evaluation of color grading impact in restoration process of archive films
Author(s):
Karel Fliegel;
Stanislav Vítek;
Petr Páta;
Petr Janout;
Jiří Myslík;
Josef Pecák;
Marek Jícha
Show Abstract
Color grading of archive films is a very particular task in the process of their restoration. The ultimate goal of color grading here is to achieve the same look of the movie as intended at the time of its first presentation. The role of the expert restorer, expert group and a digital colorist in this complicated process is to find the optimal settings of the digital color grading system so that the resulting image look is as close as possible to the estimate of the original reference release print adjusted by the expert group of cinematographers. A methodology for subjective assessment of perceived differences between the outcomes of color grading is introduced, and results of a subjective study are presented. Techniques for objective assessment of perceived differences are discussed, and their performance is evaluated using ground truth obtained from the subjective experiment. In particular, a solution based on calibrated digital single-lens reflex camera and subsequent analysis of image features captured from the projection screen is described. The system based on our previous work is further developed so that it can be used for the analysis of projected images. It allows assessing color differences in these images and predict their impact on the perceived difference in image look.
Facial landmark detection in real-time with correlation filtering
Author(s):
Viridiana Contreras;
Víctor H. Díaz-Ramírez
Show Abstract
An algorithm for facial landmark detection based on template matched filtering is presented. The algorithm is able to detect and estimate the position of a set of prespecified landmarks by employing a bank of linear filters. Each filter in the bank is trained to detect a single landmark that is located in a small region of the input face image. The filter bank is implemented in parallel on a graphics processing unit to perform facial landmark detection in real-time. Computer simulation results obtained with the proposed algorithm are presented and discussed in terms of detection rate, accuracy of landmark location estimation, and real-time efficiency.
Effective indexing for face recognition
Author(s):
I. Sochenkov;
A. Sochenkova;
A. Vokhmintsev;
A. Makovetskii;
A. Melnikov
Show Abstract
Face recognition is one of the most important tasks in computer vision and pattern recognition. Face recognition is useful
for security systems to provide safety. In some situations it is necessary to identify the person among many others. In this
case this work presents new approach in data indexing, which provides fast retrieval in big image collections. Data
indexing in this research consists of five steps. First, we detect the area containing face, second we align face, and then
we detect areas containing eyes and eyebrows, nose, mouth. After that we find key points of each area using different
descriptors and finally index these descriptors with help of quantization procedure. The experimental analysis of this
method is performed. This paper shows that performing method has results at the level of state-of-the-art face
recognition methods, but it is also gives results fast that is important for the systems that provide safety.
A technique of experimental and numerical analysis of influence of defects in the intraocular lens on the retinal image quality
Author(s):
Malwina Geniusz;
Marek Zając
Show Abstract
Intraocular lens (IOL) is an artificial lens implanted into the eye in order to restore correct vision after the removal
of natural lens cloudy due to cataract. The IOL prolonged stay in the eyeball causes the creation of different changes
on the surface and inside the implant mainly in form of small-size local defects such as vacuoles and calcium deposites.
Their presence worsens the imaging properties of the eye mainly due to occurence of scattered light thus deteriorating
the vision quality of patients after cataract surgery. It is very difficult to study influence the effects of these changes
on image quality in real patients. To avoid these difficulties two other possibilities were chosen: the analysis of the image
obtained in an optomechanical eye model with artificially aged IOL as well as numerical calculation of the image
characteristics while the eye lens is burdened with adequately modeled defects.
In experiments the optomechanical model of an eye consisting of a glass “cornea”, chamber filled with liquid
where the IOL under investigation was inserted and a high resulution CCC detector serving as a “retina” was used.
The Modulation Transfer Function (MTF) of such “eye” was evaluated on the basis of image of an edge. Experiments
show that there is significant connection between ageing defects and decrease in MTF parameters.
Numerical part was performed with a computer programme for optical imaging analysis (OpticStudio Professional,
Zemax Professional from Radiant Zemax, LLC). On the basis of Atchison eye model with lens burdened with defects
Modulation Transfer Functio was calculated. Particular parameters of defects used in a numerical model were based
on own measurements. Numerical simulation also show significant connection between ageing defects and decrease
of MTF parameters. With this technique the influence of types, density and distribution of local defect in the IOL
on the retinal image quality can be evaluated quickly without the need of performing very difficult and even dangereous
experiments on real human patients.
Sparsity based target detection for compressive spectral imagery
Author(s):
David Alberto Boada;
Henry Arguello Fuentes Sr.
Show Abstract
Hyperspectral imagery provides significant information about the spectral characteristics of objects and materials present
in a scene. It enables object and feature detection, classification, or identification based on the acquired spectral characteristics.
However, it relies on sophisticated acquisition and data processing systems able to acquire, process, store, and
transmit hundreds or thousands of image bands from a given area of interest which demands enormous computational
resources in terms of storage, computationm, and I/O throughputs. Specialized optical architectures have been developed
for the compressed acquisition of spectral images using a reduced set of coded measurements contrary to traditional architectures
that need a complete set of measurements of the data cube for image acquisition, dealing with the storage and
acquisition limitations. Despite this improvement, if any processing is desired, the image has to be reconstructed by an
inverse algorithm in order to be processed, which is also an expensive task. In this paper, a sparsity-based algorithm for
target detection in compressed spectral images is presented. Specifically, the target detection model adapts a sparsity-based
target detector to work in a compressive domain, modifying the sparse representation basis in the compressive sensing
problem by means of over-complete training dictionaries and a wavelet basis representation. Simulations show that the
presented method can achieve even better detection results than the state of the art methods.
Correlation-based tracking using tunable training and Kalman prediction
Author(s):
Sergio E. Ontiveros-Gallardo;
Vitaly Kober
Show Abstract
Tracking solves the problem of detecting and estimating the future target state in an input video sequence. In this work, an adaptive tracking algorithm by means of multiple object detections in reduced frame areas with a tunable bank of correlation filters is proposed. Prediction of the target state is carried out with the Kalman filtering. It helps us to estimate the target state, to reduce the search area in the next frame, and to solve the occlusion problem. The bank of composite filters is updated frame by frame with tolerance to different recent viewpoint and scale changes of the target. The performance of the proposed algorithm with the help of computer simulation is evaluated in terms of detection and location errors.
A robust HOG-based descriptor for pattern recognition
Author(s):
Julia Diaz-Escobar;
Vitaly Kober
Show Abstract
The Histogram of Oriented Gradients (HOG) is a popular feature descriptor used in computer vision and image processing. The technique counts occurrences of gradient orientation in localized portions of an image. The descriptor is sensible to the presence in images of noise, nonuniform illumination, and low contrast. In this work, we propose a robust HOG-based descriptor using the local energy model and phase congruency approach. Computer simulation results are presented for recognition of objects in images affected by additive noise, nonuniform illumination, and geometric distortions using the proposed and conventional HOG descriptors.
Polarization-correlation optical microscopy of anisotropic biological layers
Author(s):
A. G. Ushenko;
A. V. Dubolazov;
V. A. Ushenko;
Yu. A. Ushenko;
M. Yu. Sakhnovskiy;
V. N. Balazyuk;
O. Khukhlina;
K. Viligorska;
A. Bykov;
A. Doronin;
I. Meglinski
Show Abstract
The theoretical background of azimuthally stable method of Jones-matrix mapping of histological sections of biopsy of
myocardium tissue on the basis of spatial frequency selection of the mechanisms of linear and circular birefringence is
presented. The diagnostic application of a new correlation parameter – complex degree of mutual anisotropy – is
analytically substantiated. The method of measuring coordinate distributions of complex degree of mutual anisotropy
with further spatial filtration of their high- and low-frequency components is developed. The interconnections of such
distributions with parameters of linear and circular birefringence of myocardium tissue histological sections are found.
The comparative results of measuring the coordinate distributions of complex degree of mutual anisotropy formed by
fibrillar networks of myosin fibrils of myocardium tissue of different necrotic states – dead due to coronary heart disease
and acute coronary insufficiency are shown. The values and ranges of change of the statistical (moments of the 1st – 4th
order) parameters of complex degree of mutual anisotropy coordinate distributions are studied. The objective criteria of
differentiation of cause of death are determined.
A modified iterative closest point algorithm for shape registration
Author(s):
Dmitrii Tihonkih;
Artyom Makovetskii;
Vladislav Kuznetsov
Show Abstract
The iterative closest point (ICP) algorithm is one of the most popular approaches to shape registration. The algorithm
starts with two point clouds and an initial guess for a relative rigid-body transformation between them. Then it iteratively
refines the transformation by generating pairs of corresponding points in the clouds and by minimizing a chosen error
metric. In this work, we focus on accuracy of the ICP algorithm. An important stage of the ICP algorithm is the
searching of nearest neighbors. We propose to utilize for this purpose geometrically similar groups of points. Groups of
points of the first cloud, that have no similar groups in the second cloud, are not considered in further error minimization.
To minimize errors, the class of affine transformations is used. The transformations are not rigid in contrast to the
classical approach. This approach allows us to get a precise solution for transformations such as rotation, translation
vector and scaling. With the help of computer simulation, the proposed method is compared with common nearest
neighbor search algorithms for shape registration.
An automatic registration system of multi-view 3D measurement data using two-axis turntables
Author(s):
Dong He;
Xiaoli Liu;
Zewei Cai;
Hailong Chen;
Xiang Peng
Show Abstract
Automatic registration is a key researcher issue in 3D measurement field. In this work, we developed the
automatic registration system, which is composed of a stereo system with structured light and two axis
turntables. To realize the fully automatically 3D point registration, the novel method is proposed for
calibration the stereo system and the two turntable direction vector simultaneously. The plane calibration
rig with marked points was placed on the turntable and was captured by the left and right cameras of the
stereo system with different rotation angles of the two axis turntable. By the shot images, a stereo system
(intrinsically and extrinsically) was calibrated with classics camera model, and reconstruction 3D
coordinates of the marked points with different angle of the two turntable. The marked point in different
angle posted the specific circle, and the normal line of the circle around the turntable axis direction vector.
For the each turntable, different points have different circle and normal line, and the turntable axis
direction vector is calculated by averaging the different normal line. And the result show that, the
proposed registration system can precisely register point cloud under the different scanning angles. In
addition, there are no the ICP iterative procedures, and that make it can be used in registration of the
point cloud without the obvious features like sphere, cylinder comes and the other rotator.
Fast estimate of Hartley entropy in image sharpening
Author(s):
Zuzana Krbcová;
Jaromír Kukal;
Jan Svihlik;
Karel Fliegel
Show Abstract
Two classes of linear IIR filters: Laplacian of Gaussian (LoG) and Difference of Gaussians (DoG) are frequently used as high pass filters for contextual vision and edge detection. They are also used for image sharpening when linearly combined with the original image. Resulting sharpening filters are radially symmetric in spatial and frequency domains. Our approach is based on the radial approximation of unknown optimal filter, which is designed as a weighted sum of Gaussian filters with various radii. The novel filter is designed for MRI image enhancement where the image intensity represents anatomical structure plus additive noise. We prefer the gradient norm of Hartley entropy of whole image intensity as a measure which has to be maximized for the best sharpening. The entropy estimation procedure is as fast as FFT included in the filter but this estimate is a continuous function of enhanced image intensities. Physically motivated heuristic is used for optimum sharpening filter design by its parameter tuning. Our approach is compared with Wiener filter on MRI images.
Fast algorithm for calculation of linear variations
Author(s):
Fedor Alekseev;
Mikhail Alekseev;
Artyom Makovetskii
Show Abstract
Image restoration deals with functions of two variables. A function of two variables can be described by two
variations, namely total variation and linear variation. Linear variation is a topological characteristic of a
function of two variables. In this text we compare possible approaches to calculation of linear variation: the
straightforward one, based on conventional algorithms for connected component labeling, and also we present a
modification that exploits specificity of the problem to dramatically reduce complexity by reusing of intermediate
results. Possibilities for further optimizations are also discussed.
Mueller-matrix differentiation of fibrillar networks of biological tissues with different phase and amplitude anisotropy
Author(s):
A. G. Ushenko;
A. V. Dubolazov;
V. A. Ushenko;
Yu. A. Ushenko;
L. Ya. Kushnerick;
O. V. Olar;
N. V. Pashkovskaya;
Yu. F. Marchuk
Show Abstract
The work consists of investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of
analysis of laser autofluorescence coordinate distributions of biological tissues histological sections. A new model of
generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser
autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical
activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable
Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are
determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed.
Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of histological
sections of uterus wall tumor – group 1 (dysplasia) and group 2 (adenocarcinoma) are estimated.
Meteor tracking via local pattern clustering in spatio-temporal domain
Author(s):
Jaromír Kukal;
Martin Klimt;
Jan Švihlík;
Karel Fliegel
Show Abstract
Reliable meteor detection is one of the crucial disciplines in astronomy. A variety of imaging systems is used for meteor path reconstruction. The traditional approach is based on analysis of 2D image sequences obtained from a double station video observation system. Precise localization of meteor path is difficult due to atmospheric turbulence and other factors causing spatio-temporal fluctuations of the image background. The proposed technique performs non-linear preprocessing of image intensity using Box-Cox transform as recommended in our previous work. Both symmetric and asymmetric spatio-temporal differences are designed to be robust in the statistical sense. Resulting local patterns are processed by data whitening technique and obtained vectors are classified via cluster analysis and Self-Organized Map (SOM).
Thermography based diagnosis of ruptured anterior cruciate ligament (ACL) in canines
Author(s):
Norsang Lama;
Scott E. Umbaugh;
Deependra Mishra;
Rohini Dahal;
Dominic J. Marino;
Joseph Sackman
Show Abstract
Anterior cruciate ligament (ACL) rupture in canines is a common orthopedic injury in veterinary medicine. Veterinarians use both imaging and non-imaging methods to diagnose the disease. Common imaging methods such as radiography, computed tomography (CT scan) and magnetic resonance imaging (MRI) have some disadvantages: expensive setup, high dose of radiation, and time-consuming. In this paper, we present an alternative diagnostic method based on feature extraction and pattern classification (FEPC) to diagnose abnormal patterns in ACL thermograms. The proposed method was experimented with a total of 30 thermograms for each camera view (anterior, lateral and posterior) including 14 disease and 16 non-disease cases provided from Long Island Veterinary Specialists. The normal and abnormal patterns in thermograms are analyzed in two steps: feature extraction and pattern classification. Texture features based on gray level co-occurrence matrices (GLCM), histogram features and spectral features are extracted from the color normalized thermograms and the computed feature vectors are applied to Nearest Neighbor (NN) classifier, K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier with leave-one-out validation method. The algorithm gives the best classification success rate of 86.67% with a sensitivity of 85.71% and a specificity of 87.5% in ACL rupture detection using NN classifier for the lateral view and Norm-RGB-Lum color normalization method. Our results show that the proposed method has the potential to detect ACL rupture in canines.
Identification of superficial defects in reconstructed 3D objects using phase-shifting fringe projection
Author(s):
Carlos A. Madrigal;
Alejandro Restrepo;
John W. Branch
Show Abstract
3D reconstruction of small objects is used in applications of surface analysis, forensic analysis and tissue reconstruction
in medicine. In this paper, we propose a strategy for the 3D reconstruction of small objects and the identification of some
superficial defects. We applied a technique of projection of structured light patterns, specifically sinusoidal fringes and
an algorithm of phase unwrapping. A CMOS camera was used to capture images and a DLP digital light projector for
synchronous projection of the sinusoidal pattern onto the objects. We implemented a technique based on a 2D flat pattern
as calibration process, so the intrinsic and extrinsic parameters of the camera and the DLP were defined. Experimental tests
were performed in samples of artificial teeth, coal particles, welding defects and surfaces tested with Vickers indentation.
Areas less than 5cm were studied. The objects were reconstructed in 3D with densities of about one million points per
sample. In addition, the steps of 3D description, identification of primitive, training and classification were implemented
to recognize defects, such as: holes, cracks, roughness textures and bumps. We found that pattern recognition strategies
are useful, when quality supervision of surfaces has enough quantities of points to evaluate the defective region, because
the identification of defects in small objects is a demanding activity of the visual inspection.
Time-space analysis in photoelasticity images using recurrent neural networks to detect zones with stress concentration
Author(s):
Juan C. Briñez de León;
Alejandro Restrepo M.;
John W. Branch
Show Abstract
Digital photoelasticity is based on image analysis techniques to describe the stress distribution in birefringent materials subjected to mechanical loads. However, optical assemblies for capturing the images, the steps to extract the information, and the ambiguities of the results limit the analysis in zones with stress concentrations. These zones contain stress values that could produce a failure, making important their identification. This paper identifies zones with stress concentration in a sequence of photoelasticity images, which was captured from a circular disc under diametral compression. The capturing process was developed assembling a plane polariscope around the disc, and a digital camera stored the temporal fringe colors generated during the load application. Stress concentration zones were identified modeling the temporal intensities captured by every pixel contained into the sequence. In this case, an Elman artificial recurrent neural network was trained to model the temporal intensities. Pixel positions near to the stress concentration zones trained different network parameters in comparison with pixel positions belonging to zones of lower stress concentration.
Determination of particle sizes in hydraulic liquids based on image- and subpixel processing
Author(s):
Dmitriy V. Kornilin;
Ilya A. Kudryavtsev;
Alison J. McMillan;
Ardeshir Osanlou;
Ian Ratcliffe
Show Abstract
This paper describes the processing methods of signal obtained from the CMOS matrix sensor in terms of its
implementation for the in-line automatic particles counters. The methods involve an analysis of particles' tracks in terms
of its shapes and charge accumulated by each pixel. This combination gives an opportunity to determine an equivalent
diameter of particles and their shapes. These methods can be implemented using digital signal processors, which is very
important in the area of developing and producing the built into hydraulic systems sensors. The primary application of
developed methods is the diagnostic of the state of hydraulic systems in different areas.
A correlation-based algorithm for recognition and tracking of partially occluded objects
Author(s):
Alexey Ruchay;
Vitaly Kober
Show Abstract
In this work, a correlation-based algorithm consisting of a set of adaptive filters for recognition of occluded objects in still and dynamic scenes in the presence of additive noise is proposed. The designed algorithm is adaptive to the input scene, which may contain different fragments of the target, false objects, and background to be rejected. The algorithm output is high correlation peaks corresponding to pieces of the target in scenes. The proposed algorithm uses a bank of composite optimum filters. The performance of the proposed algorithm for recognition partially occluded objects is compared with that of common algorithms in terms of objective metrics.
Automatic optical inspection system design for golf ball
Author(s):
Hsien-Huang Wu;
Jyun-Wei Su;
Chih-Lin Chen
Show Abstract
ith the growing popularity of golf sport all over the world, the quantities of relevant products are increasing year by year. To create innovation and improvement in quality while reducing production cost, automation of manufacturing become a necessary and important issue. This paper reflect the trend of this production automa- tion. It uses the AOI (Automated Optical Inspection) technology to develop a system which can automatically detect defects on the golf ball. The current manual quality-inspection is not only error-prone but also very man- power demanding. Taking into consideration the competition of this industry in the near future, the development of related AOI equipment must be conducted as soon as possible. Due to the strong reflective property of the ball surface, as well as its surface dimples and subtle flaws, it is very difficult to take good quality image for automatic inspection. Based on the surface properties and shape of the ball, lighting has been properly design for image-taking environment and structure. Area-scan cameras have been used to acquire images with good contrast between defects and background to assure the achievement of the goal of automatic defect detection on the golf ball. The result obtained is that more than 973 of the NG balls have be detected, and system maintains less than 103 false alarm rate. The balls which are determined by the system to be NG will be inspected by human eye again. Therefore, the manpower spent in the inspection has been reduced by 903.
Total variation regularization with bounded linear variations
Author(s):
Artyom Makovetskii;
Sergei Voronin;
Vitaly Kober
Show Abstract
One of the most known techniques for signal denoising is based on total variation regularization (TV regularization). A
better understanding of TV regularization is necessary to provide a stronger mathematical justification for using TV
minimization in signal processing. In this work, we deal with an intermediate case between one- and two-dimensional
cases; that is, a discrete function to be processed is two-dimensional radially symmetric piecewise constant. For this case,
the exact solution to the problem can be obtained as follows: first, calculate the average values over rings of the noisy
function; second, calculate the shift values and their directions using closed formulae depending on a regularization
parameter and structure of rings. Despite the TV regularization is effective for noise removal; it often destroys fine
details and thin structures of images. In order to overcome this drawback, we use the TV regularization for signal
denoising subject to linear signal variations are bounded.
Dependences between kinetics of the human eye pupil and blood pulsation
Author(s):
Marta A. Szmigiel;
Henryk Kasprzak;
Anna Klysik
Show Abstract
The study presents measurement and numerical analysis of time variability of the eye pupil geometry and its
position, as well as their correlations with blood pulsation. The image of the eye pupil was recorded by use of the fast
CCD camera with 200 fps rates. Blood pulsation was synchronously recorded by use of pulse transducer with the
sampling frequency of 200 Hz. Each single image from a sequence was numerically processed. Contour of the eye pupil
was approximated, and its selected geometrical parameters as well as center positions were calculated. Spectral and
coherence analysis of time variability of calculated pupil parameters and blood pulsation were determined.
A local correlation based visual saliency model
Author(s):
Yang Li;
Xuanqin Mou
Show Abstract
We propose a novel local correlation based saliency model that is friendly to application of video coding. The
proposed model is developed in YCbCr color space. We extract feature maps with local mean and local contrast
of each channel image and its Gaussian blurred image, and produce rarity maps by calculating the correlation
between the feature maps of the original and blurred channels. The proposed saliency map is produced by a
combination of the local mean rarity maps and the local contrast rarity maps across all the channels. Experiments
validate that the proposed model works with excellent performance.
A fast preview restoration algorithm for space-variant degraded images
Author(s):
Victor Karnaukhov;
Vitaly Kober
Show Abstract
The paper deals with restoration of decimated images degraded by space-variant distortions. Such distortions occur in
real conditions when the camera in an actual shoot is shaken and rotated in three dimensions while its shutter is open.
The proposed method is locally adaptive image restoration in the domain of a sliding orthogonal transform. It is assumed
that the signal distortion operator is spatially homogeneous in a small sliding window. A fast preview restoration
algorithm for degraded images is proposed. To achieve the image restoration with low resolution at high rate, a fast
recursive algorithm for computing the sliding discrete cosine transform with arbitrary step is utilized. The proposed
algorithm is tested with spatially nonuniform distortion operators and obtained results are discussed.
Clustered impulse noise removal from color images with spatially connected rank filtering
Author(s):
Alexey Ruchay;
Vitaly Kober
Show Abstract
This paper deals with impulse noise removal from color images. The proposed noise removal algorithm employs two classical approaches for color image denoising; that is, detection of corrupted pixels and removal of the detected noise by means of local rank filtering. With the help of computer simulation we show that the proposed algorithm can effectively remove impulse noise and clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
An efficient algorithm for matching of SLAM video sequences
Author(s):
Jose A. González-Fraga;
Victor H. Diaz-Ramirez;
Vitaly Kober;
Juan J. Tapia-Higuera;
Omar Alvarez-Xochihua
Show Abstract
In this work, we propose a new algorithm for matching of coming video sequences to a simultaneous localization and
mapping system based on a RGB-D camera. Basically, this system serves for estimation in real-time the trajectory of
camera motion and generates a 3D map of indoor environment. The proposed algorithm is based on composite
correlation filters with adjustable training sets depending on appearance of indoor environment as well as relative
position and perspective from the camera to environment components. The algorithm is scale-invariant because it
utilizes the depth information from RGB-D camera. The performance of the proposed algorithm is evaluated in terms of
accuracy, robustness, and processing time and compared with that of common feature-based matching algorithms based
on the SURF descriptor.
Video quality assessment based on correlation between spatiotemporal motion energies
Author(s):
Peng Yan;
Xuanqin Mou
Show Abstract
Video quality assessment (VQA) has been a hot research topic because of rapid increase of huge demand of video
communications. From the earliest PSNR metric to advanced models that are perceptual aware, researchers have made
great progress in this field by introducing properties of human vision system (HVS) into VQA model design. Among
various algorithms that model the property of HVS perceiving motion, the spatiotemporal energy model has been validated
to be high consistent with psychophysical experiments. In this paper, we take the spatiotemporal energy model into VQA
model design by the following steps. 1) According to the pristine spatiotemporal energy model proposed by Adelson et al,
we apply the linear filters, which are oriented in space-time and tuned in spatial frequency, to filter the reference and test
videos respectively. The outputs of quadrature pairs of above filters are then squared and summed to give two measures of
motion energy, which are named rightward and leftward energy responses, respectively. 2) Based on the pristine model,
we calculate summation of the rightward and leftward energy responses as spatiotemporal features to represent perceptual
quality information for videos, named total spatiotemporal motion energy maps. 3) The proposed FR-VQA model, named
STME, is calculated with statistics based on the pixel-wise correlation between the total spatiotemporal motion energy
maps of the reference and distorted videos. The STME model was validated on the LIVE VQA Database by comparing
with existing FR-VQA models. Experimental results show that STME performs with excellent prediction accuracy and
stays in state-of-the-art VQA models.
Face recognition based on matching of local features on 3D dynamic range sequences
Author(s):
B. Adriana Echeagaray-Patrón;
Vitaly Kober
Show Abstract
3D face recognition has attracted attention in the last decade due to improvement of technology of 3D image acquisition
and its wide range of applications such as access control, surveillance, human-computer interaction and biometric
identification systems. Most research on 3D face recognition has focused on analysis of 3D still data. In this work, a new
method for face recognition using dynamic 3D range sequences is proposed. Experimental results are presented and
discussed using 3D sequences in the presence of pose variation. The performance of the proposed method is compared
with that of conventional face recognition algorithms based on descriptors.
Moving object detection via low-rank total variation regularization
Author(s):
Pengcheng Wang;
Qian Chen;
Na Shao
Show Abstract
Moving object detection is a challenging task in video surveillance. Recently proposed Robust Principal Component
Analysis (RPCA) can recover the outlier patterns from the low-rank data under some mild conditions. However, the
ℓ-penalty in RPCA doesn’t work well in moving object detection because the irrepresentable condition is often not
satisfied. In this paper, a method based on total variation (TV) regularization scheme is proposed. In our model, image
sequences captured with a static camera are highly related, which can be described using a low-rank matrix. Meanwhile,
the low-rank matrix can absorb background motion, e.g. periodic and random perturbation. The foreground objects in the
sequence are usually sparsely distributed and drifting continuously, and can be treated as group outliers from the
highly-related background scenes. Instead of ℓ-penalty, we exploit the total variation of the foreground. By minimizing
the total variation energy, the outliers tend to collapse and finally converge to be the exact moving objects. The
TV-penalty is superior to the ℓ-penalty especially when the outlier is in the majority for some pixels, and our method
can estimate the outlier explicitly with less bias but higher variance. To solve the problem, a joint optimization function
is formulated and can be effectively solved through the inexact Augmented Lagrange Multiplier (ALM) method. We
evaluate our method along with several state-of-the-art approaches in MATLAB. Both qualitative and quantitative results
demonstrate that our proposed method works effectively on a large range of complex scenarios.
Visual grouping under isoluminant condition: impact of mental fatigue
Author(s):
Tatjana Pladere;
Diana Bete;
Jurgis Skilters;
Gunta Krumina
Show Abstract
Instead of selecting arbitrary elements our visual perception prefers only certain grouping of information. There is ample
evidence that the visual attention and perception is substantially impaired in the presence of mental fatigue. The question
is how visual grouping, which can be considered a bottom-up controlled neuronal gain mechanism, is influenced. The
main purpose of our study is to determine the influence of mental fatigue on visual grouping of definite information –
color and configuration of stimuli in the psychophysical experiment. Individuals provided subjective data by filling in
the questionnaire about their health and general feeling. The objective evidence was obtained in the specially designed
visual search task were achromatic and chromatic isoluminant stimuli were used in order to avoid so called pop-out
effect due to differences in light intensity. Each individual was instructed to define the symbols with aperture in the same
direction in four tasks. The color component differed in the visual search tasks according to the goals of study. The
results reveal that visual grouping is completed faster when visual stimuli have the same color and aperture direction.
The shortest reaction time is in the evening. What is more, the results of reaction time suggest that the analysis of two
grouping processes compete for selective attention in the visual system when similarity in color conflicts with similarity
in configuration of stimuli. The described effect increases significantly in the presence of mental fatigue. But it does not
have strong influence on the accuracy of task accomplishment.
Face detection based on multiple kernel learning algorithm
Author(s):
Bo Sun;
Siming Cao;
Jun He;
Lejun Yu
Show Abstract
Face detection is important for face localization in face or facial expression recognition, etc. The basic idea is to determine
whether there is a face in an image or not, and also its location, size. It can be seen as a binary classification problem,
which can be well solved by support vector machine (SVM). Though SVM has strong model generalization ability, it has
some limitations, which will be deeply analyzed in the paper. To access them, we study the principle and characteristics
of the Multiple Kernel Learning (MKL) and propose a MKL-based face detection algorithm. In the paper, we describe the
proposed algorithm in the interdisciplinary research perspective of machine learning and image processing. After analyzing
the limitation of describing a face with a single feature, we apply several ones. To fuse them well, we try different kernel
functions on different feature. By MKL method, the weight of each single function is determined. Thus, we obtain the face
detection model, which is the kernel of the proposed method. Experiments on the public data set and real life face images
are performed. We compare the performance of the proposed algorithm with the single kernel-single feature based
algorithm and multiple kernels-single feature based algorithm. The effectiveness of the proposed algorithm is illustrated.
Keywords: face detection, feature fusion, SVM, MKL
Features of volume with transparent particles correlation analysis
Author(s):
Tatiana A. Vovk;
Nikolay V. Petrov
Show Abstract
In this paper we propose a technique of the distribution of transparent particles suspended in a volume investigation.
We use a method that implies division of the volume into a plurality of layers containing particles therein.
An inline hologram of this volume is reconstructed in two adjacent layers, which are compared using correlation
function. We have derived dependencies of correlation on particle parameters. We perform an experimental
validation of this study. This technique is useful for applications that require instant assessment of the particle
distribution.
Comprehensive evaluation for fused images of multispectral and panchromatic images based on entropy weight method
Author(s):
Xiaojie Xia;
Yan Yuan;
Lijuan Su;
Liang Hu
Show Abstract
An evaluation model of image fusion based on entropy weight method is put forward to resolve evaluation issue for fused results of multispectral and panchromatic images, such as the lack of overall importance in single factor metric evaluation and the discrepancy among different categories of characteristic evaluation. In this way, several single factor metrics in different aspects of image are selected to form a metric set, then the entropy weights for each single factor index are calculated based on entropy weight method, thus a new comprehensive evaluation index is obtained to evaluate each fused image and images with higher spectral resolution and spatial resolution can be acquired. Experimental analysis shows that the proposed method is of versatility, objectivity and rationality and performs well on the evaluation of fused results of multispectral and panchromatic images.
Defect inspection in hot slab surface: multi-source CCD imaging based fuzzy-rough sets method
Author(s):
Liming Zhao;
Yi Zhang;
Xiaodong Xu;
Hong Xiao;
Chao Huang
Show Abstract
To provide an accurate surface defects inspection method and make the automation of robust image region of interests(ROI) delineation strategy a reality in production line, a multi-source CCD imaging based fuzzy-rough sets method is proposed for hot slab surface quality assessment. The applicability of the presented method and the devised system are mainly tied to the surface quality inspection for strip, billet and slab surface etcetera. In this work we take into account the complementary advantages in two common machine vision (MV) systems(line array CCD traditional scanning imaging (LS-imaging) and area array CCD laser three-dimensional (3D) scanning imaging (AL-imaging)), and through establishing the model of fuzzy-rough sets in the detection system the seeds for relative fuzzy connectedness(RFC) delineation for ROI can placed adaptively, which introduces the upper and lower approximation sets for RIO definition, and by which the boundary region can be delineated by RFC region competitive classification mechanism. For the first time, a Multi-source CCD imaging based fuzzy-rough sets strategy is attempted for CC-slab surface defects inspection that allows an automatic way of AI algorithms and powerful ROI delineation strategies to be applied to the MV inspection field.
Advances to the development of a basic Mexican sign-to-speech and text language translator
Author(s):
G. Garcia-Bautista;
F. Trujillo-Romero;
G. Diaz-Gonzalez
Show Abstract
Sign Language (SL) is the basic alternative communication method between deaf people. However, most of the
hearing people have trouble understanding the SL, making communication with deaf people almost impossible and
taking them apart from daily activities. In this work we present an automatic basic real-time sign language translator
capable of recognize a basic list of Mexican Sign Language (MSL) signs of 10 meaningful words, letters (A-Z) and
numbers (1-10) and translate them into speech and text. The signs were collected from a group of 35 MSL signers
executed in front of a Microsoft Kinect™ Sensor. The hand gesture recognition system use the RGB-D camera to
build and storage data point clouds, color and skeleton tracking information. In this work we propose a method to
obtain the representative hand trajectory pattern information. We use Euclidean Segmentation method to obtain the
hand shape and Hierarchical Centroid as feature extraction method for images of numbers and letters. A pattern
recognition method based on a Back Propagation Artificial Neural Network (ANN) is used to interpret the hand
gestures. Finally, we use K-Fold Cross Validation method for training and testing stages. Our results achieve an
accuracy of 95.71% on words, 98.57% on numbers and 79.71% on letters. In addition, an interactive user interface
was designed to present the results in voice and text format.
A face recognition algorithm based on thermal and visible data
Author(s):
Ilya Sochenkov;
Dmitrii Tihonkih;
Aleksandr Vokhmintcev;
Andrey Melnikov;
Artyom Makovetskii
Show Abstract
In this work we present an algorithm of fusing thermal infrared and visible imagery to identify persons. The proposed
face recognition method contains several components. In particular this is rigid body image registration. The rigid
registration is achieved by a modified variant of the iterative closest point (ICP) algorithm. We consider an affine
transformation in three-dimensional space that preserves the angles between the lines. An algorithm of matching is
inspirited by the recent results of neurophysiology of vision. Also we consider the ICP minimizing error metric stage for
the case of an arbitrary affine transformation. Our face recognition algorithm also uses the localized-contouring
algorithms to segment the subject’s face; thermal matching based on partial least squares discriminant analysis. Thermal
imagery face recognition methods are advantageous when there is no control over illumination or for detecting disguised
faces. The proposed algorithm leads to good matching accuracies for different person recognition scenarios (near
infrared, far infrared, thermal infrared, viewed sketch). The performance of the proposed face recognition algorithm in
real indoor environments is presented and discussed.
Image processing and pattern recognition with CVIPtools MATLAB toolbox: automatic creation of masks for veterinary thermographic images
Author(s):
Deependra K. Mishra;
Scott E. Umbaugh;
Norsang Lama;
Rohini Dahal;
Dominic J. Marino;
Joseph Sackman
Show Abstract
CVIPtools is a software package for the exploration of computer vision and image processing developed in the Computer Vision and Image Processing Laboratory at Southern Illinois University Edwardsville. CVIPtools is available in three variants – a) CVIPtools Graphical User Interface, b) CVIPtools C library and c) CVIPtools MATLAB toolbox, which makes it accessible to a variety of different users. It offers students, faculty, researchers and any user a free and easy way to explore computer vision and image processing techniques. Many functions have been implemented and are updated on a regular basis, the library has reached a level of sophistication that makes it suitable for both educational and research purposes. In this paper, the detail list of the functions available in the CVIPtools MATLAB toolbox are presented and how these functions can be used in image analysis and computer vision applications. The CVIPtools MATLAB toolbox allows the user to gain practical experience to better understand underlying theoretical problems in image processing and pattern recognition. As an example application, the algorithm for the automatic creation of masks for veterinary thermographic images is presented.