Transfer learning for early detection and classification of amblyopia
Author(s):
Marc Bosch;
Christopher M. Gifford;
David P. Harvie;
Gerhard W. Cibis;
Arvin Agah
Show Abstract
Amblyopia, also known as lazy eye, affects 2-3% of children. If ambylopia is not treated successfully during early childhood, it will persist into adulthood. One of the causes of amblyopia is strabismus, which is a misalignment of the eyes. In this paper, we have investigated several neural network architectures as universal feature extractors for two tasks: (1) classification of eye images to detect strabismus, and (2) detecting the need to be referred to a specialist. We have examined several state-of-the-art backbone architectures for feature extraction, as well as several classifier frameworks. Through these experiments, we observed that VGG19 and random forest classifier offer the overall best performance for both classification tasks. We also observed that when top-performing architectures are fused together, even with simple rules such as a median filter, overall performance improves.
Non-invasive real-time monitoring of skin flap in mouse model using laser speckle imaging modality
Author(s):
Jinhyuck Im;
Hyunseon Yu;
Jihoon Kim;
Byungjo Jung
Show Abstract
Skin necrosis may occur due to no blood circulation after skin grafting. In present, it may not be easy to make sure that the blood is normally circulated to the trans implanted skin flap. A real-time laser speckle imaging modality (LSIM) developed in our laboratory was utilized to evaluate early skin necrosis. Two experiments were performed: 1) in-vitro optical tissue phantom (OTP) experiment to quantitatively identify the feasibility of blood flow variation; 2) in-vivo animal skin flap surgery experiment to induce skin necrosis in mouse. In comparison of laser speckle images (spatial and temporal speckle contrast analysis) and color images after skin flap surgery, laser speckle image resulted in better efficiency in evaluating skin necrosis than the color image. In laser speckle images, the temporal contrast image was more efficient than spatial contrast image in evaluating skin necrosis. In conclusion, it may be useful employing the real-time LSIMin noninvasively evaluating blood flow variation in skin necrosis.
Projection based subcutaneous vein detection imaging modality and its feasibility evaluation in optical phantom and human: a preliminary study
Author(s):
Hyunseon Yu;
Jinhyuck Im;
Jihoon Park;
Byungjo Jung
Show Abstract
In clinical diagnosis, subcutaneous vein detection may be a useful method to investigate the morphological information of skin and to perform intravenous injection. Although an enhanced image may provide more effective information than a bare eye, medical doctors may feel uncomfortness due to the visual offset between the images displayed on the monitor and the actual target. This study was aimed to develop a projection based real-time subcutaneous vein detection imaging modality (PSVDIM) and to evaluate its feasibility in optical tissue phantom (OTP) and human. Projection-based technology allows users to intuitively know the size and location of the region of interest. The PSVDIM consists of a Near Infrared (NIR) camera and eight NIR LEDs, and laboratory built program based on MATLAB . The images acquired with or without PSVDIM were compared to evaluate the performance of PSVDIM. With the PSVDIM, it was possible to find blood vessels that were not clearly distinguished by bare eye.
Motion robust imaging photoplethysmography in defocus blurring
Author(s):
Yuheng Wu;
Lingqin Kong;
Fei Chen;
Yuejin Zhao;
Liquan Dong;
Ming Liu;
Mei Hui;
Xiaohua Liu;
Cuiling Li;
Weijie Wang
Show Abstract
Non-contact, imaging photoplethysmography (IPPG) uses video sequence to measure variations in light absorption, caused by blood volume pulsations, to extract cardiopulmonary parameters including heart rate (HR), pulse rate variability, and respiration rate. Previous researches most focused on extraction of these vital signs base on the focus video, which require a static and focusing environment. However, little has been reported about the influence of defocus blur on IPPG signal’s extraction. In this research, we established an IPPG optical model in defocusing motion conditions. It was found that the IPPG signal is not sensitive to defocus blur by analysis the light intensity distribution in the defocus images. In this paper, a real-time measurement of heart rate in defocus and motion conditions based on IPPG was proposed. Automatically select and track the region of interest (ROI) by constructing facial coordinates through facial key points detection, obtained the IPPG signal. The signal is de-noised to obtain the spectrum by the wavelet filtering, color-distortion filter (CDF) and fast Fourier transform (FFT). The peak of the spectrum is corresponded to heartbeats. Experimental results on a data set of 30 subjects show that the physiological parameters include heart rate and pulse wave, derived from the defocus images captured by the IPPG system, exhibit characteristics comparable to conventional the blood volume pulse (BVP) sensor. Contrast experiment show that the difference between the results measured by both methods is within 3 beat per minute (BPM). This technology has significant potential for advancing personal health care and telemedicine in motion situation.
Real-time and robust heart rate measurement for multi-people in motion using IPPG
Author(s):
Weijie Wang;
Lingqin Kong;
Yuejin Zhao;
Baoling Han;
Ming Liu;
Liquan Dong;
Mei Hui;
Yuheng Wu;
Qifan Deng
Show Abstract
In the field of biomedical monitoring, Image Photoplethysmography (IPPG) enables contactless monitoring of resting heart rate (HR). However, while people are in motion such as head rotation, walking back and forth, or jogging in situ, the measurement accuracy of HR is susceptive to motion-induced signal distortion. In addition, in the scene of multi-people, how to accurately distinguish each person’s signal in real time is a critical issue. In this paper, a robust and real-time HR measurement system for multi-people using Open Computer Vision library (OpenCV) library is proposed, which mainly consists of five parts: face detection by feature points acquirement, a novel, fast yet simple face tracking, region of interest (ROI) adaptive extraction for increasing motion tolerance, signal processing for pulse extraction, and HR calculation via fast Fourier transform (FFT) under the double threads framework. Using Bland-Altman plots and Pearson’s correlation coefficient (CC), the HR estimated from videos recorded by a color CCD camera is compared to a figure blood volume pulse (BVP) senor to analyze agreement. The experiment results on 28 subjects show that the max average absolute error of HR estimation is less than 5 beats per minute (BPM), and that the CC is 0.910. In our case, the frame rate is 25 frames per second (FPS) for concomitant measurement of 7 subjects with a resolution of 1024×768 pixels. Overall, our HR measurement system for multi-people meets the requirements of accuracy, motion robustness, and real-time performance, and better extends the application range of IPPG technology.
Improve methodology for tumor detection in mammogram images
Author(s):
Luis Cadena
Show Abstract
Breast cancer is a serious and become common disease that affects thousands of women in the world each year. Early detection is essential and critical for effective treatment and patient recovery. This work gives an idea of extracting features from the mammogram image to find affected area, which is a crucial step in breast cancer detection and verification. We present the affected area identification through in which place the tumor cells are extracted directly from the grey scale mammogram image. To remove noise from the mammogram image this work presents a simple and efficient technique using fast average filter, to determine the pixel value in the noise less image. To contour detection used shearlet transform and classic filters as like Sobel, Prewitt, and others. To evaluate the quality of contour used SSIM measure. Our experimental results demonstrate that our approach can achieve the better performance in time duration of reduce noise and with shearlet transform select affected area with high efficiency.
Evaluating resolution in live cell structured illumination microscopy
Author(s):
Jakub Pospíšil;
Karel Fliegel;
Jan Švihlík;
Miloš Klíma
Show Abstract
In the last decade, several different structured illumination microscopy (SIM) approaches have been developed. Precise determination of the effective spatial resolution in a live cell SIM reconstructed image is essential for reliable interpretation of reconstruction results. Theoretical resolution improvement can be calculated for every SIM method. In practice, the final spatial resolution of the cell structures in the reconstructed image is limited by many different factors. Therefore, assessing the resolution directly from the single image is an inherent part of the live cell imaging. There are several commonly used resolution measurement techniques based on image analysis. These techniques include full-width at half maximum (FWHM) criterion, or Fourier ring correlation (FRC). FWHM measurement requires fluorescence beads or sharp edge/line in the observed image to determine the point spread function (PSF). FRC method requires two stochastically independent images of the same observed sample. Based on our experimental findings, the FRC method does not seem to be well suited for measuring the resolution of SIM live cell video sequences. Here we show a method based on the Fourier transform analysis using power spectral density (PSD). In order to estimate the cut-off frequency from a noisy signal, we use PSD estimation based on Welch's method. This method is widely used in non-parametric power spectra analysis. Since the PSD-based metric can be computed from a single SIM image (one video frame), without any prior knowledge of the acquiring system, it can become a fundamental tool for imaging in live cell biology.
Comparative analysis of smoothing filters in confocal microscopy images
Author(s):
Manuel G. Forero;
Reynel Peña-Ambrosio;
Jaime Sánchez-Tarquino;
Camilo Restrepo-Taborda;
Diana Rojas-Rodríguez
Show Abstract
Confocal microscopy is a widely used tool in the biomedical area, allowing to obtain 3D images with a high spatial resolution. Despite having advantages over conventional microscopy, the analysis of cellular images through confocal microscopy is a complicated process due to the very low S/N, so the use of filters is necessary to reduce noise. However, this step normally affects the quality of the edges, making them more diffuse. In addition, images acquired in confocal microscopy are affected by distortions introduced by lenses and the acquisition system. Therefore, it is possible to improve the edges definition by eliminating distortions. This is done by means of deconvolution methods such as the Wiener filter. Furthermore, in recent years a new generation of smoothing filters have been developed that seek to reduce Gaussian-type noise, without losing edges. However, a study has not been carried out to determine if these filters can be used in the elimination of noise in confocal microscopy, which are contaminated with Poisson noise. Therefore, in this work we present a comparative study of ten filters for the elimination of noise in confocal microscopy: median, anisotropic diffusion, bilateral, propagated, improved propagated, Rudin-Osher-Fatemi (ROF), TVL1, non-local means, K-SVD, and Wavelet 'A trous' and Haar filters, with and without preprocessing images with the Wiener filter, taking as criteria the noise reduction and the conservation of edges.
Classification of ground objects from remote sensing image with close spectral curves based on modified density mixture model
Author(s):
Xinyu Zhou;
Ye Zhang
Show Abstract
As the sharply development of remote sensing technology, spatial resolution and spectral resolution become much higher in hyperspectral images than before. Commonly, spectral differences often be used in distinguish objects that are difficult to be classified, especially, which share the same color or texture. However, the spectral features are not as unique as we think. In many cases, spectral curves of same materials may be different and, on the contrary, that of different materials may be same. In this condition, false alarm and missing alarm probability are high. To solve this problem, a modified density mixture model is provided. Firstly, each band of data is whitened to remove the correlation between the pixels in order to reduce redundancy. Secondly, the whitened result will be handled by the weighted multivariate normal distribution model. Then, several pixels of each kind of objects are taken to build an spectral library. Finally, Spectral Angle Mapping (SAP) is applied to classification by matching with spectral library. The result demonstrates that objects are classified precisely with low false alarm and missing alarm probability, for the spectral difference of the same kind of objects decreases, and that of different kinds of objects increases compared with the data before processing.
A jigger ship's automatic detection method based on VIIRS DNB data
Author(s):
C. X. Gao;
Chengcheng Xue;
Shi Qiu;
Qi Wang;
Jian Hu;
Chuanrong Li;
Yaokai Liu;
Yonggang Qian
Show Abstract
The Visible Infrared Imaging Radiometer (VIIRS) day/night band (DNB) onboard Suomi National Polar-orbiting Partnership (NPP) satellite offers a wide range of applications at night, ranging from fire detection, meteorological phenomena to observations of anthropogenic light sources. It is becoming a useful tool to monitor and quantify these ships by detecting the light emitted by the lamps. In this study, a threshold-based method is presented to automatically identify the ships. Before detection, several pre-processing steps including contrast enhancement and instrument noise removal are conducted; then the background value is subtracted from the original image to reduce the blurred area around the target, which can further make the gathered ships isolated; In addition, the effects of some interference sources such as ionospheric energetic particles and thin clouds are also taken into consideration for improving the detection rate. Finally, the proposed threshold-based method is applied to the DNB images over study areas in Yellow Sea and Bohai Sea in China. The detection results show that the proposed method can detect more than 81% of ships when comparing with those from Automatic Identification System (AIS).
Exploiting camera rolling shutter to detect high frequency signals
Author(s):
Lena Franklin;
David Huber
Show Abstract
Rolling shutter-based image sensors generate images by sequentially illuminating individual pixel rows. While this often results in unwanted image distortion for scenes with motion, we propose methods to exploit the temporal behavior of the rolling shutter to detect periodic changes. Temporal information can be extracted from single rolling shutter frames, but one is limited by the number of pixels that the motion extends in the frame. However, with several rolling shutter frames we can extract very high frequencies, without aliasing, that are far above the nominal Nyquist limit established for global shutter cameras based upon the same camera frame rate. Applying the Lomb-Scargle periodogram permits a frequency analysis of sources that extend only a few pixel rows in the image.
Identification of breakwater damage by processing video with the SURF algorithm
Author(s):
Roberto Herrera-Charles;
Miguel A. Vergara;
Carlos A. Hernandez;
Erick G. Morales
Show Abstract
This article shows the application of the advantages offered by the SURF algorithm for the detection of points of interest in the video images, which are monitored in real time, of the concrete units that form a breakwater. This procedure of monitoring and analysis of the images allows determining the displacement suffered by the elements or shells of the breakwater and consequently the damages submerge the breakwaters. This technique is applied in modeling studies in hydraulic coastal laboratories. Damage can be weighted as a percentage of the total number of armor units on the slope of the breakwater, per unit of area covered by video camera monitoring and digital image processing with the SURF algorithm to determine the movements of the elements of the housing properly and efficiently.
Parallelization and multi-threaded latency constrained parallel coding of JPEG XS
Author(s):
Thomas Richter;
Joachim Keinert;
Siegfried Fößel
Show Abstract
This paper discusses the background, challenges, opportunities for parallelization of low-latency video transport of JPEG XS streams over the real-time network protocol (RTP). Transport of compressed video signals over a constant bitrate channel requires smoothing buffers at encoder and decoder side. To limit their sizes in partical implementations, part 2 of the JPEG XS standard defines a normative buffer model along a sequential hypothetical reference decoder. Due to its sequential nature, it cannot be directly applied to high-speed applications that necessarily depend on multithreading. In this paper, the JPEG XS Part 2 buffer model is introduced, then the current state of JPEG XS RTP transport standardization is discussed, and multiple strategies for multithreaded encoding under latency contraints within this application are reported and analyzed.
JPEG XL next-generation image compression architecture and coding tools
Author(s):
Jyrki Alakuijala;
Ruud van Asseldonk;
Sami Boukortt;
Martin Bruse;
Iulia-Maria Comșa;
Moritz Firsching;
Thomas Fischbacher;
Evgenii Kliuchnikov;
Sebastian Gomez;
Robert Obryk;
Krzysztof Potempa;
Alexander Rhatushnyak;
Jon Sneyers;
Zoltan Szabadka;
Lode Vandevenne;
Luca Versari;
Jan Wassenberg
Show Abstract
An update on the JPEG XL standardization effort: JPEG XL is a practical approach focused on scalable web distribution and efficient compression of high-quality images. It will provide various benefits compared to existing image formats: significantly smaller size at equivalent subjective quality; fast, parallelizable decoding and encoding configurations; features such as progressive, lossless, animation, and reversible transcoding of existing JPEG; support for high-quality applications including wide gamut, higher resolution/bit depth/dynamic range, and visually lossless coding. Additionally, a royalty-free baseline is an important goal. The JPEG XL architecture is traditional block-transform coding with upgrades to each component. We describe these components and analyze decoded image quality.
Rust AV1 encoder (rav1e) project
Author(s):
Luca Barbato;
David M. Barr;
Ivan Molodetskikh;
Christopher Montgomery;
Shreevari S. P.;
Raphaël A. Zumer;
Nathan E. Egge
Show Abstract
Last year, the Alliance for Open Media released its next-generation video codec AV1. It achieves better compression than proprietary competitors, while its patents can be licensed via a royalty-free, open-source friendly license. With a broad array of industry support including all major browser vendors, many hardware partners, internet streaming and video conferencing providers, we think this is our best chance yet to create a successful video codec that achieves wide deployment. However publishing the AV1 standard is just the first step towards broad adoption. Now that the bitstream is finalized research has shifted to encoder algorithms for use in real-world, production environments. The Rust AV1 Encoder (rav1e) project at Mozilla is a clean room AV1 implementation targeting a variety of operating points. Where the standardization effort focused on tools that improved objective metrics, the rav1e project is focused on algorithms that improve perceived video quality.
A new end-to-end image compression system based on convolutional neural networks
Author(s):
Pinar Akyazi;
Touradj Ebrahimi
Show Abstract
In this paper, two new end-to-end image compression architectures based on convolutional neural networks are presented. The proposed networks employ 2D wavelet decomposition as a preprocessing step before training and extract features for compression from wavelet coefficients. Training is performed end-to-end and multiple models operating at di↵erent rate points are generated by using a regularizer in the loss function. Results show that the proposed methods outperform JPEG compression, reduce blocking and blurring artifacts, and preserve more details in the images especially at low bitrates.
Assessment of quality of JPEG XL proposals based on subjective methodologies and objective metrics
Author(s):
Pinar Akyazi;
Touradj Ebrahimi
Show Abstract
The Joint Photographic Experts Group (JPEG) is currently in the process of standardizing JPEG XL, the next generation image coding standard that o↵ers substantially better compression efficiency than existing image formats. In this paper, the quality assessment framework of proposals submitted to the JPEG XL Call for Proposals is presented in details. The proponents were evaluated using objective metrics and subjective quality experiments in three di↵erent laboratories, on a dataset constructed for JPEG XL quality assessment. Subjective results were analyzed using statistical significance tests and presented with correlation measures between the results obtained from di↵erent labs. Results indicate that a number of proponents superseded the JPEG standard and performed at least as good as the state-of-the-art anchors in terms of both subjective and objective quality on SDR and HDR contents, at various bitrates.
Perceptual quantization matrices for high dynamic range H.265/MPEG-HEVC video coding
Author(s):
Dan Grois;
Alex Giladi
Show Abstract
In this work, perceptual quantization matrices for high-resolution High Dynamic Range (HDR) video coding have been developed for optimizing perceived visual quality and for reducing video transmission bit-rate, further making a special emphasis on the Ultra High Definition (UltraHD) resolution and H.265/MPEG-HEVC video coding standard. According to the proposed coding scheme, perceptual quantization matrices are first calculated according to Human Visual System (HVS) characteristics and based on predefined viewing conditions, and then utilized during the encoding loop for removing non-perceptible visual information. The above-mentioned predefined conditions include, for example, a target HDR display resolution and a variety of display characteristics, such as a distance between a user and display, luminance levels, and many others. According to the detailed experimental results presented in this work, visual quality of the UltraHD HDR video sequences is significantly improved, for substantially the same bit-rate, in terms of both SSIMPlus and PSNR objective quality metrics. On the other hand, the video transmission bit-rate is reduced by up 11.3% and 2.4%, respectively, while keeping the visual quality at substantially the same level.
Content-adaptive frame level rate control for video encoding using a perceptual video quality measure
Author(s):
Tamar Shoham;
Dror Gill;
Sharon Carmel;
Nikolay Terterov;
Pavel Tiktov
Show Abstract
Reaching an optimal trade-off between maximal perceptual quality of reconstructed video and minimal bitrate of the compressed video stream under a maximum bitrate constraint, for a wide variety of content, is a significant challenge and one that has major cost and user-experience implications for video content providers and consumers alike. This challenge is often addressed with content adaptive encoding, and generally strives to reach the optimal bit-rate per content at clip or scene level. Our solution presented herein, goes a step further, and performs encoder adaptation at the frame level. In this paper we describe our closed loop, optimized video encoder which performs encoding to the lowest bitrate which still preserves the perceptual quality of an encode to the target bitrate. The optimization is performed on a frame-by-frame basis, guaranteeing the visual quality of the video, in a manner that minimizes additional encoder complexity, thus making the solution applicable for live or real-time encoding. We also describe our subjectively tuned, low complexity, perceptual video quality metric which is the engine driving this solution.
Overnight large-scale subjective video quality assessment using automatic test generation and crowdsourcing Internet marketplaces
Author(s):
Tamar Shoham;
Dror Gill;
Sharon Carmel;
Dan Julius
Show Abstract
Subjective quality feedback by actual human viewers is crucial for reliable evaluation of various solutions and configurations for video processing or encoding. However, it is generally a time consuming and expensive process. Therefore, in many cases evaluation of video quality is done using objective measures, which may be poorly correlated with actual subjective results. In order to address this issue, we have developed VISTA (VIsual Subjective Testing Application), an easy to use application for visually comparing pairs of video sequences played synchronously side by side, and a user interface for indicating the relative subjective quality of the two video sequences. In addition, we developed a system for automating the evaluation process called Auto-VISTA. The system receives as input guidelines for the required testing session, prepares the content to be compared, launches the app in a crowdsourcing Internet marketplace (such as Amazon Mechanical Turk), and performs collection and analysis of the results. Thus, obtaining large scale subjective feedback becomes cheap and accessible, which in turn allows for fast and reliable evaluation cycles of different video encoding and processing solutions, or tuning various configurations and settings for a given solution.
A NR-IQA based deep neural network for tone mapping HDR images
Author(s):
Minseok Choi;
Pilkyu Park;
Kwang Pyo Choi;
Tejas Nair
Show Abstract
The most recent High Dynamic Range (HDR) standard, HDR10+, achieves good picture quality by incorporating dynamic metadata that carry frame-by-frame information for tone mapping while most HDR standards use static tone mapping curves that apply across the entire video. Since it is laborious to acquire hand-crafted best-fitting tone mapping curve for each frame, there have been attempts to derive the curves from input images. This paper proposes the neural network framework that generates tone mapping on a frame-by-frame basis. Although a number of successful tone mapping operators (TMOs) have been proposed over the years, evaluation of tone mapped images still remains a challenging topic. We define an objective measure to evaluate tone mapping based on Non-Reference Image Quality Assessment (NR-IQA). Experiments show that the framework produces good tone mapping curves and makes the video more vivid and colorful.
Deep learning and video quality analysis
Author(s):
P. Topiwala;
M. Krishnan;
W. Dai
Show Abstract
For more than 30 years, the video coding industry has been using mean-squared error-based PSNR as a measure of video quality, despite evidence of its inadequacy. Moreover, in the encoder, SAD is used instead of MSE to save multiplications. We quantify how these measures are inadequately correlated to subjective scores and obtain new measures that correlate much better. We focus on the problem of full-reference assessment of video degraded only by coding and scaling errors, such as experienced by streaming services, and put aside issues of transmission, such as timing jitters, rebufferings, etc. We begin with the Video Multi- Assessment Fusion (VMAF) algorithm introduced by Netflix. Results with up to 97% correlation accuracy to subjective scores are reported on two Netflix datasets, using a neural network model.
Flicker reduction method for 120 fps shooting under 100 Hz light fluctuation by using a double rolling shutter
Author(s):
Kohei Tomioka;
Toshio Yasue;
Ryohei Funatsu;
Kodai Kikuchi;
Kazuya Kitamura;
Yuichi Kusakabe;
Tomoki Matsubara
Show Abstract
This study proposes a flicker reduction method for 120 fps shooting under 100 Hz light fluctuation. In 120 fps videos, a 100 Hz light fluctuation causes a 20 Hz flicker, which is an aliasing artifact induced by its sampling frequency. In this method, the frame period of 1/120 s is divided into 1/150 s and 1/600 s exposure by using a double rolling shutter. Each pixel alternatively outputs 1/150 s and 1/600 s exposure signals which are readout by a readout circuit operated at double the rate of a normal 120 fps operation. A 120 fps signal with an exposure time of 1/100 s is obtained by summing three consecutive signals with exposures of 1/600, 1/150, and 1/600 s. This method is effective for flicker reduction even in the presence of several light sources with different amplitudes and phases. We implemented this method to an 8K camera and examined the flicker reduction effect for an 8K 120 fps video. As a result, the 20 Hz flicker was suppressed to less than one-tenth.
Blur and noisy image restoration for near real time applications
Author(s):
. Gyandendra;
Rahul Kumar;
Brajesh Kumar Kaushik;
R. Balasubramanian
Show Abstract
Image restoration of blur and noisy images can be performed in either of the two ways i.e. denoising after deblurring or deblurring after denoising. While performing deblurring after denoising, the residual noise is greatly amplified due to the subsequent deblurring process. In case of denoising after deblurring, the denoising stage severely blurs the image and leads to inadequate restoration. Denoising can be done mainly in two ways namely, linear filtering and non-linear filtering. The former one is fast and easy to implement. However, it produces a serious image blurring. Nonlinear filters can efficiently overcome this limitation and results in highly improved filtering performance but at the cost of high computational complexity. Few filtering algorithms have been proposed for performing image denoising and deblurring simultaneously. This paper presents a novel algorithm for the restoration of blur and noisy images for near real time applications. The proposed algorithm is based on PSF (Point Spread Function) estimation and Wiener filtering. The Wiener filter removes the additive noise and inverts the blurring simultaneously and thus performs an optimal trade-off between inverse filtering and noise suppressing. The Wiener filtering minimizes the overall mean square error in the process of noise suppressing. The PSF used for Wiener filtering is estimated using blind deconvolution. This is a noniterative process and provides faster results.
Investigation of moving objects through atmospheric turbulence from a non-stationary platform
Author(s):
Nicholas Ferrante;
Jérôme Gilles;
Shibin Parameswaran
Show Abstract
In this work, we extract the optical flow field corresponding to moving objects from an image sequence of a scene impacted by atmospheric turbulence and captured from a moving camera. Our procedure first computes the optical flow field and creates a motion model to compensate for the flow field induced by camera motion. After subtracting the motion model from the optical flow, we proceed with our previous work, Gilles et al,1 where a spatial-temporal cartoon+texture inspired decomposition is performed on the motion-compensated flow field in order to separate flows corresponding to atmospheric turbulence and object motion. Finally, the geometric component is processed with the detection and tracking method and is compared against a ground truth. All of the sequences and code used in this work are open source and are available by contacting the authors.
Propagation of quantization error in performing intra-prediction with deep learning
Author(s):
Raz Birman;
Yoram Segal;
Avishay David-Malka;
Ofer Hadar;
Ron Shmueli
Show Abstract
Standard video compression algorithms use multiple “Modes”, which are various linear combinations of pixels for prediction of their neighbors within image Macro-Blocks (MBs). In this research, we are using Deep Neural Networks (DNN) with supervised learning to predict block pixels. Using DNNs and employing intra-block pixel values’ calculations that penetrate into the block, we manage to obtain improved predictions that yield up to 200% reduction of residual block errors. However, using intra-block pixels for predictions brings upon interesting tradeoffs between prediction errors and quantization errors. We explore and explain these tradeoffs for two different DNN types. We further discovered that it is possible to achieve a larger dynamic range of quantization parameter (Qp) and thus reach lower bit-rates than standard modes, which already saturate at these Qp levels. We explore this phenomenon and explain its reasoning.
MPEG-5: essential video coding standard
Author(s):
K. Choi;
M. W. Park;
K. P. Choi;
J. Park;
J. Chen;
Y.-K. Wang;
R. Chernyak;
S. Ikonin;
D. Rusanovskyy;
W.-J. Chien;
V. Seregin;
M. Karczewicz
Show Abstract
MPEG-5 Essential Video Coding Standard is currently being prepared as the video coding standard of ISO/IEC Moving Picture Experts Group. The main goal of the EVC standard development is to provide a significantly improved compression capability over existing video coding standards with timely publication of availability terms. This paper provides an overview of the feature and the characteristics of the MPEG-5 EVC standard.
Hadamard transform domain filter for video coding
Author(s):
S. Ikonin;
V. Stepin;
R. Chernyak;
J. Chen
Show Abstract
This paper proposes a filter for video coding in Hadamard transform domain. The filter is applied to decoded samples at the block level directly after reconstruction. The filter parameters are derived from the coded information depending on quantization parameter (QP) avoiding additional signaling overhead. The filter is designed in hardware friendly manner targeting to exclude multiplication and division operations by using look-up table (LUT). Total LUT size is 70 bytes where only 16 of them are required for one block filtering with certain QP. That allows to keep required loop-up table in one 128- bit register which is beneficial for SIMD software implementation. The method was tested on top of VTM2.0 according to the JVET common test conditions. The experimental results demonstrated 0.7% of bitrate reduction with 3% of encoding time and 1% of decoding time increase.
Noise suppression filter for video coding
Author(s):
R. Chernyak;
V. Stepin;
S. Ikonin;
J. Chen
Show Abstract
In this paper we propose a non-local in-loop filter for video coding called Noise Suppression Filter (NSF). The filter is based on block matching procedure and performs video signal filtering in a transform domain. Filter parameters are derived from a reconstructed signal and do not require additional signaling. NSF can be applied to both luma and chroma components or for luma component only to decrease the complexity. Experimental results shows that the for random access configuration applying NSF for luma and chroma components provides 1.0%, 2.0% and 1.9% of BD-rate saving for Y, Cb and Cr components correspondently with 100% encoding time and 134% decoding time compared to VTM 1.0. For lumaonly NSF demonstrates 1.0% Y BD-rate saving with 100% encoding time and 123% decoding time.
Data Adaptive HDR Compression in VVC
Author(s):
Pankaj Topiwala;
Madhu Krishnan;
Wei Dai
Show Abstract
This paper presents an advanced approach to HDR/WCG video coding developed at FastVDO called FVHDR, and built on top of the Versatile Video Coding (VVC) VTM-5.0 test model of the Joint Video Exploration Team, a joint committee of ITU|ISO/IEC. A fully automatic adaptive video process that differs from a known HDR video processing chain (analogous to HDR10, and herein called “anchor”), is used. FVHDR works entirely within the framework of the VVC software model but adds additional tools. These tools can become an integral part of a future video coding standard or be extracted as additional pre- and post-processing chains. Reconstructed video sequences using FVHDR show an improved subjective visual quality to the output of the anchor. Moreover, the resultant SDR content generated by the data adaptive grading process is backward compatible.
Transform skip residual coding for the versatile video coding standard
Author(s):
Benjamin Bross;
Tung Nguyen;
Heiko Schwarz;
Detlev Marpe;
Thomas Wiegand
Show Abstract
The development of the emerging Versatile Video Coding (VVC) standard was motivated by the need of significant bit-rate reductions for natural video content as well as content for different applications, such as computer generated screen content. The signal characteristics of screen content video are different to the ones of natural content. These include sharp edges as well as at areas of the same color. In block-based hybrid video coding designs, as employed in VVC and its predecessors standards, skipping the transform stage of the prediction residual for screen content signals can be beneficial due to the different residual signal characteristics. In this paper, a modified transform coefficient level coding tailored for transform skip residual signals is presented. This includes no signaling of the last significant position, a coded block ag for every subblock, modified context modeling and binarization as well as a limit for the number of context coded bins per sample. Experimental results show bit-rate savings up to 3.45% and 9.55% for two different classes of screen content test sequences coded in a random access configuration.
Performance comparison of VVC, AV1 and EVC
Author(s):
Pankaj Topiwala;
Madhu Krishnan;
Wei Dai
Show Abstract
This paper presents a study comparing the coding efficiency performance of three video codecs: (a) the Versatile Video Coding (VVC); (b) AV1 codec of the Alliance for Open Media (AOM); and (c) the MPEG-5 Essential Video Coding (EVC). Two approaches to coding were used: (i) constant quality (QP) for VVC, AV1, EVC; and (ii) target bit rate (VBR) for AV1. Constant quality encoding is performed with all the three codecs for an unbiased comparison of the core coding tools. Whereas, target bitrate coding is done with the AV1 codec to study the compression efficiency achieved with rate control, which can and does have a significant impact. Performance is tabulated for on two fronts: (1) objective performance based on PSNR’s and (2) informal subjective assessment. Our general conclusion derived from the assessment of objective metrics and subjective evaluation is that VVC appears to be superior to AV1 and EVC under both constant quality and target bitrate coding. However, relative to currently popular codecs such as AVC and HEVC, that difference is modest.
Intra prediction using multiple reference lines for the versatile video coding standard
Author(s):
Yao-Jen Chang;
Hong-Jheng Jhu;
Hui-Yu Jian;
Liang Zhao;
Xin Zhao;
Xiang Li;
Shan Liu;
Benjamin Bross;
Paul Keydel;
Heiko Schwarz;
Detlev Marpe;
Thomas Wiegand
Show Abstract
This paper provides a technical overview of the most probable modes (MPM)-based multiple reference line (M-MRL) intra-picture prediction that was adopted into the Versatile Video Coding (VVC) standard draft at the 12th JVET meeting. M-MRL applies not only the nearest reference line but also farther reference lines to MPMs for intra-picture prediction. The highlighted aspects of the adopted M-MRL scheme include the signaling of the reference line index, discontinuous reference lines, the reference sample construction and prediction for farther reference lines, and the joint reference line and intra mode decisions at encoder side. Experimental results are provided to evaluate the performance of M-MRL on top of the VVC test model VTM-2.0.1 together with an analysis of discontinuous reference lines. The presented M-MRL provides 0.5% bitrate savings for an all intra and 0.2% for a random access configuration on average.
Perceptually-inspired super-resolution of compressed videos
Author(s):
Di Ma;
Mariana Afonso;
Fan Zhang;
David R. Bull
Show Abstract
Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality. These approaches are usually trained to minimise pixel-based losses such as Mean-Squared Error (MSE), despite the fact that this type of loss metric does not correlate well with subjective opinions. In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial up-sampling of compressed video using a modified CNN model, which has been trained using a generative adversarial network (GAN) on compressed content with perceptual loss functions. The proposed method was integrated with HEVC HM 16.20, and has been evaluated on the JVET Common Test Conditions (UHD test sequences) using the Random Access configuration. The results show evident perceptual quality improvement over the original HM 16.20, with an average bitrate saving of 35.6% (Bjøntegaard Delta measurement) based on a perceptual quality metric, VMAF.
AV1 In-loop super-resolution framework
Author(s):
Urvang Joshi;
Debargha Mukherjee;
Yue Chen;
Sarah Parker;
Adrian Grange;
Hui Su
Show Abstract
The AV1 codec added a new in-loop super-resolution mode that allows a frame to be encoded at a lower resolution, and then super-resolved normatively to the full resolution, before updating the reference buffers. While encoding at lower resolution and super-resolving to a higher resolution is not a new concept, this is the first time such a mode has been normatively incorporated in a standardized video codec. To this end, AV1 has not only added support for across-scale motion prediction to allow predicting a lower resolution version of a frame from higher resolution reference buffers, but also made various simplifications to the super-resolving process itself after reconstruction to make it both software and hardware-friendly in implementation. Specifically, the super-resolving process in AV1 comprises of normative linear upscaling, followed by a restoration operation to recover the high frequencies using another AV1 tool called loop-restoration that includes a Wiener or Self-guided filter selected in a block switchable manner. Further, in order to enable a cost-effective hardware solution with limited line-buffers, this mode only allows the upscaling/downscaling operation to be horizontal.
In this paper, we provide the details of the super-resolution mode in AV1 and some results showcasing the benefits of the same.
Point cloud compression on the basis of 3D motion estimation and compensation
Author(s):
Junsik Kim;
Jiheon Im;
Sungryeul Rhyu;
Kyuheon Kim
Show Abstract
The point cloud is a medium that visualizes various information by placing a point having a color value and a geometry value in a three-dimensional space. The point cloud uses dozens and millions of points for visualization of information, and the key point of commercialization of this point cloud video is to efficiently compress a large amount of information of point cloud and transmit it to users. Currently, MPEG V-PCC is conducting dynamic point cloud compression research using the 2D video codec, where motion estimation is conducted in terms of 2D video sequences. Thus, there is a limitation in estimating the motion in 3D point cloud contents. In this paper, we propose the method to use the 3D motion for point cloud video compression. The proposed technology achieves efficient compression rate and improves accuracy in lossy compression.
A method of level of details control table for 3D point density scalability in video based point cloud compression
Author(s):
Jiheon Im;
Junsik Kim;
Sungryeul Rhyu;
Kyuheon Kim
Show Abstract
Recently, the emergence of 3D cameras, 3D scanners and various cameras including Lidar is expected to promote various 3D media applications such as AR, VR and autonomous mobile vehicles. The 3D media can be recently realized by not only 2D pixel and depth information but also points with texture and geometry information. In particular, 3D point cloud data is consisted of hundreds of thousands to millions of 3D points, and thus dramatically increases its data size compared to 2D media data, which brings up the development of an efficient encoding/decoding technology. Also, it is required to develop a scalability function such as Level of Detail (LoD) for both an effective service with different bandwidths, devices and Region of Interest (RoI). In this paper, we propose a new LoD quality parameter considering characteristics of 3D point cloud contents instead of bitrate change based on a video codec in MPEG Video-based Point Cloud Compression (V-PCC). Therefore, the use of LoD table proposed in this paper is confirmed to generate 3D point cloud contents with different point density.
3D map generation based on keyframes selection and keypoints tracking
Author(s):
Jose A. Gonzalez-Fraga;
Vitaly Kober;
Omar Alvarez-Xochihua
Show Abstract
There are several factors that affect the performance of a 3D scene reconstruction system. Among them the most important are the choice of feature detectors and descriptors, number of visual features and correct match between them, reliable tracking of the correspondences along selected keyframes.
In this work, we propose a fast method for generation of a 3D map from the time sequence of RGB-D images selecting the minimum number of keypoints and keyframes that still ensures correct feature correspondences and as a result a high quality of the 3D map. The performance of the proposed algorithm of the 3D scene reconstruction is evaluated by computer simulation using real indoor environment data.
Per-pixel calibration using CALTag and dense 3D point cloud reconstruction
Author(s):
Karelia Pena-Pena;
Xiao Ma;
Daniel L. Lau;
Gonzalo R. Arce
Show Abstract
This paper proposes a multimodal imaging system that allows reconstructing a dense 3D spectral point cloud. The system consists of an Intel RealSense D415 depth camera that includes active infrared stereo and a NuruGo Smart Ultraviolet (UV) camera. RGB and Near Infrared (NIR) images are obtained from the first camera and UV from the second one. The novelty of this work is in the application of a perpixel calibration method using CALTag (High Precision Fiducial Markers for Camera Calibration) that outperforms traditional cameras calibration, which is based on a pinhole-camera model and a checker pattern. The new method eliminates both lens distortions and depth distortion with simple calculations on a Graphics Processing Unit (GPU), using a rail calibration system. To this end, the undistorted 3D world coordinates for every single pixel are generated using only six parameters and three linear equations. The traditional pinhole camera model is substituted by two polynomial mapping models. One handles lens distortions and the other one handles the depth distortions. The use of CALTag instead of traditional checkerboards allows overcoming failures during calibration due to clipping or occlusion of the calibration pattern. Multiple point clouds from different points of view of an object are registered using iterative closest point (ICP) algorithm. Finally, a deep neural network for point set upsampling is used as part of the post-processing to generate a dense 3D point cloud.
An exploratory study towards objective quality evaluation of digital hologram coding tools
Author(s):
Roberto Corda;
Antonin Gilles;
Kwan-Jung Oh;
Antonio Pinheiro;
Peter Schelkens;
Cristian Perra
Show Abstract
Holography is an acquisition and reproduction technique of visual content which allows, theoretically, for the reconstruction of the acquired scene without any difference with its real-world counterpart. The objective quality assessment of digital holograms coding tools is a very challenging problem because the signal properties of holograms are significantly different from those of regular images. Several approaches can be devised for holography compression and objective quality evaluation. The exploratory study presented in this paper aims at assessing a procedure for objective quality evaluation of data compression tools when applied to the hologram plane.
Towards practical hologram streaming using progressive coding
Author(s):
Anas El Rhammad;
Patrick Gioia;
Antonin Gilles;
Marco Cagnazzo
Show Abstract
Digital holography is an emerging technology for 3D visualization which is expected to dethrone conventional stereoscopic devices in the future. Aside from their specific signal properties, high quality holograms with broad viewing angles contain massive amount of data. For a reasonable transmission time, efficient scalable compression schemes are needed to bridge the gap between the overwhelming volume of data and the limited bandwidth of the communication channels. The viewpoint scalability is a powerful property since it allows to encode and transmit only the information corresponding to the observer’s view. However, this approach imposes an online encoding at the server which may increase the latency of the transmission chain. To overcome this hurdle, we propose a scalable compression framework based on Gabor-wavelets decomposition, where the whole hologram is encoded offline. First, the observer plane is divided into spatial blocks. Then, the Gabor atoms are assigned to these blocks by exploiting the duality between Gabor wavelets and light rays. The atoms of each block are then classified into different layers according to their importance for the reconstruction and encoded in packets. At the decoder side, the atoms’ packets are progressively decoded based on the viewer’s position. Then, the corresponding sub-hologram is generated using a GPU implementation. Results show that our approach enables a practical progressive streaming of digital holograms with a low latency.
Overview of MV-HEVC prediction structures for light field video
Author(s):
Vasileios Avramelos;
Glenn Van Wallendael;
Peter Lambert
Show Abstract
Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions.
JPEG Pleno light field coding technologies
Author(s):
Peter Schelkens;
Pekka Astola;
Eduardo A. B. da Silva;
Carla Pagliari;
Cristian Perra;
Ioan Tabus;
Osamu Watanabe
Show Abstract
JPEG Pleno provides a standard framework to facilitate the capture, representation, and exchange of light field, point cloud and holographic imaging modalities. JPEG Pleno Part 2 addresses coding of light field data. Two coding modes are supported for this modality. The first mode exploits the redundancy in this 4D data by utilizing a 4D transform technique, the second mode is based on 4D prediction. Both techniques are outlined in this paper as well as the file format that encapsulates the resulting codestreams.
Performance analysis of JPEG Pleno light field coding
Author(s):
Cristian Perra;
Pekka Astola;
Eduardo A. B. da Silva;
Hesam Khanmohammad;
Carla Pagliari;
Peter Schelkens;
Ioan Tabus
Show Abstract
Light fields can nowadays be acquired by several methods and devices in the form of light field images, which are at the core of new forms of media technologies. Many research challenges are still open in light field imaging, such as data representation formats, data compression tools, communication protocols, subjective and objective quality of experience measurement metrics and methods. This paper presents a brief overview of the current architecture of the JPEG Pleno light field coding standard under development within the JPEG committee (ISO/IEC JTC1/SC29/WG1). Thereafter, a comparative analysis between the performance of the JPEG Pleno Light Field codec under various modes and configurations and the performance of the considered anchor codecs is reported and discussed.
Rendering-dependent compression and quality evaluation for light field contents
Author(s):
Irene Viola;
Keita Takahashi;
Toshiaki Fujii;
Touradj Ebrahimi
Show Abstract
Light field rendering promises to overcome the limitations of stereoscopic representation by allowing for a more seamless transition between multiple point of views, thus giving a more faithful representation of 3D scenes. However, it is indisputable that there is a need for light field displays on which the data can be natively visualised, fuelled by the recent innovations in the realm of acquisition and compression of light field contents. Assessing the visual quality of light field contents on native light field display is of extreme importance in future development of both new rendering methods, as well as new compression solutions. However, the limited availability of light field displays restrict the possibility of using them to carry out subjective tests. Moreover, hardware limitations in prototype models may lessen considerably the perceptual quality of experience in consuming light field contents. In this paper, we compare three different compression approaches for multi-layer displays, through both objective quality metrics and subjective quality assessment. Furthermore, we analyze the results obtained through subjective tests conducted using a prototype multi-layer display, and a recently-proposed framework to conduct quality assessment of light field contents rendered through a tensor display simulator in 2D screens. Using statistical tools, we assess the correlation among the two settings and we draw useful conclusions for future design of compression solutions and subjective tests for light field contents with muti-layer rendering.
Gaussian noise estimation methods in images
Author(s):
Manuel G. Forero;
Sergio L. Miranda
Show Abstract
A common and investigated problem in image processing is noise, especially Gaussian, because it affects image quality and the subsequent treatment. Therefore, different filtering techniques are created and improved every day. Some of these techniques require to know the approximate value of the noise level, so it is essential to develop noise estimation methods. Given that there are not comparatives studies of these techniques, in this work an evaluation of six relevant methods was developed. The techniques were implemented as java plugins for the free access software ImageJ. They were evaluated and compared using standard and synthetic images. Several of the methods assume certain parameter values. Here, the validity of these assertions was studied, observing that they were not valid in most of them. It was found, in most cases the evaluated techniques did not allow to obtain a good estimation of the noise level.
Accurate image dehazing with three simultaneously captured hazed images
Author(s):
José Luis López-Martínez;
Vitaly Kober;
Vladimir Saptsin;
Olga Kober
Show Abstract
In this work, we capture hazed images with three similar cameras possessing a sufficiently large depth of field (wide angle lenses). So, the foreground, middle-ground and background of the scene appear sharp and clear. The cameras are located at the vertices of an isosceles right triangle and take the same part of the scene. We propose a restoration algorithm based on three observed degraded images. It is assumed that degraded images contain information about an original image, hazing function, and noise. A dehazing algorithm explicitly solves a linear system of equations derived from a quadratic objective function. Experimental results obtained with the proposed method are presented and discussed.
Restoration of depth-based space-variant blurred images
Author(s):
Yitzhak Yitzhaky;
Lior Graham
Show Abstract
Over the last decades, extensive work was done on image de-blurring using different approaches. Most studies assumed that the entire image is equally distorted (space-invariant blur). In such cases, by knowing of finding the single point spread function (PSF) of the distortion, the entire image can be restored using the same distortion PSF. Various attempts have been done also to reconstruct blurred images degraded by a space-variant defocus blur. Here we assume that different areas in the image may contain different levels of Gaussian-like blur, and may include also sharp regions. Gaussian-like blur can approximate distortions such as out-of-focus and long atmospheric path. In the first step we construct a blur map by estimating edge widths at many locations in the image. We assume that the blur map resembles a depth map where the blur severity depends on the distance of the objects from the image, as occurs in limited depth-of-field imaging, focused close to the camera. In the second step the image is divided into a number of non-overlapping layers (regions) according to the blur severity. This means that in each region the blur size is within a relatively small range. Then, in each of these regions we approximate its local PSF according to a best-step-edge based method. Next, each region is de-blurred using a Total Variation reconstruction method. In the final step all the restored regions are combined into a single reconstructed image.
Visual cryptography based robust copyright protection scheme to secure online social networking content with multiple owners
Author(s):
Sonal Kukreja;
Geeta Kasana;
Singara Singh Kasana
Show Abstract
Most of the existing visual cryptography (VC) based watermarking schemes consider images of a single owner, however, some real-life applications demand images with multiple owners. In these schemes, pixel expansion, meaningless shares, use of codebook and low robustness against certain attacks are some other challenges. To overcome such challenges, a robust and secure copyright protection scheme for color images is proposed. This scheme uses VC, transform domain and chaos technique. VC ensures the security of the scheme while other techniques are applied to enhance the robustness of the scheme. Transform domain techniques are applied to the R, G and B components of the image to extract its features. These features are used to create the master share. This master share along with the respective watermarks are used to construct the key shares for every owner. The key shares should be some meaningful images and not random looking, so that they do not create any suspicion for some secret information being shared. Hence the constructed key shares are hidden in meaningful cover images using Kukreja et al. scheme. To prove the copyright of the image, the key share stored with the owner is superimposed with the master share to retrieve the watermark. The novelty of the scheme is that it can protect the image having multiple owners by using multiple watermarks, still the original image remains unmodified, as the watermark is not embedded inside it but is hidden in the key shares. The experimental results can prove that the proposed scheme has strong robustness against different image processing attacks, perfect imperceptibility and also satisfy blindness and security properties. Comparisons with the existing schemes show the effectiveness of the proposed scheme.
Banknotes classification system through image processing and pattern recognition for people with visual impairment
Author(s):
Gustavo Andres Moreno H.;
Manuel G. Forero;
Kelly Tamayo Z.
Show Abstract
In Colombia, the recognition of banknotes by blind people is increasingly difficult, because it requires extensive training and is increasingly difficult due to the emergence of new bills, aging, and circulation of counterfeit bills. To contribute to this recognition process, a classification system was carried out for eleven denominations of Colombian banknotes applying image processing and pattern recognition techniques. A prototype was developed, consisting of a frontal lighting system to eliminate shadows and detail textures. For the recognition of the banknotes, two stages were developed: The detection was carried out by means of image processing, the background was eliminated by means of binarization. Then, uniform interest points were taken over the entire bill, and a set of descriptors was obtained in a different color space. For identification, a database of 1100 samples was created, 100 of each bill, which was used to train different neural networks MLP, structuring a pipeline to vary the configuration parameters to obtain the model with greater accuracy. 70% of the data was used for training and 30% for verification. An initial accuracy of 85% was obtained by cross-validation. A significant improvement was achieved by adding new features and increasing the number of samples to 7700 by manipulating them, modifying the brightness and rotating the bills. With the new data, it was possible to increase the accuracy to 95% by cross-validation. The system was mounted on a Raspberry PI 3 for its practical application. A final test was done with 240 images captured in real time, 20 images from each banknote, making each prediction in 0.9 seconds and getting a general accuracy of 97%.
Hass avocado classification by color and volume using a Kinect sensor
Author(s):
Gustavo Andres Moreno H.;
Manuel G. Forero;
Felipe Gómez A.;
Mauricio Ramírez N.
Show Abstract
In this article an automatic Hass avocado grading system is introduced. The equipment consists of a conveyor belt and an artificial vision system. The color classification was made by determining a color scale according to agricultural standards. From the depth image, the Z coordinate of the points was obtained and a function that represents the shape of the avocado was constructed. It was used to calculate the volume of the avocado. Fifty avocados were classified (one avocado per second). The volume found was validated by the immersion of the avocados in water and correlating the displacement with the estimated dimension.
The impact of message quality on entity location and identification performance in distributed situational awareness
Author(s):
Chad T. Bates;
Arie Croitoru;
Andrew Crooks;
Eric Harclerode
Show Abstract
Location and time are critical to the success of many organizations’ missions. Sensors, software, processors, vehicles, and human analysts work together to accomplish these tasks of detecting and identifying specific entities as quickly as possible for these missions. This work aims to make a contribution by providing a team-based detection and identification performance model incorporating the theory of Distributed Situational Awareness (DSA) and its effect on completing a specific task. The task being the ability to detect and identify a specific entity within a complex urban environment. Conditions to accomplish the task is the utilization of two unmanned aerial vehicles mounted with electrooptical sensors, operated by two analysts, creating a team to execute this task. Our results provide an additional resource on the how technology and training might be utilized to find the best performance given these certain conditions and missions. A highly trained team might improve their performance with this technology, or a team with low training could perform at a high level given the appropriate technology in limited time scenarios. More importantly, the model presented in this paper provides an evaluation tool to compare new technologies and their impact on teams. Specifically, it enables answering questions, such as: is an investment in new technology appropriate if investing in additional training produces the same performance results? Future performance can also be evaluated based on the team’s level of training and use of technology for these specific tasks.
Energy based image steganography using dynamic programming
Author(s):
Ron Shmueli;
Tal Shmueli;
Ofer Hadar
Show Abstract
Image steganography is the art of hiding information in a cover image in such a way that a third party does not notice the hidden information. This paper presents a novel technique for image steganography in the spatial domain. The new method hides and recovers hidden information of substantial length within digital imagery, while maintaining the size and quality of the original image. The image gradient is used to generate a saliency image, which represent the energy of each pixel in the image. Pixels with higher energy are more salient and they are valuable for hiding data since their visual impairment is low. From the saliency image, a cumulative maximum energy matrix is created; this matrix is used to generate horizontal seams that pass over the maximum energy path. By embedding the secret bits of information along the seams, a stego-image is created which contains the hidden message. In the stegoimage, we ensure that the hidden data is invisible, with very small perceived image quality degradation. The same algorithms are used to reconstruct the hidden message from the stego-image. Experiments have been conducted using two types of image and two types of hidden data to evaluate the proposed technique. The experimental results show that the proposed algorithm has a high capacity and good invisibility, with a Peak Signal-to-Noise Ratio (PSNR) of about 70, and a Structural SIMilarity index (SSIM) of about 1.
Real-time multi-criteria classification of facial images
Author(s):
Radoslav Marinov;
Zhifeng Chen;
Yuriy Reznik
Show Abstract
We proposes a practical technique for the classification of facial images across multiple criteria, such as: gender, age, ethnicity, expression, and others. The technique uses a novel form of Gabor-based features followed by the application of the PCA and LDA algorithms. The computation of class scores in the context of nearest centroid classification is also novel, and relies, in part, on properties of the proposed features. We demonstrate that the proposed form of Gabor features is particularly suitable for achieving simultaneous classification. The reported results are obtained using a set of standard databases and include comparisons against known state-of-the-art algorithms. The utility of the proposed scheme is demonstrated by practical applications requiring multiple classification results to be obtained in real time while using typical consumer devices (cellphones, tablets, PCs) as computing platforms.
Spatial domain analysis based on human position and movement recognition for the top-view imaging systems
Author(s):
Seung Jun Lee;
Byeong Hak Kim;
Young Hyoung Kim;
Min Young Kim
Show Abstract
The smart home appliances are dramatically growing into a big part of the consumer electronics market and they are required to have many convenient functions for users. Therefore, many products are using top-view imaging systems for advanced technologies to recognize object positions and movement for human and machine interaction. Although the topview imaging system has already developed for many applications, not only home appliances but also closed-circuit televisions (CCTV) and unmanned aerial vehicles (UAV), it still has many drawbacks. Especially the top-view image shows asymmetrical features and radially distorted scenes around the corners like omnidirectional view images. Therefore, conventional human detection methods are struggled with the computational complexity and low accuracy to calibrate its artifacts. In this paper, we propose an efficient method to recognize spatial domain of human positions and movements based on motion vector detection using multiple feature maps on the top-view images. In the experimental results, we show efficient computation time and results of spatial domain detection qualitatively.
Product detection based on CNN and transfer learning
Author(s):
Xingsheng Zhu;
Ming Liu;
Yuejin Zhao;
Liquan Dong;
Mei Hui;
Lingqin Kong
Show Abstract
With the development of artificial intelligence and the introduction of “new retail” concept, unmanned settlement has gradually become a research hotspot in academia and industry. As an important part of the retail, settlement is important for supermarket and user experience. In the traditional method, bar code based recognition requires a lot of manual assistance, and the salary cost is high; RFID also requires special equipment, and the hardware cost is high. At present, convolutional neural networks (CNNs) exhibit many advantages over traditional methods in various machine vision tasks such as image classification, object detection, instance segmentation, image generation, etc. Based on deep learning, this paper provides a novelty unmanned settlement solution that requires only a few cameras, which can achieve a new experience that is faster, more accurate and lower cost. A very high accuracy rate is achieved on our product dataset. The subsequent paper also demonstrate the effectiveness and the robustness of the algorithm under different conditions through a series of experiments.
Colorimetric index-based segmentation for RGB images of whales
Author(s):
Miguel Ángel Castillo-Martínez;
Blanca Esther Carvajal-Gámez;
Francisco Javier Gallegos-Funes;
Rosa Isela Ramos-Arredondo
Show Abstract
A binary automatic segmentation algorithm for digital images of whales is presented. The algorithm is established by two cascade blocks in order to process an image and classify its pixels. The first block preprocess the information with a 1% grid subsampling over all pixels in each image, those samples are used to generate new indexes based in the RGB data channels which represents the colors of the whale. Because the ratio of a pixel whose is detected as whale and clutter does not tend to 1:1, an equal number random sampling selection is made in order to work with a homogeneous distribution in training process. An inference engine is the second block, this classify those pixels that are detected as whale according to its colorimetric features and samples obtained in the previous block. This block is realized by Principal Component Analysis (PCA), reducing the feature number for the model and increasing the data separability, and Logistic Regression for pixel classification. Our approach avoids the requirement of extensive computing power and specialized equipment, this allow the algorithm portability and implementation in consumer devices. The algorithm needs an annotated dataset because it is based in supervised learning strategy. In addition, this algorithm minimizes the required time to develop an automatic segmentation system around any object of interest in images.
3D face recognition using depth filtering and deep convolutional neural network
Author(s):
Konstantin Dorofeev;
Alexey Ruchay;
Anastasia Kober;
Vitaly Kober
Show Abstract
In this paper, we first estimate the accuracy of 3D facial surface reconstruction from real RGB-D depth maps using various depth filtering algorithms. Next, a new 3D face recognition algorithm using deep convolutional neural network is proposed. With the help of 3D face augmentation techniques different facial expressions from a single 3D face scan are synthesized and used for network learning. The performance of the proposed algorithm is compared in terms of 3D face recognition metrics and processing time with that of common 3D face recognition algorithms.
Neural eye-processing computer based on FPGA technologies
Author(s):
Andriy V. Kozhemyako;
Oleksandr S. Bezkrevnyi
Show Abstract
One of the promising areas in the field of image processing and analysis is the hardware implementation of neural network for processing and analyzing of images based on FPGA technologies. The structural scheme of the multifunctional calculator was developed. It contains MxN cells in the form of matrix, a formation block of signs that has N control nodes, input of clock impulses, input of reset, output of common feature of zero and outputs of features of zero by columns and rows of matrix. The described structure is a classifier and it was written to the FPGA crystal
1.
The main task is to provide processing and analysis of images in real time. Hardware implementation allows to use it in many types of activities: biomedical engineering, industry, aerospace sphere, etc
2.
Classification of breast abnormalities in digital mammography using phase-based features
Author(s):
Julia Diaz-Escobar;
Vitaly Kober
Show Abstract
Breast cancer is one of the principal causes of death for women in the world. Invasive breast cancer develops in about one in eight women in the United States during her lifetime. Digital mammography is a common technique for early detection of the breast cancer. However, only 84% of breast cancers are detected by interpreting radiologists. Computer Aided Detection (CAD) is a technology designed to help radiologists and to decrease observational errors. Actually, for every true-positive cancer detected by the CAD there are more false predictions, which have to be ignored by radiologists. In this work, a CAD method for detection and classification of breast abnormalities is proposed. The proposed method is based on the local energy and phase congruency approach and a supervised machine learning classifier. Experimental results are presented using digital mammography dataset and evaluated under different performance metrics.
Reconstruction of 3D deformable objects using a single Kinect sensor
Author(s):
Marcelo Luis Ruiz-Rodriguez;
Vitaly Kober
Show Abstract
3D object reconstruction has multiple applications in areas such as medicine, robotics, virtual reality, video games, reverse engineering, human computer interaction among others. This work addresses the following research problem: design of an algorithm for accurate real-time reconstruction of 3D deformable objects using a single Kinect sensor without restricting too much user or camera motion. Prior knowledge of the object shape is not used, allowing a general scanning of the object with free deformations. The reconstruction process consists of the following steps: capture in time of RGB-D information with a Kinect sensor, registration using a modified iterative closest point algorithm, and dynamic construction and refinement of a dense 3D object model. To improve the model quality, segmentation of the desired object from background based on a depth-error analysis is employed. The performance of the algorithm is evaluated using experimentally validated data and compared with state-of-the-art techniques in challenging sequences. The algorithm is implemented on a computer with a graphics processing unit using parallel programming.
Adaptive algorithm for the SLAM design with a RGB-D camera
Author(s):
Antonio Ortiz-González;
Vitaly Kober
Show Abstract
Simultaneous localization and mapping (SLAM) is a well-known problem in the field of autonomous mobile robotics where a robot needs to localize itself in unknown environments by processing onboard sensors without external referencing systems such as Global Positioning System (GPS). In this work, we present a visual featurebased SLAM, which is able to produce high-quality tridimensional maps in real time with a low-cost RGB-D camera such as the Microsoft Kinect. It is suitable for future planning or common robot navigation tasks. First, a comprehensive performance evaluation of the combination of different state-of-the-art feature detectors and descriptors is presented. The main purpose behind these evaluations is to determine the best detectordescriptor algorithm combination to use for robot navigation. Second, we use the Iterative Closest Point (ICP) algorithm to get the relative motion between consecutive frames and then refine the pose estimate following the composition rule. However, the spatial distribution and resolution of depth data affect the performance of 3D scene reconstruction based on ICP. Due to this, we propose an adaptive architecture which computes the pose estimate from the most reliable measurements in a given environment. We evaluate our approach extensively on common available benchmark datasets. The experimental results demonstrate that our system can robustly deal with challenging scenarios while being fast enough for online applications.
Optimization of installation of deformation monitoring of multiple points by optical methods
Author(s):
Bangan Liu;
Yongbin Wei
Show Abstract
Optical methods have been widely utilized in civil engineering deformation/displacement measuring such as robotic total stations (RTS), image/non-contact measurement et al. However, in the practice of measuring in civil engineering, the installation positions of the measuring points and measuring devices were decided usually from human experience. This paper described an optimization method of deciding the installation position in deformation monitoring of multiple points by robotic total stations as an example. The method of resection and polar coordinate was utilized in deformation monitoring, in which two datum points should be installed in immovable location and measuring system could be installed in movable location. The basic errors from optical measuring system are angle errors and distance errors, which combined the systematic measuring errors, which could be reduced by optimizing the installation of the measuring system. This paper presented methods of evaluating the systematic accuracy-functions of the locations of the measuring system and two datum points, the idea of which is from the error analysis in the geodesy field. Then the systematic accuracy could be optimized by optimization methods such as simplex method. The method could be applied in the deformation measurement by RTS or image/non-contact measurement. In most measuring cases by RTS, the problem is a constrained optimization problem with 9 parameters.
New automatic method for tracking rats in a pool for medication studies
Author(s):
Manuel G. Forero;
Natalia C. Hernandez;
Cristian M. Morera;
Laura E. Baquedano;
Luis A. Aguilar
Show Abstract
The study of drugs to combat neurodegenerative diseases, such as Alzheimer and Parkinson, is frequently done in animal models such as rats. To evaluate the effectiveness of drugs and administered medication, videos of rats in a swimming pool are recorded and their behavior is analyzed. Although, there are several commercial and free access computer programs that allow recording the movement of the rat, they do not do it in an automatic way, given that the identification of some reference points such as the position and ratio of the pool is done by hand. In addition, it is required to identify the frame when the rat is released. This makes the study of these videos long, tedious and not reproducible. Therefore, in this paper, a new technique for the evaluation of the Morris test is introduced. It automatically detects and localises the pool and the rat notably reducing the time consumed in the evaluation. For the pool identification a segmentation method, based on the projection of the video frames, is done, eliminating the rat, while conserving the shape of the pool. Then, the Hough transformation is used to recognize the position and radius of the pool. The frame when the rat is released is found by using mathematical morphology techniques. The software was developed as a plugin of the free access software imageJ. The results obtained were validated, allowing to verify the quality of the proposed method.
Mathematical improvement of the Lee’s 3D skeleton algorithm
Author(s):
Manuel Forero;
Camilo Murillo;
Ricardo Zuleta;
Fabián Molina
Show Abstract
3D skeletonization is one of the most used techniques for the recognition and tracking of objects. One of the best-known algorithms is the one developed by Lee et al. in 1994, being perhaps the most used in plant phylogeny since it is implemented in ITK, Matlab and ImageJ platforms. However, this algorithm has some deficiencies that do not allow in several cases to obtain a topologically complete skeleton. This article presents a mathematical description of the method, the causes of its error and a topologically justified correction. Some improvements in its implementation are also presented, making the algorithm more efficient.
Analysis of the convolutional neural network architectures in image classification problems
Author(s):
Sergey Leonov;
Alexander Vasilyev;
Artyom Makovetskii;
Vitaly Kober
Show Abstract
The work aims to construct effective methods for image classification. For this purpose, we analyze neural network convolutional architectures, which understand as the number of network layers, elements in the input and output layers, the type of activation functions, and the connections between neurons. We studied the application of various configurations of convolutional networks for solving image classification problems. Numerical experiments on BOSPHORUS database were conducted; we described the results in this work. A neural network architecture has been developed based on the analysis of convolutional neural networks, which for the data set under consideration, provides the most accurate classification. A new method combines the advantages of using RGB images and depth maps as input data is proposed for processing the output of a convolutional network.
Extraction of phase profile at discrete spatial frequency bands using phase shifting interferometry
Author(s):
Payel Ghosh;
Sarad Subhra Bhakat;
Ipsita Chakraborty;
Sanjukta Sarkar;
Kallol Bhattacharya
Show Abstract
In this paper, we report an interferometric method to extract the phase information available in a desired band of spatial frequencies. The phase sample is placed in the path of a converging beam of light entering a Mach Zehnder interferometer so that the two Fourier Transform (FT) planes are located in the two interferometer arms. One of the FT planes is filtered by suitable masks so that only the frequencies that are blocked appears in the final interferogram. An imaging lens images the phase object on a CCD. Polarization Phase Shifting is incorporated so that the final frequency filtered image is reconstructed from the four phase shifted interferograms. The interferometer is made to operate in null fringe condition so that residual phase is eliminated. Simulated and experimental results are presented.
Stationarity testing in 2D image analysis
Author(s):
Jaromír Kukal;
Iva Nachtigalová;
Zuzana Krbcová;
Jan Švihlík;
Karel Fliegel
Show Abstract
Signal and image stationarity is the basic assumption for many methods of their analysis. However this assumption is not true in a lot of real cases. The paper is focused on local stationary testing using a small symmetric neighbourhood. The neighbourhood is split into two parts which should have the same statistical properties when the hypothesis of image stationarity is valid. We apply various testing approaches (two-sampled F-test, t-test, WMW, K-S) to obtain adequate p-values for given pixel, mask position, and test type. Finally, using battery of masks and tests, we obtain the series of p-values for every pixel. Applying False Discovery Rate (FDR) methodology, we localize all the pixels when any hypothesis falls. Resulting binary image is an alternative to traditional edge detection but with strong statistical background.
Variational approach to semi-automated 2D image segmentation
Author(s):
Jaromír Kukal;
Zuzana Krbcová;
Iva Nachtigalová;
Jan Švihlík;
Karel Fliegel
Show Abstract
The segmentation of 2D biomedical images is very complex problem which has to be solved interactively. Original MRI, CT, PET, or SPECT image can be enhanced using variational smoother. However, there are Regions of Interest (ROI) which can be exactly localized. The question is how to design human interaction with computer for user friendly biomedical service. Our approach is based on user selected points which determine the ROI border line. The relationship between point positions and image intensity is subject of variational interpolation using thin plate spline model. The general principle of segmentation is demonstrated on biomedical images of human brain.
Semantic segmentation approach for tunnel roads’ analysis
Author(s):
Arcadi Llanza;
Assan Sanogo;
Marouan Khata;
Alami Khalil;
Nadiya Shvai;
Hasnat Abul;
Antoine Meicler;
Yassine El Khattabi;
Justine Noslier;
Paul Maarek;
Amir Nakib
Show Abstract
Image segmentation is the most important step for any visual scene understanding system. In this paper, we use a semantic approach where each pixel is labeled with a semantic object category. Location of objects inside a tunnel’s road is a crucial task for an automatic tunnel incident detection system. It needs in particular to accurately detect and localize different types of zones, such as road lane, emergency lane, and sidewalk. Unfortunately, the existing methods often fail in providing acceptable image regions due to dynamic environment conditions: change in the lighting conditions, shadow appearance, objects variability, etc. To overcome these difficulties, we proposed to use the semantic tunnel image segmentation approach and a Convolutional Neural Network (CNN) to solve this problem. To evaluate the performance of the proposed approach, we performed a comparison to the state of the art and recent methods on two different datasets collected from two tunnels in France, called the ”T1” and ”T2”. Our extensive study leads to the provide of the best tunnel scene segmentation approach. The proposed method has been deployed by VINCI Autoroutes company in a real-world environment for automatic incident detection system.
Non-rigid ICP and 3D models for face recognition
Author(s):
Sergei Voronin;
Vitaly Kober;
Artyom Makovetskii;
Aleksei Voronin
Show Abstract
3D fitting algorithms are very important for searching particular facial parts and key points, which can be used for recognition of facial expressions with various face deformations. Common non-rigid ICP variants are usually based on the affine point-to-point approach. In this paper, in order to apply a non-rigid ICP to facial surfaces we build a part-based model of the 3D facial surface and combine it with a new non-rigid ICP algorithm. Computer simulation results are provided to illustrate the performance of the proposed algorithm.
An efficient 3D mapping framework
Author(s):
Dmitrii Tihonkih;
Vitaly Kober;
Artyom Makovetskii;
Aleksei Voronin
Show Abstract
Reconstruction of 3D map of the observed scene in real space is based on the information on 3D points coordinates. Triangulation is the known method for surface representation in three-dimensional space. Two consistently obtained frames with point clouds corresponds to two partially overlapping triangulated surfaces. There is known an algorithm for the correct construction of the triangulation of the overlapping area. Another approach to the build of a 3D map is the use of surfels. Surfel is a round patch on the surface. The characteristics of the surfels are described by a triple of elements: the position of the surfel, the normal to the surfel, the radius of the surfel. One more recently developed method for constructing a three-dimensional map of the scene is the so-called “octree”. Octrees are a hierarchical data structure for a spatial representation of an object. Each octree node represents a space contained in a cubic volume called a voxel. This volume is recursively subdivided into eight sub-volumes until the specified minimum voxel size is reached. The minimum voxel size determines the resolution of the octree. Since octree is a hierarchical data structure, a tree can be reduced at any level to get a coarser spatial representation of the object. The decision on whether a given voxel is a busy object or not is made on the basis of a probabilistic approach. In the proposed paper we describe new efficient algorithm for surface reconstruction in three-dimensional space and with the help of computer simulation, the proposed method is compared with known algorithms for 3D map reconstruction.
Image dehazing using spatially displaced sensors
Author(s):
Sergei Voronin;
Vitaly Kober;
Artyom Makovetskii;
Aleksei Voronin
Show Abstract
Images of outdoor scenes are often degraded by particles and water droplets in the atmosphere. Haze, fog, and smoke are such phenomena due to atmospheric absorption and scattering. Most image dehazing (haze removal) methods proposed for the last two decades employ image enhancing or restoration approach. Different variants of local adaptive algorithms for single image dehazing are suggested. These methods are successful when a haze-free image has higher contrast compared with that of the input hazy image. Other haze removal approaches estimate a dehazed image from the observed scene by solving an objective function whose parameters are adapted to local statistics of the hazed image inside a moving window. In this presentation we propose a new dehazing algorithm that utilizes several scene images. These images are captured in such a way to be spatially shifted relatively each other. Computer simulation results are provided to illustrate the performance of the proposed algorithm for restoration of hazed images.
Index-based methods for water body extraction in satellite data
Author(s):
M. Arreola-Esquivel;
M. Delgadillo-Herrera;
C. Toxqui-Quitl;
A. Padilla-Vivanco
Show Abstract
Several water index-based methods have been proposed in the literature, which, combine satellite multispectral bands in an algebraic expression. The objective of these water index-based methods is to increase the intensity contrast between water-pixels (surface water-body) and non-water pixels (built-up, soil, vegetation, etc.). The present investigation evaluates the Modified Normalized Difference Water Index (MNDWI) and the Automated Water Extraction Index (AWEI) using the Satellite data from Landsat 5 TM, Landsat 8 and Sentinel 2A at different time scenes. Based on visual inspection of the Lake Metztitlan water body mapping results, a high performance of AWEI approached via the OLI and the MSI sensors is observed. In the selected study area of [9210x9380]m, a statistical water pixel percentage of 30.703616% is observed in a flooding season and 9.884537% for a dry season of the year.
Thyroid nodule diagnosis system based on the densely connected convolutional network
Author(s):
Li-Jen Liao;
Min-Hsiung Lin;
Chi Chen;
Yung-Sheng Chen
Show Abstract
Thyroid cancer, which is one of the top ten most prevalent cancer in Taiwan, can be diagnosed by traditional fine-needle aspiration biopsy or ultrasonic imaging technology cooperated with physicians' clinical experience. Recently, the computer aided diagnosis (CAD) system based on ultrasonic technology is well adopted by the hospitals. However, based on ultrasonic image and human experience, the cancer can only be diagnosed approximately in 80%, the rest thus have to return to invasive biopsy. What is worse, there are still remaining 20% uncertainness after biopsy. In order to increase the detection rate in CAD system, an approach based on shear wave ultrasonic image with computer vision and deep learning technology is developed. Three methods, namely, texture analysis, traditional convolutional neural network (CNN), and densely connected convolutional network (DenseNet), are used for study and comparison. With manual ROI selection, the method based on the DenseNet achieves 88% accuracy, 90.9% sensitivity as well as 96.5% specificity on our testing data, and is thus selected as the kernel used in our thyroid nodule diagnosis system for benign/malignant classification. Furthermore, a semi-automatic user interface has been built, which can diagnose thyroid nodule in real-time clinically and thus improve the physicians' diagnosis accuracy as well as reduce the probability of invasive biopsy.
A compact representation of character skeleton using skeletal line based shape descriptor
Author(s):
Ming-Te Chao;
Yung-Sheng Chen
Show Abstract
Skeletonization is a quite significant technology for the shape representation in the field of image processing and pattern recognition. In order to explore its application onto the Chinese calligraphy character representation and reconstruction, a skeletal line based shape descriptor has been presented by the authors recently. Its performances evaluated by measurement of skeleton deviation (MSD), number of distorted forks (NDF), number of spurious strokes (NSS) as well as measurement of reconstructability (MR) showed that the skeleton-biased phenomenon can be greatly reduced and the pattern reconstructability near to 100% can be achieved. However, due to the use of dense skeletal line (SL) placement scheme, a lot of memory space is needed for storing the extended and dense SL information; and the computation cost is also rather expensive. Therefore, a compact strategy is presented in this paper to overcome these issues. Instead of storing all the SL information, only the sampled SL with a certain interval will be stored in the skeleton table. By performing the curve-fitting strategy derived from Vandermonde matrix onto the sampled SL information in the skeleton table, both the required skeleton and pattern contour can be readily restored, and the original pattern can thus be reconstructed. The sampling interval (SI) from 1 to 6 are used in our experiments (with 15 Chinese calligraphy characters) and the original method is regarded as the ground truth. Our experimental results show that the memory space can be approximately reduced from 54% (SI = 1) to 92% (SI = 6). The pattern reconstructability can still be maintained from 95% (SI = 1) to 92% (SI = 6). Moreover, the mean execution time of pattern reconstruction can be greatly reduced from 7.814 sec (the original method) to 0.078 sec (the improved method). The results confirm the feasibility of the proposed approach.