Tomographic imaging with wireless sensor networks
Author(s):
Tenger Batjargal;
Ramakrishnan Sundaram
Show Abstract
This paper discusses (a) the design and implementation of the integrated radio tomographic imaging (RTI) interface for radio signal strength (RSS) data obtained from a wireless sensor network (WSN) (b) the use of model-driven methods to determine the extent of regularization to be applied to reconstruct images from the RSS data, and (c) preliminary study of the performance of the network.
Monitoring refractory wear in a coke oven under high temperature
Author(s):
Naoshi Yamahira;
Toshifumi Kodama;
Shinjiro Baba;
Hiroaki Komatsubara;
Masaki Kajitani
Show Abstract
We have developed a measurement method to measure both heat radiation and shapes of the whole surface of refractory blocks in a carbonization chamber of a coke oven under high-temperature conditions. An apparatus of our method consists of an area camera, two line lasers and a specially designed optical filter that divide field of view of the camera into two regions: one for imaging radiation and the other for catching only laser lights for shape measurement. The apparatus moves in the chamber of a coke oven to scan refractory blocks. One conventional problem is that the measured shape data contains a distance from meandering during moving through the 16m-length of the chamber. We have developed a signal processing to remove the disturbance using this feature
Comparison of non-uniformity correction methods in midwave infrared focal plane arrays of high speed platforms
Author(s):
Buğra Sofu;
Doğan Uğur Sakarya;
Onur Akın
Show Abstract
Non-uniformity of mid-wave infrared focal plane arrays (FPAs) is a critical factor affecting the detection range of infrared imaging systems, especially in detecting small and dim objects. Moreover, false alarm probability of the system gets higher as the non-uniformity of the FPA increases. Therefore, to improve the performance of infrared imaging systems, non-uniformity of the array should be corrected. Conventionally, to achieve this goal, either scene or calibration based non-uniformity correction (NUC) methodologies have been used. To achieve reasonable performance, scene based NUC techniques may require an impractical amount of time considering the operational duration of high speed platforms. On the other hand, calibration based NUC performance degrades as scene temperature observed by infrared imaging systems of high speed platforms varies. The method presented in this work relies on multiple NUC tables to compensate for the temperature variations in the scene. To compare this method with conventional methods, standard deviation of the NUC images and target detection probability were used as metrics. NUC images were obtained by capturing images with an infrared imaging module. To obtain images with target, either synthetic or pinhole target images were added to the non-uniformity corrected background images. According to these metrics, we found that multiple point NUC correction method is superior to the conventional calibration based method using single NUC table.
Generating simulated SAR images using Generative Adversarial Network
Author(s):
Wenlong Liu;
Yuejin Zhao;
Ming Liu;
Liquan Dong;
Xiaohua Liu;
Mei Hui
Show Abstract
Synthetic aperture radar (SAR) is a microwave imaging equipment based on the principle of synthetic aperture, which has all kinds of characteristics such as all-time, all-weather, high resolution and wide breadth. It also has high research value and applied foreground in the area of military and civilian. In particular, worldwide, a great deal of researches on SAR target classification and identification based Deep Learning are ongoing, and the obtained results are highly effective. However, it is well known that Deep Learning requires a large amount of data, and it is costly and inaccessible to acquire SAR samples through field experiment, so image simulation research for expanding SAR dataset is essential. In this paper, we concentrated on generating highly realistic SAR simulated images for several equipment models using Generative Adversarial Network (GAN) without construction of terrain scene model and RCS material mapping. Then we tested the SAR simulated images on a specialized SAR classification model pretrained on MSTAR dataset. The results showed that simulated targets could be identified and classified accurately, demonstrating the high similarity of SAR simulated images with real samples. Our work could provide a greater variety of available SAR images for target classification and identification study.
Robust night target tracking via infrared and visible video fusion
Author(s):
Keyan Ren;
Xiao Zhang;
Yu Han;
Yibin Hou
Show Abstract
Night target tracking usually fails due to various reasons such as insufficient light, appearance change, motion blur, illumination variation, and deformation. Because infrared (IR) and visible video data provides comple- mentary information that can be utilized suitably and efficiently, we explore a novel framework by combining correlation filter-based visible tracking and Markov chain Monte Carlo (MCMC)-based IR tracking to overcome these challenges. In this framework, the two types of videos are asynchronous, and the frame rate of visible video is several times faster than that of IR video. Visible video is first used for location and scale estimation by solving a ridge regression problem efficiently in the correlation filter domain. When recording IR data, we use a uniquely designed feature shape context descriptor for the best location and scale estimation of an IR video target by using the MCMC particle filter. Then, we use candidate region location-scale fusion rules for the final target tracking update. Meanwhile, we build an accurately labeled IR and visible target tracking dataset for experiments. The result shows that the performance of our proposed approach is better than the state-of-the-art trackers for night target tracking, and our approach can significantly improve re-tracking performance when there is the drift.
Natural scene text detection and recognition with a three-stage local phase-based algorithm
Author(s):
Julia Diaz-Escobar;
Vitaly Kober
Show Abstract
The Robust Reading research area deals with detection and recognition of textual information in scene images. In particular, natural scene text detection and recognition techniques have gained much attention from the computer vision community due to their contribution to multiple applications. Common text detection and recognition methods are often affected by environment aspects, image acquisition problems, and the text content. In this work, a method for text detection and recognition in natural scenes is proposed. The method consists of three stages: 1) phase-based text segmentation, obtained by applying the MSER algorithm to the local phase image; 2) text localization, where segmented regions are classified and grouped as text and non-text components; and, 3) word recognition, where characters are recognized utilizing Histograms of Phase Congruency. Experimental results are presented using a known dataset and evaluated under precision and recall measures.
Automatic identification of diatoms using descriptors obtained in the plane of frequencies
Author(s):
Eduardo Gessel Pacheco-Venegas;
Isabel Israde-Alcántara;
Josué Álvarez-Borrego;
Esperanza Guerra-Rosas;
Esbanyely Garza-Flores
Show Abstract
Diatoms are unicellular algae that have as characteristic to be composed mainly of silice. Currently, its study has become relevant due to its multiple applications that include forensic medicine, palaeoenvironmental reconstructions and its use as biological bioindicators of water quality. It is estimated that there are around 100,000 different diatom species, showing a high similarity between some of them. For these reasons, their identification is slow and often unreliable. Additionally, the number of specialists capable of carrying out an identification is not sufficient in comparison to the number of samples that usually have to be analyzed. It is for these reasons that there is a need to have automated systems that perform this task. In the present work, an automatic identification system was created for 46 diatom species with different morphology using images obtained with optical microscopy. This system was designed by calculating descriptors in the plane of frequencies using three different methodologies: the Fourier Mellin transform, the concentric ring binary masks and the fractional Fourier transform. The methods used for the identification system has as main characteristics to be robust to changes of scale, rotations, translations, and lighting. Additionally, the number of images used as reference images compared to other techniques found in the literature is lower, which gives a higher possibility that it can be extended to other species.
Three dimensional reconstruction using a lenslet light field camera
Author(s):
Wentao Zhang;
Shengqian Chang;
Xiao Tao;
Chang Wang;
Chenning Tao;
Peizheng Huang;
Zhenrong Zheng
Show Abstract
A novel method is proposed in this paper to accurately reconstruct the three-dimensional scenes by using a passive single-shot exposure with a lenslet light field camera. This method has better performance of 3D scenes reconstruction with both defocus and disparity depth cues captured by light field camera. First, the light field data is used to refocus and shift viewpoints to get a focal stack and multi-view images. In refocusing procedure, the phase shift theorem in the Fourier domain is first introduced to substitute shift in spatial domain, and sharper focal stacks can be obtained with less blurriness. Thus, 3D scenes can be reconstructed more accurately. Next, through multi-view images, disparity depth cues are obtained by performing correspondence measure. Then, the focal stack is used to compute defocus depth cues by focus measure based on gray variance. Finally, the focus cost is built to integrate both defocus and disparity depth cues, and the accurate depth map is estimated by using Graph Cuts based on the focus cost. Using this accurate depth map and all-in-focus image, the 3D structure in real world are accurately reconstructed. Our method is verified by a number of synthetic and real-world examples captured with a dense camera array and a Lytro light field camera.
Canonical 3D object orientation for interactive light-field visualization
Author(s):
Roopak R. Tamboli;
Peter A. Kara;
Aron Cserkaszky;
Attila Barsi;
Maria G. Martini;
Soumya Jana
Show Abstract
Light-field visualization allows the users to freely choose a preferred location for observation within the display’s valid field of view. As such 3D visualization technology offers continuous motion parallax, the users location determines the perceived orientation of the visualized content, if we consider static objects and scenes. In case of interactive light-field visualization, the arbitrary rotation of content enables efficient orientation changes without the need for actual user movement. However, the preference of content orientation is a subjective matter, yet it is possible to be objectively managed and assessed as well. In this paper, we present a series of subjective tests we carried out on a real light-field display that addresses static content orientation preference. The state-of-the-art objective methodologies were used to evaluate the experimental setup and the content. We used the subjective results in order to develop our own objective metric for canonical orientation selection.
Steered mixture-of-experts for light field video coding
Author(s):
Vasileios Avramelos;
Ignace Saenen;
Ruben Verhack;
Glenn Van Wallendael;
Peter Lambert;
Thomas Sikora
Show Abstract
Steered Mixture-of-Experts (SMoE) is a novel framework for representing multidimensional image modalities. In this paper, we propose a coding methodology for SMoE models that is readily extendable to any dimensional SMoE model, thus representing any image modality of any dimension. We evaluate the coding performance of SMoE models of light field video, a 5D image modality, i.e. time, two angular, and two spatial dimensions. The coding consists of the exploiting the redundancy between the parameters of SMoE models, i.e. a set of multivariate Gaussian distributions. We compare the performance of three multi-view HEVC (MV-HEVC) configurations that differ in terms of random access. Each subaperture view from the light field video is interpreted as a single view in MV-HEVC. Experiments validate that excellent coding performance compared to MV-HEVC for low- to midrange bitrates in terms of PSNR and SSIM with bitrate savings up to 75%.
Analysis of motion vectors and parallel computing in pseudo-sequence based light field image compression methods
Author(s):
Hadi Amirpour;
Antonio Pinheiro;
Manuela Pereira;
Mohammad Ghanbari
Show Abstract
Pseudo-sequence based light field image compression methods use state-of-the-art video codecs like HEVC. Although video codecs have been designed to compress video sequences, they have good performance for light field images compression. Considering the light field images represented by their multi-views representation, a sequence with different image views is aligned following an appropriate strategy. However, there are some differences between video sequences and light field pseudo videos that can be utilized to improve the codec adaptation to the different view images compression. The pseudo-sequence images have spatial distances and predictable behaviors when compared to video sequences that have temporal distance and unpredictable behavior respectively. Considering these differences unnecessary operations can be avoided, while its performance can be improved. The video codecs compute the motion vectors using block matching motion estimation algorithms, which is computationally the most complex operation of any video codec. To reduce the motion estimation complexity many codecs use prediction models. In this paper, HEVC motion vectors search models are applied to the light field image views aligned as pseudo-sequences to analyze and find their repetitive and predictable patterns. These new patterns are then utilized for changing the HEVC motion estimation algorithm for codec complexity reduction using a state-of-the-art pseudo sequence compression method. Moreover, the use of parallel computing for the pseudo sequence compression method is addressed.
Light field image coding: objective performance assessment of Lenslet and 4D LF data representations
Author(s):
Ricardo J. S. Monteiro;
Nuno M. M. Rodrigues;
Sérgio M. M. Faria;
Paulo J. L. Nunes
Show Abstract
State-of-the-art light field (LF) image coding solutions, usually, rely in one of two LF data representation formats: Lenslet or 4D LF. While the Lenslet data representation is a more compact version of the LF, it requires additional camera metadata and processing steps prior to image rendering. On the contrary, 4D LF data, consisting of a stack of sub-aperture images, provides a more redundant representation requiring, however, minimal side information, thus facilitating image rendering. Recently, JPEG Pleno guidelines on objective evaluation of LF image coding defined a processing chain that allows to compare different 4D LF data codecs, aiming to facilitate codec assessment and benchmark. Thus, any codec that does not rely on the 4D LF representation needs to undergo additional processing steps to generate an output comparable to a reference 4D LF image. These additional processing steps may have impact on the quality of the reconstructed LF image, especially if color subsampling format and bit depth conversions have been performed. Consequently, the influence of these conversions needs to be carefully assessed as it may have a significant impact on a comparison between different LF codecs. Very few in-depth comparisons on the effects of using existing LF representation have been reported. Therefore, using the guidelines from JPEG Pleno, this paper presents an exhaustive comparative analysis of these two LF data representation formats in terms of LF image coding efficiency, considering different color subsampling formats and bit depths. These comparisons are performed by testing different processing chains to encode and decode the LF images. Experimental results have shown that, in terms of coding efficiency for different color subsampling formats, the Lenslet LF data representation is more efficient when using YUV 4:4:4 with 10 bit/sample, while the 4D LF data representation is more efficient when using YUV 4:2:0 with 8 bit/sample. The “best” LF data representation, in terms of coding efficiency, depends on several factors which are extensively analyzed in this paper, such as the objective metric that is used for comparison (e.g., average PSNR-Y or average PNSR-YUV), the type of LF content, as well as the color format. The maximum objective quality is also determined, by evaluating the influence of each block from each processing chain in the objective quality of the reconstructed LF image. Experimental results show that, when the 4D LF data representation is not used the maximum achieved objective quality is lower than 50 dB, in terms of average PSNR-YUV.
A graph learning approach for light field image compression
Author(s):
Irene Viola;
Hermina Petric Maretic;
Pascal Frossard;
Touradj Ebrahimi
Show Abstract
In recent years, light field imaging has attracted the attention of the academic and industrial communities thanks to its enhanced rendering capabilities that allow to visualise contents in a more immersive and interactive way. However, those enhanced capabilities come at the cost of a considerable increase in content size when compared to traditional image and video applications. Thus, advanced compression schemes are needed to efficiently reduce the volume of data for storage and delivery of light field content. In this paper, we introduce a novel method for compression of light field images. The proposed solution is based on a graph learning approach to estimate the disparity among the views composing the light field. The graph is then used to reconstruct the entire light field from an arbitrary subset of encoded views. Experimental results show that our method is a promising alternative to current compression algorithms for light field images, with notable gains across all bitrates with respect to the state of the art.
The perceived quality of light-field video services
Author(s):
Peter A. Kara;
Roopak R. Tamboli;
Aron Cserkaszky;
Maria G. Martini;
Attila Barsi;
Laszlo Bokor
Show Abstract
Real-time video transmission services are unquestionably dominating the flow of data over the Internet, and their percentage of the global IP packet traffic is still continuously increasing. As novel visualization technologies are emerging, they tend to demand higher bandwidth requirements; they offer more visually, but in order to do so, they need more data to be transmitted. The research and development of the past decades in optical engineering enabled light-field displays to surface and appear in the industry and on the market, and light-field video services are already on the horizon. However, the data volumes of high-quality light-field contents can be immense, creating storing, coding and transmission challenges. If we consider the representation of light-field content as a series of 2D views, then for a single video frame, angular resolution determines the number of views within the field of view, and spatial resolution defines the 2D size of those views. In this paper, we present the results of an experiment carried out to investigate the perceptual differences between different angular and spatial resolution parametrization of a light-field video service. The study highlights how the two resolution values affect each other regarding perceived quality, and how the combined effects are detected, perceived and experienced by human observers. By achieving an understanding of the related visual phenomena, especially degradations that are unique for light-field visualization, the design and development of resource-efficient light-field video services and applications become more straightforward.
Predicting 3D visual discomfort using natural scene statistics and a binocular model
Author(s):
Zeina Sinno;
Alan C. Bovik
Show Abstract
When humans observe stereoscopic images, visual discomfort may be experienced in the form of physiological symptoms such as eyestrain, a feeling of pressure in the eyes, headaches, neck pain, and more. These sensations can arise in cortical mechanisms related to early visual processing. For example, vergence eye movements and lens accommodation can provide conflicting information to the brain if the stereo images are distorted or presented on a flat display. Over the past decade, significant effort has been applied to understanding and characterizing how discomfort arises, towards being able to design safer and more comfortable 3D displays and to provide better guidelines on how to design, align, and capture 3D images and videos. Part of solving this problem consists of objectively predicting the visual discomfort that may arise from viewing a given pair of stereo images that are distorted. Researchers have built several models based primarily on cortical mechanisms that yield good predictions of visual discomfort. Here we study the role of natural scene statistics (NSS) of the disparity maps of stereoscopic images and their relationship to 3D visual discomfort. In particular, we focus on bivariate NSS models. We also build a new prediction model that combines information from binocular vision and the NSS models of disparity maps to accurately predict 3D visual discomfort, and we demonstrate that an algorithm that realizes the prediction outperforms other existing predictors.
Point cloud subjective evaluation methodology based on reconstructed surfaces
Author(s):
Evangelos Alexious;
Antonio M. G. Pinheiro;
Carlos Duarte;
Dragan Matković;
Emil Dumić;
Luis A. da Silva Cruz;
Lovorka Gotal Dmitrović;
Marco V. Bernardo;
Manuela Pereira;
Touradj Ebrahimi
Show Abstract
Point clouds have been gaining importance as a solution to the problem of efficient representation of 3D geometric and visual information. They are commonly represented by large amounts of data, and compression schemes are important for their manipulation transmission and storing. However, the selection of appropriate compression schemes requires effective quality evaluation. In this work a subjective quality evaluation of point clouds using a surface representation is analyzed. Using a set of point cloud data objects encoded with the popular octree pruning method with different qualities, a subjective evaluation was designed. The point cloud geometry was presented to observers in the form of a movie showing the 3D Poisson reconstructed surface without textural information with the point of view changing in time. Subjective evaluations were performed in three different laboratories. Scores obtained from each test were correlated and no statistical differences were observed. Scores were also correlated with previous subjective tests and a good correlation was obtained when compared with mesh rendering in 2D monitors. Moreover, the results were correlated with state of the art point cloud objective metrics revealing poor correlation. Likewise, the correlation with a subjective test using a different representation of the point cloud data also showed poor correlation. These results suggest the need for more reliable objective quality metrics and further studies on adequate point cloud data representations.
A novel methodology for quality assessment of voxelized point clouds
Author(s):
Eric M. Torlig;
Evangelos Alexiou;
Tiago A. Fonseca;
Ricardo L. de Queiroz;
Touradj Ebrahimi
Show Abstract
Recent trends in multimedia technologies indicate a significant growth of interest for new imaging modalities that aim to provide immersive experiences by increasing the engagement of the user with the content. Among other solutions, point clouds denote an alternative 3D content representation that allows visualization of static or dynamic scenes in a more immersive way. As in many imaging applications, the visual quality of a point cloud content is of crucial importance, as it directly affects the user experience. Despite the recent efforts from the scientific community, subjective and objective quality assessment for this type of visual data representation remains an open problem. In this paper, we propose a new, alternative framework for quality assessment of point clouds. In particular, we develop a rendering software, which performs real-time voxelization and projection of the 3D point clouds onto 2D planes, while allowing interaction between the user and the projected views. These projected images are then employed by two-dimensional objective quality metrics, in order to predict the perceptual quality of the displayed stimuli. Benchmarking results, using subjective ratings that were obtained through experiments in two test laboratories, show that our framework provides high predictive power and outperforms the state of the art in objective quality assessment of point cloud imaging.
A digital hologram compression scheme for representation on the object plane
Author(s):
Marco V. Bernardo;
Elsa Fonseca;
Paulo Fiadeiro;
António M. G. Pinheiro;
Manuela Pereira
Show Abstract
Digital holography is a growing field that owes its success to the provided three-dimensional imaging representation. This is achieved by encoding the wave field transmitted or scattered by an object in the form of an interference pattern with a reference beam. While in conventional imaging systems it is usually impossible to recover the correct focused image from a defocused one, with digital holography the image can be numerically retrieved at any distance from the hologram. Digital holography also allows the reconstruction of multiple objects at different depths. In a previous study, the benchmark of the main available image coding standard solutions JPEG, JPEG-XT, JPEG 2000 and the HEVC intra mode was performed for digital holographic data represented on the object plane. The HEVC intra main coding profile outperforms the other standards while JPEG 2000 results in very similar compression performance. In the current work, a scheme based on the HEVC intra mode codec for holographic information compression on the object plane is proposed. In the base layer, a 2D version of the object (amplitude information on object plane) is coded with HEVC intra main coding profile. Previously was observed that the phase information requires much higher bit rates than the amplitude information, as standardized codecs are not adapted for the compression of this type of information. In this paper we propose a model where the amplitude information is encoded with the HEVC intra mode codec, while the phase is represented by encoding the real information and the signal of the imaginary information. The real information is also encoded using the HEVC intra mode as it already revealed appropriate for compression of this type of information. The imaginary information signal is encoded with JBIG. The advantage of this scheme is that the amplitude information provides a direct 2D representation of the hologram while the phase information can be considered as a 3D enhancement layer. The results show that the proposed scheme outperforms the state of the art in holography compression, while allowing compatibility with the current standards and direct 2D visualization.
Predicting the quality of images compressed after distortion in two steps
Author(s):
Xiangxu Yu;
Christos G. Bampis;
Praful Gupta;
Alan C. Bovik
Show Abstract
Full-reference and reduced-reference image quality assessment (IQA) models assume a high quality reference against which to measure perceptual quality. However, this assumption may be violated when the source image is upscaled, poorly exposed, or otherwise distorted before being compressed. Reference IQA models on a compressed but previously distorted “reference” may produce unpredictable results. Hence we propose 2stepQA, which integrates no-reference (NR) and reference (R) measurements into the quality prediction process. The NR module accounts for imperfect quality of the reference image, while the R component measures further quality from compression. A simple, efficient multiplication step fuses these into a single score. We deploy MS-SSIM as the R component and NIQE as the NR component and combine them using multiplication. We chose MS-SSIM, since it is efficient and correlates well with subjective scores. Likewise, NIQE is simple, efficient, and generic, and does not require training on subjective data. The 2stepQA approach can be generalized by combining other R and NR models. We also built a new data resource: LIVE Wild Compressed Picture Database, where authentically distorted reference images were JPEG compressed at four levels. 2stepQA is shown to achieve standout performance compared to other IQA models. The proposed approach is made publicly available at https://github.com/xiangxuyu/2stepQA.
Spatial resolution adaptation framework for video compression
Author(s):
Mariana Afonso;
Fan Zhang;
David R. Bull
Show Abstract
This paper presents a resolution adaptation framework for video compression. It dynamically applies spatial resampling, trading off the relationship between spatial resolution and quantization. A learning-based Quantization-Resolution Optimization (QRO) module, trained on a large database of video content, determines the optimal spatial resolution among multiple options, based on spatial and temporal video features of the uncompressed video frames. In order to improve the quality of upscaled videos, a modified CNN-based single image super-resolution method is employed at the decoder. This super-resolution model has been trained using compressed content from the same training database. The proposed resolution adaptation framework was integrated with the High Efficiency Video Coding (HEVC) reference software, HM 16.18, and tested on UHD content from several databases including videos from the JVET (Joint Video Exploration Team) test set. Experimental results show that the proposed method offers significant overall bit rate savings for a wide range of bitrates compared with the original HEVC HM 16.18, with average BD-rate savings of 12% (based on PSNR) and 15% (based on VMAF) and lower encoding complexity.
A user model for JND-based video quality assessment: theory and applications
Author(s):
Haiqiang Wang;
Ioannis Katsavounidis;
Xinfeng Zhang;
Chao Yang;
C.-C. Jay Kuo
Show Abstract
The video quality assessment (VQA) technology has attracted a lot of attention in recent years due to an increasing demand of video streaming services. Existing VQA methods are designed to predict video quality in terms of the mean opinion score (MOS) calibrated by humans in subjective experiments. However, they cannot predict the satisfied user ratio (SUR) of an aggregated viewer group. Furthermore, they provide little guidance to video coding parameter selection, e.g. the Quantization Parameter (QP) of a set of consecutive frames, in practical video streaming services. To overcome these shortcomings, the just-noticeable-difference (JND) based VQA methodology has been proposed as an alternative. It is observed experimentally that the JND location is a normally distributed random variable. In this work, we explain this distribution by proposing a user model that takes both subject variabilities and content variabilities into account. This model is built upon user’s capability to discern the quality difference between video clips encoded with different QPs. Moreover, it analyzes video content characteristics to account for inter-content variability. The proposed user model is validated on the data collected in the VideoSet. It is demonstrated that the model is flexible to predict SUR distribution of a specific user group.
Combining tile parallelism with slice partitioning in video coding
Author(s):
Maria Koziri;
Panos K. Papadopoulos;
Thanasis Loukopoulos
Show Abstract
Tiles and slices provide different frame partitioning options. While they can both be used for video coding parallelization, tiles offer better scalability to the number of available processors, especially as far as video quality is concerned, e.g., in the HEVC case. On the other hand, slices can be useful in video transmission. Since slices can be defined as a series of consecutive (raster order) tiles, properly balancing them can lead to viable trade-offs between parallelization and transmission requirements. In this paper we study the combined problem of tile and slice partitioning with the goals of maximizing the achievable parallelism speedup, while minimizing size difference among slices. These goals might conflict with each other, while producing multiple Pareto frontier solutions can introduce additional time overhead. For these reasons, we map the two-function optimization problem to a single one, using constant weighting and develop algorithms that perform tile resizing and slice definition so as to optimize the composite target function. Experiments with common class A and class B test sequences and the reference HEVC encoder (HM), reveal that compared to static uniform tile partitioning and to literature alternatives that resize tiles in order to increase parallelization speedup, the proposed algorithm achieves considerable gains in slice balancing, while also improving speedup over the static approach. Furthermore, these performance merits come with negligible overhead to the encoding process.
Performance comparison of objective metrics on free-viewpoint videos with different depth coding algorithms
Author(s):
Shishun Tian;
Lu Zhang;
Luce Morin;
Olivier Déforges
Show Abstract
The popularity of 3D applications has brought out new challenges in the creation, compression and transmission of 3D content due to the large size of 3D data and the limitation of transmission. Several compression standards, such as, Multiview-HEVC and 3D-HEVC have been proposed to compress the 3D content aiding by view synthesis technologies, among which the most commonly used algorithm is Depth-Image-Based-Rendering (DIBR), but the quality assessment of DIBR-synthesized view is very challenging owing to its new types of distortions induced by inaccurate depth map which the conventional 2D quality metrics may fail to assess. In this paper, we test the performance of existing objective metrics on free-viewpoint video with different depth coding algorithms. Results show that all the existing objective metrics perform not well on this database including the full-reference and the no-reference. There is certainly room for further improvement for the algorithms.
Subjective and objective quality assessment of omnidirectional video
Author(s):
Francisco Lopes;
João Ascenso;
António Rodrigues;
Maria Paula Queluz
Show Abstract
Omnidirectional video, also known as 360° video, is becoming quite popularsince it provides a more immersive and natural representation of the real world. However, to fulfill the expectation of an high quality-of-experience (QoE), the video content delivered to the end users must also have high quality. To automatically evaluate the video quality, objective quality assessment metrics are then required. This paper starts by presenting the results of a subjective assessment campaign that was conducted to evaluate the impact, on quality, of HEVC compression and/or spatial/temporal subsampling, when the videos are displayed in a head mounted device (HDM). The subjective assessment results are then used as ground-truth to evaluate conventional quality assessment metrics developed for 2D video, as well as some of the recently proposed metrics for omnidirectional video, namely, spherical peak-signal to noise ratio (S-PSNR), weighted to spherically uniform PSNR (WS-PSNR), and viewport PSNR (VP-PSNR); in the context of this study, the adaptation of two SSIM based metrics, to omnidirectional contents, are also proposed and evaluated.
Video codec comparison using the dynamic optimizer framework
Author(s):
Ioannis Katsavounidis;
Liwei Guo
Show Abstract
We present a new methodology that allows for more objective comparison of video codecs, using the recently published Dynamic Optimizer framework. We show how this methodology is relevant primarily to non-real time encoding for adaptive streaming applications and can be applied to any existing and future video codecs. By using VMAF, Netflix’s open-source perceptual video quality metric, in the dynamic optimizer, we offer the possibility to do visual perceptual optimization of any video codec and thus produce optimal results in terms of PSNR and VMAF. We focus our testing using full-length titles from the Netflix catalog. We include results from practical encoder implementations of AVC, HEVC and VP9. Our results show the advantages and disadvantages of different encoders for different bitrate/quality ranges and for a variety of content.
Geo-popularity assisted optimized transcoding for large scale adaptive streaming
Author(s):
Yao-Chung Lin;
Chao Chen;
Balu Adsumilli;
Anil Kokaram;
Steve Benting
Show Abstract
HTTP-based video streaming techniques have now been widely deployed to deliver video streams over communication networks. With these techniques, a video player can dynamically select a video stream from a set of pre-encoded representations of the video source based on its available bandwidth and viewport size. The bitrates of the encoded representations thus determine the video quality presented to viewers and also the averaged streaming bitrate which is highly related to streaming cost for massive video streaming platforms. Our work minimizes the average streaming bitrate on a per-chunk basis by modeling the probability that a player observes a particular representation. Since popularity of videos is regional, this paper exploits a further optimization that uses regional statistics of client bandwidth and viewport instead of the global statistics. Simulation results demonstrate that using regional statistics reduces streaming cost for low-bandwidth regions while improving the delivered quality for high-bandwidth regions compared to a baseline configuration that uses global statistics.
Using modern motion estimation algorithms in existing video codecs
Author(s):
Daniel J. Ringis;
Davinder Singh;
Francois Pitie;
Anil Kokaram
Show Abstract
Motion estimation is a key component of any modern video codec. Our understanding of motion and the estimation of motion from video has come a very long way since 2000. More than 135 different algorithms have been recently reviewed by Scharstein et al http://vision.middlebury.edu/flow/. These new algorithms differ markedly from Block Matching which has been the mainstay of video compression for some time. This paper presents comparisons of H.264 and MP4 compression using different motion estimation methods. In so doing we present as well methods for adapting pre-computed motion fields for use within a codec. We do not observe significant gains to be had with the methods chosen w.r.t. Rate Distortion tradeoffs but the results reflect a significantly more complex interrelationship between motion and compression than would be expected. There remains much more to be done to improve the coverage of this comparison to the emerging standards but these initial results show that there is value in these explorations.
Using video quality metrics for something other than compression
Author(s):
Anil Kokaram;
Damien Kelly;
Sasi Inguva;
Jessie Lin;
Yilin Wang;
Chao Chen;
Neil Birkbeck;
Michele Covell;
Balu Adsumilli;
Steve Benting
Show Abstract
The development of video quality metrics and perceptual video quality metrics has been a well established pursuit for more than 25 years. The body of work has been seen to be most relevant for improving the performance of visual compression algorithms. However, modeling the human perception of video with an algorithm of some sort is notoriously complicated. As a result the perceptual coding of video remains challenging and no standards have incorporated perceptual video quality metrics within their specification. In this paper we present the use of video metrics at the system level of a video processing pipeline. We show that it is possible to combine the artefact detection and correction process by posing the problem as a classification exercise. We also present the use of video metrics as part of a classical testing pipeline for software infrastructure, but here it is sensitive to the perceived quality in picture degradation.
Performance comparison of VVC, AV1, and HEVC on 8-bit and 10-bit content
Author(s):
Pankaj Topiwala;
Madhu Krishnan;
Wei Dai
Show Abstract
This paper presents a study comparing the coding efficiency performance of three video codecs: (a) the Versatile Video Coding (VVC) Bench Mark Set 1 (BMS1); (b) AV1 codec of the Alliance for Open Media (AOM); and (c) the HEVC Main Profile Reference Software. Two approaches to coding were used: (i) constant quality (QP); and (ii) target bit rate (VBR). Constant quality encoding is performed with all the three codecs for an unbiased comparison of the core coding tools. Whereas, target bitrate coding is done with the AV1 codec to study the compression efficiency achieved with rate control, which can and does have a significant impact. Performance is tabulated for on two fronts: (1) objective performance based on PSNR’s and (2) informal subjective assessment. Our general conclusion derived from the assessment of objective metrics and subjective evaluation is that VVC (BMS1) appears to be superior to AV1 and HEVC under both constant quality and target bitrate coding constraints. AV1 shows superior coding gains with respect to HEVC under target bitrate coding, but in general has increased computational complexity and henceforth an encode time factor of 20 – 30 over HEVC.
Adaptive reshaping for next generation video codec
Author(s):
Taoran Lu;
Fangjun Pu;
Peng Yin;
Tao Chen;
Walt Husak
Show Abstract
Various earlier work in MPEG/JCTVC have shown that out-loop reshaping, which modifies the video signal in preprocessing before encoding and post-processing after decoding in an end-to-end video compression workflow, can improve subjective quality of coded High Dynamic Range (HDR) and Wider Color Gamut (WCG) content compressed using HEVC. However, the requirement of not making normative changes to the HEVC specification has significantly constrain the design and optimization of the reshaper. In April 2018, The Joint Video Experts Team (JVET) has launched a project to develop a new video coding standard to be known as Versatile Video Coding (VVC). This opens the door to exploit possibilities of the reshaper design inside of the core video codec. In this paper, an in-loop architect of reshaper is presented. Preliminary results suggest that the in-loop reshaping architect can retain the functionality of out-loop reshaper. In addition, the in-loop design can resolve many limitations of the out-loop design and can be used as a general coding tool for general video content not limited to HDR.
An adaptive quantization method for 360-degree video coding
Author(s):
Xiaoyu Xiu;
Yuwen He;
Yan Ye
Show Abstract
In the existing workflow for 360-degree video coding, the original 360-degree video content needs to be converted onto a 2D plane using a projection format before being encoded by a video codec. Given the selected projection format, the samples on the projected 2D plane may correspond to distinctive sampling densities on the sphere. If the projected video is coded using a fixed quantization parameter (QP), then it is equivalent to applying different levels of quantization on the sphere because the sampling densities differ within the projected video. This could result in non-uniform reconstructed qualities for different spherical regions. In this paper, an adaptive quantization method is proposed to improve the 360-degree video coding efficiency. The proposed method allows adaptively adjusting the QP of each region on the 2D projected plane to modulate its reconstruction quality based on the spherical sampling density of the region. Additionally, to further improve the performance, one encoder-side method is proposed to derive the optimal Lagrangian multiplier based on the adjusted QP value for a better rate-distortion (RD) tradeoff during rate-distortion optimization (RDO). The proposed method is implemented based on the JVET 360-degree video coding software JEM- 6.0-360Lib. Experimental results demonstrate that significant coding gains can be achieved: based on the end-to-end weighted to spherically uniform PSNR (WS-PSNR) metric, the proposed method provides on average 5.0% BD-rate saving for the equirectangular projection and 2.6% BD-rate saving for the cubemap projection, respectively, compared to the fixed QP coding scheme.
On the adaptive selection of partitioning tree types for coding video color channels
Author(s):
Kiran Misra;
Andrew Segall;
Jie Zhao;
Weijia Zhu;
Michael Horowitz
Show Abstract
We explore the use of separate partitioning structures for luma and chroma channels in the design of next generation video codecs. The proposed methods are evaluated relative to the Quad-Tree, Ternary-Tree and Binary-Tree (QTTTBT) partitioning framework currently implemented in the BenchMark Set (BMS-1.0) software being used in the development of the Versatile Video Coding (VVC) project. VVC is the next generation video compression standard under development by the Joint Video Experts Team (JVET), which is joint collaboration between MPEG and the ITU-T. In the paper, the performance of using shared or separate partitioning tree structures for luma and chroma channels is measured for sequences including those used for the Joint Call for Proposals on video compression with capability beyond HEVC issued by MPEG/ITU-T and trends are analyzed. The use of separate partitioning tree structures is restricted to intra coded regions. Objective performance is reported using the Bjøntegaard Delta (BD) bitrate, and visual observations are also provided. To demonstrate the efficacy of using different partition structures, bitrate savings are computed using simulations and show an average improvement of 0.46%(Y)/7.83%(Cb)/7.96%(Cr) relative to state-ofthe-art. It is asserted that the coding efficiency improvement is especially pronounced in sequences with occlusions/emergence of objects or dynamic changing content (e.g. fire, water, smoke). In the tests conducted, the Campfire sequence which has a large portion of the picture exhibiting a burning fire, shows the most BD bitrate saving of 1.79%(Y)/5.45%(Cb)/1.82%(Cr).
An overview of end-to-end HDR
Author(s):
Min (Maggie) Dai;
Dmytro Rusanovskyy
Show Abstract
High Dynamic Range (HDR) has been a hot topic in decades and recently there are technology breakthroughs in multiple areas including HDR video, HDR photography, and HDR displays. However, if users want to show an HDR image or play an HDR video on a mobile device, a system-level HDR support is required to make it happen. This paper gives a brief introduction to the end-to-end HDR ecosystem covering topics from HDR capture to compression to display, which can benefit not only HDR related module designs in chipset, but also in standard development for more efficient algorithm and metadata proposals.
HDR compression in the JVET codec
Author(s):
Pankaj Topiwala;
Madhu Krishnan;
Wei Dai
Show Abstract
This paper presents an advanced approach to HDR/WCG video coding developed at FastVDO called FVHDR, and built on top of the Versatile Video Coding (VVC) VTM1.0 test model of the Joint Video Exploration Team, a joint committee of ITU|ISO/IEC. A fully automatic adaptive video process that differs from a known HDR video processing chain (analogous to HDR10, and herein called “anchor”) developed recently in the standards committee JCTVC, is used. FVHDR works entirely within the framework of the VTM software model, but adds additional tools. These tools can become an integral part of a future video coding standard, or be extracted as additional pre- and post-processing chains. Reconstructed video sequences using FVHDR show an improved subjective visual quality to the output of the anchor. Moreover, the resultant SDR content generated by the data adaptive grading process is backward compatible.
Deep learning techniques in video coding and quality analysis
Author(s):
Pankaj Topiwala;
Madhu Krishnan;
Wei Dai
Show Abstract
Video coding is a powerful enabling technology for networked multimedia transmission and communication, that has been in constant improvement for decades. The upcoming VVC video codec, due in 2020, from the ITU|ISO/IEC standards committees, aims to achieve on the order of 1000:1 compression on high resolution and high dynamic range video, a stunning landmark. But the basic structure of codecs has remained largely unchanged over time, the gains obtained mainly through complexity increases. Moreover, video encoders have for decades used the same mean squared error, or sum of absolute differences, measure to optimize coding decisions. At the same time, the rapid rise of deep learning (DL) techniques poses the question: can DL fundamentally reshape how video is coded. While that question is highly complex, we first see a path for DL methods to make inroads into how video quality is measured. This in turn can also change how it is coded. In particular, we study a recently introduced video quality metric called VMAF and find ways to improve it further, which can lead to more powerful encoder designs that employ these measures in the coding decisions.
Machine Learning approach for global no-reference video quality model generation
Author(s):
Ines Saidi;
Lu Zhang;
Vincent Barriac;
Olivier Deforges
Show Abstract
Offering the best Quality of Experience (QoE) is the challenge of all the video conference service providers. In this context it is essential to identify the representative metrics to monitor the video quality. In this paper, we present Machine Learning techniques for modeling the dependencies of different video impairments to the global video quality perception using subjective quality feedback. We investigate the possibility of combining no-reference single artifact metrics in a global video quality assessment model. The obtained model has an accuracy of 63% of correct prediction
Neural network based intra prediction for video coding
Author(s):
J. Pfaff;
P. Helle;
D. Maniry;
S. Kaltenstadler;
W. Samek;
H. Schwarz;
D. Marpe;
T. Wiegand
Show Abstract
Today’s hybrid video coding systems typically perform an intra-picture prediction whereby blocks of samples are predicted from previously decoded samples of the same picture. For example, HEVC uses a set of angular prediction patterns to exploit directional sample correlations. In this paper, we propose new intra-picture prediction modes whose construction consists of two steps: First, a set of features is extracted from the decoded samples. Second, these features are used to select a predefined image pattern as the prediction signal. Since several intra prediction modes are proposed for each block-shape, a specific signalization scheme is also proposed. Our intra prediction modes lead to significant coding gains over state of the art video coding technologies.
Intra prediction with deep learning
Author(s):
Raz Birman;
Yoram Segal;
Avishay David-Malka;
Ofer Hadar
Show Abstract
One fundamental component of video compression standards is Intra-Prediction. Intra-Prediction takes advantage of redundancy in the information of neighboring pixel values within video frames to predict blocks of pixels from their surrounding pixels and thus allowing to transmit the prediction errors instead of the pixel values themselves. The prediction errors are of smaller values than the pixels themselves, thus allowing to accomplish compression of the video stream. Prevalent standards take advantage of intra-frame pixel value dependencies to perform prediction at the encoder end and transfer only residual errors to the decoder. The standards use multiple “Modes”, which are various linear combinations of pixels for prediction of their neighbors within image Macro-Blocks (MBs). In this research, we have used Deep Neural Networks (DNN) to perform the predictions. Using twelve Fully Connected Networks, we managed to reduce Mean Square Error (MSE) of the predicted error by up to 3 times as compared to standard modes prediction results. This substantial improvement comes at the expense of more extensive computations. However, these extra computations can be significantly mitigated by the use of dedicated Graphical Processing Units (GPUs).
Video quality analysis framework for spatial and temporal artifacts
Author(s):
Yilin Wang;
Balu Adsumilli
Show Abstract
Video quality metrics are essential for improved video processing algorithms. Common video quality metrics are simple averages of independently computed per frame spatial metrics, but human quality perception is not uniform across frames. In particular, the order of frames matter, as does content complexity and scene changes. In this work, we develop a video quality framework that comprehensively integrates both spatial and temporal metrics at three levels: frame, scene, and full video. We experimentally demonstrate improved correlation of spatial metrics with human evaluation as well a new well-correlated temporal metric (jerkiness) based on this framework.
Efficient implementation of enhanced multiple transforms for video coding
Author(s):
Karam Naser;
Fabrice Leleannec;
Edouard Francois
Show Abstract
Recently, the advances in transform coding have contributed to significant bitrate saving for the next generation of video coding. In particular, the combination of different discrete trigonometric transforms (DTT’s) was adopted in the Joint Video Exploration Team (JVET) solution, as well as the Bench-Mark Set (BMS) of the future video coding standard (Versatile Video Coding), to efficiently model the residual signal statistics and to improve the overall coding efficiency. However, this combination of transforms necessitates an increase in the memory requirement as well as coding complexity, which could potentially limit their practical use. In this paper, we solve both memory and complexity issues by reducing the number of transforms, where some of the transforms are generated from other by simple mathematical operations, like sign changing and reverse ordering. The simulation results showed that this approach achieves competitive results with substantial simplification of the transform design.
Challenges of eye tracking systems for mobile XR glasses
Author(s):
Injoon Hong;
Kyeongryeol Bong;
Hoi-Jun Yoo
Show Abstract
In this paper, we summarized power and latency challenges of eye tracking systems for mobile XR glasses. Compared to conventional ET applications like psychology or neuroscience experiments, or user interface for people with disability, XR glasses require a lot lower power consumption considering overall XR system power budget, which is less than ~1W with battery power. At the same time, latency is also important since if ET latency is too large so that it is difficult to accommodate the ET and graphics rendering within the target XR’s motion-to-photon latency, then users feel motion sickness. Considering the challenging ET’s power and the latency requirements in XR, we will introduce several factors that could impact on power and latency in both algorithm and system-level in this paper. Also, we will share rough power and latency exploration results to see how challenge it is to meet such two requirements at a same time. In addition, we reviewed commercial eye tracker’s power and speed performances to see if they can provide enough performance for the XR applications. Lastly, we will introduce some promising academy researches using ET embedded image sensor to satisfy both power and latency challenges in XR.
Neural net architectures for image demosaicing
Author(s):
Rhys Buggy;
Marco Forte;
François Pitié
Show Abstract
Demosaicing remains a critical component of modern digital image processing, with a direct impact on image quality. Conventional demosaicing methods yield relatively poor results especially light-weight methods used for fast processing. Alternatively, recent works utilizing Deep Convolutional Neural Nets have significantly improved upon previous methods, increasing both quantitative and perceptual performance. This approach has seen significant reduction of artifacts but there still remains scope for meaningful improvement. To further this research, we investigate the use of alternate architectures and training parameters to reduce incurred errors, especially visually disturbing demosaicing artifacts such as moiré and provide an overview of current methods to better understand their expected performance. Our results show a U-NET style Network to outperform previous methods in quantitative and perceptual error and remain computationally efficient for use in GPU accelerated applications as an end-to-end demosaicing solution.
Identification of 3D objects using correlation of holograms
Author(s):
Ujitha Abeywickrema;
Rudra Gnawali;
Partha P. Banerjee
Show Abstract
A hologram is the 2D recording of the amplitude and phase of an object. Often the phase contains the information of the depth of a 3D object. In our previous work, the 3D mapping of different surfaces have been performed using digital holographic topography. For instance, multi-wavelength digital holography has been used to resolve deformations/depths which are in the order of several microns to centimeters. Now, 2D correlation has been extensively used as a pattern recognition tool to distinguish between different 2D objects. For 3D object correlation, a novel technique involving 2D correlation of holograms is proposed, enabling identification of the 3D object. As a proof-of-principle, in this work holograms of objects which have identical intensity features and different depth profiles are computer generated as well as optically recorded. Then 2D correlation is applied to distinguish the objects. An additional advantage of this method is that one can compare phase/depth information without performing additional numerical steps such as phase unwrapping.
Real-time video stitching based on ORB features and SVM
Author(s):
Ruifeng Yuan;
Ming Liu;
Mei Hui;
Yuejin Zhao;
Liquan Dong;
Lingqin Kong;
Zhi Cai
Show Abstract
Real-time video stitching, which is used to obtain a large field video by some small field cameras, has great significance in real life. The existing video mosaic method based on SIFT features and RANSAC algorithm takes too much time in the processing of the first frame image, and the transformation matrix will have large errors when the number of the feature points matched correctly is small. In this paper, a real-time video stitching method based on ORB features and support vector machine (SVM) using binocular cameras is proposed. Firstly, the distortion of the cameras is corrected. Secondly, the ORB features in the overlapped regions of the first two frame images are extracted. Each pair of the feature points matched is filtered through the pre-trained SVM model. The matching calculation is terminated after 4 pairs of feature points are obtained and the transformation matrix can be calculated. Finally, the video stitching result can be obtained by image registration. The experiments show that the real-time seamless wide-field video can be obtained, and the first frame processing time of this method is much shorter than the other methods available, the frame frequency is 30fps.
A real-time perception system for autonomous cars targeting highways
Author(s):
S. Al Dhahri;
S. Al Sieairi;
H. AlMarashda;
M. Meribout
Show Abstract
In this paper, a real-time perception system for autonomous car is presented. It is based on a highly parallel architecture using state of the art Field Programmable Gate Array (FPGA) to perform both low and intermediary levels image processing tasks at video frame rate (i.e. 30 frames / s). The hardware algorithm consists to perform noise removal and edge detection, followed by Hough transform task to extract the segments corresponding the lanes boundaries. The rich hardware resources which are available in nowadays FPGAs (e.g. large built-in distributed RAM memories, DSP blocks, and reconfigurable PLLs) yielded for a compact and low power consumption real-time vision system. Series of tests on different roads within Abu Dhabi city were successfully conducted for different scenarios such as continues lines, discontinues lines and slightly curved lines for which the car speed reached up to 122 km/h.
A video smoke detection method based on structural similarity index determined by complexity of image
Author(s):
Ming Zhu;
Jiaying Wu;
Shu Wang;
Tianying Ma
Show Abstract
Video-based smoke detection has a very wide range of applications. In this paper, we propose a smoke detection method based on the structural similarity and complexity of image. The background of image blurred by smoke causes the degradation of image quality, similar to adding noise to the image, which is very different from the background obscured by objects. Thus the structural similarity of image, usually for objective quality assessment of image, can be utilized to detect smoke qualitatively. The value of structural similarity index of image is affected by the complexity of image. We extract the texture features of image based on the gray level co-occurrence matrix and use the weighted sum of the second moment, the contrast, the inverse moment, the entropy and the correlation of image to determine the complexity of image. On this basis, we propose a method based on the structural similarity index of image determined by the image complexity for qualitative and quantitative smoke detection, and develop a DSPs system of video smoke detection based on the DM6437 EVM made by Texas Instruments. Experimental results show that the values of the structural similarity index determined by complexity of image are in good agreement with the results of the obscuration coefficient measured from optical smoke density meter.
IRSUM: inter-frame registration based non-uniformity correction using spatio-temporal update mask
Author(s):
Huseyin Seckin Demir;
Omer Faruk Adil
Show Abstract
We propose a novel scene-based non-uniformity correction method to achieve better fixed-pattern noise reduction and eliminate ghosting artifacts. Our approach is based on robust parameter updates via inter-frame registration and spatio-temporally consistent correction coefficients. We utilized a GMM-based spatio-temporal update mask to selectively refine the estimations of correction coefficients. The results of our experiments on an extensive dataset consisting of both synthetically corrupted data and real infrared videos show that the proposed algorithm achieves superior performance in PSNR and roughness metrics with notably lower ghosting artifacts when compared to other state-of-the-art methods.
Optical surface inspection: A novelty detection approach based on CNN-encoded texture features
Author(s):
Michael Grunwald;
Matthias Hermann;
Fabian Freiberg;
Pascal Laube;
Matthias O. Franz
Show Abstract
In inspection systems for textured surfaces, a reference texture is typically known before novel examples are inspected. Mostly, the reference is only available in a digital format. As a consequence, there is no dataset of defective examples available that could be used to train a classifier. We propose a texture model approach to novelty detection. The texture model uses features encoded by a convolutional neural network (CNN) trained on natural image data. The CNN activations represent the specific characteristics of the digital reference texture which are learned by a one-class classifier. We evaluate our novelty detector in a digital print inspection scenario. The inspection unit is based on a camera array and a flashing light illumination which allows for inline capturing of multichannel images at a high rate. In order to compare our results to manual inspection, we integrated our inspection unit into an industrial single-pass printing system.
Aesthetic color templates for enhancing casual videos
Author(s):
Jun-Ho Choi;
Jong-Seok Lee
Show Abstract
In this study, we analyze color characteristics of professional videos in comparison to casual videos in the viewpoint of their aesthetic quality. It is shown that the saturation and brightness components have larger differences between amateur and professional videos than the hue component. Then, we propose aesthetic color templates that are obtained from professional videos. We provide showcases where casual videos are made closer to professional videos in terms of aesthetic appearance based on the proposed templates. The results demonstrate that employing our color templates is beneficial to enhance video aesthetics.
Noise analysis of two pattern recognition methodologies using binary masks based on the fractional Fourier transform
Author(s):
Esbanyely Garza-Flores Sr.;
Josué Álvarez-Borrego
Show Abstract
Noise often corrupts images; therefore, it is essential to know the performance capability of a pattern recognition algorithm for images affected by it. In this work, a complete analysis of two methodologies is performed when images are affected by Gaussian and salt and pepper noise. The two methods use the nonlinear correlation of signatures. A signature is a onedimensional vector that represents each image, and it is obtained using a binary mask created based on the fractional Fourier transform (FRFT). In the first methodology, a spectral image it is used as the input to the system. The spectral image is the modulus of the Fourier transform (FT) of the image processed. The binary mask is generated from the real part of the FRFT of the spectral image. The signature is constructed by sampling the modulus of the FRFT of the spectral image with the mask. In the second methodology, the image is the input to the system, and the binary mask is obtained from the real part of the FRFT of the image. The signature, in this case, is obtained by sampling the modulus of the FT of the image with the binary mask. Each method was tested using the discrimination coefficient metric.
Stereo vision and fourier transform profilometry for 3D measurement
Author(s):
Peizheng Huang;
Xiao Lu;
Yang Liu;
Zhenrong Zheng
Show Abstract
A three-dimensional data measurement method combining stereo vision and Fourier transform profilometry(FTP) is proposed in this paper. Stereo vision is simple and fast but it is prone to mismatch in smooth areas. FTP has high accuracy over smooth areas but it can only measure a limited height gradient. Depth information of high quality is obtained by combining these two techniques. Firstly, the system was constructed with four CMOS sensor cameras and a projector. Four cameras’ position were adjusted to capture pictures of scene, and thus four depth maps were obtained from disparity. The rough depth map of the central camera field was obtained using image transform and image mosaic. Secondly, the rough depth map was segmented into parts with similar depth by Flood Fill algorithm. A mask was constructed based on each part to choose the proper areas of stripe image for FTP. Modulation analysis was used to get unwrapped phase from the distorted strip pattern caused by the object height. By merging the initial depth map with the depth map of FTP, the final depth map of high quality was obtained.
High-resolution DMD-FPM system based on ring pattern phase retrieval algorithm
Author(s):
Xiao Tao;
Jinlei Zhang;
Youquan Liu;
Chang Wang;
Wentao Zhang;
Chenning Tao;
Shengqian Chang;
Zhenrong Zheng
Show Abstract
We report an approach to enhance the resolution of the microscopy imaging by using the fourier ptychographic microscopy (FPM) method with a laser source and Spatial Light Modulator (SLM) to generate modulated sample illumination. The performance of the existed FPM system is limited by low illumination efficiency of the LED array. In our prototype setup, digital micromirror device (DMD) is introduced to replace the LED array as a reflective spatial light modulator and is placed at the front focal plane of the 4F system. A ring pattern sample illumination is generated by coding the micromirrors on the DMD, and converted to multi-angular illumination through the relay illumination system. A series of intensity sample images can be obtained by changing the size of the ring pattern and then used to reconstruct high resolution image through the ring pattern phase retrieval algorithm. Finally, our method is verified by an experiment using a resolution chart. The results also show that our method have higher reconstruction resolution and faster imaging speed.
Why JPEG is not JPEG: testing a 25 years old standard
Author(s):
Thomas Richter;
Richard Clark
Show Abstract
While ISO WG1 recently celebrated the 25th anniversary of its most successful standard, it seems to be more than surprising that up to now, no reference implementation of this standard exists. During an ongoing activity aiming at filling this gap, several observations have been made in how far the “living standard” deviates from the ISO documents. In particular, applying the official reference testing procedure of JPEG, available as ITU Recommendation T.83 or ISO/IEC 10918-2, turned out to be more a challenge than expected. This document sheds some light on the JPEG ISO standard, and our findings during reference testing a legacy, 25 year old standard.
Overview of the JPEG XS core coding system subjective evaluations
Author(s):
Alexandre Willème;
Saeed Mahmoudpour ;
Irene Viola;
Karel Fliegel;
Jakub Pospíšil;
Touradj Ebrahimi;
Peter Schelkens;
Antonin Descampe;
Benoit Macq
Show Abstract
The JPEG committee (Joint Photographic Experts Group, formally known as ISO/IEC SC29 WG1) is currently in the process of standardizing JPEG XS, a new interoperable solution for low-latency, lightweight and visually lossless compression of image and video. This codec is intended to be used in applications where content would usually be transmitted or stored in uncompressed form such as in live production, display links, virtual and augmented reality, self driving vehicles or frame buffers. It achieves bandwidth and power reduction for transparent and low latency coding for compression ratios ranging from 2:1 to 6:1. The subjective assessment of the impact of visually lossless compression poses particular challenges. This paper describes the subjective quality evaluation conducted on the JPEG XS core coding system. In particular, it details the test procedures and compares the results obtained by the different evaluation laboratories involved in the standardization effort.
Entropy coding, profiles, and levels of JPEG XS
Author(s):
Thomas Richter;
Joachim Keinert;
Antonin Descampe;
Gael Rouvroy
Show Abstract
JPEG XS is a new standard for low-latency and low-complexity coding designed by the JPEG committee. Unlike former developments, optimal rate distortion performance is only a secondary goal; the focus of JPEG XS is to enable cost-efficient, easy to parallelize implementations suitable for FPGAs or GPUs. In this article, we shed some light on the entropy coding back-end of JPEG XS and introduce profiles and levels that are currently under discussion in the JPEG committee.
Emerging image metadata standards activities in JPEG
Author(s):
Andy Kuzma;
Frederik Temmermans ;
Thomas Richter
Show Abstract
This paper presents a review of JPEG metadata activities that enable enriched interactions with JPEG images. This is achieved through a multi-application metadata framework which builds on an extensible box-structure used in JPEG files. The JPEG 360 standard is the first application to use this metadata structure to support omnidirectional images. The upcoming JPEG Privacy and Security standard will follow the same approach.
JPEG Pleno: a standard framework for representing and signaling plenoptic modalities
Author(s):
Peter Schelkens;
Zahir Y. Alpaslan;
Touradj Ebrahimi;
Kwan-Jung Oh;
Fernando M. B. Pereira;
Antonio M. G. Pinheiro;
Ioan Tabus;
Zhibo Chen
Show Abstract
In recent years, we have observed the advent of plenoptic modalities such as light fields, point clouds and holography in many devices and applications. Besides plenty of technical challenges brought by these new modalities, a particular challenge is arising at the horizon, namely providing interoperability between these devices and applications, and – in addition – at a cross-modality level. Based on these observations the JPEG committee (ISO/IEC JTC1/SC29/WG1 and ITU-T SG16) has initiated a new standardization initiative – JPEG Pleno – that is intended to define an efficient framework addressing the above interoperability issues. In this paper, an overview is provided about its current status and future plans.
A new objective metric to predict image quality using deep neural networks
Author(s):
Pinar Akyazi;
Touradj Ebrahimi
Show Abstract
Quality assessment of images is of key importance for mulmedia applications. In this paper we present a new full reference objective metric to predict the quality of images using deep neural networks. The network makes use of both the color as well as frequency information extracted from reference and distorted images. Our method comprises of extracting a number of equal sized random patches from the reference image and the corresponding patches from the distorted image, then feeding the patches themselves as well as their 3-scale wavelet transform coefficients as input to a neural network. The architecture of the network consists of four branches, with the first three generating frequency features and the fourth extracting color features. Feature extraction is carried out using 12 to 15 convolutional layers and one pooling layer, while two fully connected layers are used for regression. The overall image quality is computed as a weighted sum of patch scores, where local weights are also learned by the network using two additional fully connected layers. The network was trained using TID2013 and tested on TID2013, CSIQ and LIVE image databases. Our results show high correlations with subjective test scores, are generalizable for certain types of distortions and are competitive with respect to the state-of-the-art methods.
Noise removal of the x-ray medical image using fast spatial filters and GPU
Author(s):
Luis Cadena;
Alexander Zotin;
Franklin Cadena;
Nikolai Espinosa
Show Abstract
Medical images are corrupted by different types of noises caused by the equipment itself. It is very important to obtain precise images to facilitate accurate observations for the given application. Removing of noise from images is now a very challenging issue in the field of medical image processing. This work undertake the study of noise removal techniques in medical image by using fast implementation of different digital filters, such as average, median and Gaussian filter. Processing of X-ray medical images takes a significant time. Now days modern hardware allows to use parallel technology for image processing on CPU and GPU. Using GPU processing technology were proposed parallel implementations of noise reduction algorithm taking into account the data parallelism. The experimental study conducted on medical X-ray image, so that to choose the best filters considering medical task and time of processing. The comparison of the implementation of fast filters algorithm and GPU implementation show great increase in performance. Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms.
Optimization of coded aperture in compressive x-ray tomography
Author(s):
Tianyi Mao;
Angela P. Cuadros;
Xu Ma;
Weiji He;
Qian Chen;
Gonzalo R. Arce
Show Abstract
The CT system structure matrix in the coded aperture compressive X-ray tomography (CACXT) is highly structured and thus the random coded apertures are not optimal. A fast approach based on minimal information loss is proposed. The peak signal to noise ratios (PSNR) of the reconstructed images with optimized coded apertures exhibit significant gains and the design execution time is reduced by orders of magnitude. Simulations results for optimized coded apertures are shown, and their performance is compared to the use of random coded apertures.
Convolutional neural network based computational imaging spectroscopy
Author(s):
Chenning Tao;
Xiao Shu;
Wentao Zhang;
Xiao Tao;
Chang Wang;
Zhenrong Zheng
Show Abstract
Computational imaging spectrometry provides spatial-spectral information of objects. This technology has been applied in biomedical imaging, ocean monitoring, military and geographical object identification, etc. Via compressive sensing with coded apertures, 3D spatial-spectral data cube of hyperspectral image is compressed into 2D data array to alleviate the problems due to huge amounts of data. In this paper, a 3D convolutional neural network (3D CNN) is proposed for reconstruction of compressively sensed (CS) multispectral image. This network takes the 2D compressed data as the input and gives an intermediate output, which has identical size with the original 3D data. Then a general image denoiser is applied on it to obtain the final reconstruction result. The network with one fully connected layer, six 3D convolutional layers is trained with a standard hyperspectral image dataset. Though the compression rate is extremely high (16:1), this network performs well both in spectral reconstruction, demonstrated with single point spectrum, and in quantitative comparison with original data, in terms of peak signal to noise ratio (PSNR). Compared with state-of-the-art iterative reconstruction methods e.g. two-step iterative shrinkage/thresholding (TwIST), this network features high speed reconstruction and low spectral dispersion, which potentially guarantees more accurate identification of objects.
Magnetic resonance brain images algorithm to identify demyelinating and ischemic diseases
Author(s):
D. Castillo;
René Samaniego;
Y. Jiménez;
L. Cuenca;
O. Vivanco;
M. J. Rodríguez-Álvarez
Show Abstract
Brain demyelination lesions occur due to damage of the myelin layer of nerve fibers, this deterioration is the cause of pathologies such as multiple sclerosis, leukodystrophy, encephalomyelitis. Brain ischemia is the interruption of the blood supply to the brain, and the flow of oxygen and nutrients needed to maintain the correct functioning of brain cells. This project presents the results of an algorithm processing images with the the main objective of identify and differentiate between demyelination and ischemic brain diseases through the automatic detection, classification and identification of their features found in the magnetic resonance images. The sequences of images used were T1, T2, and FLAIR and with a dataset of 300 patients with and without these or other pathologies, respectively. The algorithm in this stage uses Discrete Wavelet Transform (DWT), principal component analysis (PCA) and a kernel support vector machine (SVM). The algorithm developed indicates a 75% of accuracy, for that reason, with an effective validation could be applied for the fast diagnosis and contribute to an effective treatment of these brain diseases especially in the rural places.
Face recognition by using wavelet-subband booster
Author(s):
J. W. Wang;
T. H. Chen
Show Abstract
Lighting variation is a challenge for face recognition. This paper proposes a new enhancement method called waveletsubband booster, which can be used to redeem face image quality, to overcome this problem. The efficient brightness detector was used to classify the color face image into three classes of bright, normal, or dark. The RGB color channels of the face image were respectively transformed to the discrete wavelet domain. Each subband coefficients of the RGB color channels were then adjusted by multiplying the singular value matrices of these frequency subband coefficient matrices with the boosting coefficients. An image denoising model was further applied. Then the 2D inverse discrete wavelet transform was performed to obtain the boosted color face image without the lighting effect. The experimental results demonstrated the efficiency of the proposed methodology. The proposed method not only yield boosted images that are good as they were taken under normal lighting, but also significantly improve the accuracy and computation speed for face recognition.
Crop row detection a bioinspired and data analysis approach
Author(s):
Anabel Martínez-Vargas;
M. A. Cosío-León;
Gerardo Romo;
Gener Áviles-Rodríguez;
Julio C. Ramos-Fernández
Show Abstract
The increasing of robotics equipped with machine vision sensors applied to Precision Agriculture is demanding solutions for several problems. The robot navigates and acts over a rough surface, considering specific restrictions. The information to navigate between the crops is supplied by physical sensors and mainly by some imaging detection system to the robot. The vision system for this kind of robots has many challenges, as changes in luminosity, uncontinuous crop row, processing capacity and time, as well as terrain conditions, among others. The aim of this research is to propose a method to develop a vision system for a tractor robot based on: PCA dimensionality reduction algorithm, the second derivative method and a genetic algorithm for crop row detection.
A correlation-based algorithm for detecting linearly degraded objects using noisy training images
Author(s):
Victor Karnaukhov;
Vitaly Kober
Show Abstract
The paper deals with the design of a composite correlation filter from noisy training images for reliable recognition and localization of distorted targets embedded into cluttered linearly degraded and noisy scenes. We consider the nonoverlapping signal model for the input scene and additive noisy model for the reference. The impulse response of the obtained filter is a linear combination of generalized filters optimized with respect to the peak-to-output energy. The performance of the proposed composite correlation filter is analyzed in terms of discrimination capability and accuracy of target location when the reference objects and input scenes are degraded.
Accuracy analysis of 3D object shape recovery using depth filtering algorithms
Author(s):
Alexey Ruchay;
Konstantin Dorofeev;
Anastasia Kober;
Vladimir Kolpakov;
Vsevolod Kalschikov
Show Abstract
In this paper, we estimate the accuracy of 3D object reconstruction using depth filtering and data from a RGB-D sensor. Depth filtering algorithms carry out inpainting and upsampling for defective depth maps from a RGB-D sensor. In order to improve the accuracy of 3D object reconstruction, an efficient and fast method of depth filtering is designed. Various methods of depth filtering are tested and compared with respect to the reconstruction accuracy using real data. The presented results show an improvement in the accuracy of 3D object reconstruction using depth filtering from a RGB-D sensor.
3D object reconstruction using multiple Kinect sensors and initial estimation of sensor parameters
Author(s):
Alexey Ruchay;
Konstantin Dorofeev;
Anastasia Kober
Show Abstract
In this paper, we reconstruct 3D object shape using multiple Kinect sensors. First, we capture RGB-D data from Kinect sensors and estimate intrinsic parameters of each Kinect sensor. Second, calibration procedure is utilized to provide an initial rough estimation of the sensor poses. Next, extrinsic parameters are estimated using an initial rigid transformation matrix in the Iterative Closest Point (ICP) algorithm. Finally, a fusion of calibrated data from Kinect sensors is performed. Experimental reconstruction results using Kinect V2 sensors are presented and analyzed in terms of the reconstruction accuracy.
An efficient detection of local features in depth maps
Author(s):
Alexey Ruchay;
Konstantin Dorofeev;
Anastasia Kober
Show Abstract
In this paper, we propose an algorithm for the detection of local features in depth maps. The local features can be utilized for determination of special points for Iterative Closest Point (ICP) algorithms. The proposed algorithm employs a novel approach based a cascade mechanism, which can be applied for several 3D keypoint detection algorithms. Computer simulation and experimental results obtained with the proposed algorithm in real-life scenes are presented and compared with those obtained with state-of-the-art algorithms in terms of detection efficiency, accuracy, and speed of processing. The results show an improvement in the accuracy of 3D object reconstruction using the proposed algorithm followed by ICP algorithms.
Removal of impulsive noise from color images with cascade switching algorithm
Author(s):
Alexey Ruchay;
Anastasia Kober;
Vladimir Kolpakov;
Tatyana Makovetskaya
Show Abstract
This presentation deals with restoration of the image corrupted by impulsive noise using a novel cascade switching algorithm. The algorithm processes iteratively the observed degraded image changing adaptively parameters of the switching filter. With the help of computer simulation, we show that the proposed algorithm is able to effectively remove impulse noise from a highly contaminated image. The performance of the proposed algorithm is compared with that of common successful algorithms in terms of image restoration metrics.
Extrinsic calibration and usage of a single-point laser rangefinder and single camera
Author(s):
Zewei Liu;
Dongming Lu;
Weixian Qian;
Jun Zhang II;
Jinqing Yang III
Show Abstract
Laser and visual imagery have been broadly utilized in computer vision and mobile robotics applications because these sensors provide complementary information. So we focus attention on the fusion of 1-D laser rangefinder and camera. However, finding the transformation between the camera and the 1-D laser rangefinder is the first necessary step for the fusion of information. Many algorithms have been proposed to calibrate camera and 2-D or 3-D laser rangefinder, but few methods for 1-D laser rangefinder. In this paper, we propose a robust extrinsic calibration algorithm that is implemented easily and has small calibration error. Due to the 1-D laser rangefinder only returns a data in one dimension direction, it is difficult to build geometric constraint equations like 2-D laser rangefinder. So we are no longer constrained to build constraint equations to finish calibration. Due to the spot of the single-point laser rangefinder we commonly use is mostly invisible, we can determine the full calibration even without observing the laser rangefinder observation point in the camera image. We evaluate the proposed method demonstrating the efficiency and good behavior under noise. Finally we calibrate the installation error of camera utilizing the calibration result.
Application of SRIO in real-time image processing
Author(s):
Zhanchao Wang;
Min Huang;
Baowei Zhao;
Yan Sun
Show Abstract
In the real-time image processing system, SRIO was used to meet the demands of massive data interacting capacity between FPGA and DSP. This paper realized the massive image data transmission between FPGA and DSP with SRIO. The image sensor outputs image data in 4 channels and the clock in each channel is 175MHz, five times of input clock. Since the data channel is double data rate, so the data rate of one channel is 350Mbps, the data rate of the whole image sensor is 1400Mbps. FPGA receives the sampled data from image sensor and reorganizes the image data, and then transmits the organized data to camera link interface for display testing on the one hand; on the other hand, FPGA transmits the sorted data to DSP via SRIO for further process. The SRIO transmission between FPGA and DSP uses x1 mode, 8b/10b coding, and the transmission rate is 2.5Gbps per lane. The result shows that the image in camera link interface is fine and the SRIO transmission is successful.
All-in-focus image reconstruction robust to ghosting effect
Author(s):
Sergio G. Angulo;
Julia R. Alonso;
M. Strojnik;
Ariel Fernández;
G. García-Torales;
J. L. Flores;
Jose A. Ferrari
Show Abstract
Depth of field (DOF) is still a limitation in the acquisition of sharp images. Since DOF depends on the optical system, a shallow DOF is common in macro-photography and microscopy. In this work, we propose a post-processing method to improve an all-in-focus algorithm based on a linear system. Using a multi-focus stack we extract a focused object of interest. Finally, we synthesize a single image with extended DOF including occlusions using a-priori information of the objects of the acquired scene. We present theory and experimental results to show the performance of the proposed method using real images captured with a fixed camera.
Polarization spectroscopy of blood and punctate douglas deepening in patients with ovarian tumors
Author(s):
Olexander Peresunko;
Sergey Yermolenko;
Ksenia Rudan;
Bin Guo;
Zhebo Chen
Show Abstract
The reason for the existing methods of diagnosis of ovarian tumors is related to the long asymptomatic course of the disease, the difficulty of differentiation with the use of existing reliable methods for diagnosis of ovarian tumors and their relative bias. The aim of the study was to use the polarization spectrophotometry method to develop a diagnostic algorithm for blood studies and the content of douglas deepening in women with ovarian tumors. A comparative analysis of the content of douglas deepening of healthy women and patients with ovarian cancer revealed significantly greater optical anisotropy of the latter. Investigation of polarization images of blood revealed a sufficiently developed microcrystalline structure. The results of a punctal study are similar to those of women. Based on the study of blood and puncture and douglas deepening of healthy women and patients with benign and malignant tumors of the ovaries, using the method of polarization spectrophotometry, photometric and polarization criteria indicating the presence of malignancy of the tumor were experimentally developed and clinically tested.
Spectroscopic image criteria for the selection of patients with ovarian cancer for further molecular genetic studies
Author(s):
O. P. Peresunko;
M. S. Gavrylyak;
S. B. Yermolenko
Show Abstract
The article is devoted to the substantiation of the selection of patients with ovarian cancer (OC) for the purpose of conducting expensive molecular genetic studies on genotyping, namely the identification of gene of hereditary predisposition of OC (BRCA1 and BRCA2). The proposed method makes it possible at the first stage to find among all patients with ovarian the proposed individuals whose further molecular genetic studies will objectively confirm the diagnosis of hereditary cancer (by genotyping on (BRCA I and II), which significantly reduces the cost of these studies in the population. A preliminary data show that the optical method of infrared plasma spectroscopy of patients with ovarian cancer patients needs further research as a test screening method for the diagnosis of hereditary ovarian cancer as a preliminary selection of patients for molecular genetic studies (BRCA I and II).
Polarization image processing of chordae tendinea of atrio-ventribular heart valves of the foetus
Author(s):
N. P. Penteleichuk ;
O. V. Tsyhykalo;
Yu. Yu. Malyk;
T. O. Semeniuk ;
S. B. Yermolenko
Show Abstract
The aim of this research is to clarify the anizotropic structure and polarization properties of the chordae tendinea of the atrio-ventricular heart valves of the foetus.The researches of images of sections of chordae tendinea of the human heart using the method of laser polarimetry have been carried out to systematically analyze the topography of anisotropic properties of collagen fibers, elastic fibers and contractile cardiomyocytes through the obtained intensity distributions at different polarization orientations of Stokes vector parameters, polarization maps, as well as elements of the Mueller matrixes and their respective statistical moments from the 1st to the 4th order. It was shown that the character of manifestations of anisotropic properties manifested itself in different topographical architectures of these types of fibers with a different level of birefringence, which were visualized in different polarization states of the probing and analyzing beams, which was also described by other scientists.
Laser polarimetry imaging in diagnostics of morphological structure of the heart valve tendinous cords of newborns
Author(s):
S. B. Yermolenko;
N. P. Penteleichuk ;
O. V. Tsyhykalo;
Yu. Yu. Malyk;
T. O. Semeniuk
Show Abstract
The morphological peculiarities of TS mitral valve of the heart of man in normal and abnormal spaced strings of the left ventricle and the study of their structural features depending on the location was studied. Results of statistical dependency and the correlation structure of two-dimensional Mueller matrix elements in different spectral ranges of laser radiation on changes in the distribution of orientations of the optical axes and birefringence of protein crystals biological tissues revealed. Diagnostic sensitivity to changes in optical anisotropic component of biological objects of statistical (statistical moments of the 1st - 4th order) and correlation (half-width of the autocorrelation function and the variance of the power spectra) parameters was founded.
Polarization structural property of the images of chordae tendineae of the mitral and tricuspid heart valves of the infants
Author(s):
N. P. Penteleichuk ;
O. V. Tsyhykalo;
Yu. Yu. Malyk;
T. O. Semeniuk;
K. M. Chala;
S. B. Yermolenko
Show Abstract
The aim of this study is to study the microscopic, submicroscopic structure and polarization properties of the chordae tendineae of the atrio-ventricular valves of the heart of infants. Methods of laser polarimetry provide additional information about the morphological and optico-anisotropic structure of the biological tissues in norm and in pathology, which is aimed at visualization and in order to receive of the informative images of the polarization-inhomogeneous object fields and their statistical analysis, makes it possible to establish the differential signs of the biological tissues state in norm and pathology. The disordered architectonics of cardiomyocytes of various localization was manifested in false tendons of fibro-muscular and muscular types. The topologically localized and phase-inhomogeneous component of the anisotropic formations of cardiomyocytes manifested itself in the values of statistical moments of higher orders of the Stokes vector-parameter S3. The correlation characteristic of the filtered collagen network indicates an ordered and largescale character of manifestations of hierarchical anisotropy of collagen fibers with a distinct difference in the values of the fourth-order statistical moment (excess) of the values of the ellipticity of the polarization of the image.
Polarization-interference images of optically anisotropic biological layers
Author(s):
M. Yu. Sakhnovskiy;
O. Yu. Wanchuliak;
B. Bodnar;
I. V. Martseniak;
O. Tsyhykalo;
A. V. Dubolazov;
V. A. Ushenko;
O. I. Olar;
P. M. Grygoryshyn
Show Abstract
A theoretical basis for the method of polarization-interference mapping of optically thin polycrystalline films of human biological fluids is given. The coordinate distributions of the value of the local contrast of the interference distributions of the polarization-inhomogeneous microscopic images of polycrystalline films of the synovial fluid of the human joint are investigated. In the framework of the statistical (statistical moments of the 1st-4th order) approaches, objective criteria for the distribution of the values of local contrast are established. The possibility of differentiation of weak changes in the optical anisotropy of blood films of healthy and patients with breast cancer patients is determined.
System of biological crystals fibrillar networks polarization-correlation mapping
Author(s):
M. Yu. Sakhnovskiy;
A.-V. Syvokorovskaya;
V. Martseniak;
B. M. Bodnar;
O. Tsyhykalo;
A. V. Dubolazov;
O. I. Olar;
V. A. Ushenko;
P. M. Grygoryshyn
Show Abstract
A new model of the correlation description of the optically anisotropic component of biological tissues is proposed. To characterize the transformation of the fibrillar network of laser radiation, a new parameter is used-the complex "twopoint" Stokes vector. The interconnections between the parameters of the "two-point" Stokes vector with the distributions of the optical axes and birefringence of the fibrillar network of biological tissue are found. A scheme of generalized Stokes polarimetry of microscopic images of histological sections of fibrillar biological tissues is developed. A technique for direct polarization measurement of the distributions of the real and imaginary parts of the "two-point" Stokes vector is proposed. Maps of phase distributions of the "two-point" Stokes vector of histological sections of the myocardium and endometrium. The sensitivity, specificity and balanced accuracy of the method of polarizationcorrelation mapping of fibrillar networks of biological crystals are determined. Within the framework of the statistical analysis of the phase maps of the "two-point" Stokes vector, histological sections of the myocardium and endometrium, objective criteria for diagnosing endometrial cancer have been found and an excellent balanced accuracy of the cause of death.
System of differential Mueller-matrix mapping of phase and amplitude anisotropy of depolarizing biological tissues
Author(s):
Yu. A. Ushenko;
O. V. Olar;
A. V. Dubolazov;
O. B. Bodnar;
B. M. Bodnar;
L. Pidkamin;
O. Prydiy;
M. I. Sidor;
D. Kvasnyuk;
O. Tsyhykalo
Show Abstract
The possibility of solving the inverse problem - extraction of information about linear and circular birefringence and dichroism of light scattering biological layers is considered. An analytical model of the optical anisotropy of depolarizing biological tissues is proposed. The Mueller matrix is represented as a superposition of differential matrices of the first and second orders. Interrelations between the parameters of phase and amplitude anisotropy and elements of a first-order differential matrix are obtained. The algorithms for the experimental measurement of the coordinate distributions of the elements of the polarization component of the Mueller matrix of depolarizing biological tissue are found. The symmetry and features of first-order differential matrices of fibrillar (muscle) and parenchymal (liver) depolarizing biological tissues are investigated. The interrelations between the statistical moments of the first and fourth orders and the features of the morphological polycrystalline structure of the biological tissues of various human organs are found. The ways of application of the differential Mueller-matrix mapping method in clinical diagnostics of the distribution of the phase and amplitude anisotropy distributions are proposed.
Accurate alignment of RGB-D frames for 3D map generation
Author(s):
Jose A. Gonzalez-Fraga;
Vitaly Kober;
Everardo Gutiérrez
Show Abstract
It is well known that the accuracy and resolution of depth data decreases when the distance from a RGB-D sensor to a 3D object of interest increases, affecting the performance of 3D scene reconstruction systems based on an ICP algorithm. In this paper, to improve the 3D map accuracy by aligning multiple cloud points we propose: first, to split the depth plane into sub-clouds with a similar resolution; then, in each sub-cloud to select a minimum number of keypoints for aligning them separately with an ICP algorithm; finally, to merge all clouds into a dense 3D map. Computer simulation results show the performance of the proposed algorithm of the 3D scene reconstruction using real indoor environment data.
Object tracking with composite optimum filters using non-overlapping signal models
Author(s):
Jose A. Gonzalez-Fraga;
Vitaly Kober;
Omar Alvarez-Xochihua
Show Abstract
In order to design a tracking algorithm with invariance to pose, occlusion, clutter, and illumination changes of a scene, non-overlapping signal models for input scenes as well as for objects of interest and Synthetic Discriminant Function approach are exploited. A set of optimum correlation filters with respect to peak-to-output energy is derived for different target versions in each frame. A prediction method is utilized to locate a target patch in the coming frame. The algorithm performance is tested in terms of recognition and localization errors in real scenarios and compared with that of the state-of-the-art tracking algorithms.
High efficient energy compaction network for image transform
Author(s):
JongSeok Lee;
Sunyoung Jeon;
Kwang Pyo Choi;
Youngo Park;
JaeHwan Kim;
JeongHoon Park
Show Abstract
For decades, Discrete Cosine Transform (DCT) has been a crucial role for video and image compression since Chen and Pratt proposed image compression application based on DCT. The energy compaction property of DCT is highly efficient for the compression when combined with entropy coder and a specific scan order. By exploiting the property, DCT has widely been used for video and image compression from JPEG, which is the famous image compression format, to High Efficiency Video Coding (HEVC), which is the latest video compression standard, over the 20 years. Since DCT has been used for image compression, several transforms have been proposed for the better compression performance than DCT. Among them, the most famous transform is Karhunen–Loève transform (KLT). The KLT has the best performance in the aspects of the energy compaction. However, the KLT must send the extra information of transform basis, which is not required in DCT, therefore its compression performance is worse and complexity is heavier than DCT. To achieve the energy compaction performance of KLT without extra information, we propose the machine learning network, TransNet, for image/video transform. TransNet is trained to achieve the better energy compaction property than DCT and maintain the image quality simultaneously. To find the optimal point between reconstructed image quality and energy compaction, we propose new loss function based the orthogonal transform property and regularization term. To evaluate the compression performance of the proposed network, we compared DCT and TransNet using JPEG encoder. In terms of the BD-rate on Peak Signal to Noise Ratio (PSNR), the proposed network shows about 11% gain compared with DCT.
An algorithm of face recognition based on generative adversarial networks
Author(s):
Sergey Leonov;
Alexander Vasilyev;
Artyom Makovetskii;
J. Diaz-Escobar
Show Abstract
The problem of face recognition is the important task in the security field, closed-circuit television (CCTV), artificial intelligence, and etc. One of the most effective approaches for pattern recognition is the use of artificial neural networks. In this presentation, an algorithm using generative adversarial networks is developed for face recognition. The proposed method consists in the interaction of two neural networks. The first neural network (generative network) generates face patterns, and the second network (discriminative network) rejects false face patterns. Neural network of feed forward type (single-layer or multilayer perceptron) is used as generative network. The convolutional neural network is used as discriminative network for the purpose of pattern selection. A big database of normalized to brightness changes and standardized in scale artificial images is created for the training of neural networks. New facial images are synthesized from existing ones. Results obtained with the proposed algorithm using generative adversarial networks are presented and compared with common algorithms in terms of recognition and classification efficiency and speed of processing.
An algorithm for selecting face features using deep learning techniques based on autoencoders
Author(s):
Sergey Leonov;
Alexander Vasilyev;
Artyom Makovetskii;
Vladislav Kuznetsov;
J. Diaz-Escobar
Show Abstract
In recent years, deep learning as a part of the artificial intelligence theory has formed the basis for many advanced developments, such as drone, voice and image recognition technologies, and etc. The concept of deep learning is closely related to artificial neural networks. On the other hand deep learning techniques work with unmarked data. For this reason, deep learning algorithms show their effectiveness in face recognition. But there are a number of difficulties related to implementation of deep learning algorithms. Deep learning requires a large amount of unmarked data and long training. In this presentation a new algorithm for automatic selection of face features using deep learning techniques based on autoencoders in combination with customized loss functions to provide high informativeness with low withinclass and high between-class variance is proposed. The multilayer networks of feed forward type are used. The extracted features are used for face classification. The performance of the proposed system for processing, analyzing and classifying persons from face images is compared with that of state-of-art algorithms.
Stochastic and analytic modeling of atmospheric turbulence in image processing
Author(s):
Zuzana Krbcová;
Jaromír Kukal;
Quang Van Tran;
Jan Švihlík;
Karel Fliegel
Show Abstract
Modeling of atmospheric turbulence through Kolmogorov theorem belongs to traditional applications of 2D Fourier Transform (2D FT). It is based on Point Spread Function (PSF) in the spatial domain and its frequency domain image known as Optical Transfer Function (OTF). The latter is available in the explicit form. It enables to create an artificial fog effect in traditional image processing using 2D Discrete Fourier Transform (2D DFT). Exact knowledge of the Optical Transfer Function allows performing the image deblurring as deconvolution through Wiener method. The difference between the reference image and the deconvolution outcome can be quantified using SNR in traditional and rank modification. However, the real star image is a result of a stochastic process which is driven by 2D alpha-stable distribution. There is an efficient method how to generate a pseudorandom sample from the alpha-stable distribution. The distribution then enables to simulate the photon distribution following the theoretical PSF, i.e. convergence according to distribution is guaranteed. The comparison of both models and optimal parameter setting of Wiener deconvolution are studied for various exposure times and CCD camera noise levels. Obtained results can be generalized and applied to turbulent noise suppression.
Stabilization of median smoother via variational approach
Author(s):
Zuzana Krbcová;
Abduljalil Sireis;
Jaromír Kukal;
Jan Švihlík;
Karel Fliegel
Show Abstract
Traditional median smoother for 2D images is insensitive to impulse noise but generates flat areas as unwanted artifacts. The proposed approach to overcome this issue is based on the minimization of the regularized form of total variation functional. At first the continuous functional is defined for n-dimensional signal in an integral form with regularization term. The continuous functional is converted to the discrete form using equidistant spatial sampling in point grid of pixels, voxels or other elements. This approach is suitable for traditional signal and image processing. The total variance is then converted to the sum of absolute intensity differences as a minimization criterion. The functional convexity guarantees the existence of global minimum and absence of local extremes. Resulting non-linear filter iteratively calculates local medians using red-black method of Successive Over/Under Relaxation (SOR) scheme. The optimal value of the relaxation parameter is also subject to our study. The sensitivity to regularization parameter enables to design high-pass and nonlinear band-pass filters as the difference between the image and low-pass smoother or as the difference between two different low-pass smoothers, respectively. Various median based approaches are compared in the paper.
Reducing number of points for ICP algorithm based on geometrical properties
Author(s):
Dmitrii Tihonkih;
Aleksei Voronin;
Artyom Makovetskii;
J. Diaz-Escobar
Show Abstract
ICP is the most commonly used algorithm in tasks of point clouds mapping, finding the transformation between clouds, building a three-dimensional map. One of the key steps of the algorithm is the removal a part of the points and the searching a correspondence of clouds. In this article, we propose a method for removing some points from the clouds. Reducing the number of points decrease an execution time of the next steps and, as a result, increase performance. The paper describes an approach based on the analysis of the geometric shapes of the scene objects. In the developed algorithm, the points lying on the boundaries of the planes intersections, the so-called edges of objects, are selected from the clouds. Then the intersection points of the found edges are checked to belong the main vertices of the objects. After that, additional vertices are excluded from the edges and, if necessary, new ones are added. The described approach is performed for both point clouds. All further steps of the ICP algorithm are performed with new clouds. In the next step, after finding the correspondence, the vertices found in the previous step are taken from the first cloud, with all the edges connected with them. For each such group it is necessary to find the corresponding group from the second cloud. The method looks for correspondence for geometrically similar parts of point clouds. After finding the intermediate transformation, the current error is calculated. The original point clouds are used for the error calculation. This approach significantly reduces the number of points participating the deciding of the ICP variational subproblem.
Comparison of resolution estimation methods in optical microscopy
Author(s):
Jakub Pospíšil;
Karel Fliegel;
Jan Švihlík;
Miloš Klíma
Show Abstract
Super-resolution (SR) microscopy is a powerful technique which enhances the resolution of optical microscopes beyond the diffraction limit. Recent SR methods achieve the resolution of 100 nm. Theoretical resolution enhancement can be mathematically defined. However, the final resolution in the real image can be influenced by technical limitations. Evaluation of resolution in a real sample is essential to assess the performance of an SR technique. Several image based resolution limit evaluation methods exist, but the determination of cutoff frequency is still a challenging task. In order to compare the efficiency of assessing resolution methods, the reference estimation technique is necessary. There exist several conventional methods in digital image processing. In this paper, the most common resolution measurement techniques used in the optical microscopy imaging are presented and their performance compared.
A point-to-plane registration algorithm for orthogonal transformations
Author(s):
Artyom Makovetskii;
Sergei Voronin;
Vitaly Kober;
Aleksei Voronin
Show Abstract
The key point of the ICP algorithm is the search of either an orthogonal or affine transformations, which is the best in sense of the quadratic metric to combine two point clouds with a given correspondence between points. The point-toplane metric performs better than the point-point metric in terms of the accuracy and convergence rate. A closed-form solution to the point-to-plane case for orthogonal transformations is an open problem. In this presentation, we propose an approximation of the closed-form solution to the point-to-plane problem for orthogonal transformations.
A regularization algorithm for registration of deformable surfaces
Author(s):
Sergei Voronin;
Artyom Makovetskii;
Aleksei Voronin;
J. Diaz-Escobar
Show Abstract
The registration of two surfaces is finding a geometrical transformation of a template surface to a target surface. The transformation combines the positions of the semantically corresponding points. The transformation can be considered as warping the template onto the target. To choose the most suitable transformation from all possible warps, a registration algorithm must satisfies some constraints on the deformation. This is called regularization of the deformation field. Often use regularization based on minimizing the difference between transformations for different vertices of a surface. The variational functional consists of the several terms. One of them is the functional of the ICP (Iterative Closest Point algorithm) variational subproblem for the point-to-point metric for affine transformations. Other elements of the functional are stiffness and landmark terms. In proposed presentation we use variational functional based on the point-toplane metric for affine transformations. In addition, the use of orthogonal transformations is considered. The proposed algorithm is robust relative to bad initialization and incomplete surfaces. For noiseless and complete data, the registration is one-to-one. With the help of computer simulation, the proposed method is compared with known algorithms for the searching of optimal geometrical transformation.
Image dehazing using total variation regularization
Author(s):
Sergei Voronin;
Vitaly Kober;
Artyom Makovetskii
Show Abstract
Images of outdoor scenes are often degraded by particles and water droplets in the atmosphere. Haze, fog, and smoke are such phenomena due to atmospheric absorption and scattering. Numerous image dehazing (haze removal) methods have been proposed in the last two decades, and the majority of them employ an image enhancing or restoration approach. Different variants of local adaptive algorithms for single image dehazing are also known. A haze-free image must have higher contrast compared with the input hazy image. It is possible to remove haze by maximizing the local contrast of the restored image. Some haze removal approaches estimate a dehazed image from an observed hazed scene by solving an objective function, whose parameters are adapted to local statistics of the hazed image inside a moving window. In the signal and image processing a common way to solve the denoising problem utilizes the total variation regularization. In this presentation we propose a new algorithm combining local estimates of depth maps toward a global map by regularization the total variation for piecewise-constant functions. Computer simulation results are provided to illustrate the performance of the proposed algorithm for restoration of hazed images.
Performance comparison of perceived image color difference measures
Author(s):
Karel Fliegel;
Jana Kolmašová;
Jan Švihlík
Show Abstract
This paper deals with techniques for analysis of perceived color differences in images. Impact of color artifacts in image processing chain is a critical factor in the assessment of overall Quality of Experience (QoE). At first, an overview of color difference measures is presented. The performance of the methods is compared based on the results from subjective studies. Possible utilization of publicly available datasets with associated subjective scores is discussed. Majority of the datasets contain images distorted by various types of distortions not necessarily with controlled color impairments. Dedicated database of images with common color distortions and associated subjective scores is introduced. Performance evaluation and comparison of objective image color difference measures is presented using conventional correlation performance measures and robust receiver operating characteristic (ROC) based analyses.
An efficient algorithm of 3D total variation regularization
Author(s):
Artyom Makovetskii;
Sergei Voronin;
Vitaly Kober
Show Abstract
One of the most known techniques for signal and image denoising is based on total variation regularization (TV regularization). There are two known types of the discrete TV norms: isotropic and anisotropic. One of the key difficulties in the TV-based image denoising problem is the nonsmoothness of the TV norms. Many properties of the TV regularization for 1D and 2D cases are well known. On the contrary, the multidimensional TV regularization, basically, an open problem. In this work, we deal with TV regularization in the 3D case for the anisotropic norm. The key feature of the proposed method is to decompose the large problem into a set of smaller and independent problems, which can be solved efficiently and exactly. These small problems are can be solved in parallel. Computer simulation results are provided to illustrate the performance of the proposed algorithm for restoration of degraded data.
Complex moments for the analysis of metal-mechanical parts
Author(s):
H. Vargas-Vargas;
C. J. Camacho-Bello
Show Abstract
Complex moments are usually used to characterize, analyze and extract information from an image. The extracted characteristics can be used in pattern recognition. In this work, the classiÖcation of metal-mechanical pieces based on the invariant complex moments in an array of polar pixels is presented. The conventional calculation of complex moments is through the zero-order approximation; the integrals are replaced by summations. In this work we propose the calculation of complex moments using an array of polar pixels, which has the purpose of integrating the kernel of complex moments in an analytical manner and eliminates the geometry error inherent in the computation of complex moments. The proposed method is used to classify metal-mechanical parts that have intrinsically small di§erences between them, such as millimeter screws. Finally, the experimental results of four families of standard screws with a polar pixel scheme and the zero order approximation are presented.
Breast thermography: a non-invasive technique for the detection of lesions
Author(s):
L. B. Alvarado-Cruz;
C. Toxqui-Quitl;
J. A. Hernández-Tapia;
A. Padilla-Vivanco
Show Abstract
Breast thermography uses ultrasensitive infrared cameras to produce high resolution images of temperatures and vascular changes. In the present work we propose the development of computational methods that allow the analysis and digital processing of breast thermographic images for the detection of regions related to a probable lesion. A semi-automated segmentation algorithm is presented by means of polynomial curve Ötting for the detection a region of breast (ROI). A public database which contains information on volunteers with diagnose made by mammography and biopsy is used.
Smoke detection in compressed video
Author(s):
Behçet Uğur Töreyin
Show Abstract
Early detection of fires is an important aspect of public safety. In the past decades, devices and systems have been developed for volumetric sensing of fires using non-conventional techniques, such as, computer vision based methods and pyro-electric infrared sensors. These systems pose an alternative for more commonly used point detectors, which suffer from transport delay in large and open areas. The ubiquity of computing and recent developments on novel hardware alternatives, like memristor crossbar arrays, promise an increase in the number of deployments of such systems. Existing video-based methods have been developed for the analysis of uncompressed spatio-temporal sequences. In order to respond the growing demand of such systems, techniques specifically aimed at analyzing compressed domain video streams should be developed for early fire detection purposes. In this paper, a Markov model and wavelet transform based technique is proposed to further improve the current state-of-the-art methods for video smoke detection by detecting signs of smoke existence in the MJPEG2000 compressed video.
A sub-picture-based omnidirectional video live streaming platform
Author(s):
Srinivas Gudumasu;
Ahmed Hamza;
Yong He;
Yan Ye
Show Abstract
MPEG Omnidirectional Media Format (OMAF) specifies both a viewport-dependent video profile and a viewportdependent presentation profile to enable immersive media applications. A sub-picture-based approach for viewportdependent streaming is one of the main approaches being explored in MPEG OMAF standardization. This paper presents a sub-picture-based omnidirectional video live streaming platform with state of the art technologies integrated on both server and client sides to illustrate the benefits of such viewport-dependent omnidirectional video streaming approach. The technologies include omnidirectional video acquisition, sub-picture partitioning, real-time GPUaccelerated HEVC encoding and DASH-based live streaming. The presented platform supports virtual reality (VR) clients including both VR head-mounted displays (HMDs) and conventional 2D displays. Viewing orientation tracking and realtime viewport extraction are also supported in our platform. As with all live streaming platforms, one of the main goals of our platform is to minimize end-to-end system latency. A new metric called Comparable-Quality Viewport Switching (CQVS) latency is proposed to evaluate the performance of viewport dependent video streaming and presentation. The CQVS latency is defined as the amount of time it takes for the viewport video quality to improve to a level comparable to that prior to viewport switching. The platform was demonstrated in the Joint 3GPP and VRIF workshop on VR and the 2018 Mobile World Congress as one of the first OMAF-compliant viewport-dependent live streaming solutions.
Capabilities and limitations of visual search in volumetric images: the effect of target discriminability
Author(s):
Tatjana Pladere;
Kristaps Klava;
Vita Konosonoka;
Karola Panke;
Marina Seleznova;
Gunta Krumina
Show Abstract
In diagnostics, radiologists search for anatomical abnormalities through generated three-dimensional data on flat displays. Professionals are required to scroll repeatedly through image stacks forth and back, as well as remember a broad amount of visual information. This process leads to working memory overload and decreased search outcome. In contrast, a volumetric multi-planar display includes many planes, which makes possible to visualize data in a true physical depth. Thus, theoretically, it can facilitate the visual search performance in diagnostics and lessen the necessity for repeated scrolling. Therefore, our work aims to explore practically the extent to which the visual search is effective, as well as deliver evidence on the scrolling strategy through image stacks when data are shown on many display planes. Visual search set consisted of constant angular size stimuli presented on ten out of twenty display planes in two depth segments. Participants searched for a target with varying target-distractor similarity within trials. All ten images were presented simultaneously in the beginning of each trial and participants scrolled freely through them. In the result, target discriminability affected significantly the correct response rate and time, as well as search behavior was consistent with the physical design of stimuli set. In more detail, the number of moves through image stack almost doubled when the target-distractor similarity increased and, overall, participants skipped searching repeatedly already seen images. The developed visual search task is suggested for implementation in studies of visual perception and search behavior in threedimensional displays.
An adaptive soft threshold image denoising method based on quantum bit gate theory
Author(s):
Lu Han;
Kun Gao;
Yingjie Zhou
Show Abstract
Because the images are always contaminated by different kinds of noise in the courses of image acquisition, transmission and storage process, the image denoising is a very important step of image restoration. The key of denoising algorithm is making recovery image reserve as much as possible edge details when eliminating noise. Because noise and image details both are part of the high frequency components of image, to some extent, these two sides are contradictory. If the selection of the criterion and treatment for noise and marginal are inappropriate , denoising will make image details ( especially the marginal) become more vague, which must reduce the quality of the image and increase greatly the complexity of subsequent image processing. Since the quantum process and imaging process have the similar characteristics in the probability and statistics fields, a kind of soft threshold denoising algorithm is proposed based on the concept of quantum computation such as the quantum bit, superposition and collapse, etc. This filter algorithm can generate an adaptive template according to the characteristic of the edge of local image. Due to the algorithm is sensitive to the shape of edge, the balance is obtained between the noise suppression and the edge preserving.