Proceedings Volume 11510

Applications of Digital Image Processing XLIII

cover
Proceedings Volume 11510

Applications of Digital Image Processing XLIII

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 29 September 2020
Contents: 18 Sessions, 95 Papers, 65 Presentations
Conference: SPIE Optical Engineering + Applications 2020
Volume Number: 11510

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 11510
  • Opening Remarks
  • Immersive Imaging
  • Energy Efficient Video Compression and Quality Measurement I
  • Energy Efficient Video Compression and Quality Measurement II
  • Artificial Intelligence in Imaging I
  • Artificial Intelligence in Imaging II
  • Compression I
  • Human Visual System and Perceptual Imaging I
  • Human Visual System and Perceptual Imaging II
  • New Standards in Image and Video Applications
  • Image and Video Processing and Analysis I
  • Medical Imaging I
  • Medical Imaging II
  • Image and Video Processing and Analysis II
  • Compression II
  • Image and Video Processing and Analysis III
  • Poster Session
Front Matter: Volume 11510
icon_mobile_dropdown
Front Matter: Volume 11510
This PDF file contains the front matter associated with SPIE Proceedings Volume 11510, including the Title Page, Copyright information, and Table of Contents.
Opening Remarks
icon_mobile_dropdown
Opening Remarks by Conference Chairs
Welcome to the Applications of Digital Image Processing XLIII conference
Immersive Imaging
icon_mobile_dropdown
Compression and reconstruction of extremely-high resolution holograms based on hologram-lightfield transforms
Holography is often considered as the most promising 3D visualization technique, creating virtual images indistinguishable from the real ones. However, one the main barrier to the adoption of holographic displays in wide 3D viewing systems is the very large amount of information contained in a hologram. Indeed, a hologram with a large size and wide viewing angle contains terabytes of data, urging the need for holographic data coding algorithms. In this paper, we propose a data coding algorithm suitable to the compression of holograms containing several billions of pixels. In our proposed approach, each holographic frame is subdivided into pixel blocks which are 2D Fourier transformed. The pixels thus obtained are rearranged to form new complex-valued segments whose amplitudes have characteristics close to orthographic projection images. These segments are ordered in sequence and their real and imaginary parts are encoded using the High-Efficiency Video Coding (HEVC) Main 4:4:4 coding profile with 4:0:0 chroma sampling.
Study of 2D foveated video quality in virtual reality
Yize Jin, Meixu Chen, Todd Goodall Bell, et al.
In Virtual Reality (VR), the necessity of immersive videos leads to greater challenges in compression and communication owing to the much higher spatial resolution, rapid, and the often real-time changes in viewing direction. Foveation in displays exploits the space-variant density of the retinal photoreceptors, which decreases exponentially with increasing eccentricity, to reduce the amount of data from the visual periphery. Foveated compression is gaining relevance and popularity for Virtual Reality. Likewise, being able to predict the quality of displayed foveated and compressed content has become more important. Towards advancing the development of objective quality assessment algorithms for foveated and compressed measurements of VR video contents, we built a new VR database of foveated/compressed videos, and conducted a human study of perceptual quality on it. A foveated video player having low motion-to-photon latency (~50ms) was designed to meet the requirements of smooth playback, while an eye tracker was deployed to provide gaze direction in real time. We generated 180 distorted videos from 10 pristine 8K videos (30fps) having varying levels and combinations of foveation and compression distortions. These contents were viewed and quality-rated by 36 subjects in a controlled VR setting. Both the subject ratings and the eye tracking data are being made available along with the rest of the database.
Towards neural network approaches for point cloud compression
Point cloud imaging has emerged as an efficient and popular solution to represent immersive visual information. However, the large volume of data generated in the acquisition process reveals the need of efficient compression solutions in order to store and transmit such contents. Several standardization committees are in the process of finalizing efficient compression schemes to cope with the large volume of information that point clouds require. At the same time, recent efforts on learning-based compression approaches have been shown to exhibit good performance in the coding of conventional image and video contents. It is currently an open question how learning-based coding performs when applied to point cloud data. In this study, we extend recent efforts on the matter by exploring neural network implementations for separate, or joint compression of geometric and textural information from point cloud contents. Two alternative architectures are presented and compared; that is, a unified model that learns to encode point clouds in a holistic way, allowing fine-tuning for quality preservation per attribute, and a second paradigm consisting of two cascading networks that are trained separately to encode geometry and color, individually. A baseline configuration from the best-performing option is compared to the MPEG anchor, showing better performance for geometry and competitive performance for color encoding at low bit-rates. Moreover, the impact of a series of parameters is examined on the network performance, such as the selection of input block resolution for training and testing, the color space, and the loss functions. Results provide guidelines for future efforts in learning-based point cloud compression.
Towards real-time augmented reality with edge servers and 5G communications
P. Topiwala, W. Dai
Augmented and virtual reality are among the hottest new applications in multimedia services, with projected growth for the next few years resembling exponential levels. Indeed, the vision of a world where the virtual and real are nearly indistinguishable, and fully available on demand, to take you anywhere, let you do anything, is tantalizing. But that exponential growth depends critically to achieving technical goals not met by any existing technologies: ultra-low latency, ultra-high resolution and full coverage, and rock-solid reliability. In this paper, we consider the first and perhaps most critical aspect of this challenge: ultra-low latency. The real world has zero latency; for a virtual world to seem real, it must have imperceptible latency. Commercial systems are currently aiming for latencies on the order of 20-30ms. But some planned enterprise deployment (including one defense application: realistic battle simulation) would like latencies below 5ms. We point to a possible approach to this challenge using edge servers and 5G communications and make some observations of what may be achievable with methods outlined herein in the near future.
Energy Efficient Video Compression and Quality Measurement I
icon_mobile_dropdown
Escaping the complexity-bitrate-quality barriers of video encoders via deep perceptual optimization
A. Chadha, R. Anam, I. Fadeev, et al.
We extend the concept of learnable video precoding (rate-aware neural-network processing prior to encoding) to deep perceptual optimization (DPO). Our framework comprises a pixel-to-pixel convolutional neural network that is trained based on the virtualization of core encoding blocks (block transform, quantization, block-based prediction) and multiple loss functions representing rate, distortion and visual quality of the virtual encoder. We evaluate our proposal with AVC/H.264 and AV1 under per-clip rate-quality optimization. The results show that DPO offers, on average, 14.2% bitrate reduction over AVC/H.264 and 12.5% bitrate reduction over AV1. Our framework is shown to improve both distortion- and perception-oriented metrics in a consistent manner, exhibiting only 3% outliers, which correspond to content with peculiar characteristics. Thus, DPO is shown to offer complexity-bitrate-quality tradeoffs that go beyond what conventional video encoders can offer.
Video compression with low complexity CNN-based spatial resolution adaptation
Di Ma, Fan Zhang, David R. Bull
It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from high complexity at the decoder due to the employment of CNN-based super-resolution. In this paper, a novel framework is proposed which supports the flexible allocation of complexity between the encoder and decoder. This approach employs a CNN model for video down-sampling at the encoder and uses a Lanczos3 filter to reconstruct full resolution at the decoder. The proposed method was integrated into the HEVC HM 16.20 software and evaluated on JVET UHD test sequences using the All Intra configuration. The experimental results demonstrate the potential of the proposed approach, with significant bitrate savings (more than 10%) over the original HEVC HM, coupled with reduced computational complexity at both encoder (29%) and decoder (10%).
Per-clip adaptive Lagrangian multiplier optimisation with low-resolution proxies
Optimising the parameters of a video codec for a specific video clip has been shown to yield significant bitrate savings. In particular, per-clip optimisation of the Lagrangian multiplier in Rate controlled compression, has led to BD-Rate improvements of up to 20% using HEVC. Unfortunately, this was computationally expensive as it required multiple measurement of rate distortion curves which meant in excess of fifty video encodes were used to generate that level of savings. This work focuses on reducing the computational cost of repeated video encodes by using a lower resolution clip as a proxy. Features extracted from the low resolution clip are then used to learn the mapping to an optimal Lagrange Multiplier for the original resolution clip. In addition to reducing the computational cost and encode time by using lower resolution clips, we also investigate the use of older, but faster codecs such as H.264 to create proxies. This work shows the computational load is reduced by up to 22 times using 144p proxies, and more than 60% of the possible gain at the original resolution is achieved. Our tests are based on the YouTube UGC dataset, using the same computational platform; hence our results are based on a practical instance of the adaptive bitrate encoding problem. Further improvements are possible, by optimising the placement and sparsity of operating points required for the rate distortion curves. Our contribution is to improve the computational cost of per clip optimisation with the Lagrangian multiplier, while maintaining BD-Rate improvement.
Energy Efficient Video Compression and Quality Measurement II
icon_mobile_dropdown
Rate-distortion video coding and uncertainties: to be blindly chasing marginal improvement or to be greener
The last decade has witnessed the rapid development of video encoding and video quality assessment. However, each new generation of codecs has come with a significant increase in computational complexity, especially to exploit their full potential. To ease the exponential growth in computational load, a greener video encoding scheme that consumes less power should be considered. The improvement of encoding efficiency is driven by Rate-Distortion Optimization (RDO), where the goal is to minimize the coding distortion under a target coding rate. As distortions are quantified by quality metrics, whether the applied quality metric is capable of judging accurately the perceived quality is vital for selecting encoding recipes, etc. In most cases, more complex codecs are developed seeking for any enhancement of quality scores predicted by an ad hoc quality metric, e.g., 1 dB by PSNR or 1/100 of the SSIM scale. Despite some rules of thumb, whether such improvement is worth a significant increase in power consumption is questionable, as the resolution of most quality metrics with respect to human judgement accuracy is often above the measured improvement. In this work, we propose a simple model to quantify the uncertainty/resolution of a quality metric, where confidence intervals (CI) of the quality scores predicted by the metric are computed with respect to existing observed ground truth (e.g. human observer opinion). As a possible use case, if the CI of two encoding recipes overlap, the greener one could be selected. Extensive experiments have been conducted on several datasets tailored to different purposes, and the uncertainties of mainstream quality metrics are reported. Perspectives of the trade-off between complexity and efficiency of encoding techniques are provided.
Energy efficient perceptual video quality measurement (VMAF) at scale
Ajayshyam ., Jay N. Shingala, Naveen Kumar Thangudu, et al.
Over The Top (OTT) video industry has witnessed exponential growth in recent years with introduction of number of streaming services. Better video quality, at optimum encoding parameters like bitrates, resolutions is must to meet the demand of cost effective solution. Hence an objective and efficient method to measure the perceptual video quality is extremely vital for customer satisfaction and retention. VMAF is one such full reference quality metric which provides consistent, accurate assessment of video quality. It uses an internally trained ML based SVR model to fuse spatial and temporal quality features such as VIF (Visual Information Fidelity), ADM (or Detail Loss Metric DLM) and motion scores. We analyzed the complexity of such metrics in terms of core mathematical floating operations and number of loads and stores. In this paper, we propose replacing the existing floating point operations by optimal fixed point (integer) implementation without drop in accuracy. This is achieved by optimally selecting precision requirements at every stage of processing, which helps in efficient memory usage and improved performance. The proposed change will significantly help in VMAF’s usability for deployment at scale, application in low power devices and live streaming without compromising accuracy.
Cross-codec encoding optimizations for video transcoding
Gaurang Chaudhari, Hariharan Lalgudi, Harikrishna Reddy
A video on demand (VOD) server generates multiple video output qualities (bit rates), resolutions and codecs to best video quality for all viewers’ internet connection. While each codec optimizes tools and does computation-quality tradeoff there isn't much work to exploit computation reduction across codecs. In this work, we propose some methods to achieve this. Specifically, we use VP9 mode decision to reduce the computational requirements of AV1 encoding.
Hardware acceleration of video quality metrics
Deepa Palamadai Sundar, Visala Vaduganathan, Xing C. Chen
Quality Metrics (QM) provide an objective way to measure perceived video quality. These metrics are very compute intensive and are currently done in software. In this paper, we propose an accelerator that can compute metrics like single scale and multi-scale Structural Similarity Index (SSIM, MS_SSIM) and Visual Information Fidelity (VIF). The proposed accelerator offers an energy efficient solution compared to traditional CPUs. It improves memory bandwidth utilization by computing multiple Quality metrics simultaneously.
Efficient measurement of quality at scale in Facebook video ecosystem
This paper describes FB-MOS metric that measures video quality at scale in Facebook ecosystem. As the quality of uploaded UGC source itself varies widely, FB-MOS consists of both a no-reference component to assess input (upload) quality and a full-reference component, based on SSIM, to assess quality preserved in the transcoding and delivery pipeline. Note that the same video may be watched on a variety of devices (Mobile/laptop/TV) in varying network conditions that cause quality fluctuations; moreover, the viewer can switch between in-line view and full-screen view during the same viewing session. We show how FB-MOS metric accounts for all this variation in viewing condition while minimizing the computation overhead. Validation of this metric on FB-content has shown that SROCC is 0.9147 using internally selected videos. The paper also discusses some of the optimizations to reduce metric computation complexity and scale the complexity in proportion to video popularity.
Artificial Intelligence in Imaging I
icon_mobile_dropdown
Automatic classification of citrus aurantifolia based on digital image processing and pattern recognition
Victor Tuesta-Monteza, Freddy Alcarazo, Heber I. Mejiá-Cabrera, et al.
Citrus Aurantifolia swingle is grown on the northern coast of Peru for domestic consumption and export. This is an indispensable ingredient due to its high level of acidity for the preparation of fish ceviche, the traditional dish of Peruvian gastronomy. Lemons are classified according to their color in yellow, green and pinton (green lemons already showing a hint of yellow), since the yellow ones are for national consumption, while the other two types are for export. This selection is done manually. This process is time consuming and additionally lemons are frequently misclassified due to lack of concentration, exhaustion and experience of the worker, affecting the quality of the product sold in domestic and foreign markets. Therefore, this paper introduces a new method for the automatic classification of Citrus Aurantifolia, which comprises three stages: acquisition, image processing, feature extraction, and classification. A mechanical prototype for image acquisition in a controlled environment and a software for the classification of lemons were developed. A new segmentation method was implemented, which makes use only of the information obtained from the blue channel. From the segmented images we obtained the color characteristics, selecting the best descriptors in the RGB and CIELAB spaces, finding that the red channel allows the best accuracy. Two classification models were used, SVM and KNN, obtaining an accuracy of 99.04% with the K-NN.
New method for subject identification based on palm print
Nowadays, new security and protection systems for citizens are being developed, since criminals have found techniques to violate those already known, such as those based on fingerprints, facial recognition, iris and voice. Thus, using biometric data, new systems are being developed that are more secure, infallible and fast to identify each person, making it impossible to impersonate them, as has happened with other methods. Recently new identification methods have been proposed based on hand geometry and palmprint based on texture techniques for the identification of hand characteristics such as ridges, edges, points, and textures. Following this trend, this paper presents a method based on the detection of the palm print, acquired by contact, through the use of a scanner. For this purpose, the image is segmented to detect the silhouette of the hand and delimit the working area, achieving greater speed in identification. The images are then used as input to a convolutional neural network VGG 16 for learning and identification of subjects, achieving 100% accuracy.
Soil salinity estimation of sparse vegetation based on multispectral image processing and machine learning
Heber I. Mejia-Cabrera, Daniel Vilchez, Victor Tuesta-Monteza, et al.
The main effects of soil degradation include loss of nutrients, desertification, salinization, deterioration of soil structure, wind and water erosion, and pollution. Soil salinity is an environmental hazard present worldwide, especially in arid and semi-arid areas, which occurs mainly due to irrigation and other intensified agricultural activities. Therefore, the measurement of soil degradation in areas of low vegetation is of great importance in Peru. Two commonly used methods for estimating soil salinity are based on a measurement of electrical conductivity. Although on one hand, one of these methods is quite accurate, it requires many field samples and laboratory tests, which makes it quite expensive and impractical to measure large areas of the Peruvian coast. On the other hand, the second method is based on relative conductivity measurements in situ, being less accurate, but equally very expensive when measuring very large areas. For this reason, the use of multispectral imaging has been proposed for this purpose, using linear regression techniques. Following this trend in this work, the different descriptors used for the estimation were studied, comparing the correlations between the salinity indices and the soil samples, and two estimators based on SVM and PLSR were used to verify if the estimation improved. The PSWIR band, followed by the red one, was found to have the highest correlation and the indices based on the combination of these bands provide the best estimate with the classifiers evaluated.
Video analysis methods to track motion across frames
Traditional methods to analyze video comprising human action are based on the extraction of the spatial content from image frames and the temporal variation across image frames of the video. Typically, images contain redundant spatial content and temporal invariance. For instance, background data and stationary objects represent redundant spatial content as well as temporal invariance. The redundancy leads to increase in storage requirements and the computation time. This paper focuses on the analysis on the key point data obtained from the capture of body movement, hand gestures, and facial expression in video-based sign language recognition. The key point data is obtained from OpenPose. OpenPose provides two-dimensional estimates of the human pose for multiple persons in real-time. In this paper, the K-means cluster method is applied to the key point data. The K-means cluster method selects the key frames based on the number of centroids formed from the key point data. The method described in this paper generates the data required for deep learning applications.
Demystify squeeze networks and go beyond
Ruiyuan Lin, Yuhang Xu, Hamza Ghani, et al.
Small neural networks (NNs) that have a small model size find applications in mobile and wearable computing. One famous example is the SqueezeNet that achieves the same accuracy as the AlexNet yet has 50x fewer parameters than AlexNet. A few follow-ups and architectural variants have been inspired. They were built upon ad hoc arguments and experimentally justified. It remains a mystery why the SqueezeNet works efficiently. In this work, we attempt to provide a scientific explanation to the superior performance of the SqueezeNet. The function of the fire module, which is a key component of the SqueezeNet, is analyzed in detail. We study the evolution of cross-entropy values across layers and use visualization tools to shed light on its behavior with several illustrative examples.
Adaptive image denoising using a deep neural network with a noise correction map
Smartphone camera is becoming the primary choice for photography among general users due to its convenience and rapidly improving image quality. However, it is more prone to noise compared to a professional DSLR camera due to a smaller sensor. Image noise, especially in low-light situations, is a critical problem that must be addressed to obtain high quality photos. Image denoising has thus remained an important low level vision topic over years with both traditional and learning based techniques used for mitigating this problem. We propose an adaptive Deep Neural Network based Noise Reduction (DNN-NR) algorithm to address the denoising problem in smartphone images. Image noise was modeled from photos captured under different light settings using a Poisson-Gaussian noise model which better approximates the signaldependence (photon sensing) and stationary disturbances in the sensor data. Using this noise model, synthetic noisy datasets were prepared to mimic photos captured under varying light conditions and train the network. A noise correction map based on camera and image information like ISO, vignetting map and image gray level was provided as an input to the network. This correction map provides an indication of the local noise level to help the network adaptively denoise photos. Experimental results show that our adaptive neural network based denoising approach produced a significantly better denoised image with higher PSNR and MOS quality scores in comparison to a standard denoising method like CBM3D across varying light conditions. In addition, using a locally varying noise map helped in preserving more detail in denoised images.
Artificial Intelligence in Imaging II
icon_mobile_dropdown
Long-distance spatial position measurement based on multi-camera system
Camera calibration is the first and the most important step of spatial position measurement. Aiming at improving the measurement precision over a large field of view and long distance, the internal and external parameters of the camera need to be calibrated accurately. Therefore, a simple and accurate calibration method is proposed to solve the problem that a small calibration board cannot appear in the field of view of multiple cameras at the same time. We calibrate the internal and external parameters of camera separately. For the internal parameters, we calculate it using analytical solution estimation method after collecting calibration plate images for feature point detection. For the external parameters, we accurately obtain the relative positions of the cameras with the help of the total station, and then unify the pose relationship of each camera to the same coordinate system through rigid body transformation to obtain the parameters. Then, the maximum likelihood method is used to optimize the estimation of both internal and external parameters to improve the calibration accuracy. Finally, the three-dimensional coordinates of target can be obtained by triangulation. The experimental results show that this method meets requirement of calibration accuracy, and the error of the 3D spatial position of the target is in the centimeter level.
StressNet: A deep convolutional neural network for recovering the stress field from isochromatic images
Juan C. Briñez de León, Mateo Rico-Garcia, John W. Branch, et al.
Extending photoelasticity studies to industrial applications is a complex process generally limited by the image acquisition assembly and the computational methods for demodulating the stress field wrapped into the color fringe patterns. In response to such drawbacks, this paper proposes an auto-encoder based on deep convolutional neural networks, called StressNet, to recover the stress map from one single isochromatic image. In this case, the public dataset of synthetic photoelasticity images `Isochromatic-art' was used for training and testing achieving an averaged performance of 0.95 +/- 0.04 according to the structural similarity index. With these results, the proposed network is capable of obtaining a continuous stress surface which represents a great opportunity toward developing real time stress evaluations.
Generalized adversarial networks for stress field recovering processes from photoelasticity images
For overcoming conventional photoelasticity limitations when evaluating the stress field in loaded bodies, this paper proposes a Generative Adversarial Network (GAN) while maintaining performance, gaining experimental stability, and shorting time response. Due to the absence of public photoelasticity data, a synthetic dataset was generated by using analytic stress maps and crops from them. In this case, more than 100000 pair of images relating fringe colors to their respective stress surfaces were used for learning to unwrap the stress information contained into the fringes. Main results of the model indicate its capability of recovering the stress field achieving an averaged performance of 0.93±0.18 according to the structural similarity index (SSIM). These results represent a great opportunity for exploring GAN models in real time stress evaluations.
Recognition of objects based on deep learning in an RPAS
A combination between research lines of robotics and artificial intelligence and using computer vision, this consists of using robotic systems that can recognize and understand images and scenes, generally integrate the Areas: detection of objects image recognition and Image Generation Object detection is sophisticated and more in robotics due to countless applications that can develop through image processing. This article shows the implementation of the NVIDIA® Jetson development card in a remote control unmanned aerial system (RPAS) for object recognition, based on focal loss. Hence, it is a challenge to obtain results, which it faces to develop it will show in the expected final solution.
Machine learning based detection of digital documents maliciously recaptured from displays
Saleh Gholam-Zadeh, Evgeniy Upenik, Guy Hatarsi, et al.
We used to say “seeing is believing": this is no longer true. The digitization is changing all aspects of life and business. One of the more noticeable impacts is in how business documents are being authored, exchanged and processed. Many documents such as passports and IDs are being at first created in paper form but are immediately scanned, digitized, and further processed in electronic form. Widely available photo editing software makes image manipulation quite literally a child's play increasing the number of forged contents tremendously. With the growing concerns over authenticity and integrity of scanned and image-based documents such as passports and IDs, it is more than urgent to be able to quickly validate scanned and photographic documents. The same machine learning that is behind some of the most successful content manipulation solutions can also be used as a counter measure to detect them. In this paper, we describe an efficient recaptured digital document detection based on machine learning. The core of the system is composed of a binary classification approach based on support vector machine (SVM), properly trained with authentic and recaptured digital passports. The detector informs when it encounters a digital document that is the result of photographic capture of another digital document displayed on an LCD monitor. To assess the proposed detector, a specific dataset of authentic and recaptured passports with a number of different cameras was created. Several experiments were set up to assess the overall performance of the detector as well as its efficacy for special situations, such as when the machine learning engine is trained on a specific type of camera or when it encounters a new type of camera for which it was not trained. Results show that the performance of the detector remains above 90 percent accuracy for the large majority of cases.
Deep learning and video quality analysis: towards a unified VQA
P. Topiwala, W. Dai, J. Pian
Video makes up 80% of internet traffic today and is still rising. Most of it is meant for human consumption. But for 40 years, the video coding industry has been using mean-squared error-based PSNR, effectively the most basic full reference (FR) video quality measure, as the main tool for assessing video quality despite long known poor correlation to subjective video ratings. Moreover, in the encoder, the sum of absolute differences (SAD) is used instead of MSE to save multiplications. Meanwhile, many current video applications such as YouTube do not have access to a pristine reference and have had to develop ad hoc methods to attempt to monitor the volumes of video in their servers in a challenging no reference (NR) setting. For this, they have in part leaned on the Gaussianity of natural scene statistics (NSS), and evaluating how video distortions affect or alter those statistics to create a measure of quality. An entire cottage industry has sprung up to create both full-reference and no-reference video quality assessment (FR-, NR-VQA) measures, that can adequately meet the needs for monitoring and stream selection, in the massive worldwide video services industry. These two fields have so far gone their separate ways, as there seemed no sensible way to bring them under one roof. In this paper, we attempt a first synthesis of FR and NR VQA, which we simply call FastVDO Quality (FVQ). It incorporates all the lessons learned from the Video Multi-Assessment Fusion (VMAF) algorithm introduced by Netflix in 2016, the NSS-based assessment concepts developed by Univ. of Texas and Google to treat the NR case, culminating in the algorithms VIIDEO and SLEEQ, as well as our own research over the past several years in using learning-based methods in VQA. We provide some early indications that this approach can bear fruit for both NR and FR-VQA and may even offer state-of-the-art results in each field.
Compression I
icon_mobile_dropdown
Scalable trellis quantization for JPEG XS
Thomas Richter
Trellis quantization as structured vector quantizer is able to improve the rate-distortion performance of traditional scalar quantizers. As such, it has found its way into the JPEG 2000 standard, and also recently as an option in HEVC. In this paper, a trellis quantization option for JPEG XS is considered and analyzed; JPEG XS is a low-complexity, low-latency high-speed “mezzanine" codec for Video over IP transmission in professional production environments and industrial applications where high compression rates are of lesser importance than visual lossless compression at high speed. A particular challenge of trellis quantization is to make it compatible with applications where sharp rate thresholds have to be satisfied, such as in JPEG 2000 and JPEG XS. While the JPEG 2000 standard originally proposes a Lagrangian “a priori" rate allocation that is based on statistical models, such methods are less suitable for JPEG XS which only has a very small prefetch window of less than 30 lines available to drive its rate allocation. In this paper, a simple trellis quantization option for JPEG XS is proposed that is compatible with the hard-bounds rate-allocation requirements of this coding standard.
Design of the intra subpartition mode in VVC and its optimized encoder search in VTM
Santiago De-Luxán-Hernández, Valeri George, Gayathri Venugopal, et al.
The Intra Subpartition (ISP) mode is one of the intra prediction tools incorporated to the new Versatile Video Coding (VVC) standard. ISP divides a luma intra-predicted block along one dimension into 2 or 4 smaller blocks, called subpartitions, that are predicted using the same intra mode. This paper describes the design of this tool and its encoder search implementation in the VVC Test Model 7.3 (VTM-7.3) software. The main challenge of the ISP encoder search is the fact that the mode pre-selection based on the sum of absolute transformed differences typically utilized for intra prediction tools is not feasible in the ISP case, given that it would require knowing beforehand the values of the reconstructed samples of the subpartitions. For this reason, VTM employs a different strategy aimed to overcome this issue. The experimental tool-off tests carried out for the All Intra configuration show a gain of 0.52% for the 22-37 Quantization Parameter (QP) range with an associated encoder runtime of 85%. The results are improved to a 1.06% gain and an 87% encoder runtime in the case of the 32-47 QP range. Analogously, for the tool-on case the results for the 22-37 QP range are a 1.17% gain and a 134% encoder runtime and this improves in the 32-47 QP range to a 1.56% gain and a 126% encoder runtime.
Fast encoding parameter selection for convex hull video encoding
Ping-Hao Wu, Volodymyr Kondratenko, Ioannis Katsavounidis
With the recent development of video codecs, compression efficiency is expected to improve by at least 30% over their predecessors. Such impressive improvements come with a significant, typically orders of magnitude, increase in computational complexity. At the same time, objective video quality metrics that correlate increasingly better with human perception, such as SSIM and VMAF, have been gaining popularity. In this work, we build on the per-shot encoding optimization framework that can produce the optimal encoding parameters for each shot in a video, albeit carrying itself another significant computational overhead. We demonstrate that, with this framework, a faster encoder can be used to predict encoding parameters that can be directly applied to a slower encoder. Experimental results show that we can approximate within 1% the optimal convex hull, with significant reduction in complexity. This can significantly reduce the energy spent on optimal video transcoding.
Human Visual System and Perceptual Imaging I
icon_mobile_dropdown
A comparative performance evaluation of VP9, x265, SVT-AV1, VVC codecs leveraging the VMAF perceptual quality metric
The prevalence of video driven applications, leveraging over the top video on demand services as well as live video streaming applications, dominate network traffic over today’s internet landscape. As such, they necessitate efficient video compression methods to accommodate the desired quality of service and hence user experience. In this study, we compare the performances of the emerging versatile video coding (VVC) standard, the recently released AV1 encoder (using SVTAV1 instance), the established high efficiency video coding (HEVC) standard via its x265 implementation, and the earlier VP9 codec. We used selected videos coming from three different datasets, namely UT LIVE (432p), and HEVC test sequences (480p, 720p, 1080p) that provide diversity in video content, video resolutions, and frame rates. The experimental setup involved fixed quality encoding using four different rate points, and more specifically, QP values of 27, 35, 46, 55 for AV1 and VP9 and QP values of 22, 27, 32, 37 for VVC and x265 codecs. For estimating bitrate gains, we used the BD-RATE algorithm using both PSNR and VMAF for objective video quality assessment (VQA). We found that VVC achieved the best video coding performance, significantly outperforming all other codecs. AV1 consistently outperformed x265, but with narrow margins in some video sequences, suggesting that a cautious selection between the two codecs needs to based on application-specific criteria. Within the group of considered codecs, VP9 required the highest bitrates. Ongoing work involves extending the examined video datasets pool to different resolutions (e.g., 240p, 1600p) while investigating the correlation between subjective and objective VQA scores.
A comparative study of HEVC, VVC, VP9, AV1 and AVS3 video codecs
Xin Zhao, Shan Liu, Liang Zhao, et al.
Video codec is the core engine which has been driving various video related applications for the past several decades, such as online video streaming and video conferencing. Driven by the drastic growth of global video traffic and improved computation power of various devices, the evolution of video compression technology has delivered several generations of video coding standards. In this paper, the technologies used in recent video codecs are described and their coding performance are compared, including HEVC and VVC developed by the ISO/IEC MPEG and ITU-T VCEG, VP9 developed by Google, AV1 developed by AOM and AVS3 (IEEE 1857) developed by Audio Video coding Standard workgroup of China. The datasets cover a wide range of typically used video resolutions up to 4K. The experiments are performed by collecting four data points for each test sequence, and the coding efficiency is measured by the popular BD-Rate metrics.
Human Visual System and Perceptual Imaging II
icon_mobile_dropdown
Assessing objective video quality in systems with multi-generation transcoding
In a multi-generation transcoding system, the source may be an encoded mezzanine video whose objective video quality metrics (e.g., PSNR, SSIM) are unknown. Transcoding process yields objective quality metrics that are relative to the encoded source video, which does not indicate the actual quality of the transcoded video relative to the original uncompressed reference video. In this paper, we present an approach for estimating the objective quality metrics of the encoded mezzanine and demonstrate that it has higher accuracy compared to a well-known scheme. Finally, we derive bounds for the end-to-end objective quality metrics of the transcoded video, and use it for controlling the transcoding process to ensure that the final transcoded video satisfies a quality criterion.
Video transcoding optimization based on input perceptual quality
Yilin Wang, Hossein Talebi, Feng Yang, et al.
Today's video transcoding pipelines choose transcoding parameters based on rate-distortion curves, which mainly focus on the relative quality difference between original and transcoded videos. By investigating the recently released YouTube UGC dataset, we found that human subjects were more tolerant to changes in low quality videos than in high quality ones, which suggests that current transcoding frameworks can be further optimized by considering perceptual quality of the input. In this paper, an efficient machine learning metric is proposed to detect low quality inputs, whose bitrate can be further reduced without sacrificing perceptual quality. To evaluate the impact of our method on perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology to evaluate statistical significance among different treatments. The results show that the proposed quality guided transcoding framework is able to reduce the average bitrate up to 5% with insignificant perceptual quality degradation.
Video quality vs. video usability in the era of surveillance
P. Topiwala, W. Dai
More than 100 years since the development of movies, film, and television, we have entered a new age of media, the surveillance era. Previously, media was used to entertain, or inform (news). Today, we are also increasingly collecting media to inform ourselves about who, what, and where. Public places from malls to airports to city streets are laced with cameras that watch us with hidden eyes. Home security systems now routinely embed cameras both in and around residences. Autonomous vehicles represent a massive new use of sensors, including RGB, infrared, and lidar. And there are eyes in the skies, not only over military battlespaces, but even over urban areas, as personal drones compete with package delivery robots. All of this surveillance video has a purpose (e.g., object detection, face recognition, object tracking, activity recognition), which is increasingly pursued by machines, not human observers. In the era of surveillance, what does the quality of video mean? One simple answer is: video quality is a measure of how well it supports the purpose. This is in an abstract sense the same as it is for entertainment video. But quantifying it requires new tools. In this paper, we address the case of surveillance, especially aerial surveillance. Our ideas apply equally to military and commercial applications like autonomous vehicles and package delivery.
New Standards in Image and Video Applications
icon_mobile_dropdown
Interpolation filtering for intra prediction in versatile video coding
Gagan Rath, Fabien Racape, Fabrice Urban, et al.
In this paper, we propose a novel interpolation of reference samples for intra prediction in Versatile Video Coding (VVC). To interpolate a predictor value between two reference samples, the method uses four nearest reference samples, as does the existing cubic filter in VVC, but with a simpler design that does not require to pre-compute the filter coefficients. We model the signal as a sum of two components, where the first component is the classical linear interpolation, and the second component is a corrective term that accounts for the change due to the two farther samples. To arrive at this model, we model the signal as a sum of two quadratic functions where each quadratic function models the signal with three adjacent samples. The corrective term is attributed to the quadratic term in the resulting model, which can be calculated on the fly. Since models based on four samples can lead to large errors at edges of objects, we propose to use a thresholding method to decide between the proposed model and the usual linear interpolation. Besides BD-rate performance gain, the advantages of the proposed method are lower complexity, no memory storage of filter coefficients, and a uniform method for both Luma and Chroma components. The proposed method applied in VTM 7.0 intra prediction results in BD-rate gains of 0.13% for Luma and 0.30% for Chroma with lower decoding complexity.
Revisiting transform and partitioning tools for post VVC codec
Karam Naser, Fabrice Le Léannec, Tangi Poirier, et al.
Transform and partitioning represent core components of the video coding architectures. Compared with HEVC, VVC is characterized by higher number of transform types, additional transform level (LFNST) and more flexible partitioning via the binary tree and ternary tree. This flexibility in transform and partitioning provides about 2% and 10% coding gain. Nevertheless, the current design is not ultimately optimized for the highest coding gain, but rather for the compromise with the design complexity. That is, the potential of combining higher transform and partitioning diversity is higher than the current state in VVC. This can be demonstrated by utilizing some early transform and partitioning proposals in the context of VVC development, which were not adopted due to the complexity concerns. In this paper, we revisit these designs targeting the maximum bitrate saving. This is to establish a new state of the art anchor for the post VVC development.
Multiple constraints rate distortion optimization for a video encoder control
Fabrice Le Léannec, Tangi Poirier, Franck Galpin, et al.
Current video coding standards like HEVC, VP9, VVC, AV1, etc., involve partitioning a picture into coding tree units (CTU), typically corresponding to 64x64 or 128x128 picture areas. Each CTU is partitioned into coding blocks following a recursive coding tree. In recently published perceptual video encoding methods, the CTU is used as the spatial unit to assign a QP value in a given picture area. Such an approach fits well with the usual rate distortion optimization used to decide the coding tree representation of a CTU since a constant QP is used inside the CTU. Thus Lagrangian rate distortion optimization works in such a situation. However, for some applications, finer spatial granularity may be desired with an adaptive QP. A perceptual video coding scheme may use a codec agnostic QP allocation process that proceeds on a 16x16 block basis. The issue raised in such a case is that the rate distortion trade-off among split modes no more works with the Lagrangian method. This paper proposes several methods to perform the rate distortion optimization of a coding tree in the situation where multiple QPs may be assigned inside the same CTU. First a theoretical method to solve the problem is described. It consists in a coding tree RD optimization using multiple Lagrange parameters. Then some simpler empirical methods which emulate the theoretical approach are proposed. Experimental results show the benefit of the proposed methods on top of VP9 and HEVC video encoders.
Lossless coding support in VVC
Benjamin Bross, Tung Nguyen, Heiko Schwarz, et al.
This Conference Presentation, "Lossless coding support in VVC" was recorded for the Optics + Photonics Digital Forum.
MPEG-5 part 2: Low Complexity Enhancement Video Coding (LCEVC): Overview and performance evaluation
Guido Meardi, Simone Ferrara, Lorenzo Ciccarelli, et al.
Low Complexity Enhancement Video Coding (LCEVC) is a new MPEG video codec, currently undergoing standardization as MPEG-5 Part 2. Rather than being another video codec, LCEVC enhances any other codec (e.g. AVC, VP9, HEVC, AV1, EVC or VVC) to produce a reduced computational load and a compression efficiency higher than what is achievable by the enhanced codec used alone for a given resolution, especially at video delivery relevant bitrates. The core idea is to use a conventional video codec as a base codec at a lower resolution and reconstruct a full resolution video by combining the decoded low-resolution video with up to two enhancement sub-layers of residuals encoded with specialized low-complexity coding tools.
Review and comparative analysis of parallel video encoding techniques for VVC
Panagiotis Belememis, Natalia Panagou, Thanasis Loukopoulos, et al.
In this paper we review and summarize research results concerning video encoding parallelization, with a primary focus on medium and fine grained methods that operate at block or inner block levels. Taxonomies are illustrated wherever applicable with emphasis to scalability issues. Given the reported results, we turn our attention into the problem of allocating resources (processing cores) to parallel tasks performed by the encoder so as to achieve high speedup. We advocate that a parallelization scheme taking advantage of independently coded areas (e.g., tiles), wavefront parallelism within each area and inner block parallelism at the CTU compression level, can achieve significantly higher parallelization degree compared to standalone methods. An algorithm is then proposed that takes resource allocation decisions at all the aforementioned levels. Both the proposed algorithm and standalone representative approaches from the relevant literature are evaluated in terms of scalability using CTU coding times recorded by CU split parallelism in VTM 6.2. Results show that the potential scalability of the proposed scheme surpasses alternatives.
Wireless video communications over lossy channels
P. Topiwala, W. Dai, K. Bush
Wireless video communications are a growing segment of communications in both commercial and defense domains. Due to its lossy nature, it requires a powerful error correction mechanism. In the commercial domain, ever more people watch videos on portable wireless devices, while in defense domains, surveillance assets grow rapidly to provide real-time motion imagery intelligence. In both domains, the transmission of high-quality video is of vital importance, and a variety of both source and channel codecs have been developed, separately for each domain, and application. In this paper, we outline an effort to explore the space of video codecs, channel codecs, channel models, and error concealment methods, in an attempt to find the best practices within this large search space. Among source codecs, we focus attention on the two most common video codecs in use: H.264/AVC, and H.265/HEVC. We perform simulations of an additive white gaussian noise channel model with stressfully low signal-to-noise ratios and use powerful Low-Density Parity-Check (LDPC) and Polar codes to correct errors. While some earlier studies had suggested that H.264 might in fact be more suitable for video communications in a lossy environment, our preliminary results appear to suggest that H.265 may actually be better, provided we use powerful error correction.
Image and Video Processing and Analysis I
icon_mobile_dropdown
Low-cost educational resource using optical fibers to send color images
We present a low-cost resource that aims to facilitate the teaching process of subjects related to optical fibers, given that it presents a practical application of this technology. The main purpose of this resource is to send a color image through optical fibers with a laser diode from one computer and receive it with a photoresistor on another computer, with a total estimated setup-cost of twenty dollars. The setup includes a visual user-interface, for both computers, programmed with the software LabVIEW, which lets to select and prepare an image for a proper sending and receiving. In this proposal we have used arduino boards and low-cost plastic optical fibers used in ornaments.
Comparison of principal component analysis and multi-dimensional ensemble empirical mode decomposition for impact damage segmentation in square pulse shearography phase images
Herberth Birck Fröhlich, Bernardo Cassimiro Fonseca de Oliveira, Estiven Sanchez Barrera, et al.
Composite materials have mechanical behavior comparable to metallic alloys, with the benefit of being lighter. However, due to their anisotropy, integrity characterization of impact damages remains a challenge. Non-Destructive Testing (NDT) methods are useful in this context, as they achieve success in evaluation while they avoid modifications in the characteristics of the piece. Shearography is an NDT that reveal changes on a surface in response to a load. Yet, shearography outputs carry several unwanted characteristics along with the defect, like background patterns, light changes, and noise. Image segmentation techniques can enhance the capability of automatic measurement of defective areas and also aid supervised methods and feature extractors, which rely on images as inputs. Yet, most of the times image pre-processing is required for better and useful results in segmentation. Principal Component Analysis (PCA) and Multi-dimensional Ensemble Empirical Mode Decomposition (MEEMD) allow this to be done at the same time that they deal with background and light changes, as they decompose the image. Using Matthews Coefficient as a metric for masks comparison, it is shown that MEEMD has better results than PCA, and with lower expanded uncertainty.
Comparative study of point cloud registration techniques between ICP and others
Registration is a technique employed for the alignment of point clouds in a single coordinate system. This process is very useful for the reconstruction of 3D plant models, the extraction of their morphological features and the subsequent analysis of the phenotype. One of the most widely studied recording algorithms is ICP (Iterative Closest Point), which is based on rigid transformations. Although in the literature there are several comparative studies between different variants of ICP, there is no comparative study with other more recent existing methods based on other principles. Therefore, in this paper we present a study comparing the results obtained with different registration algorithms on previously filtered 3D point clouds of plants, obtained with a MS Kinect V1 sensor integrated to a rotating base. The study includes two of the most used variants of the ICP, the point-to-point ICP and the point-to-plane ICP. These variants are based on the normals to the surfaces found to guide their point-to-point matching method presenting better results in smooth regions. In addition, other iterative point cloud alignment algorithms based on probability density estimation, hierarchical mixed Gaussian models and distance minimization between probability distributions are included. The results showed the effectiveness of ICP variants simplicity, and the high precision achieved by probabilistic methods. The error and computation time of the algorithms, implemented in Python, were evaluated.
Non-contact measurement of mental stress via heart rate variability
Non-contact detection of mental stress based on physiological parameters has many potential application areas, such as measuring stress in athletic contest. Non-contact detection could measure mental stress without drawing the attention of subjects. And compared with questionnaire survey, mental stress measurement based on physiological parameters is more objective. In this paper, we introduced a non-contact method to measure mental stress via heart rate variability (HRV). We conducted an experiment with 29 participants at rest and under stress. And a mental arithmetic test was employed to induce stress. To extract HRV, we recorded videos on subjects’ faces by a color CCD camera. HRV was extracted from these videos by imaging photoplethysmography (IPPG). The results showed that HRV was significantly different between normal and stressed conditions. Then we performed significance test and independence test to select the features which could be used in mental stress measurement. Finally, nine features were used to measure mental stress. In order to establish a stress measurement model, support vector machines (SVM) was used to establish a binary classifier for stress detection and the accuracy of the model was 78.2%. Compared with other methods, our method took non-linear features of HRV into consideration. The method we proposed supports the application of non-contact mental stress detection.
Medical Imaging I
icon_mobile_dropdown
A new method for detecting brain fibrosis in microscopy images using the neurocysticercosis pig model
Neurocysticercosis (NCC) is considered a major cause of acquired epilepsy in most developing countries. Humans and pigs acquire cysticercosis ingesting T. solium eggs by the fecal-oral route. After ingestion, oncospheres disperse throughout the body producing cysts mainly in the central nervous system and striated muscles. The treatment is focused on antiparasitic, anti-inflammatory, and antiepileptic drugs; however, new drugs are being studied in animal models recently. The aim of this study was to perform histological image analysis of pig brains with NCC after antiparasitic treatment to develop future tools to study brain inflammation since usually the evaluation of fibrosis is obtained manually on microscopy images in a long, inaccurate, poorly reproducible, and tedious process. For this purpose, the slides of pig brains with NCC were stained with Masson's Trichrome, and high quality photographic images were taken. Then, image processing and machine learning were performed to detect the presence and extension of collagen fibers around the cyst as markers of fibrosis. The process includes the use of color normalization and probabilistic classification implemented in Java language as a plugin to the free access program ImageJ. This paper presents a new method to detect cerebral fibrosis, assessing the amount of fibrosis in the images with accuracy above 75% in 12 seconds. A manual editing tool allows us to raise the results above 90% faster and efficiently.
Three-dimensional super line-localization in low signal-to-noise microscope images via prior-apprised unsupervised learning (PAUL)
Biological processes such as processive enzyme turnover and intracellular cargo trafficking involve the dynamic motion of a small ”article" along a curvilinear biopolymer track. To understand these processes that occur across multiple length and time scales, one must acquire both the trajectory of the particle and the position of the track along which it moves, possibly by combining high-resolution single-particle tracking with conventional microscopy. Yet, usually there is a significant resolution mismatch between these modalities: while the tracked particle is localized with a precision of 10 nm, the image of the surroundings is limited by optical diffraction, with 200 nm lateral and 500 nm axial resolutions. Compared to the particle's trajectory, the surrounding curvilinear structure appears as a blurred and noisy image. This disparity in the spatial resolutions of the particle trajectory and the surrounding curvilinear structure image makes data reconstruction, as well as interpretation, particularly challenging. Analysis is further complicated when the curvilinear structures are oriented arbitrarily in 3D space. Here, we present a prior-apprised unsupervised learning (PAUL) approach to extract information from 3D images where the underlying features resemble a curved line such as a filament or microtubule. This three-stage framework starts with a Hessian-based feature enhancement, which is followed by feature registration, where local line segments are detected on repetitively sampled subimage tiles. In the final stage, statistical learning, segments are clustered based on their geometric relationships. Principal curves are then approximated from each segment group via statistical tools including principal component analysis, bootstrap and kernel transformation. This procedure is characterized on simulated images, where sub-voxel medium deviations from true curves have been achieved. The 3D PAUL approach has also been implemented for successful line localization in experimental 3D images of gold nanowires obtained using a multifocal microscope. This work not only bridges the resolution gap between two microscopy modalities, but also allows us to conduct 3D super line-localization imaging experiments, without using super-resolution techniques.
Medical Imaging II
icon_mobile_dropdown
Feature relevance in dermoscopy images by the use of ABCD standard
The use of complex classification algorithms such as deep learning techniques does not allow the researchers to identify the most discriminant features for tumor classification as they lack interpretability. This study aims to develop an algorithm capable of differentiating a set of dermoscopic images depending on whether the tumor is benign or malignant. The priority of this research is to obtain the importance of each extracted feature. This work is focused on the ABCD rule feature analysis and it aims to find the relevance of each feature and its performance in a classification model. A relevant aspect of this study is the use of a heterogeneous database, where the images were uploaded by different sources worldwide. A combination of novel and previously used features are analyzed and their importance is computed by the use of a Gaussian mixture model. After selecting the most discriminant features, a set of classification models was applied to find the best model with the less quantity of features. We found that a total of 65.89% of the features could be omitted with a loss in accuracy, sensibility and specificity equal or lower than 2%. While similar performance measures have been employed in other studies, most results are not comparable, as the databases used were more homogeneous. In the remaining studies, sensitivity values are comparable, with the main difference that the proposed model is interpretable.
Comparison of cell contour closing methods in microscopy images
Cell counting and tracking approaches are widely used in microscopy image processing. Cells may be of different shapes and may be very crowded or relatively close together. In both cases, the correct identification of each cell requires the detection and tracking of its contour. But, this is not always possible due to noise, image blurring from signal degradation during the acquisition process and staining problems. Generally, cell segmentation approaches use filtering techniques, Hough transform, combined with morphological operators to address this problem. However, usually, not all contours can be closed. Therefore, heuristic contour closing techniques have been employed to achieve better results. Despite being necessary, no comparative studies on this type of methods were found in the literature. For that reason, this paper compares three approaches to contour tracking and closing. Two of them use one end of a contour as a starting point and trace a path along the edge of the cell seeking to find another endpoint of the cell. This is done using the first or second ring of neighboring pixels around the starting point. The heuristics used are based on region growing taking the information from the first or second ring of neighboring pixels and keeping the direction along the plotted path. The third method employs a modification of Dijkstra's algorithm. This approach employs two seed points located at each possible end of the contour. This paper presents a description of these techniques and evaluates the results in microscopy images.
Vein monitor
Properly visualizing a patient's vascular system for the location of veins to place a catheter or take a blood sample is a delicate and laborious task, especially when the patient has a particular condition that does not facilitate this work using classic techniques, i.e. it is not easy to visualize the veins, since they do not sprout when the subject's arm is held with a tourniquet, which can be traumatic and painful for the user since several venipunctures may be necessary before a vein is located. To solve this problem and improve the patient's experience during the venipuncture, in this paper we propose the development of a low-cost prototype for the visualization of the vascular system. For this purpose, some commercial visualizers were studied, and different types of lighting and filters related to the images of the vascular system were explored. The described design uses a simple web camera modified for visualization, together with an interface that allows connection to a computer, where the video is processed, allowing through digital image processing techniques the visualization in real time of the subject's veins through the computer screen. As a result, a prototype similar to the commercial ones was obtained, with a cost of less than 10% of a commercial equipment and very similar image quality characteristics.
Redesign of the wireless sensor network for tomographic imaging
Radio tomographic imaging (RTI) uses the radio signal strength (RSS) data obtained from a wireless sensor network (WSN) to form images of the obstructions in the network. In the previous design and implementation, the WSN utilized two different wireless transmitter modules along with a microcontroller. The proposed new architecture consists of microcontrollers, which have built-in Wi-Fi modules, and XBee transmitters. The RSS values are generated by the XBee modules as in the previous design. However, in the proposed design, the microcontroller can be connected to a Local Area Network (LAN) via the Wi-Fi router, which enables the main computer to receive the RSS values using the User Diagram Protocol (UDP) for data processing. The new architecture will have a custom designed printed circuit board (PCB) to improve method of powering and physical packaging of the regular nodes. Therefore, this new design will enable the research team to scale the WSN easier and therefore, increase mobility of the sensor nodes for outdoor tests.
Evaluation of the Hough and RANSAC methods for the detection of circles in biological tests
Manuel G. Forero, Laura A. Medina, Natalia C. Hernández, et al.
In the analysis of in vitro biological samples is very common to use Petri dishes, in which the samples are cultured. Likewise, in the study of mouse behavior and learning, the Morris test is employed, which consists of placing the animal in a circular pool to observe its behavior within it. In both cases, usually, the detection of each circle in images and videos is done manually, which makes the process long, tedious and imprecise. Currently, there are several image processing methods that allow the detection of lines, circles and other geometric shapes; the most well-known being the technique based on the Hough transform, which allows the detection of geometric shapes that can be expressed by a mathematical equation, and the Random Sample Consensus (RANSAC), a robust estimation algorithm that allows a mathematical model to be found from data contaminated with numerous values that do not fit the model. The precise location of the circle's position is very important, as it can seriously affect the detection and counting of samples on the Petri dishes and the measurement of mouse paths in the Morris test. Therefore, in this paper we evaluate and present the results obtained with these two techniques in synthetic images, for the detection of Petri dishes in biological images and the circular pool in Morris' test videos, measuring their computational efficiency and the error in the location of the circles.
Automatic analysis of breast thermograms by convolutional neural networks
Temperature patterns of the breast measured using infrared thermography have been used to detect changes in blood perfusion that can occur due to inflammation, angiogenesis, or other pathological causes. In this work, 94 thermograms of patients with suspected breast cancer were analyzed using an automatic classification method, based on a convolutional neural network. In particular, our approach uses a deep convolutional neural network (CNN) with transfer learning to automatically classify thermograms into two different tasks: normal and abnormal thermograms, and malign and benign lesions. Class Activation Mapping is used to show how the network can focus on the affected areas without having received this information. Several measurements were carried out to validate the performance of the network in each task and these results suggest that deep convolutional neural networks with transfer learning are able to detect thermal anomalies in thermograms with sensitivity similar to that of a human expert, even in cohorts with a low prevalence of breast cancer.
Image and Video Processing and Analysis II
icon_mobile_dropdown
Using wavelet transform to evaluate single-shot phase measuring deflectometry data
Phase measuring deflectometry (PMD) is a well-established method for determining the topography of specular freeform surfaces. A disadvantage of the classical PMD method, however, is the sequential measurement process - it requires at least six camera images of phase-shifted sinusoidal fringe patterns for one measurement. Therefore, for moving objects in industrial production, as well as for non-fixable objects such as the human cornea, the classical PMD evaluation is not suitable anymore. To overcome this problem, single-shot methods using single-side-band demodulation have been presented, which allow for a deflectometric measurement based on just one single image capture. However, this kind of evaluation does not work for complex surface geometries that result in broadband fringe patterns, since the phase is only considered globally in the Fourier space. A new single-shot evaluation method for the phase determination using the Continuous Wavelet Transform (CWT) is presented. The advantage of the wavelet transform is that the signal can be evaluated locally in both spatial and frequency space, making it possible to measure even complex reflective surfaces in motion. First measurement results are shown and compared to the classic phase-shifting evaluation for a non-moving object. Furthermore, limits and possible enhancements of this new method are discussed. Phase measuring deflectometry (PMD) is a well-established method for determining the topography of specular freeform surfaces. A disadvantage of the classical PMD method, however, is the sequential measurement process - it requires at least six camera images of phase-shifted sinusoidal fringe patterns for one measurement. Therefore, for moving objects in industrial production, as well as for non-fixable objects such as the human cornea, the classical PMD evaluation is not suitable anymore. To overcome this problem, single-shot methods using single-side-band demodulation have been presented, which allow for a deflectometric measurement based on just one single image capture. However, this kind of evaluation does not work for complex surface geometries that result in broadband fringe patterns, since the phase is only considered globally in the Fourier space. A new single-shot evaluation method for the phase determination using the Continuous Wavelet Transform (CWT) is presented. The advantage of the wavelet transform is that the signal can be evaluated locally in both spatial and frequency space, making it possible to measure even complex reflective surfaces in motion. First measurement results are shown and compared to the classic phase-shifting evaluation for a non-moving object. Furthermore, limits and possible enhancements of this new method are discussed.
Comparative analysis of inpainting techniques based on sparse models and isophote comparison
Manuel G. Forero, Sergio L. Miranda, María J. Pinilla
One of the most commonly used image processing tasks in photo editing is to improve the quality of the picture when it contains scratches, smudges or any unwanted object or drawing. The restoration method used to remove them is inpainting, which consists of filling in the areas, where the unwanted information is found, in an imperceptible way. Inpainting has been used since ancient times, where the concept of removing or replacing an object in an image has been developed through the painting of photographs. Recently, new inpainting techniques have emerged based on sparse models that offer a solution to this problem. A sparse model is a system of linear equations that involves the use of a dictionary together with a vector α to make the reconstruction or improvement of the images. Orthogonal Matching Pursuit (OMP) and K-SVD are the techniques used to obtain the dictionary and the vector α. These inpainting techniques provide fairly realistic results but have not been evaluated against other techniques. Therefore, in this work we compare the results obtained with sparse modelling against those obtained with two other techniques, the first one based on bilinear interpolation and the second one, called Isophote continuation, initially identifies the area to be reconstructed, then from the adjacent neighbours creates new layers within the region to be reconstructed and repeats the process until the area to be restored is completely filled. Initially, the results of the techniques were visually contrasted with the original images. Then the difference between the original and the resulting image was calculated taking into account only the areas of interest to find the number of non-zero pixels and the root mean square error (RMSE). The techniques based on dispersed models based generated good results as well as Inpainting by Continuation of Isophotes.
Evaluation of panchromatic and multispectral image fusion methods
Victor Tuesta-Monteza, Yelsin Pérez Vásquez, Heber I. Mejiá-Cabrera, et al.
Earth observation satellites provide multispectral images that are characterized by good spectral quality but low spatial quality. They also provide panchromatic images that, on the contrary, are characterized by good spatial quality but low spectral quality. Therefore, it is important to merge both images to obtain a single one that contains complementary information and can be used in land resource studies, surface geology, water management, forests, urban development, agriculture, and others. For this reason, it is important to evaluate the techniques used for the fusion of multispectral and panchromatic images: EIHS, Brovey and Averaging. Therefore, in this work these three techniques are evaluated, using the quantitative indices: spectral ERGAS and spatial ERGAS. In this way, the quality of the resulting fused images can be measured. Natural images were used to make the evaluation. The results show, on the one hand, that the best spectral quality is obtained with the Averaging algorithm, followed by the Brovey and, thirdly, by the EIHS. On the other hand, the best spatial quality was obtained with the EIHS algorithm, followed by the Brovey and then by the Averaging algorithm. It was also found that by averaging the values obtained in both evaluations that the best quality of fusion is obtained with the Averaging algorithm, followed by the Brovey and finally by the EIHS.
Digital photoelasticity and DIC applied to stress and strain hybrid evaluation of bioinspired structures from rice root cross-section
Some engineering areas have the challenge to discover geometries with mechanical high performance against complex applications, which is a defiant design task. With this objective, recent works have demonstrated the powerful contributions that bioinspired experiments can offer a wide diversity of applications. In this sense, this paper analyzes differences in the mechanical response of a stress concentrator when comparing conventional rings and their respective bioinspired representation. Here, the bioinspired geometry comes from a cut of a transversal section of rice root due to its resistance to internal pressures. The mechanical analysis is carried out by hybrid integration of photoelasticity studies, digital image correlation, and finite element methods. In this case, results indicate that preserving the same quantity of material, bioinspired geometries are more sensitive to stress and strain than conventional rings.
Synthesis of video processing with open-source hardware descriptor languages
Technology development has allowed each day we have devices that can contain all the functionality of a Digital System on a single chip (SoC) and they have a very high scale of integration (VLSI) hundreds of millions of gates at very low costs. As well as the design, verification and synthesis tools offered by the development factories those make these SoC and FPGA components. This companies offer Integrated Development Environments with software tools to perform from the specification of the Design to its synthesis in C.I. and its verification in industry standard languages as Verilog and VHDL. This paper shows the advantages in design, verification, synthesis and testing that can be obtained by using HDL languages such as CHISEL, MyHDL for the processing of video processing in Real-Times and demonstrate its main advantages in both learning time and costs.
Automatic method for brightness adjustment in regions of interest in photography
One of the most commonly used techniques in photo retouching is to highlight or modify the brightness of specific areas or objects in the image that appear dark to make them more noticeable. This task is usually done by hand, but it is a subjective and sometimes long process, depending on the quality of the adjustment required by the user. Therefore, based on the idea used for the removal of noise in images by means of the joint bilateral and guided filters, in this work, the use of a second image of the same scene is proposed. This second image, acquired with different lightning and in which the area to be highlighted appears with the desired intensity, is used to adjust the brightness of the same area in the image to be processed. The method is based on a technique previously employed for the adjustment of monochromatic microscopy images, in which two two-dimensional histograms are obtained from the pixels of the region to be modified and another two from the one that serves as a guide. From each pair of histograms, five profiles are obtained, which are used to automatically adjust the brightness of the region of interest to be processed, taking the one that produces the closest result among the regions of interest using the Bhattacharyya distance. In this way, it is possible to make photographic effects for image correction. The results obtained show the quality of the proposed technique.
Compression II
icon_mobile_dropdown
Open source framework for reduced-complexity multi-rate HEVC encoding
Aruna Mathesawaran, Praveen Kumar Karadugattu, Pradeep Ramachandran, et al.
Adaptive bitrate streaming (ABR) is the key enabler for large-scale video distribution over internet. While this approach allows for very reliable and robust video distribution, but it comes at a cost. While only a single SD and a single HD representation is needed for traditional cable/IPTV distribution, ABR requires far more representations. This, in turn, takes a heavy toll on computational resources. x265, a popular open-source HEVC encoder introduced a framework for reusing information across different representations. While the benefits of this framework to compute time are significant, it relied heavily on user input to configure the encoder correctly to achieve the required gains. In this paper, we present an x265-based implementation that leverages the above framework to automatically improve the computation effciency by 44%, as well as an orders of magnitude reduction in the turnaround time.
Fast transform type selection using conditional Laplace distribution based rate estimation
Bohan Li, Jingning Han, Yaowu Xu
Selecting among multiple transform kernels to code prediction residuals is widely used for better compression efficiency. Conventionally, the encoder performs trials of each transform to estimate the rate-distortion (R-D) cost. However, such an exhaustive approach suffers from a significant increase of complexity. In this paper, a novel rate estimation approach is proposed to by-pass the entropy coding process for each transform type using the conditional Laplace distribution model. The proposed method estimates the Laplace distribution parameter by the context inferred by the quantization level and finds the expected rate of the coefficients for transform type selection. Furthermore, a greedy search algorithm for separable transforms is also presented to further accelerate the process. Experimental results show that the transform type selection scheme using the proposed rate estimation method achieves high accuracy and provides a satisfactory speed-performance trade-off.
Comprehensive assessment of image compression algorithms
JPEG image coding standard has been a dominant format in a wide range of applications in soon three decades since it has been released as an international standard. The landscape of applications and services based on pictures has evolved since the original JPEG image coding algorithm was designed. As a result of progress in the state of the art image coding, several alternative image coding approaches have been proposed to replace the legacy JPEG with mixed results. The majority among them have not been able to displace the unique position that the legacy JPEG currently occupies as the ubiquitous image coding format of choice. Those that have been successful, have been so in specific niche applications. In this paper, we analyze why all attempts to replace JPEG have been limited so far, and discuss additional features other than compression efficiency that need to be present in any modern image coding algorithm to increase its chances of success. Doing so, we compare rate distortion performance of state of the art conventional and learning based image coding, in addition to other desired features in advanced multimedia applications such as transcoding. We conclude the paper by highlighting the remaining challenges that need to be overcome in order to design a successful future generation image coding algorithm.
The SVT-AV1 encoder: overview, features and speed-quality tradeoffs
Faouzi Kossentini, Hassen Guermazi, Nader Mahdi, et al.
The Scalable Video Technology AV1 (SVT-AV1) encoder is an open-source software AV1 encoder that is architected to yield excellent quality-speed-latency tradeoffs on CPU platforms for a wide range of video coding applications. The SVTAV1 encoder is based on the SVT architecture, which supports multi-dimensional parallelism, multi-pass partitioning decision, multi-stage/multi-class mode decision, and multi-level spatiotemporal prediction and residual coding algorithms. Given a latency constraint, the SVT-AV1 encoder maximizes the CPU utilization on multicore CPUs, through picturebased parallelism for high-latency video applications, and through segment-based parallelism for low-latency video applications. The picture-level/segment-level parallelism allows the SVT-AV1 encoder to produce identical bit streams, irrespective of whether single- or multi-threaded encoding is performed. In mode decision, the SVT-AV1 encoder yields many speed-quality tradeoffs with high granularity, mainly through multi-pass processing of each superblock and multistage/ multi-class processing of each block, and through the different levels of the prediction/coding features. The resulting speed-quality tradeoffs of SVT-AV1 are compared, in Video-On-Demand (VOD) use-cases, to those of libaom (another open-source AV1 encoder), and to those of the latest x264 (AVC), x265 (HEVC) and libvpx (VP9) open source encoders.
Image and Video Processing and Analysis III
icon_mobile_dropdown
High dynamic range image sharing with privacy protection
Sharing pictures has become a very popular practice among consumers. Most recent cameras, displays, and smartphones can capture and display images in high dynamic range and wide colour gamut, contributing to an increase of this type of content. It is a well-known fact that pictures contain information that could cause various security and privacy issues. This is in particular true because high dynamic range pictures exhibit more information when compared to conventional images resulting in an increased invasion of privacy. Lack of proper solutions to overcome privacy can hinder a wide spread adoption of high dynamic range pictures shared in social networks. For example, dark areas of a content that are otherwise difficult to view in a conventional picture become more visible. It is therefore natural that mechanisms are offered to protect privacy while sharing pictures also in high dynamic range. In this paper, we propose an architecture to share high dynamic range images captured by high-end smartphones while offering a mechanism to protect privacy. In our approach, the high dynamic range picture is tone-mapped into a lower dynamic range version which reduces its degree of invasiveness as far as privacy is concerned. Eventually, any remaining privacy sensitive areas in the picture can be further protected (blurring of faces, masking cars licence plates, etc.). The core of the proposed architecture conforms to both JPEG XT, a recently standardised format that offers backward compatibility with legacy JPEG and to JPEG Systems under development. The proposed solution allows to publicly share a protected version of the high dynamic range image tone-mapped to a standard dynamic range picture. The latter can go through transmorphing operation to further protect it. Authorised users can access the original high dynamic range picture. The architecture as described above has been implemented as an app for smartphones running Android OS. We demonstrate feasibility and usefulness of this approach and discuss its advantages when compared to conventional image sharing such as those used in social networks.
Determination of quality characteristics in modern agricultural systems through feature extraction
In this work, an approach for feature extraction by support vector machines over images of plants from an aquaponic system is presented. Image processing using Python programming on a low-cost computer connected to a compatible camera is proposed as an aid for preserving modern agricultural systems, under a color, texture and shape feature extraction analysis. The intensity information from images of tomatoes growing in an aquaponic system is exploited in order to obtain an early detection of diseases and nutrient deficiency, as well as a prediction of freshness and time to harvest. Working towards the use of a portable tool for non-destructive measurement on the field; image segmentation, clustering for extraction, classification and comparison of features, are implemented using open source software routines that allow an easier discrimination between plants for a reasonable execution time.
Privacy-preserving photo sharing based on blockchain
Sharing photos online has become an extremely popular activity, raising a wide concern on privacy issues related to the shared content. ProShare, a photo-sharing solution developed by Multimedia Signal Processing Group of EPFL addresses some of these privacy issues by relying on a trusted third party. Recently, distributed ledger technologies in general and blockchain, in particular, have been used to provide trust and censorship resistance in a decentralized manner. This paper proposes the use of blockchain technology to remove the need for a trusted third-party in the sharing process introduced by ProShare.
Poster Session
icon_mobile_dropdown
The Foveal Avascular Zone Image Database (FAZID)
Arpit Agarwal, Jothi Balaji J., Rajiv Raman, et al.
The Foveal Avascular Zone (FAZ) is of clinical importance since the vascular arrangement around the fovea changes with disease and refractive state of the eye. Therefore, it is important to segment and quantify the FAZ accurately. Studies done to date have achieved reasonable segmentation but there is a need for considerable improvement. In order to test and validate newly developed automated segmentation algorithms, we have created a public dataset of these retinal fundus images. The 304 images in the dataset are classified into: diabetic (107), myopic (109) and normal (88) eyes. The images were classified by a clinical expert and include clinical grading of diabetic retinopathy and myopia. The images are of dimensions 420 x 420 pixels (6mm x 6mm of retina). Both clear and manually segmented by a clinical expert (ground truth) images are available (608 total images). In these images, the FAZ is the green region marked in manually segmented image. The images can be used to test newly developed techniques and the manual segmentation images can be used as a ground truth for making performance comparisons and validation. It should also be noted there are only a few studies using supervised learning to segment the FAZ and this dataset will potentially be useful for machine learning training and validation. The image database, The Foveal Avascular Zone Image Database (FAZID) dataset can be accessed from the ICPSR website at the University of Michigan (https://doi.org/10.3886/E117543V2).
Optimization of the regularization parameters for photoacoustic imaging based on augmented Lagrangian
Haoxiang Gao, Weixin Kang
In photoacoustic imaging, regularization is often used to improve the fidelity of reconstructed images. This method is based on augmented Lagrangian alternate TV method, which is a known compression sensing method. The results show that this method can quickly calculate the regularization parameters and efficiently reconstruct the initial sound pressure. In this paper, 2 - valued image and CT scan data are used to evaluate the algorithm
Feature point matching of infrared and visible image
Feature point matching has been widely applied in image registration, image fusion, remote sensing and other fields. The relation between the pixels of infrared images and pixels of visible images is complex due to the images were taken by different sensors. Different sensors images also contain some common information which can been depended to achieve point matching. Scale Invariant Feature Transform (SIFT) algorithm is an effective and popular feature extraction algorithm. SIFT algorithm can be used in point matching, it can get feature descriptor vectors of the feature points which extracted from the images. But in some scenes, SIFT algorithm can’t achieve the accurate feature point matching. Different from SIFT algorithm, Edge-Oriented-Histogram (EOH) algorithm characters the orientation information of the edge and EOH feature descriptor can integrate the boundary information in different directions around the feature points, so that we can realize the describe of the edge direction and amplitude by EOH algorithm. To achieve the accurate feature point matching of infrared image and visible image, we propose a feature point matching method based on SIFT algorithm and EOH algorithm. Firstly, we use image enhancement to increase contrast of the image and then we use SIFT algorithm to extract feature points. Next, we utilize the EOH algorithm to get the 80 bins descriptors of the feature points which detected by SIFT algorithm. We calculate Euclidean distance to get the similarity of the descriptors to achieve point matching. In order to improve the matching accuracy, we adopt Random Sample Consensus (RANSAC) algorithm to eliminate the mismatching points. Nevertheless, RANSAC can’t get all correct points, we use another measure of Euclidean to select correct pairs, then we combine the two parts to match the feature points again. Experimental results demonstrate that our method can effectively improve the accuracy of matching and find more correct matching point pairs.
Analysis of the cloud background and its simulation based on the remote sensing image
He Zhang, Bin Chen, Weixian Qian, et al.
The realistic simulation of the cloud background has certain difficulty because of its appearance and irregular arbitrary distribution. In order to more realistically simulate cloud background, a multi-layer "cloud particle" superposition model is proposed in this paper based on the characteristics of real cloud background. First, characteristics of remote sensing of clouds under different distribution frequency are collected and the laws within them are analyzed. Second, the generation space of the "cloud particles" that will be generated is delimited based on the fractal theory. Third, cloud images at different frequency are generated based on the characteristic laws of the real cloud images. And finally, the multi-layer images are merged by linear superposition. The method used in this paper is also with controllable coverage, the remote sensing cloud images under different coverage, therefore, can be simulated.
Infrared small target detection based on fusion of multiple saliency information
Chao Ma, Guohu Gu, Minjie Wan, et al.
To improve the detection rate of small target in infrared image, this paper proposes an infrared small target detection algorithm based on the fusion of multiple saliency information, which combines local contrast measure (LCM), curvature filtering and motion saliency. Firstly, three saliency maps of the infrared image are calculated separately to prepare for the next advantages integration. Then, to improve the contrast of the target, the LCM saliency map and curvature saliency map are filtered according to the motion saliency value. Finally, the fusion weight is determined by the background suppression factor of the saliency map so that the fusion saliency map is obtained. Experimental results show that the proposed infrared small target algorithm outperforms other comparing methods in terms of detection capability.
Design of autonomous mobile systems for face recognition based on a DCNN with compression and pruning
In this paper, 3D face recognition based on a deep convolutional neural network in autonomous mobile systems is associated with a large size of neural models and extremely high computational complexity of classification procedures owing to the large network depth. To solve this problem, we use compression and pruning algorithms. Since these algorithms decrease the recognition accuracy, we propose an efficient retraining of neural models in such a way to approach the recognition accuracy to very large modern models of neural networks. The performance of the proposed neural models using compression and pruning is compared in terms of face recognition accuracy and compression rate.
Classification of breast abnormalities in digital mammography with a deep convolutional neural network
A novel algorithm for analysis and classification of breast abnormalities in digital mammography based on a deep convolutional neural network is proposed. Simplified neural network architectures such as MobileNetV2, InceptionResNetV2, Xception, and ResNetV2 are intensively studied for this task. In order to improve the accuracy of detection and classification of breast abnormalities on real data an efficient training algorithm based on augmentation technique is suggested. The performance of the proposed algorithm for analysis and classification of breast abnormalities on real data is discussed and compared to that of the state-of-the-art algorithms.
Real-time dense 3D object reconstruction using RGB-D sensor
In this paper, we propose a new algorithm for dense 3D object reconstruction using a RGB-D sensor at high rate. In order to obtain a dense shape recovery of a 3D object, an efficient merging of the current and incoming point clouds obtained with the Iterative Closest Point is suggested. As a result, incoming frames are aligned to the dense 3D model. The accuracy of the proposed 3D object reconstruction algorithm on real data is compared to that of the estate-of-the-art reconstruction algorithms.
Identification of Lasiodiplodia Theobromae in avocado trees through image processing and machine learning
The avocado is a fruit that grows in tropical and subtropical areas, very popular in the markets due to its great nutritional qualities and medicinal properties. The avocado is a plant of great commercial interest for Peru and Colombia, countries that export this fruit. This tree is affected by a wide variety of diseases reducing its production, even causing the death of the plant. The most frequent disease of the avocado tree in the production zone of Peru is caused by the fungus Lasiodiplodia Theobromae, which is characterized in its initial stage by producing a chancre around the stems and branches of the tree. Detection is commonly made by manual inspection of the plants by an expert, which makes it difficult to detect the fungus in extensive plantations. Therefore, in this work we present a semi-automatic method for the detection of this disease based on image processing and machine learning techniques. For this purpose, an acquisition protocol was defined. The identification of the disease was performed by taking as input pre-processed images of the tree branches. A learning technique was evaluated, based on a shallow CNN, obtaining 93% accuracy.
Fast VVC intra prediction mode decision based on block shapes
Since MPEG and VCEG jointly standardized the H.265/HEVC (High-Efficiency Video Coding) international standard in 2013, they have just completed H.266/VVC (Versatile Video Coding) [1] as the Final Draft International Standard (FDIS) in July 2020 through the Joint Video Experts Team (JVET) of ISO / IEC JTC1 / SC29 / WG11 MPEG and ITU-T SG16 / Q6 VCEG. VVC supports up to 87 intra coding modes including 65 general directional modes and 20 wide angular modes, which is increased more than twice compared to HEVC. VVC can accommodate not only more detailed intra prediction directions, but also so called wide angular modes which make its special sense in non-square coding blocks. The more detailed intra predictions naturally demand more computation in order to determine the most effective intra-prediction mode. In this sense, we investigate how to pre-prune the prediction candidate lists for a fast ISP intra mode decision and propose searching only a small number of prediction modes based on the shape of a block in the rate-distortion optimization (RDO)–based intra mode decision. The proposed method is verified to decrease the intra prediction processing time with only a little increase in the bit rate and a negligible reduction in PSNR values.
Spectropolarimetry diagnostics of cervical cytological smears for availability of papillomavirus
Olexander Peresunko, Sergey Yermolenko, Nina Horodynska
The purpose of the work was to demonstrate the possibility of optical diagnosis of cytological smears of the cervix for the presence of HPV using spectral and polarization methods. Comparison of cervical specimens with and without coilocytosis, irradiated with a range-shifted polarized radiation, showed significant differences in the values of linear dichroism and its spectral dependences. The difference between the coilocyte response characteristic in the range 395-415 nm was calculated using computer programs. As a result, the use of spectropolarization studies and the fluorescence method will improve the accuracy of patient selection for a costly procedure - high-carcinogenic DNA diagnostics of HPV by the standard method of polymerase chain reaction.
Differential diagnosis of adenocarcinoma and squamous cell carcinoma of the cervix by spectropolarimetry
Olexander Peresunko, Christina Felde, Sergey Yermolenko
The aim of this work is to improve the diagnosis of cervical cancer by introducing laser polarimetry and spectropolarization methods of investigation. We have proposed a novel approach for the differentiation of squamous cell carcinoma and cervical adenocarcinoma using laser optics. Obtained by the method of laser polarimetry by Stokes parameter S4 of native smear with adenocarcinoma of the cervix and squamous cell carcinoma, as well as scraping of the mud canal with endometrial adenocarcinoma and flat cell cancer allows to reliably differentiate the norm from cancer in the native smear and adenocarcinoma from the flat cell cancer in the smear-imprint. The method of spectropolarimetry allows reliably accurately distinguishing the normal epithelium of the cervix from cancer of the cervix, and the parameters of linear dichroism during the spectropolarization study, reliably (р=0,001) differentiate between normal, adenocarcinoma and flat cell cancer of the cervix.
Vector-parametric structure of polarization images of networks of biological crystals for differential diagnosis of inflammatory processes
For differential diagnosis of severity of the septic process, the method of mapping vector-parametric microscopic images of the polycrystalline component of histological sections of the internal organs of rats was used. A statistical analysis of the distribution of the number of characteristic values of the Stokes vector of polarizing microscopic images of histological sections of the internal organs of laboratory rats was carried out. The relationships between the values of statistical moments of the 1st - 2nd orders of magnitude characterizing the distribution of the number of characteristic values of the Stokes vector of polarizing microscopic images of histological sections of the internal organs of laboratory rats and the severity of the septic process are determined. Quantitative parameters that are most sensitive to septic conditions have been established, which ensure the implementation of statistically confidance differentiation of histological sections of the internal organs of laboratory rats. From the standpoint of evidence-based medicine, the operational characteristics of the diagnostic forces of the vector-parametric microscopy method are established with an estimate of the number of characteristic values of the fourth Stokes vector of polarization images of histological sections of the internal organs of laboratory rats.
IR spectrum comparison of the blood of breast cancer patients as a preliminary stage of further molecular genetic screening
Olexander Peresunko, Tatiana Kruk, Sergey Yermolenko
The aim of this paper is to study BRCA1 - 185delAg and 5382insC gene mutations in breast cancer patients and their relatives to identify new directions for the prevention, diagnosis and treatment of breast cancer. As diagnostic methods have been used ultraviolet spectrometry samples of blood plasma in the liquid state, infrared spectroscopy middle range (2,5 - 25 microns) dry residue of plasma polarization and laser diagnostic technique of thin histological sections of biological tissues. Obtained results showed that the use of spectrophotometry in the range of 1000-3000 cm-1 allowed to establish quantitative parameters of the plasma absorption rate of blood of patients in the third group in different ranges, which would allow in the future to conduct an express analysis of the patient's condition (procedure screening) for further molecular-genetic typing on BRCA I and II.
Multiparametric polarization histology in the detection of traumatic changes in the optical anisotropy of biological tissues
A. Litvinenko, M. Garazdyuk, V. Bachinsky, et al.
For a high-precision objective histological determination of the prescription of damage to internal organs over a long period of time, a systematic approach was used based on digital azimuthally invariant polarizing, Mueller-matrix and tomographic methods for studying temporary changes in the molecular and polycrystalline structure of brain, liver and kidney samples in the post-mortal period. It was revealed that a linear change in the magnitude of statistical moments of the 1st - 4th orders characterizing the distribution of data of digital azimuthally invariant polarizing, Mueller-matrix and tomographic methods is interconnected with the duration of damage to internal organs. On this basis, a new algorithm for digital histological determination of the prescription of the occurrence of damage is proposed. To determine the extent of damage, the method of azimuthally invariant polarization microscopy with different magnification of the image of histological sections of tissues of internal organs was used, which provided diagnostic relationships between changes in the magnitude of statistical moments of the 1st to 4th orders, which characterize the azimuth and elliptic polarization maps of digital microscopic images and time intervals of damage duration.
Digital processing of fluorimetry imaging of deep layers in the macula of the retina in diabetic macular edema
The method of vascular segmentation is considered as one of the main approaches to the creation of automated retinal analysis tools. Improved retinal image analysis that can be used for segmented vascular tree to calculate vessel diameter and tortuosity, differentiation of veins and arteries together with measurement of arteriovenous ratio. The algorithm of segmentation of the retinal vessels based on fuzzy clustering of c-means and the method of setting the level is proposed. Morphological processes, CLAHE, and appropriate image filtering techniques were used to enhance the picture before fuzzy clustering of vascular pixels. A method of segmentation on publicly available datasets that uses common validation metrics in retinal vessel segmentation is proposed.
Diffuse tomography of brain nerve tissue in the temporary monitoring of pathological changes in optical anisotropy
1. Development of a structural-logical scheme and experimental testing of methods and means of diffuse tomography new in forensic practice for reproducing fluctuations in the magnitude of linear and circular birefringence and dichroism of the polycrystalline structure of histological sections of the brain. 2. Experimental determination of a set of maps and histograms fluctuation distributions of linear (FFLB) and circular (FFCB) birefringence and dichroism (FALD and FACD) for differential diagnosis of traumatic hemorrhage, cerebral infarction ischemic and hemorrhagic genesis using diffuse tomography of polycrystalline structure of histological sections of the brain. 3. Information analysis to determine the operational characteristics of the force (sensitivity, specificity and balanced accuracy) of the diffuse tomography method.
Multichannel polarization sensing of polycrystalline blood films in the diagnosis of the causes of poisoning
Ya. Ivashkevich, O. Vanchulyak, V. Bachinsky, et al.
The results of a study of the effectiveness in the differential diagnosis of cases of alcohol and carbon monoxide poisoning by azimuthally invariant polarizing Mueller-matrix microscopy are presented:
  •  multichannel sounding with differently polarized laser beams of histological sections of the brain, myocardium, adrenal glands, liver and polycrystalline blood films of the dead and multichannel polarization filtering of a series of microscopic images with algorithmic determination of coordinate distributions (maps):
    1. 1. Muller-matrix invariants of linear birefringence of fibrillar networks (MMI LB); 2. Mueller-matrix invariants of circular birefringence of optically active molecular complexes (MMI CB);
  •  statistical differentiation of MMI LB and MMI CB cards with the optically anisotropic component of histological sections of the brain, myocardium, adrenal glands, liver and polycrystalline blood films of the dead due to IHD (control group), alcohol poisoning (study group 1) and carbon monoxide (study group 2);
  •  determination of operational characteristics (sensitivity, specificity and balanced accuracy) of the strength of the multidimensional Mueller-matrix microscopy method for histological sections of the brain, myocardium, adrenal glands, liver and polycrystalline blood films of the dead of all groups.
  • Azimuthally invariant Mueller-matrix tomography of the distribution of phase and amplitude anisotropy of biological tissues
    The use of physically sound and analytically determined algorithms for reconstructing parameters characterizing linear birefringence and dichroism of networks of biological crystals in differentiating changes in optical anisotropy associated with different degrees of severity of pathology - precancerous (atrophy and polyp endometrium) conditions of the cervix. Development and testing of the ―two-wave‖ method of the Muller-matrix reconstruction of parameter values characterizing the phase and amplitude anisotropy of polycrystalline films of bile and blood plasma for the diagnosis of systemic (type II diabetes) and oncological (breast cancer) diseases.
    Polarization-phase diagnostics of volume of blood loss
    N. Sivokorovskaya, V. Bachinsky, O. Vanchulyak, et al.
    The results of a statistical analysis of the distributions of the value of the fourth parameter of the Stokes vector (hereinafter the "phase parameter" - FP) of microscopic images of histological sections of kidney tissues with parenchymal morphological structure of the dead with varying degrees of blood loss are presented.
    LFDD: Light field image dataset for performance evaluation of objective quality metrics
    Adam Zizien, Karel Fliegel
    An increase in research activity around plenoptic content can be seen in recent years. As the communities around the different modalities grow, so does the demand for publicly available annotated datasets of suitable content. The datasets can be used for a multitude of purposes, such as to design novel compression algorithms, subjective evaluation methodologies or objective quality metrics. In this work, a new publicly available annotated light field image dataset is presented. The dataset consists of scenes corrupted by state-of-the-art image and video compression algorithms (JPEG, JPEG 2000, BPG, VP9, AV1, AVC, HEVC), noise, and geometric distortion. For the subjective evaluation of the included scenes, a modified version of the Double Stimulus Impairment Scale (DSIS) methodology was adopted. The views of each scene were organized into a pseudo-sequence and played to the observers as animations. The resulting subjective scores, together with additional data, are included in the dataset. The data can be used to evaluate the performance of currently used visual quality metrics as well as for the design of new ones.
    Noise phase singularities in noise contaminated images
    As it is well known, the vortices or phase singularities are optimal encoders for position marking. The pseudo-phase singularities information can be obtained from the complex signal resulting from Laguerre-Gauss filtering of the intensity image. Each singularity has its unique anisotropic core structures with different ellipticity and azimuth angles from each other. Indeed, the Poincaré sphere representation can be used to characterize the mentioned singularities. In optical vortex metrology the correct identification and the tracking of the complicated movement of pseudo-phase singularities is essential. In particular, it is relevant the singularity in the poles of the sphere. When the singularities are located at or near the equator of the sphere, the zero crossings of the real and imaginary planes cut in a straight line which makes their location difficult. These phase singularities near the equator tend to disappear when a transformation of the object, such as a deformation takes place. In this work, we analyze the number of vortices in analytical representation of a noise contaminated image. Different types of noise are analyzed (Gaussian, random, Poisson, etc.). We verified that the noise increases essentially the number of vortices at and near the equator of the Poincaré sphere.
    Automatic motion tracking system for analysis of insect behavior
    Darrin Gladman, Jehu Osegbe, Wookjin Choi, et al.
    We present a multi-object tracking system to track small insects such as ants and bees. Motion-based object tracking recognizes the movements of objects in videos using information extracted from the given video frames. We applied several computer vision techniques, such as blob detection and appearance matching, to track ants. Moreover, we discussed different object detection methodologies and investigated the various challenges of object detection, such as illumination variations and blob merge/split. The proposed system effectively tracked multiple objects in various environments.
    Image dehazing based on microscanning approach
    Sergei Voronin, Artyom Makovetskii, Vitaly Kober, et al.
    Over the past two decades, methods have been proposed for deaerating images, and most of them use a method of improving or restoring images. An image without haze should have a higher contrast than the original hazed image. It is possible remove haze by increasing the local contrast of the restored image. Some haze removal approaches estimate a hazed image from the observed hazed scene by solving an objective function whose parameters are adapted to the local statistics of the hazed image inside a moving window. Common image dehazing techniques use only one observed image for processing. Various variants of local adaptive algorithms for single image dehazing are known. A dehazing method based on spatially displaced sensors is also described. In this presentation, we propose a new dehazing algorithm that uses several scene images. Using a set of observed images, the dehazing of the image is carried out by solving a system of equations, which is derived from the optimization of the objective function. These images are made in such a way that they are spatially offset relative to each other and made in different time. Computer simulation results of are presented to illustrate the performance of the proposed algorithm for the restoration of hazed images.
    An efficient algorithm of total variation regularization in the two-dimensional case
    Artyom Makovetskii, Sergei Voronin, Vitaly Kober, et al.
    Denoising has numerous applications in communications, control, machine learning, and many other fields of engineering and science. Total variation (TV) regularization is a widely used technique for signal and image restoration. There are two types of TV regularization problem: anisotropic TV and isotropic TV. One of the key difficulties in the TV-based image denoising problem is the nonsmoothness of the TV norms. There are known exact solutions methods for 1D TV regularization problem. Strong and Chan derived exact solutions to TV regularization problem for onedimensional case. They obtained the exact solutions when the original noise-free function, noise and the regularization parameter are subject to special constraints. Davies and Kovac considered the problem as non-parametric regression with emphasis on controlling the number of local extrema, and in particular consider the run and taut string methods. Condat proposed a direct fast algorithm for searching the exact solutions to the one-dimensional TV regularization problem for discrete functions. In the 2D case, some methods are used to approximate exact solutions to the TV regularization problem. In this presentation, we propose a new approximation method for 2D TV regularization problem based on the fast exact 1D TV approach. Computer simulation results of are presented to illustrate the performance of the proposed algorithm for the image restoration.
    Neural network and non-rigid ICP in facial recognition problem
    Sergei Voronin, Artyom Makovetskii, Aleksei Voronin, et al.
    Human facial expressions describe a set of signals, which can be associated with mental states such as emotions depending on physiological conditions. There are many potential applications of expression recognition systems. They take into account about two hundred emotional states. Expression recognition is a challenging problem, not only due to the variety of expressions, but also due to difficulty in extraction of effective features from facial images. Depending on a 3D reconstruction technique, 3D data can be immune to a great range of illumination and texture variations, and they are no sensitive as 2D images to out-of-plane rotations. Moreover, 2D images may fail to capture subtle but discriminative change on the face if there is no sufficient change in brightness, such as bulges on the cheeks and protrusion of the lips. In fact, 3D data yield better recognition than conventional 2D data for many types of facial actions The most effective tool for solution of the problem of human face recognition is neural networks. But the result of recognition can be spoiled by facial expressions and other deviation from canonical face representation. In the proposed presentation we describe a resampling method of human faces represented by 3D point clouds. The method based on the non-rigid ICP (Iterative Closest Point) algorithm. We consider the combined using of this method and convolutional neural network (CNN) in the face recognition task. Computer simulation results are provided to illustrate the performance of the proposed algorithm.
    Multicriteria method of denoising and simplifying thermal images
    In Virtual Reality (VR), the necessity of immersive videos leads to greater challenges in compression and communication owing to the much higher spatial resolution, rapid, and the often real-time changes in viewing direction. Foveation in displays exploits the space-variant density of the retinal photoreceptors, which decreases exponentially with increasing eccentricity, to reduce the amount of data from the visual periphery. Foveated compression is gaining relevance and popularity for Virtual Reality. Likewise, being able to predict the quality of displayed foveated and compressed content has become more important. Towards advancing the development of objective quality assessment algorithms for foveated and compressed measurements of VR video contents, we built a new VR database of foveated/compressed videos, and conducted a human study of perceptual quality on it. A foveated video player having low motion-to-photon latency (~50ms) was designed to meet the requirements of smooth playback, while an eye tracker was deployed to provide gaze direction in real time. We generated 180 distorted videos from 10 pristine 8K videos (30fps) having varying levels and combinations of foveation and compression distortions. These contents were viewed and quality-rated by 36 subjects in a controlled VR setting. Both the subject ratings and the eye tracking data are being made available along with the rest of the database.
    Near-infrared image enhancement through multi-scale alpha-rooting processing for remote sensing application
    V. Voronin, E. Semenishchev, Yigang Cen, et al.
    We present a new near-infrared (NIR) image enhancement algorithm based on multi-scale alpha-rooting processing. The basic idea is to apply the α-rooting image enhancement approach for different image blocks. The parameter alpha for every block driven through optimization of measure of enhancement (EME). Also, we propose a fusion scheme for alpha-rooting enhancement and edge detection results. Some experimental results are presented to illustrate the performance of the proposed algorithm on the RGB-NIR Scene Data.
    3D image augmentation using neural style transfer and generative adversarial networks
    In this paper, we take the very first step in using Neural Style Transfer and Generative Adversarial Networks for the task of 3D image augmentation. With this approach, more data may be generated for object recognition and visualization purposes without having to fully reconstruct 3D objects. To the best of our knowledge, this is the first report that describes image augmentation in the 3D domain using Integral Imaging and Deep Learning. The author assumes that readers have some knowledge regarding Deep Learning.
    An approach for recognizing COVID-19 cases using convolutional neural networks applied to CT scan images
    Cuong Do, Lan Vu
    This study aims to investigate an automated approach using Convolutional Neural Network (CNN) to efficiently classify COVID-19 cases vs healthy cases using chest CT images. Convolutional Neural Network (CNN) is a class of deep neural networks, usually applied to analyzing image data, and can learn features effectively from images in comparison to the traditional method with image segmentation, feature extraction/selection and classification steps. Several models using pre-trained weights, including VGG16, VGG19, InceptionV3, InceptionResNetV2, Xception, DenseNet121, DenseNet169, and DenseNet201 were investigated. Overfitting was handled by randomly dropping nodes during training, augmenting training data, as well as using the validation set. We concluded that a CNN approach can detect COVID-19 using CT features, and DenseNet201is the highest performing model.
    Magnetic particle imaging system for solid particles quantification in pipelines
    Faisal Shehaz
    In this paper, submillimeter three dimensional tomography imaging of paramagnetic contaminants flow rate in multiphase flow pipelines is presented. The device, which is based on Magnetic Particle Imaging (MPI), consists of an array of twelve coils and a pair of permanent magnets and is not influenced with the other phases that constitute the crude oil (e.g. oil, water, sand, and gas) and which are mainly diamagnetic materials. The concentration of the paramagnetic particles can be measured in a three dimensional volumetric space with high spatial and temporal sensitivities which are proportional to the strength of the applied magnetic field. This is also influenced by the size and distribution of the particles and the anisotropy of the permanent magnet. To increase the sensitivity and improve the spatio-encoding field, a two dimensional Linear Field Scanning (LFS) technique coupled with a two dimensional excitation field is proposed. The results demonstrate that the technique would constitute a breakthrough in the area of solid flow measurements and imaging.
    An electrical capacitance tomography system for real-time process imaging
    Faisal Shehaz
    Solid contaminants have been found to be a major obstacle in assuring the quality of flow in pipelines. The presence of these substances leads to clogging of pipes that causes serious issues downstream when transporting oil or gas. This paper suggests an Electrical Capacitance Tomography (ECT) system for realtime measurement of solid contaminants in gas pipelines. It consists of a ring of eight electrodes evenly distributed in the circular cross section of the probe. The speed-up enhancement is achieved using a Field Programmable Gate Array (FPGA) for the post-processing part of the system to accelerate the intensive matrix multiplications which are required in the image reconstruction algorithm. Experimental results on field-collected solid contaminants demonstrate the capability of the system to build in real-time two dimensional cross section images of the contaminants while giving an estimated measurement of their concentration. This helps identify the flow regime of the contaminants in the pipeline, which is required to know their flow characteristics which helps mitigating their formation. Experimental results indicate that using Altera’s Stratix V FPGA, 305 KLEs are required to achieve image reconstruction throughput of up to 3,233 frames/s for image size of 64 x 64 pixels. Simulation results were also conducted using finite element method solver to assess the ECT probe for various image reconstruction algorithms (i.e Linear back projection, Landweber, and modified Landweber algorithms). The results indicate a good matching with the experimental results.