Joint denoising, demosaicing, and chromatic aberration correction for UHD video
Author(s):
Ljubomir Jovanov;
Wilfried Philips;
Klaas Jan Damstra;
Frank Ellenbroek
Show Abstract
High-resolution video capture is crucial for numerous applications such as surveillance, security, industrial inspection, medical imaging and digital entertainment. In the last two decades, we are witnessing a dramatic increase of the spatial resolution and the maximal frame rate of video capturing devices.
In order to achieve further resolution increase, numerous challenges will be facing us. Due to the reduced size of the pixel, the amount of light also reduces, leading to the increased noise level. Moreover, the reduced pixel size makes the lens imprecisions more pronounced, which especially applies to chromatic aberrations. Even in the case when high quality lenses are used some chromatic aberration artefacts will remain. Next, noise level additionally increases due to the higher frame rates.
To reduce the complexity and the price of the camera, one sensor captures all three colors, by relying on Color Filter Arrays. In order to obtain full resolution color image, missing color components have to be interpolated, i.e. demosaicked, which is more challenging than in the case of lower resolution, due to the increased noise and aberrations.
In this paper, we propose a new method, which jointly performs chromatic aberration correction, denoising and demosaicking. By jointly performing the reduction of all artefacts, we are reducing the overall complexity of the system and the introduction of new artefacts. In order to reduce possible flicker we also perform temporal video enhancement. We evaluate the proposed method on a number of publicly available UHD sequences and on sequences recorded in our studio.
A hardware architecture for real-time shadow removal in high-contrast video
Author(s):
Pablo Verdugo;
Jorge E. Pezoa;
Miguel Figueroa
Show Abstract
Broadcasting an outdoor sports event at daytime is a challenging task due to the high contrast that exists between areas in the shadow and light conditions within the same scene. Commercial cameras typically do not handle the high dynamic range of such scenes in a proper manner, resulting in broadcast streams with very little shadow detail. We propose a hardware architecture for real-time shadow removal in high-resolution video, which reduces the shadow effect and simultaneously improves shadow details. The algorithm operates only on the shadow portions of each video frame, thus improving the results and producing more realistic images than algorithms that operate on the entire frame, such as simplified Retinex and histogram shifting. The architecture receives an input in the RGB color space, transforms it into the YIQ space, and uses color information from both spaces to produce a mask of the shadow areas present in the image. The mask is then filtered using a connected components algorithm to eliminate false positives and negatives. The hardware uses pixel information at the edges of the mask to estimate the illumination ratio between light and shadow in the image, which is then used to correct the shadow area. Our prototype implementation simultaneously processes up to 7 video streams of 1920×1080 pixels at 60 frames per second on a Xilinx Kintex-7 XC7K325T FPGA.
Image quality assessment for determining efficacy and limitations of Super-Resolution Convolutional Neural Network (SRCNN)
Author(s):
Chris M. Ward;
Joshua Harguess;
Brendan Crabb;
Shibin Parameswaran
Show Abstract
Traditional metrics for evaluating the efficacy of image processing techniques do not lend themselves to under- standing the capabilities and limitations of modern image processing methods - particularly those enabled by deep learning. When applying image processing in engineering solutions, a scientist or engineer has a need to justify their design decisions with clear metrics. By applying blind/referenceless image spatial quality (BRISQUE), Structural SIMilarity (SSIM) index scores, and Peak signal-to-noise ratio (PSNR) to images before and after im- age processing, we can quantify quality improvements in a meaningful way and determine the lowest recoverable image quality for a given method.
The role of optical flow in automated quality assessment of full-motion video
Author(s):
Josh Harguess;
Scott Shafer;
Diego Marez
Show Abstract
In real-world video data, such as full-motion-video (FMV) taken from unmanned vehicles, surveillance systems,
and other sources, various corruptions to the raw data is inevitable. This can be due to the image acquisition
process, noise, distortion, and compression artifacts, among other sources of error. However, we desire methods
to analyze the quality of the video to determine whether the underlying content of the corrupted video can be
analyzed by humans or machines and to what extent. Previous approaches have shown that motion estimation,
or optical flow, can be an important cue in automating this video quality assessment. However, there are
many di↵erent optical flow algorithms in the literature, each with their own advantages and disadvantages. We
examine the e↵ect of the choice of optical flow algorithm (including baseline and state-of-the-art), on motionbased
automated video quality assessment algorithms.
Prediction of HDR quality by combining perceptually transformed display measurements with machine learning
Author(s):
Anustup Choudhury;
Suzanne Farrell;
Robin Atkins;
Scott Daly
Show Abstract
We present an approach to predict overall HDR display quality as a function of key HDR display parameters. We first performed subjective experiments on a high quality HDR display that explored five key HDR display parameters: maximum luminance, minimum luminance, color gamut, bit-depth and local contrast. Subjects rated overall quality for different combinations of these display parameters.
We explored two models | a physical model solely based on physically measured display characteristics and a perceptual model that transforms physical parameters using human vision system models. For the perceptual model, we use a family of metrics based on a recently published color volume model (IC
T-C
P), which consists of the PQ luminance non-linearity (ST2084) and LMS-based opponent color, as well as an estimate of the display point spread function. To predict overall visual quality, we apply linear regression and machine learning techniques such as Multilayer Perceptron, RBF and SVM networks. We use RMSE and Pearson/Spearman correlation coefficients to quantify performance. We found that the perceptual model is better at predicting subjective quality than the physical model and that SVM is better at prediction than linear regression. The significance and contribution of each display parameter was investigated. In addition, we found that combined parameters such as contrast do not improve prediction. Traditional perceptual models were also evaluated and we found that models based on the PQ non-linearity performed better.
Performance comparison of AV1, HEVC, and JVET video codecs on 360 (spherical) video
Author(s):
Pankaj Topiwala;
Wei Dai;
Madhu Krishnan;
Adeel Abbas;
Sandeep Doshi;
David Newman
Show Abstract
This paper compares the coding efficiency performance on 360 videos, of three software codecs: (a) AV1 video codec
from the Alliance for Open Media (AOM); (b) the HEVC Reference Software HM; and (c) the JVET JEM Reference
SW. Note that 360 video is especially challenging content, in that one codes full res globally, but typically looks locally
(in a viewport), which magnifies errors. These are tested in two different projection formats ERP and RSP, to check
consistency. Performance is tabulated for 1-pass encoding on two fronts: (1) objective performance based on end-to-end
(E2E) metrics such as SPSNR-NN, and WS-PSNR, currently developed in the JVET committee; and (2) informal
subjective assessment of static viewports. Constant quality encoding is performed with all the three codecs for an
unbiased comparison of the core coding tools. Our general conclusion is that under constant quality coding, AV1
underperforms HEVC, which underperforms JVET. We also test with rate control, where AV1 currently underperforms
the open source X265 HEVC codec. Objective and visual evidence is provided.
Verification testing of the compression performance of the HEVC screen content coding extensions
Author(s):
Gary J. Sullivan;
Vittorio A. Baroncini;
Haoping Yu;
Rajan L. Joshi;
Shan Liu;
Xiaoyu Xiu;
Jizheng Xu
Show Abstract
This paper reports on verification testing of the coding performance of the screen content coding (SCC) extensions of the High Efficiency Video Coding (HEVC) standard (Rec. ITU-T H.265 | ISO/IEC 23008-2 MPEG-H Part 2). The coding performance of HEVC screen content model (SCM) reference software is compared with that of the HEVC test model (HM) without the SCC extensions, as well as with the Advanced Video Coding (AVC) joint model (JM) reference software, for both lossy and mathematically lossless compression using All-Intra (AI), Random Access (RA), and Lowdelay B (LB) encoding structures and using similar encoding techniques. Video test sequences in 1920×1080 RGB 4:4:4, YCbCr 4:4:4, and YCbCr 4:2:0 colour sampling formats with 8 bits per sample are tested in two categories: “text and graphics with motion” (TGM) and “mixed” content. For lossless coding, the encodings are evaluated in terms of relative bit-rate savings. For lossy compression, subjective testing was conducted at 4 quality levels for each coding case, and the test results are presented through mean opinion score (MOS) curves. The relative coding performance is also evaluated in terms of Bjøntegaard-delta (BD) bit-rate savings for equal PSNR quality. The perceptual tests and objective metric measurements show a very substantial benefit in coding efficiency for the SCC extensions, and provided consistent results with a high degree of confidence. For TGM video, the estimated bit-rate savings ranged from 60–90% relative to the JM and 40–80% relative to the HM, depending on the AI/RA/LB configuration category and colour sampling format.
JPEG XS-based frame buffer compression inside HEVC for power-aware video compression
Author(s):
Alexandre Willème;
Antonin Descampe;
Gaël Rouvroy;
Pascal Pellegrin;
Benoit Macq
Show Abstract
With the emergence of Ultra-High Definition video, reference frame buffers (FBs) inside HEVC-like encoders and decoders have to sustain huge bandwidth. The power consumed by these external memory accesses accounts for a significant share of the codec’s total consumption. This paper describes a solution to significantly decrease the FB’s bandwidth, making HEVC encoder more suitable for use in power-aware applications. The proposed prototype consists in integrating an embedded lightweight, low-latency and visually lossless codec at the FB interface inside HEVC in order to store each reference frame as several compressed bitstreams. As opposed to previous works, our solution compresses large picture areas (ranging from a CTU to a frame stripe) independently in order to better exploit the spatial redundancy found in the reference frame. This work investigates two data reuse schemes namely Level-C and Level-D. Our approach is made possible thanks to simplified motion estimation mechanisms further reducing the FB’s bandwidth and inducing very low quality degradation. In this work, we integrated JPEG XS, the upcoming standard for lightweight low-latency video compression, inside HEVC. In practice, the proposed implementation is based on HM 16.8 and on XSM 1.1.2 (JPEG XS Test Model). Through this paper, the architecture of our HEVC with JPEG XS-based frame buffer compression is described. Then its performance is compared to HM encoder. Compared to previous works, our prototype provides significant external memory bandwidth reduction. Depending on the reuse scheme, one can expect bandwidth and FB size reduction ranging from 50% to 83.3% without significant quality degradation.
FastVDO enhancements of the AV1 codec and comparison to HEVC and JVET codecs
Author(s):
Pankaj Topiwala;
Wei Dai;
Madhu Krishnan
Show Abstract
This paper describes a study to investigate possible ways to improve the AV1 codec, in several directions, most particularly in the context of 10-bit HDR video content, and 8/10 bit image content. Applications to SDR video, and 360 content are discussed elsewhere. For HDR content, a data adaptive grading technique in conjunction with the AV1 codec is studied. For image content, lapped biorthogonal transforms for (near) lossless compression is studied. For scalability-type applications, we introduce advanced resampling filters which outperform current ones. It is asserted that useful improvements are possible in each of these categories. In particular, substantial value is offered in the coding of HDR content, very competitive with HEVC HDR10, in a coding framework offering backwards compatibility with SDR. We also provide a rudimentary comparison of AV1 to the standard HEVC as well as the developing JVET codecs.
Advanced single-stream HDR coding using JVET JEM with backward compatibility options
Author(s):
Pankaj Topiwala;
Wei Dai;
Madhu Krishnan
Show Abstract
This paper presents a state of the art approach in HDR/WCG video coding developed at FastVDO called FVHDR, and based on the JEM 6 Test Model of the Joint Exploration Team, a joint committee of ITU|ISO/IEC. A fully automatic adaptive video process that differs from a known HDR video processing chain (analogous to HDR10, herein called “anchor”) developed recently in the standards committee JVET, is used. FVHDR works entirely within the framework of JEM software model, but adds additional tools. These tools can become an integral part of a future video coding standard, or be extracted as additional pre- and post-processing chains. Reconstructed video sequences using FVHDR show a subjective visual quality superior to the output of the anchor. Moreover the resultant SDR content generated by the data adaptive grading process is backward compatible. Representative objective results for the system include: results for DE100, and PSNRL100, were -13.4%, and -3.8% respectively.
Novel inter and intra prediction tools under consideration for the emerging AV1 video codec
Author(s):
Urvang Joshi;
Debargha Mukherjee;
Jingning Han;
Yue Chen;
Sarah Parker;
Hui Su;
Angie Chiang;
Yaowu Xu;
Zoe Liu;
Yunqing Wang;
Jim Bankoski;
Chen Wang;
Emil Keyder
Show Abstract
Google started the WebM Project in 2010 to develop open source, royalty- free video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition codec AV1, in a consortium of major tech companies called the Alliance for Open Media, that achieves at least a generational improvement in coding efficiency over VP9. In this paper, we focus primarily on new tools in AV1 that improve the prediction of pixel blocks before transforms, quantization and entropy coding are invoked. Specifically, we describe tools and coding modes that improve intra, inter and combined inter-intra prediction. Results are presented on standard test sets.
Novel modes and adaptive block scanning order for intra prediction in AV1
Author(s):
Ofer Hadar;
Ariel Shleifer;
Debargha Mukherjee;
Urvang Joshi;
Itai Mazar;
Michael Yuzvinsky;
Nitzan Tavor;
Nati Itzhak;
Raz Birman
Show Abstract
The demand for streaming video content is on the rise and growing exponentially. Networks bandwidth is very costly and therefore there is a constant effort to improve video compression rates and enable the sending of reduced data volumes while retaining quality of experience (QoE). One basic feature that utilizes the spatial correlation of pixels for video compression is Intra-Prediction, which determines the codec’s compression efficiency. Intra prediction enables significant reduction of the Intra-Frame (I frame) size and, therefore, contributes to efficient exploitation of bandwidth. In this presentation, we propose new Intra-Prediction algorithms that improve the AV1 prediction model and provide better compression ratios. Two (2) types of methods are considered: )1( New scanning order method that maximizes spatial correlation in order to reduce prediction error; and )2( New Intra-Prediction modes implementation in AVI. Modern video coding standards, including AVI codec, utilize fixed scan orders in processing blocks during intra coding. The fixed scan orders typically result in residual blocks with high prediction error mainly in blocks with edges. This means that the fixed scan orders cannot fully exploit the content-adaptive spatial correlations between adjacent blocks, thus the bitrate after compression tends to be large. To reduce the bitrate induced by inaccurate intra prediction, the proposed approach adaptively chooses the scanning order of blocks according to criteria of firstly predicting blocks with maximum number of surrounding, already Inter-Predicted blocks. Using the modified scanning order method and the new modes has reduced the MSE by up to five (5) times when compared to conventional TM mode / Raster scan and up to two (2) times when compared to conventional CALIC mode / Raster scan, depending on the image characteristics (which determines the percentage of blocks predicted with Inter-Prediction, which in turn impacts the efficiency of the new scanning method). For the same cases, the PSNR was shown to improve by up to 7.4dB and up to 4 dB, respectively. The new modes have yielded 5% improvement in BD-Rate over traditionally used modes, when run on K-Frame, which is expected to yield ~1% of overall improvement.
Display of high dynamic range images under varying viewing conditions
Author(s):
Tim Borer
Show Abstract
Recent demonstrations of high dynamic range (HDR) television have shown that superb images are possible. With the emergence of an HDR television production standard (ITU-R Recommendation BT.2100) last year, HDR television production is poised to take off. However research to date has focused principally on HDR image display only under “dark” viewing conditions. HDR television will need to be displayed at varying brightness and under varying illumination (for example to view sport in daytime or on mobile devices). We know, from common practice with conventional TV, that the rendering intent (gamma) should change under brighter conditions, although this is poorly quantified. For HDR the need to render images under varying conditions is all the more acute. This paper seeks to explore the issues surrounding image display under varying conditions. It also describes how visual adaptation is affected by display brightness, surround illumination, screen size and viewing distance. Existing experimental results are presented and extended to try to quantify these effects. Using the experimental results it is described how HDR images may be displayed so that they are perceptually equivalent under different viewing conditions. A new interpretation of the experimental results is reported, yielding a new, luminance invariant model for the appropriate display “gamma”. In this way the consistency of HDR image reproduction should be improved, thereby better maintaining “creative intent” in television.
Spherical rotation orientation indication for HEVC and JEM coding of 360 degree video
Author(s):
Jill Boyce;
Qian Xu
Show Abstract
Omnidirectional (or "360 degree") video, representing a panoramic view of a spherical 360° ×180° scene, can be encoded using conventional video compression standards, once it has been projection mapped to a 2D rectangular format. Equirectangular projection format is currently used for mapping 360 degree video to a rectangular representation for coding using HEVC/JEM. However, video in the top and bottom regions of the image, corresponding to the "north pole" and "south pole" of the spherical representation, is significantly warped. We propose to perform spherical rotation of the input video prior to HEVC/JEM encoding in order to improve the coding efficiency, and to signal parameters in a supplemental enhancement information (SEI) message that describe the inverse rotation process recommended to be applied following HEVC/JEM decoding, prior to display. Experiment results show that up to 17.8% bitrate gain (using the WS-PSNR end-to-end metric) can be achieved for the Chairlift sequence using HM16.15 and 11.9% gain using JEM6.0, and an average gain of 2.9% for HM16.15 and 2.2% for JEM6.0.
Complexity and performance tradeoff for next generation video coding standard development (Conference Presentation)
Author(s):
Elena A. Alshina;
Kiho Choi;
Jeonghoon Park
Show Abstract
Joint Exploration Model (JEM) studies the next generation video coding standard potential. It demonstrates over 30% performance gain beyond HEVC. This paper provides tool-on and tool-off performance test results for 24 methods included into JEM. Overlap in the functionalities of those tools is discussed. Potential problems for mobile platform implementation are listed. Suggestion on standard development principals and tools selection for next generation video coding standard are made. Paper is intended to assist high quality Call for Proposal responses preparation.
Optimal design of encoding profiles for ABR streaming (Conference Presentation)
Author(s):
Yuriy A. Reznik;
Karl O. Lillevold;
Abhijith Jagannath;
Justin Greer;
Manish Rao
Show Abstract
We discuss a problem of optimal design of encoding profiles for adaptive bitrate (ABR) streaming applications.
We show, that under certain conditions and optimization targets, this problem becomes equivalent to the problem of quantization of random variable, which in this case is bandwidth of a communication channel between streaming server and the client. But using such reduction to a known information-theoretic problem, we immediately arrive at class of algorithms for solving this problem optimally. We illustrate effectiveness of our approach by examples of optimal encoding ladders designed for different networks and reproduction devices.
Specific techniques and models utilized in this paper include:
- modeling of SSIM-rate functions for modern video codecs (H.264, HEVC) and different content
- adaptation of SSIM (by using scaling & CSF-filteing) to account for different resolutions and reproduction settins
- SSIM - MOS scale mapping
- CDF models of typical communication networks (wireless, cable, WiFi, etc)
- algorithms for solving quantization problem (Lloyd-Max algorithms, analytic solutions, etc)
Performance comparison of AV1, JEM, VP9, and HEVC encoders
Author(s):
Dan Grois;
Tung Nguyen;
Detlev Marpe
Show Abstract
This work presents a performance evaluation of the current status of two distinct lines of development in future video coding technology: the so-called AV1 video codec of the industry-driven Alliance for Open Media (AOM) and the Joint Exploration Test Model (JEM), as developed and studied by the Joint Video Exploration Team (JVET) on Future Video Coding of ITU-T VCEG and ISO/IEC MPEG. As a reference, this study also includes reference encoders of the respective starting points of development, as given by the first encoder release of AV1/VP9 for the AOM-driven technology, and the HM reference encoder of the HEVC standard for the JVET activities. For a large variety of video sources ranging from UHD over HD to 360° content, the compression capability of the different video coding technology has been evaluated by using a Random Access setting along with the JVET common test conditions. As an outcome of this study, it was observed that the latest AV1 release achieved average bit-rate savings of ~17% relative to VP9 at the expense of a factor of ~117 in encoder run time. On the other hand, the latest JEM release provides an average bit-rate saving of ~30% relative to HM with a factor of ~10.5 in encoder run time. When directly comparing AV1 and JEM both for static quantization parameter settings, AV1 produces an average bit-rate overhead of more than 100% relative to JEM at the same objective reconstruction quality and, in addition, with a factor of ~2.7 in encoder run time. Even when operated in a two-pass ratecontrol mode, AV1 lags behind both the JEM and HM reference encoder with average bit-rate overheads of ~55% and ~9.5%, respectively, although the latter being configured along one-pass static quantization parameter settings.
JPEG XS, a new standard for visually lossless low-latency lightweight image compression
Author(s):
Antonin Descampe;
Joachim Keinert;
Thomas Richter;
Siegfried Fößel;
Gaël Rouvroy
Show Abstract
JPEG XS is an upcoming standard from the JPEG Committee (formally known as ISO/IEC SC29 WG1). It aims to provide an interoperable visually lossless low-latency lightweight codec for a wide range of applications including mezzanine compression in broadcast and Pro-AV markets. This requires optimal support of a wide range of implementation technologies such as FPGAs, CPUs and GPUs. Targeted use cases are professional video links, IP transport, Ethernet transport, real-time video storage, video memory buffers, and omnidirectional video capture and rendering. In addition to the evaluation of the visual transparency of the selected technologies, a detailed analysis of the hardware and software complexity as well as the latency has been done to make sure that the new codec meets the requirements of the above-mentioned use cases. In particular, the end-to-end latency has been constrained to a maximum of 32 lines. Concerning the hardware complexity, neither encoder nor decoder should require more than 50% of an FPGA similar to Xilinx Artix 7 or 25% of an FPGA similar to Altera Cyclon 5. This process resulted in a coding scheme made of an optional color transform, a wavelet transform, the entropy coding of the highest magnitude level of groups of coefficients, and the raw inclusion of the truncated wavelet coefficients. This paper presents the details and status of the standardization process, a technical description of the future standard, and the latest performance evaluation results.
Overview of the JPEG XS objective evaluation procedures
Author(s):
Alexandre Willème;
Thomas Richter;
Chris Rosewarne;
Benoit Macq
Show Abstract
JPEG XS is a standardization activity conducted by the Joint Photographic Experts Group (JPEG), formally known as ISO/IEC SC29 WG1 group that aims at standardizing a low-latency, lightweight and visually lossless video compression scheme. This codec is intended to be used in applications where image sequences would otherwise be transmitted or stored in uncompressed form, such as in live production (through SDI or IP transport), display links, or frame buffers. Support for compression ratios ranging from 2:1 to 6:1 allows significant bandwidth and power reduction for signal propagation. This paper describes the objective quality assessment procedures conducted as part of the JPEG XS standardization activity. Firstly, this paper discusses the objective part of the experiments that led to the technology selection during the 73th WG1 meeting in late 2016. This assessment consists of PSNR measurements after a single and multiple compression decompression cycles at various compression ratios. After this assessment phase, two proposals among the six responses to the CfP were selected and merged to form the first JPEG XS test model (XSM). Later, this paper describes the core experiments (CEs) conducted so far on the XSM. These experiments are intended to evaluate its performance in more challenging scenarios, such as insertion of picture overlays, robustness to frame editing, assess the impact of the different algorithmic choices, and also to measure the XSM performance using the HDR VDP metric.
New procedures to evaluate visually lossless compression for display systems
Author(s):
Dale F. Stolitzka;
Peter Schelkens;
Tim Bruylants
Show Abstract
Visually lossless image coding in isochronous display streaming or plesiochronous networks reduces link complexity and power consumption and increases available link bandwidth. A new set of codecs developed within the last four years promise a new level of coding quality, but require new techniques that are sufficiently sensitive to the small artifacts or color variations induced by this new breed of codecs. This paper begins with a summary of the new ISO/IEC 29170-2, a procedure for evaluation of lossless coding and reports the new work by JPEG to extend the procedure in two important ways, for HDR content and for evaluating the differences between still images, panning images and image sequences. ISO/IEC 29170-2 relies on processing test images through a well-defined process chain for subjective, forced-choice psychophysical experiments. The procedure sets an acceptable quality level equal to one just noticeable difference. Traditional image and video coding evaluation techniques, such as, those used for television evaluation have not proven sufficiently sensitive to the small artifacts that may be induced by this breed of codecs. In 2015, JPEG received new requirements to expand evaluation of visually lossless coding for high dynamic range images, slowly moving images, i.e., panning, and image sequences. These requirements are the basis for new amendments of the ISO/IEC 29170-2 procedures described in this paper. These amendments promise to be highly useful for the new content in television and cinema mezzanine networks. The amendments passed the final ballot in April 2017 and are on track to be published in 2018.
JPEG XS call for proposals subjective evaluations
Author(s):
David McNally;
Tim Bruylants;
Alexandre Willème;
Touradj Ebrahimi;
Peter Schelkens;
Benoit Macq
Show Abstract
In March 2016 the Joint Photographic Experts Group (JPEG), formally known as ISO/IEC SC29 WG1, issued
a call for proposals soliciting compression technologies for a low-latency, lightweight and visually transparent
video compression scheme. Within the JPEG family of standards, this scheme was denominated JPEG XS.
The subjective evaluation of visually lossless compressed video sequences at high resolutions and bit depths
poses particular challenges. This paper describes the adopted procedures, the subjective evaluation setup, the
evaluation process and summarizes the obtained results which were achieved in the context of the JPEG XS
standardization process.
High-speed low-complexity video coding with EDiCTius: a DCT coding proposal for JPEG XS
Author(s):
Thomas Richter;
Siegfried Fößel;
Joachim Keinert;
Christian Scherl
Show Abstract
In its 71th meeting, the JPEG committee issued a call for low complexity, high speed image coding, designed to address the needs of low-cost video-over-ip applications. As an answer to this call, Fraunhofer IIS and the Computing Center of the University of Stuttgart jointly developed an embedded DCT image codec requiring only minimal resources while maximizing throughput on FPGA and GPU implementations. Objective and subjective tests performed for the 73rd meeting confirmed its excellent performance and suitability for its purpose, and it was selected as one of the two key contributions for the development of a joined test model. In this paper, its authors describe the design principles of the codec, provide a high-level overview of the encoder and decoder chain and provide evaluation results on the test corpus selected by the JPEG committee.
Parallel efficient rate control methods for JPEG 2000
Author(s):
Miguel Á. Martínez-del-Amor;
Volker Bruns;
Heiko Sparenberg
Show Abstract
Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.
Lossless medical image compression through lightweight binary arithmetic coding
Author(s):
Joan Bartrina-Rapesta;
Victor Sanchez;
Joan Serra-Sagristà;
Michael W. Marcellin;
Francesc Aulí-Llinàs;
Ian Blanes
Show Abstract
A contextual lightweight arithmetic coder is proposed for lossless compression of medical imagery. Context definition uses causal data from previous symbols coded, an inexpensive yet efficient approach. To further reduce the computational cost, a binary arithmetic coder with fixed-length codewords is adopted, thus avoiding the normalization procedure common in most implementations, and the probability of each context is estimated through bitwise operations. Experimental results are provided for several medical images and compared against state-of-the-art coding techniques, yielding on average improvements between nearly 0.1 and 0.2 bps.
FBCOT: a fast block coding option for JPEG 2000
Author(s):
David Taubman;
Aous Naman;
Reji Mathew
Show Abstract
Based on the EBCOT algorithm, JPEG 2000 finds application in many fields, including high performance scientific, geospatial and video coding applications. Beyond digital cinema, JPEG 2000 is also attractive for low-latency video communications. The main obstacle for some of these applications is the relatively high computational complexity of the block coder, especially at high bit-rates. This paper proposes a drop-in replacement for the JPEG 2000 block coding algorithm, achieving much higher encoding and decoding throughputs, with only modest loss in coding efficiency (typically < 0.5dB). The algorithm provides only limited quality/SNR scalability, but offers truly reversible transcoding to/from any standard JPEG 2000 block bit-stream. The proposed FAST block coder can be used with EBCOT's post-compression RD-optimization methodology, allowing a target compressed bit-rate to be achieved even at low latencies, leading to the name FBCOT (Fast Block Coding with Optimized Truncation).
A new display stream compression standard under development in VESA
Author(s):
Natan Jacobson;
Vijayaraghavan Thirumalai;
Rajan Joshi;
James Goel
Show Abstract
The Advanced Display Stream Compression (ADSC) codec project is in development in response to a call for technologies from the Video Electronics Standards Association (VESA). This codec targets visually lossless compression of display streams at a high compression rate (typically 6 bits/pixel) for mobile/VR/HDR applications. Functionality of the ADSC codec is described in this paper, and subjective trials results are provided using the ISO 29170-2 testing protocol.
A novel projection for omni-directional video
Author(s):
Adeel Abbas;
David Newman
Show Abstract
Omnidirectional video coding typically involves mapping spherical image data onto a two-dimensional plane by means of a projection format. In this paper, we introduce and share results on a relatively new projection format called Rotated Sphere Projection (RSP). RSP uses two symmetric and perfectly continuous segments to represent sphere. It has a simple 3:2 aspect ratio, close proximity to sphere and a very simple mathematical representation that is same as Equirectangular Projection. Test results using JVET’s common testing conditions are also presented.
Intra prediction using face continuity in 360-degree video coding
Author(s):
Philippe Hanhart;
Yuwen He;
Yan Ye
Show Abstract
This paper presents a new reference sample derivation method for intra prediction in 360-degree video coding. Unlike the
conventional reference sample derivation method for 2D video coding, which uses the samples located directly above and
on the left of the current block, the proposed method considers the spherical nature of 360-degree video when deriving
reference samples located outside the current face to which the block belongs, and derives reference samples that are
geometric neighbors on the sphere. The proposed reference sample derivation method was implemented in the Joint
Exploration Model 3.0 (JEM-3.0) for the cubemap projection format. Simulation results for the all intra configuration
show that, when compared with the conventional reference sample derivation method, the proposed method gives, on
average, luma BD-rate reduction of 0.3% in terms of the weighted spherical PSNR (WS-PSNR) and spherical PSNR (SPSNR)
metrics.
Segment scheduling method for reducing 360° video streaming latency
Author(s):
Srinivas Gudumasu;
Eduardo Asbun;
Yong He;
Yan Ye
Show Abstract
360° video is an emerging new format in the media industry enabled by the growing availability of virtual reality devices. It provides the viewer a new sense of presence and immersion. Compared to conventional rectilinear video (2D or 3D), 360° video poses a new and difficult set of engineering challenges on video processing and delivery. Enabling comfortable and immersive user experience requires very high video quality and very low latency, while the large video file size poses a challenge to delivering 360° video in a quality manner at scale. Conventionally, 360° video represented in equirectangular or other projection formats can be encoded as a single standards-compliant bitstream using existing video codecs such as H.264/AVC or H.265/HEVC. Such method usually needs very high bandwidth to provide an immersive user experience. While at the client side, much of such high bandwidth and the computational power used to decode the video are wasted because the user only watches a small portion (i.e., viewport) of the entire picture. Viewport dependent 360°video processing and delivery approaches spend more bandwidth on the viewport than on non-viewports and are therefore able to reduce the overall transmission bandwidth. This paper proposes a dual buffer segment scheduling algorithm for viewport adaptive streaming methods to reduce latency when switching between high quality viewports in 360° video streaming. The approach decouples the scheduling of viewport segments and non-viewport segments to ensure the viewport segment requested matches the latest user head orientation. A base layer buffer stores all lower quality segments, and a viewport buffer stores high quality viewport segments corresponding to the most recent viewer’s head orientation. The scheduling scheme determines viewport requesting time based on the buffer status and the head orientation. This paper also discusses how to deploy the proposed scheduling design for various viewport adaptive video streaming methods. The proposed dual buffer segment scheduling method is implemented in an end-to-end tile based 360° viewports adaptive video streaming platform, where the entire 360° video is divided into a number of tiles, and each tile is independently encoded into multiple quality level representations. The client requests different quality level representations of each tile based on the viewer’s head orientation and the available bandwidth, and then composes all tiles together for rendering. The simulation results verify that the proposed dual buffer segment scheduling algorithm reduces the viewport switch latency, and utilizes available bandwidth more efficiently. As a result, a more consistent immersive 360° video viewing experience can be presented to the user.
An ROI multi-resolution compression method for 3D-HEVC
Author(s):
Chunli Ti;
Yudong Guan;
Guodong Xu;
Yidan Teng;
Xinyuan Miao
Show Abstract
3D High Efficiency Video Coding (3D-HEVC) provides a significant potential on increasing the compression ratio of multi-view RGB-D videos. However, the bit rate still rises dramatically with the improvement of the video resolution, which will bring challenges to the transmission network, especially the mobile network. This paper propose an ROI multi-resolution compression method for 3D-HEVC to better preserve the information in ROI on condition of limited bandwidth. This is realized primarily through ROI extraction and compression multi-resolution preprocessed video as alternative data according to the network conditions. At first, the semantic contours are detected by the modified structured forests to restrain the color textures inside objects. The ROI is then determined utilizing the contour neighborhood along with the face region and foreground area of the scene. Secondly, the RGB-D videos are divided into slices and compressed via 3D-HEVC under different resolutions for selection by the audiences and applications. Afterwards, the reconstructed low-resolution videos from 3D-HEVC encoder are directly up-sampled via Laplace transformation and used to replace the non-ROI areas of the high-resolution videos. Finally, the ROI multi-resolution compressed slices are obtained by compressing the ROI preprocessed videos with 3D-HEVC. The temporal and special details of non-ROI are reduced in the low-resolution videos, so the ROI will be better preserved by the encoder automatically. Experiments indicate that the proposed method can keep the key high-frequency information with subjective significance while the bit rate is reduced.
Key factors for a high-quality VR experience
Author(s):
Mary-Luc Champel;
Renaud Doré;
Nicolas Mollet
Show Abstract
For many years, Virtual Reality has been presented as a promising technology that could deliver a truly new experience to users. The media and entertainment industry is now investigating the possibility to offer a video-based VR 360 experience. Nevertheless, there is a substantial risk that VR 360 could have the same fate as 3DTV if it cannot offer more than just being the next fad. The present paper aims at presenting the various quality factors required for a high-quality VR experience. More specifically, this paper will focus on the main three VR quality pillars: visual, audio and immersion.
Discontinuity minimization for omnidirectional video projections
Author(s):
Elena Alshina;
Vladyslav Zakharchenko
Show Abstract
Advances in display technologies both for head mounted devices and television panels demand resolution increase beyond 4K for source signal in virtual reality video streaming applications. This poses a problem of content delivery trough a bandwidth limited distribution networks. Considering a fact that source signal covers entire surrounding space investigation reviled that compression efficiency may fluctuate 40% in average depending on origin selection at the conversion stage from 3D space to 2D projection. Based on these knowledge the origin selection algorithm for video compression applications has been proposed. Using discontinuity entropy minimization function projection origin rotation may be defined to provide optimal compression results. Outcome of this research may be applied across various video compression solutions for omnidirectional content.
Measuring quality of omnidirectional high dynamic range content
Author(s):
Anne-Flore Perrin;
Cambodge Bist;
Rémi Cozot;
Touradj Ebrahimi
Show Abstract
Although HDR content processing, coding and quality assessment have been largely addressed in the last few years, little to no work has been concentrating on how to assess quality in HDR for 360° or omnidirectional content. This paper is an attempt to answer to various questions in this direction. As a minimum, a new data set for 360° HDR content is proposed and a new methodology is designed to assess subjective quality of HDR 360° content when it is displayed on SDR HMD after applying various tone mapping operators. The results are then analyzed and conclusions are drawn.
True 3D digital holographic tomography for virtual reality applications
Author(s):
A. Downham;
U. Abeywickrema;
P. P. Banerjee
Show Abstract
Previously, a single CCD camera has been used to record holograms of an object while the object is rotated about a single axis to reconstruct a pseudo-3D image, which does not show detailed depth information from all perspectives. To generate a true 3D image, the object has to be rotated through multiple angles and along multiple axes. In this work, to reconstruct a true 3D image including depth information, a die is rotated along two orthogonal axes, and holograms are recorded using a Mach-Zehnder setup, which are subsequently numerically reconstructed. This allows for the generation of multiple images containing phase (i.e., depth) information. These images, when combined, create a true 3D image with depth information which can be exported to a Microsoft® HoloLens for true 3D virtual reality.
Forming intermediate spatial resolution of microscopy images for continuous zooming on multi-resolution processing system
Author(s):
Evan H. E. Putranto;
Tomohiro Suzuki;
Shin Usuki;
Kenjiro T. Miura
Show Abstract
Digital zooming especially on microscopy image has attempted to improve their quality of measurement into a better assessment. However, since the field of view of high-resolution image are not wide despite of the fact that high-resolution image has more information detail and low-resolution image has their merits which is bring a big picture of the whole structure, we need to observe the sample in any scale. This problem was been solved by developing dual-view of high and low images resolution1 but in a single interpolated images. The goal of this research is utilize multi-resolution images to develop smooth zooming magnification of microscopy image. In order to achieve smooth zooming magnification on different condition of the images, scheme process will be needed. First, we took a several spatial images of the same sample based on the different objective lens, author was used 4 objective lens which are 10×, 20×, 50× and 150× magnification. In this synthesize phase, we interpolate lower resolution image for synthesize purpose with the next higher resolution image of the sample. Second, continue to looking for the feature point of both images with SIFT feature point method until we synthesize both images. Third, author treat this synthesized image with discrete fourier transform (DFT) with low-pass filter as the same size with numerical aperture (NA) that was input on the first phase. Then the fourth phase is looping this processes until intermediate images are generated enough to be blend with pyramid blend method. In this article we also try to make a system that can arbitrarily generate intermediate image with hierarchical system.
Hough transform for clustered microcalcifications detection in full-field digital mammograms
Author(s):
A. Fanizzi;
T. M. A. Basile;
L. Losurdo;
N. Amoroso;
R. Bellotti;
U. Bottigli;
R. Dentamaro;
V. Didonna;
A. Fausto;
R. Massafra;
M. Moschetta;
P. Tamborra;
S. Tangaro;
D. La Forgia
Show Abstract
Many screening programs use mammography as principal diagnostic tool for detecting breast cancer at a very early stage. Despite the efficacy of the mammograms in highlighting breast diseases, the detection of some lesions is still doubtless for radiologists. In particular, the extremely minute and elongated salt-like particles of microcalcifications are sometimes no larger than 0.1 mm and represent approximately half of all cancer detected by means of mammograms. Hence the need for automatic tools able to support radiologists in their work. Here, we propose a computer assisted diagnostic tool to support radiologists in identifying microcalcifications in full (native) digital mammographic images. The proposed CAD system consists of a pre-processing step, that improves contrast and reduces noise by applying Sobel edge detection algorithm and Gaussian filter, followed by a microcalcification detection step performed by exploiting the circular Hough transform. The procedure performance was tested on 200 images coming from the Breast Cancer Digital Repository (BCDR), a publicly available database. The automatically detected clusters of microcalcifications were evaluated by skilled radiologists which asses the validity of the correctly identified regions of interest as well as the system error in case of missed clustered microcalcifications. The system performance was evaluated in terms of Sensitivity and False Positives per images (FPi) rate resulting comparable to the state-of-art approaches. The proposed model was able to accurately predict the microcalcification clusters obtaining performances (sensibility = 91.78% and FPi rate = 3.99) which favorably compare to other state-of-the-art approaches.
A multi-layer MRI description of Parkinson's disease
Author(s):
M. La Rocca;
N. Amoroso;
E. Lella;
R. Bellotti;
S. Tangaro
Show Abstract
Magnetic resonance imaging (MRI) along with complex network is currently one of the most widely adopted techniques for detection of structural changes in neurological diseases, such as Parkinson's Disease (PD). In this paper, we present a digital image processing study, within the multi-layer network framework, combining more classifiers to evaluate the informative power of the MRI features, for the discrimination of normal controls (NC) and PD subjects. We define a network for each MRI scan; the nodes are the sub-volumes (patches) the images are divided into and the links are defined using the Pearson's pairwise correlation between patches. We obtain a multi-layer network whose important network features, obtained with different feature selection methods, are used to feed a supervised multi-level random forest classifier which exploits this base of knowledge for accurate classification. Method evaluation has been carried out using T1 MRI scans of 354 individuals, including 177 PD subjects and 177 NC from the Parkinson's Progression Markers Initiative (PPMI) database. The experimental results demonstrate that the features obtained from multiplex networks are able to accurately describe PD patterns. Besides, also if a privileged scale for studying PD disease exists, exploring the informative content of more scales leads to a significant improvement of the performances in the discrimination between disease and healthy subjects. In particular, this method gives a comprehensive overview of brain regions statistically affected by the disease, an additional value to the presented study.
Machine learning for the assessment of Alzheimer's disease through DTI
Author(s):
Eufemia Lella;
Nicola Amoroso;
Roberto Bellotti;
Domenico Diacono;
Marianna La Rocca;
Tommaso Maggipinto;
Alfonso Monaco;
Sabina Tangaro
Show Abstract
Digital imaging techniques have found several medical applications in the development of computer aided detection systems, especially in neuroimaging. Recent advances in Diffusion Tensor Imaging (DTI) aim to discover biological markers for the early diagnosis of Alzheimer’s disease (AD), one of the most widespread neurodegenerative disorders. We explore here how different supervised classification models provide a robust support to the diagnosis of AD patients. We use DTI measures, assessing the structural integrity of white matter (WM) fiber tracts, to reveal patterns of disrupted brain connectivity. In particular, we provide a voxel-wise measure of fractional anisotropy (FA) and mean diffusivity (MD), thus identifying the regions of the brain mostly affected by neurodegeneration, and then computing intensity features to feed supervised classification algorithms. In particular, we evaluate the accuracy of discrimination of AD patients from healthy controls (HC) with a dataset of 80 subjects (40 HC, 40 AD), from the Alzheimer’s Disease Neurodegenerative Initiative (ADNI). In this study, we compare three state-of-the-art classification models: Random Forests, Naive Bayes and Support Vector Machines (SVMs). We use a repeated five-fold cross validation framework with nested feature selection to perform a fair comparison between these algorithms and evaluate the information content they provide. Results show that AD patterns are well localized within the brain, thus DTI features can support the AD diagnosis.
Association between MRI structural features and cognitive measures in pediatric multiple sclerosis
Author(s):
N. Amoroso;
R. Bellotti;
A. Fanizzi;
A. Lombardi;
A. Monaco;
M. Liguori;
L. Margari;
M. Simone;
R. G. Viterbo;
S. Tangaro
Show Abstract
Multiple sclerosis (MS) is an inflammatory and demyelinating disease associated with neurodegenerative processes that lead to brain structural changes. The disease affects mostly young adults, but 3–5% of cases has a pediatric onset (POMS). Magnetic Resonance Imaging (MRI) is generally used for diagnosis and follow-up in MS patients, however the most common MRI measures (e.g. new or enlarging T2-weighted lesions, T1-weighted gadolinium- enhancing lesions) have often failed as surrogate markers of MS disability and progression. MS is clinically heterogenous with symptoms that can include both physical changes (such as visual loss or walking difficulties) and cognitive impairment. 30–50% of POMS experience prominent cognitive dysfunction. In order to investigate the association between cognitive measures and brain morphometry, in this work we present a fully automated pipeline for processing and analyzing MRI brain scans. Relevant anatomical structures are segmented with FreeSurfer; besides, statistical features are computed. Thus, we describe the data referred to 12 patients with early POMS (mean age at MRI: 15.5 ± 2.7 years) with a set of 181 structural features. The major cognitive abilities measured are verbal and visuo-spatial learning, expressive language and complex attention. Data was collected at the Department of Basic Sciences, Neurosciences and Sense Organs, University of Bari, and exploring different abilities like the verbal and visuo-spatial learning, expressive language and complex attention. Different regression models and parameter configurations are explored to assess the robustness of the results, in particular Generalized Linear Models, Bayes Regression, Random Forests, Support Vector Regression and Artificial Neural Networks are discussed.
Brain's tumor image processing using shearlet transform
Author(s):
Luis Cadena;
Nikolai Espinosa;
Franklin Cadena;
Anna Korneeva;
Alexey Kruglyakov;
Alexander Legalov;
Alexey Romanenko;
Alexander Zotin
Show Abstract
Brain tumor detection is well known research area for medical and computer scientists. In last decades there has been much research done on tumor detection, segmentation, and classification. Medical imaging plays a central role in the diagnosis of brain tumors and nowadays uses methods non-invasive, high-resolution techniques, especially magnetic resonance imaging and computed tomography scans. Edge detection is a fundamental tool in image processing, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image has discontinuities. Shearlets is the most successful frameworks for the efficient representation of multidimensional data, capturing edges and other anisotropic features which frequently dominate multidimensional phenomena. The paper proposes an improved brain tumor detection method by automatically detecting tumor location in MR images, its features are extracted by new shearlet transform.
Demyelinating and ischemic brain diseases: detection algorithm through regular magnetic resonance images
Author(s):
D. Castillo;
René Samaniego;
Y. Jiménez;
L. Cuenca;
O. Vivanco;
M. J. Rodríguez-Álvarez
Show Abstract
This work presents the advance to development of an algorithm for automatic detection of demyelinating lesions and cerebral ischemia through magnetic resonance images, which have contributed in paramount importance in the diagnosis of brain diseases. The sequences of images to be used are T1, T2, and FLAIR.
Brain demyelination lesions occur due to damage of the myelin layer of nerve fibers; and therefore this deterioration is the cause of serious pathologies such as multiple sclerosis (MS), leukodystrophy, disseminated acute encephalomyelitis. Cerebral or cerebrovascular ischemia is the interruption of the blood supply to the brain, thus interrupting; the flow of oxygen and nutrients needed to maintain the functioning of brain cells. The algorithm allows the differentiation between these lesions.
Analysis of breast thermograms for ROI extraction and description using mathematical morphology
Author(s):
O. A. Zermeño-Loreto;
C. Toxqui-Quitl;
E. E. Orozco Guillén;
A. Padilla-Vivanco
Show Abstract
The detection of a temperature increase or hot spots in breast thermograms can be related with high metabolic activity of disease cells. Image processing algorithms to seek mainly temperature increases above 3°C which have a high probability of being a malignancy are proposed. Also a derivative operator is used to highlights breast regions of interest (ROI). In order to determinate a medical alert, a feature descriptor of the ROI is constructed using its maximum temperature, maximum increase of temperature, sector/quadrant position in the breast, and area. The proposed algorithms are tested in a home database and a public database for mastology research.
Weighted bi-prediction for light field image coding
Author(s):
Caroline Conti;
Paulo Nunes;
Luís Ducla Soares
Show Abstract
Light field imaging based on a single-tier camera equipped with a microlens array – also known as integral, holoscopic, and plenoptic imaging – has currently risen up as a practical and prospective approach for future visual applications and services. However, successfully deploying actual light field imaging applications and services will require developing adequate coding solutions to efficiently handle the massive amount of data involved in these systems. In this context, self-similarity compensated prediction is a non-local spatial prediction scheme based on block matching that has been shown to achieve high efficiency for light field image coding based on the High Efficiency Video Coding (HEVC) standard. As previously shown by the authors, this is possible by simply averaging two predictor blocks that are jointly estimated from a causal search window in the current frame itself, referred to as self-similarity bi-prediction. However, theoretical analyses for motion compensated bi-prediction have suggested that it is still possible to achieve further rate-distortion performance improvements by adaptively estimating the weighting coefficients of the two predictor blocks.
Therefore, this paper presents a comprehensive study of the rate-distortion performance for HEVC-based light field image coding when using different sets of weighting coefficients for self-similarity bi-prediction. Experimental results demonstrate that it is possible to extend the previous theoretical conclusions to light field image coding and show that the proposed adaptive weighting coefficient selection leads to up to 5 % of bit savings compared to the previous self-similarity bi-prediction scheme.
A new framework for interactive quality assessment with application to light field coding
Author(s):
Irene Viola;
Touradj Ebrahimi
Show Abstract
In recent years, light field has experienced a surge of popularity, mainly due to the recent advances in acquisition and rendering technologies that have made it more accessible to the public. Thanks to image-based rendering techniques, light field contents can be rendered in real time on common 2D screens, allowing virtual navigation through the captured scenes in an interactive fashion. However, this richer representation of the scene poses the problem of reliable quality assessments for light field contents. In particular, while subjective methodologies that enable interaction have already been proposed, no work has been done on assessing how users interact with light field contents. In this paper, we propose a new framework to subjectively assess the quality of light field contents in an interactive manner and simultaneously track users behaviour. The framework is successfully used to perform subjective assessment of two coding solutions. Moreover, statistical analysis performed on the results shows interesting correlation between subjective scores and average interaction time.
Liborg: a lidar-based robot for efficient 3D mapping
Author(s):
Michiel Vlaminck;
Hiep Luong;
Wilfried Philips
Show Abstract
In this work we present Liborg, a spatial mapping and localization system that is able to acquire 3D models on the y using data originated from lidar sensors. The novelty of this work is in the highly efficient way we deal with the tremendous amount of data to guarantee fast execution times while preserving sufficiently high accuracy. The proposed solution is based on a multi-resolution technique based on octrees. The paper discusses and evaluates the main benefits of our approach including its efficiency regarding building and updating the map and its compactness regarding compressing the map. In addition, the paper presents a working prototype consisting of a robot equipped with a Velodyne Lidar Puck (VLP-16) and controlled by a Raspberry Pi serving as an independent acquisition platform.
On the performance of metrics to predict quality in point cloud representations
Author(s):
Evangelos Alexiou;
Touradj Ebrahimi
Show Abstract
Point clouds are a promising alternative for immersive representation of visual contents. Recently, an increased interest has been observed in the acquisition, processing and rendering of this modality. Although subjective and objective evaluations are critical in order to assess the visual quality of media content, they still remain open problems for point cloud representation. In this paper we focus our efforts on subjective quality assessment of point cloud geometry, subject to typical types of impairments such as noise corruption and compression-like distortions. In particular, we propose a subjective methodology that is closer to real-life scenarios of point cloud visualization. The performance of the state-of-the-art objective metrics is assessed by considering the subjective scores as the ground truth. Moreover, we investigate the impact of adopting different test methodologies by comparing them. Advantages and drawbacks of every approach are reported, based on statistical analysis. The results and conclusions of this work provide useful insights that could be considered in future experimentation.
A new similarity measure for complex amplitude holographic data
Author(s):
Ayyoub Ahar;
Tobias Birnbaum;
Christian Jaeh;
Peter Schelkens
Show Abstract
In this research, we have adapted our recently proposed Versatile Similarity Measure (VSM) for holographic data analysis. This new measure benefits from nice mathematical properties like boundedness to [0;1], relative error weighting based on the magnitudes of the signals, steerable similarity between original and negative phase; symmetry with respect to ordering of the arguments and the regularity of at least a continuous function. Utilizing its versatile design, here we present a set of VSM constructions specifically tailored to best fit the characteristics of complex wavefield of holograms. Also performance analysis results are provided by comparing the proposed constructions as fast, stand-alone perceptual quality predictors to few available competitors of the field, namely MSE and the average SSIM of the real and imaginary parts of holograms. Comparing their visual quality prediction scores with the mean opinion scores (MOS) of the hologram reconstructions shows a significant gain for all of the VSM constructions proposed in this paper, paving the way towards designing highly efficient perceptual quality predictors for holographic data in the future and also representing the potential of utilizing VSM for other applications working with complex valued data as well.
Computer-generated holographic near-eye display system based on LCoS phase only modulator
Author(s):
Peng Sun;
Shengqian Chang;
Siman Zhang;
Ting Xie;
Huaye Li;
Siqi Liu;
Chang Wang;
Xiao Tao;
Zhenrong Zheng
Show Abstract
Augmented reality (AR) technology has been applied in various areas, such as large-scale manufacturing, national defense, healthcare, movie and mass media and so on. An important way to realize AR display is using computer-generated hologram (CGH), which is accompanied by low image quality and heavy computing defects. Meanwhile, the diffraction of Liquid Crystal on Silicon (LCoS) has a negative effect on image quality. In this paper, a modified algorithm based on traditional Gerchberg-Saxton (GS) algorithm was proposed to improve the image quality, and new method to establish experimental system was used to broaden field of view (FOV). In the experiment, undesired zero-order diffracted light was eliminated and high definition 2D image was acquired with FOV broadened to 36.1 degree. We have also done some pilot research in 3D reconstruction with tomography algorithm based on Fresnel diffraction. With the same experimental system, experimental results demonstrate the feasibility of 3D reconstruction. These modifications are effective and efficient, and may provide a better solution in AR realization.
3D+T motion analysis with nanosensors
Author(s):
Jean-Pierre Leduc
Show Abstract
This paper addresses the problem of motion analysis performed in a signal sampled on an irregular grid spread in 3-dimensional space and time (3D+T). Nanosensors can be randomly scattered in the field to form a “sensor network”. Once released, each nanosensor transmits at its own fixed pace information which corresponds to some physical variable measured in the field. Each nanosensor is supposed to have a limited lifetime given by a Poisson-exponential distribution after release. The motion analysis is supported by a model based on a Lie group called the Galilei group that refers to the actual mechanics that takes place on some given geometry. The Galilei group has representations in the Hilbert space of the captured signals. Those representations have the properties to be unitary, irreducible and square-integrable and to enable the existence of admissible continuous wavelets fit for motion analysis. The motion analysis can be considered as a so-called “inverse problem” where the physical model is inferred to estimate the kinematical parameters of interest. The estimation of the kinematical parameters is performed by a gradient algorithm. The gradient algorithm extends in the trajectory determination. Trajectory computation is related to a Lagrangian-Hamiltonian formulation and fits into a neuro-dynamic programming approach that can be implemented in the form of a Q-learning algorithm. Applications relevant for this problem can be found in medical imaging, Earth science, military, and neurophysiology.
Low-complexity object detection with deep convolutional neural network for embedded systems
Author(s):
Subarna Tripathi;
Byeongkeun Kang;
Gokce Dane;
Truong Nguyen
Show Abstract
We investigate low-complexity convolutional neural networks (CNNs) for object detection for embedded vision applications. It is well-known that consolidation of an embedded system for CNN-based object detection is more challenging due to computation and memory requirement comparing with problems like image classification. To achieve these requirements, we design and develop an end-to-end TensorFlow (TF)-based fully-convolutional deep neural network for generic object detection task inspired by one of the fastest framework, YOLO.1 The proposed network predicts the localization of every object by regressing the coordinates of the corresponding bounding box as in YOLO. Hence, the network is able to detect any objects without any limitations in the size of the objects. However, unlike YOLO, all the layers in the proposed network is fully-convolutional. Thus, it is able to take input images of any size. We pick face detection as an use case. We evaluate the proposed model for face detection on FDDB dataset and Widerface dataset. As another use case of generic object detection, we evaluate its performance on PASCAL VOC dataset. The experimental results demonstrate that the proposed network can predict object instances of different sizes and poses in a single frame. Moreover, the results show that the proposed method achieves comparative accuracy comparing with the state-of-the-art CNN-based object detection methods while reducing the model size by 3× and memory-BW by 3 − 4× comparing with one of the best real-time CNN-based object detectors, YOLO. Our 8-bit fixed-point TF-model provides additional 4× memory reduction while keeping the accuracy nearly as good as the floating-point model. Moreover, the fixed- point model is capable of achieving 20× faster inference speed comparing with the floating-point model. Thus, the proposed method is promising for embedded implementations.
An embedded system for face classification in infrared video using sparse representation
Author(s):
Antonio Saavedra M.;
Jorge E. Pezoa;
Payman Zarkesh-Ha;
Miguel Figueroa
Show Abstract
We propose a platform for robust face recognition in Infrared (IR) images using Compressive Sensing (CS). In line with CS theory, the classification problem is solved using a sparse representation framework, where test images are modeled by means of a linear combination of the training set. Because the training set constitutes an over-complete dictionary, we identify new images by finding their sparsest representation based on the training set, using standard l1-minimization algorithms. Unlike conventional face-recognition algorithms, we feature extraction is performed using random projections with a precomputed binary matrix, as proposed in the CS literature. This random sampling reduces the effects of noise and occlusions such as facial hair, eyeglasses, and disguises, which are notoriously challenging in IR images. Thus, the performance of our framework is robust to these noise and occlusion factors, achieving an average accuracy of approximately 90% when the UCHThermalFace database is used for training and testing purposes. We implemented our framework on a high-performance embedded digital system, where the computation of the sparse representation of IR images was performed by a dedicated hardware using a deeply pipelined architecture on an Field-Programmable Gate Array (FPGA).
BDVC (Bimodal Database of Violent Content): A database of violent audio and video
Author(s):
Jose Luis Rivera Martínez;
Mario Humberto Mijes Cruz;
Manuel Antonio Rodríguez Vázqu;
Luis Rodríguez Espejo;
Abraham Montoya Obeso;
Mireya Saraí García Vázquez;
Alejandro Álvaro Ramírez Acosta
Show Abstract
Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.
Dynamic frame resizing with convolutional neural network for efficient video compression
Author(s):
Jaehwan Kim;
Youngo Park;
Kwang Pyo Choi;
JongSeok Lee;
Sunyoung Jeon;
JeongHoon Park
Show Abstract
In the past, video codecs such as vc-1 and H.263 used a technique to encode reduced-resolution video and restore original resolution from the decoder for improvement of coding efficiency. The techniques of vc-1 and H.263 Annex Q are called dynamic frame resizing and reduced-resolution update mode, respectively. However, these techniques have not been widely used due to limited performance improvements that operate well only under specific conditions. In this paper, video frame resizing (reduced/restore) technique based on machine learning is proposed for improvement of coding efficiency. The proposed method features video of low resolution made by convolutional neural network (CNN) in encoder and reconstruction of original resolution using CNN in decoder. The proposed method shows improved subjective performance over all the high resolution videos which are dominantly consumed recently. In order to assess subjective quality of the proposed method, Video Multi-method Assessment Fusion (VMAF) which showed high reliability among many subjective measurement tools was used as subjective metric. Moreover, to assess general performance, diverse bitrates are tested. Experimental results showed that BD-rate based on VMAF was improved by about 51% compare to conventional HEVC. Especially, VMAF values were significantly improved in low bitrate. Also, when the method is subjectively tested, it had better subjective visual quality in similar bit rate.
Application of multi-scale segmentation algorithms for high resolution remote sensing image
Author(s):
Tingting Zhou;
Lingjia Gu;
Ruizhi Ren
Show Abstract
In recent decades, with the rapid development of remote sensing technology, high resolution remote sensing images have been widely used in various fields due to their characteristics, such as rich spectral information and complex texture information. As a key step in the feature extraction, multi-scale image segmentation algorithm has been a hotspot currently. The traditional image segmentation is based on pixels, which only takes the spectral information of pixel into account, and ignores the texture, spatial information and contextual relation of the objects in the image. The experimental high resolution remote sensing images are from GF-2 and the features of the experimental data are obvious, the edges are clear. By using the statistical region merging (SRM) algorithm, the fractal net evolution approach (FNEA) algorithm and the unsupervised multi-scale segmentation of color images (UMSC) algorithm, this paper analyzes the segmentation effects of three multi-scale segmentation algorithms on the optimal scale and on the same segmentation scale respectively. The experimental results under the optimal scale and the same segmentation scale show that the SRM algorithm outperforms the UMSC algorithm, and UMSC algorithm outperforms the FENA algorithm in multi-scale segmentation.
Blind image quality evaluation using the conditional histogram patterns of divisive normalization transform coefficients
Author(s):
Ying Chu;
Xuanqin Mou;
Hengyong Yu
Show Abstract
A novel code book based framework for blind image quality assessment is developed. The code words are designed according to the image pattern of joint conditional histograms among neighboring divisive normalization transform coefficients in degraded images. By extracting high dimensional perceptual features from different subjective score levels in the sample database, and by clustering the features to their centroids, the conditional histogram based code book is constructed. Objective image quality score is calculated by comparing the distances between extracted features and the code words. Experiments are performed on most current databases, and the results confirm the effectiveness and feasibility of the proposed approach.
A research of road centerline extraction algorithm from high resolution remote sensing images
Author(s):
Yushan Zhang;
Tingfa Xu
Show Abstract
Satellite remote sensing technology has become one of the most effective methods for land surface monitoring in recent years, due to its advantages such as short period, large scale and rich information. Meanwhile, road extraction is an important field in the applications of high resolution remote sensing images. An intelligent and automatic road extraction algorithm with high precision has great significance for transportation, road network updating and urban planning. The fuzzy c-means (FCM) clustering segmentation algorithms have been used in road extraction, but the traditional algorithms did not consider spatial information. An improved fuzzy C-means clustering algorithm combined with spatial information (SFCM) is proposed in this paper, which is proved to be effective for noisy image segmentation. Firstly, the image is segmented using the SFCM. Secondly, the segmentation result is processed by mathematical morphology to remover the joint region. Thirdly, the road centerlines are extracted by morphology thinning and burr trimming. The average integrity of the centerline extraction algorithm is 97.98%, the average accuracy is 95.36% and the average quality is 93.59%. Experimental results show that the proposed method in this paper is effective for road centerline extraction.
Siamese convolutional networks for tracking the spine motion
Author(s):
Yuan Liu;
Xiubao Sui;
Yicheng Sun;
Chengwei Liu;
Yong Hu
Show Abstract
Deep learning models have demonstrated great success in various computer vision tasks such as image classification and object tracking. However, tracking the lumbar spine by digitalized video fluoroscopic imaging (DVFI), which can quantitatively analyze the motion mode of spine to diagnose lumbar instability, has not yet been well developed due to the lack of steady and robust tracking method. In this paper, we propose a novel visual tracking algorithm of the lumbar vertebra motion based on a Siamese convolutional neural network (CNN) model. We train a full-convolutional neural network offline to learn generic image features. The network is trained to learn a similarity function that compares the labeled target in the first frame with the candidate patches in the current frame. The similarity function returns a high score if the two images depict the same object. Once learned, the similarity function is used to track a previously unseen object without any adapting online. In the current frame, our tracker is performed by evaluating the candidate rotated patches sampled around the previous frame target position and presents a rotated bounding box to locate the predicted target precisely. Results indicate that the proposed tracking method can detect the lumbar vertebra steadily and robustly. Especially for images with low contrast and cluttered background, the presented tracker can still achieve good tracking performance. Further, the proposed algorithm operates at high speed for real time tracking.
Real-time heart rate measurement for multi-people using compressive tracking
Author(s):
Lingling Liu;
Yuejin Zhao;
Ming Liu;
Lingqin Kong;
Liquan Dong;
Feilong Ma;
Zongguang Pang;
Zhi Cai;
Yachu Zhang;
Peng Hua;
Ruifeng Yuan
Show Abstract
The rise of aging population has created a demand for inexpensive, unobtrusive, automated health care solutions. Image PhotoPlethysmoGraphy(IPPG) aids in the development of these solutions by allowing for the extraction of physiological signals from video data. However, the main deficiencies of the recent IPPG methods are non-automated, non-real-time and susceptible to motion artifacts(MA). In this paper, a real-time heart rate(HR) detection method for multiple subjects simultaneously was proposed and realized using the open computer vision(openCV) library, which consists of getting multiple subjects’ facial video automatically through a Webcam, detecting the region of interest (ROI) in the video, reducing the false detection rate by our improved Adaboost algorithm, reducing the MA by our improved compress tracking(CT) algorithm, wavelet noise-suppression algorithm for denoising and multi-threads for higher detection speed. For comparison, HR was measured simultaneously using a medical pulse oximetry device for every subject during all sessions. Experimental results on a data set of 30 subjects show that the max average absolute error of heart rate estimation is less than 8 beats per minute (BPM), and the processing speed of every frame has almost reached real-time: the experiments with video recordings of ten subjects under the condition of the pixel resolution of 600× 800 pixels show that the average HR detection time of 10 subjects was about 17 frames per second (fps).
Vision-based mobile robot navigation through deep convolutional neural networks and end-to-end learning
Author(s):
Yachu Zhang;
Yuejin Zhao;
Ming Liu;
Liquan Dong;
Lingqin Kong;
Lingling Liu
Show Abstract
In contrast to humans, who use only visual information for navigation, many mobile robots use laser scanners and ultrasonic sensors along with vision cameras to navigate. This work proposes a vision-based robot control algorithm based on deep convolutional neural networks. We create a large 15-layer convolutional neural network learning system and achieve the advanced recognition performance. Our system is trained from end to end to map raw input images to direction in supervised mode. The images of data sets are collected in a wide variety of weather conditions and lighting conditions. Besides, the data sets are augmented by adding Gaussian noise and Salt-and-pepper noise to avoid overfitting. The algorithm is verified by two experiments, which are line tracking and obstacle avoidance. The line tracking experiment is proceeded in order to track the desired path which is composed of straight and curved lines. The goal of obstacle avoidance experiment is to avoid the obstacles indoor. Finally, we get 3.29% error rate on the training set and 5.1% error rate on the test set in the line tracking experiment, 1.8% error rate on the training set and less than 5% error rate on the test set in the obstacle avoidance experiment. During the actual test, the robot can follow the runway centerline outdoor and avoid the obstacle in the room accurately. The result confirms the effectiveness of the algorithm and our improvement in the network structure and train parameters
A locally adaptive algorithm for shadow correction in color images
Author(s):
Victor Karnaukhov;
Vitaly Kober
Show Abstract
The paper deals with correction of color images distorted by spatially nonuniform illumination. A serious distortion occurs in real conditions when a part of the scene containing 3D objects close to a directed light source is illuminated much brighter than the rest of the scene. A locally-adaptive algorithm for correction of shadow regions in color images is proposed. The algorithm consists of segmentation of shadow areas with rank-order statistics followed by correction of nonuniform illumination with human visual perception approach. The performance of the proposed algorithm is compared to that of common algorithms for correction of color images containing shadow regions.
Tracking of multiple objects with time-adjustable composite correlation filters
Author(s):
Alexey Ruchay;
Vitaly Kober;
Ilya Chernoskulov
Show Abstract
An algorithm for tracking of multiple objects in video based on time-adjustable adaptive composite correlation filtering is proposed. For each frame a bank of composite correlation filters are designed in such a manner to provide invariance to pose, occlusion, clutter, and illumination changes. The filters are synthesized with the help of an iterative algorithm, which optimizes the discrimination capability for each object. The filters are adapted to the objects changes online using information from the current and past scene frames. Results obtained with the proposed algorithm using real-life scenes are presented and compared with those obtained with state-of-the-art tracking methods in terms of detection efficiency, tracking accuracy, and speed of processing.
Fast perceptual image hash based on cascade algorithm
Author(s):
Alexey Ruchay;
Vitaly Kober;
Evgeniya Yavtushenko
Show Abstract
In this paper, we propose a perceptual image hash algorithm based on cascade algorithm, which can be applied in image authentication, retrieval, and indexing. Image perceptual hash uses for image retrieval in sense of human perception against distortions caused by compression, noise, common signal processing and geometrical modifications. The main disadvantage of perceptual hash is high time expenses. In the proposed cascade algorithm of image retrieval initializes with short hashes, and then a full hash is applied to the processed results. Computer simulation results show that the proposed hash algorithm yields a good performance in terms of robustness, discriminability, and time expenses.
Removal of impulse noise clusters from color images with local order statistics
Author(s):
Alexey Ruchay;
Vitaly Kober
Show Abstract
This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
Impulsive noise removal from color video with morphological filtering
Author(s):
Alexey Ruchay;
Vitaly Kober
Show Abstract
This paper deals with impulse noise removal from color video. The proposed noise removal algorithm employs a switching filtering for denoising of color video; that is, detection of corrupted pixels by means of a novel morphological filtering followed by removal of the detected pixels on the base of estimation of uncorrupted pixels in the previous scenes. With the help of computer simulation we show that the proposed algorithm is able to well remove impulse noise in color video. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
Application of white-light phase-shifting in white-light scanning interferometry
Author(s):
Yujing Wu;
Chunkan Tao;
Weiyi Wang;
Yijun Zhang;
Yunsheng Qian
Show Abstract
A method that combines scanning white-light interferometry with phase-shifting interferometry is proposed. The best-focus scanning position of correlograms is located by calculating the maximum modulation contrast, and the twice averaging four-frame algorithm is utilized to determine the phase difference between the best-focus position and the zero optical path difference point. The surface height is obtained according to the best-focus frame position and the unwrapped phase, which is achieved by a process of removing the phase ambiguity. Both simulated and experimental results demonstrate that the advanced method can achieve the advantages of high precision, large dynamic range, and be insensitive to the phase shifting deviation.
Edge detection for optical synthetic aperture based on deep neural network
Author(s):
Wenjie Tan;
Mei Hui;
Ming Liu;
Lingqin Kong;
Liquan Dong;
Yuejin Zhao
Show Abstract
Synthetic aperture optics systems can meet the demands of the next-generation space telescopes being lighter, larger and foldable. However, the boundaries of segmented aperture systems are much more complex than that of the whole aperture. More edge regions mean more imaging edge pixels, which are often mixed and discretized. In order to achieve high-resolution imaging, it is necessary to identify the gaps between the sub-apertures and the edges of the projected fringes. In this work, we introduced the algorithm of Deep Neural Network into the edge detection of optical synthetic aperture imaging. According to the detection needs, we constructed image sets by experiments and simulations. Based on MatConvNet, a toolbox of MATLAB, we ran the neural network, trained it on training image set and tested its performance on validation set. The training was stopped when the test error on validation set stopped declining. As an input image is given, each intra-neighbor area around the pixel is taken into the network, and scanned pixel by pixel with the trained multi-hidden layers. The network outputs make a judgment on whether the center of the input block is on edge of fringes. We experimented with various pre-processing and post-processing techniques to reveal their influence on edge detection performance. Compared with the traditional algorithms or their improvements, our method makes decision on a much larger intra-neighbor, and is more global and comprehensive. Experiments on more than 2,000 images are also given to prove that our method outperforms classical algorithms in optical images-based edge detection.
Accurate generation of the 3D map of environment with a RGB-D camera
Author(s):
Jose A. González-Fraga;
Vitaly Kober;
Victor H. Diaz-Ramirez;
Everardo Gutierrez;
Omar Alvarez-Xochihua
Show Abstract
With the development of RGB-D sensors, a new alternative to generation of 3D maps is appeared. First, features extracted from color and depth images are used to localize them in a 3D scene. Next, Iterative Closest Point (ICP) algorithm is used to align RGB-D frames. As a result, a new frame is added to the dense 3D model. However, the spatial distribution and resolution of depth data affect to the performance of 3D scene reconstruction systems based on ICP. In this paper we propose to divide the depth data into sub-clouds with similar resolution, to align them separately, and unify in the entire points cloud. The presented computer simulation results show an improvement in accuracy of 3D scene reconstruction using real indoor environment data.
Veterinary software application for comparison of thermograms for pathology evaluation
Author(s):
Gita Pant;
Scott E. Umbaugh;
Rohini Dahal;
Norsang Lama;
Dominic J. Marino;
Joseph Sackman
Show Abstract
The bilateral symmetry property in mammals allows for the detection of pathology by comparison of opposing sides. For any pathological disorder, thermal patterns differ compared to the normal body part. A software application for veterinary clinics has been under development to input two thermograms of body parts on both sides, one normal and the other unknown, and the application compares them based on extracted features and appropriate similarity and difference measures and outputs the likelihood of pathology. Here thermographic image data from 19° C to 40° C was linearly remapped to create images with 256 gray level values. Features were extracted from these images, including histogram, texture and spectral features. The comparison metrics used are the vector inner product, Tanimoto, Euclidean, city block, Minkowski and maximum value metric. Previous research with the anterior cruciate ligament (ACL) pathology in dogs suggested any thermogram variation below a threshold of 40% of Euclidean distance is normal and above 40% is abnormal. Here the 40% threshold was applied to a new ACL image set and achieved a sensitivity of 75%, an improvement from the 55% sensitivity of the previous work. With the new data set it was determined that using a threshold of 20% provided a much improved 92% sensitivity metric. However, this will require further research to determine the corresponding specificity success rate. Additionally, it was found that the anterior view provided better results than the lateral view. It was also determined that better results were obtained with all three feature sets than with just the histogram and texture sets. Further experiments are ongoing with larger image datasets, and pathologies, new features and comparison metric evaluation for determination of more accurate threshold values to separate normal and abnormal images.
Texture analysis integrated to infrared light sources for identifying high fringe concentrations in digital photoelasticity
Author(s):
Hermes Fandiño Toro;
Juan Carlos Briñez de León;
Alejandro Restrepo Martínez;
John W. Branch Bedoya
Show Abstract
In digital photoelasticity images, regions with high fringe densities represent a limitation for unwrapping the phase in specific zones of the stress map. In this work, we recognize such regions by varying the light source wavelength from visible to far infrared, in a simulated experiment based on a circular polariscope observing a birefringent disk under diametral compression. The recognition process involves evaluating the relevance of texture descriptors applied to data sets extracted from regions of interest of the synthetic images, in the visible electromagnetic spectrum and different sub-bands of the infrared. Our results show that extending photoelasticity assemblies to the far infrared, the stress fields could be resolved in regions with high fringe concentrations. Moreover, we show that texture descriptors could overcome limitations associated to the identification of high-stress values in regions in which the fringes are concentrated in the visible spectrum, but not in the infrared.
Application of speckle-field images processing for concrete hardening diagnostics
Author(s):
Mykhaylo P. Gorsky;
Peter P. Maksimyak
Show Abstract
This paper is devoted to processing of speckle field images dynamics during coherent light scattering by the cement surface in the process of hydration (hardening). Experimentally obtained set of images were processed by different methods including Fourier transform, Wavelet analysis, statistical moments and deviation calculation. Results of each analysis were evaluated in order to select best image processing approach. Deviation method has been selected like most accurate and less resource consuming one. It allows fast and accurate determining of concrete hardening stages by optical method
Transform extension for block-based hybrid video codec with decoupling transform sizes from prediction sizes and coding sizes
Author(s):
Jing Chen;
Ge Li;
Kui Fan;
Xiaoqiang Guo
Show Abstract
In the block-based hybrid video coding framework, transform is applied to the residual signal resulting from intra/inter prediction. Thus in the most of video codecs, transform block (TB) size is equal to the prediction block (PB) size. To further improve coding efficiency, recent video coding techniques have supported decoupling transform and prediction sizes. By splitting one prediction block into small transform blocks, the Residual Quad-tree (RQT) structure attempts to search the best transform size. However, in the current RQT, the transform size cannot be larger than the size of prediction block. In this paper, we introduce a transform extension method by decoupling transform sizes from prediction sizes and coding sizes. In addition to getting the transform block within the current PB partition, we combine multiple adjacent PBs to form a larger TB and select best block size accordingly. According to our experiment on top of the newest reference software (ITM17.0) of MPEG Internet Video Coding (IVC) standard, consistent coding performance gains are obtained.
Robot mapping algorithm based on Kalman filtering and symbolic tags
Author(s):
A. Vokhmintcev;
T. Botova;
I. Sochenkov;
A. Sochenkova;
A. Makovetskii
Show Abstract
A new method will be developed in the present work of the detection of a robot's position in a relative coordinate system based on a history of camera positions and the robot's movement, symbolic tags and on combining obtained three-dimensional depth maps that account for accuracy of their superimposition and geometric relationships between various images of the same scene. It is expected that this approach will enable one to develop a fast and accurate algorithm for localization in unknown dynamic environment.
An efficient point-to-plane registration algorithm for affine transformations
Author(s):
Artyom Makovetskii;
Sergei Voronin;
Vitaly Kober;
Dmitrii Tihonkih
Show Abstract
The problem of aligning of 3D point data is the known registration task. The most popular registration algorithm is the Iterative Closest Point (ICP) algorithm. The traditional ICP algorithm is a fast and accurate approach for rigid registration between two point clouds but it is unable to handle affine case. Recently, extension of the ICP algorithm for composition of scaling, rotation, and translation is proposed. A generalized ICP version for an arbitrary affine transformation is also suggested. In this paper, a new iterative algorithm for registration of point clouds based on the point-to-plane ICP algorithm with affine transformations is proposed. At each iteration, a closed-form solution to the affine transformation is derived. This approach allows us to get a precise solution for transformations such as rotation, translation, and scaling. With the help of computer simulation, the proposed algorithm is compared with common registration algorithms.
A generalized Condat's algorithm of 1D total variation regularization
Author(s):
Artyom Makovetskii;
Sergei Voronin;
Vitaly Kober
Show Abstract
A common way for solving the denosing problem is to utilize the total variation (TV) regularization. Many efficient numerical algorithms have been developed for solving the TV regularization problem. Condat described a fast direct algorithm to compute the processed 1D signal. Also there exists a direct algorithm with a linear time for 1D TV denoising referred to as the taut string algorithm. The Condat’s algorithm is based on a dual problem to the 1D TV regularization. In this paper, we propose a variant of the Condat’s algorithm based on the direct 1D TV regularization problem. The usage of the Condat’s algorithm with the taut string approach leads to a clear geometric description of the extremal function. Computer simulation results are provided to illustrate the performance of the proposed algorithm for restoration of degraded signals.
Convolutional neural networks and face recognition task
Author(s):
A. Sochenkova;
I. Sochenkov;
A. Makovetskii;
A. Vokhmintsev;
A. Melnikov
Show Abstract
Computer vision tasks are remaining very important for the last couple of years. One of the most complicated problems in computer vision is face recognition that could be used in security systems to provide safety and to identify person among the others. There is a variety of different approaches to solve this task, but there is still no universal solution that would give adequate results in some cases. Current paper presents following approach. Firstly, we extract an area containing face, then we use Canny edge detector. On the next stage we use convolutional neural networks (CNN) to finally solve face recognition and person identification task.
System of multifunctional Jones matrix tomography of phase anisotropy in diagnostics of endometriosis
Author(s):
V. O. Ushenko;
G. D. Koval;
Yu. O. Ushenko;
L. Y. Pidkamin;
M. I. Sidor;
O. Vanchuliak;
A. V. Motrich;
M. P. Gorsky;
I. Meglinskiy
Show Abstract
The paper presents the results of Jones-matrix mapping of uterine wall histological sections with second-degree and third-degree endometriosis. The technique of experimental measurement of coordinate distributions of the modulus and phase values of Jones matrix elements is suggested. Within the statistical and cross-correlation approaches the modulus and phase maps of Jones matrix images of optically thin biological layers of polycrystalline films of plasma and cerebrospinal fluid are analyzed. A set of objective parameters (statistical and generalized correlation moments), which are the most sensitive to changes in the phase of anisotropy, associated with the features of polycrystalline structure of uterine wall histological sections with second-degree and third-degree endometriosis are determined.
Azimuthally invariant Mueller-matrix mapping of biological optically anisotropic network
Author(s):
Yu. O. Ushenko;
O. Vanchuliak;
G. B. Bodnar;
V. O. Ushenko;
M. Grytsyuk;
N. Pavlyukovich;
O. V. Pavlyukovich;
O. Antonyuk
Show Abstract
A new technique of Mueller-matrix mapping of polycrystalline structure of histological sections of biological tissues is suggested. The algorithms of reconstruction of distribution of parameters of linear and circular dichroism of histological sections liver tissue of mice with different degrees of severity of diabetes are found. The interconnections between such distributions and parameters of linear and circular dichroism of liver of mice tissue histological sections are defined. The comparative investigations of coordinate distributions of parameters of amplitude anisotropy formed by Liver tissue with varying severity of diabetes (10 days and 24 days) are performed. The values and ranges of change of the statistical (moments of the 1st – 4th order) parameters of coordinate distributions of the value of linear and circular dichroism are defined. The objective criteria of cause of the degree of severity of the diabetes differentiation are determined.
Polarization-interference mapping of biological fluids polycrystalline films in differentiation of weak changes of optical anisotropy
Author(s):
V. O. Ushenko;
O. Vanchuliak;
M. Yu. Sakhnovskiy;
O. V. Dubolazov;
P. Grygoryshyn;
I. V. Soltys;
O. V. Olar;
A. Antoniv
Show Abstract
The theoretical background of the azimuthally stable method of polarization-interference mapping of the histological sections of the biopsy of the prostate tissue on the basis of the spatial frequency selection of the mechanisms of linear and circular birefringence is presented. The diagnostic application of a new correlation parameter – complex degree of mutual anisotropy – is analytically substantiated. The method of measuring coordinate distributions of complex degree of mutual anisotropy with further spatial filtration of their high- and low-frequency components is developed. The interconnections of such distributions with parameters of linear and circular birefringence of prostate tissue histological sections are found. The objective criteria of differentiation of benign and malignant conditions of prostate tissue are determined.
Methods and means of 3D diffuse Mueller-matrix tomography of depolarizing optically anisotropic biological layers
Author(s):
O. V. Dubolazov;
V. O. Ushenko;
L. Trifoniuk;
Yu. O. Ushenko;
V. G. Zhytaryuk;
O. G. Prydiy;
M. Grytsyuk;
L. Kushnerik;
I. Meglinskiy
Show Abstract
A new technique of Mueller-matrix mapping of polycrystalline structure of histological sections of biological tissues is suggested. The algorithms of reconstruction of distribution of parameters of linear and circular birefringence of prostate histological sections are found. The interconnections between such distributions and parameters of linear and circular birefringence of prostate tissue histological sections are defined. The comparative investigations of coordinate distributions of phase anisotropy parameters formed by fibrillar networks of prostate tissues of different pathological states (adenoma and carcinoma) are performed. The values and ranges of change of the statistical (moments of the 1st – 4th order) parameters of coordinate distributions of the value of linear and circular birefringence are defined. The objective criteria of cause of Benign and malignant conditions differentiation are determined.
Feature recognition of metal salt spray corrosion based on color spaces statistics analysis
Author(s):
Zhi Zou;
Liqun Ma;
Qiuqin Fan;
Xiaochuan Gan;
Lei Qiao
Show Abstract
The article proposed a method to quantify corrosion characteristics of high strength alloy steel samples using digital image processing technique in color spaces. The distribution histograms in different channels of different spaces in corrosion images are plotted and analyzed. Select the proper color channel to extract the corrosion characteristics among three different spaces of RGB space, HSV space, YCbCr space. Combined the theory of corrosion generation, the data of color channels is processed and the feature of metal material salt spray corrosion is recognized. Through processing several sample color images of alloy steel, it is proved that the feature extracted by this procedure has better accuracy and the corrosion degree is quantifiable and the precision of discriminating the corrosion is improved.
Resolution analysis of archive films for the purpose of their optimal digitization and distribution
Author(s):
Karel Fliegel;
Stanislav Vítek;
Petr Páta;
Jiří Myslík;
Josef Pecák;
Marek Jícha
Show Abstract
With recent high demand for ultra-high-definition (UHD) content to be screened in high-end digital movie theaters but also in the home environment, film archives full of movies in high-definition and above are in the scope of UHD content providers. Movies captured with the traditional film technology represent a virtually unlimited source of UHD content. The goal to maintain complete image information is also related to the choice of scanning resolution and spatial resolution for further distribution. It might seem that scanning the film material in the highest possible resolution using state-of-the-art film scanners and also its distribution in this resolution is the right choice. The information content of the digitized images is however limited, and various degradations moreover lead to its further reduction. Digital distribution of the content in the highest image resolution might be therefore unnecessary or uneconomical. In other cases, the highest possible resolution is inevitable if we want to preserve fine scene details or film grain structure for archiving purposes. This paper deals with the image detail content analysis of archive film records. The resolution limit in captured scene image and factors which lower the final resolution are discussed. Methods are proposed to determine the spatial details of the film picture based on the analysis of its digitized image data. These procedures allow determining recommendations for optimal distribution of digitized video content intended for various display devices with lower resolutions. Obtained results are illustrated on spatial downsampling use case scenario, and performance evaluation of the proposed techniques is presented.
An efficient direct method for image registration of flat objects
Author(s):
Dmitry Nikolaev;
Dmitrii Tihonkih;
Artyom Makovetskii;
Sergei Voronin
Show Abstract
Image alignment of rigid surfaces is a rapidly developing area of research and has many practical applications. Alignment methods can be roughly divided into two types: feature-based methods and direct methods. Known SURF and SIFT algorithms are examples of the feature-based methods. Direct methods refer to those that exploit the pixel intensities without resorting to image features and image-based deformations are general direct method to align images of deformable objects in 3D space. Nevertheless, it is not good for the registration of images of 3D rigid objects since the underlying structure cannot be directly evaluated. In the article, we propose a model that is suitable for image alignment of rigid flat objects under various illumination models. The brightness consistency assumptions used for reconstruction of optimal geometrical transformation. Computer simulation results are provided to illustrate the performance of the proposed algorithm for computing of an accordance between pixels of two images.
Smoothing of astronomical images with Poisson distribution
Author(s):
Zuzana Krbcová;
Jaromír Kukal;
Jan Švihlík;
Karel Fliegel
Show Abstract
Images obtained from an astronomical digital camera are of integer nature as event counters in every pixel of its image sensor. The quality of the captured images is in uenced mostly by the camera characteristics and photon noise caused by natural random uctuation of observed light. We suppose the image pixel intensity as a mean value of the signal with Poisson distribution. The application of maximum likelihood method with image gradient regularization leads to variational task in discrete formulation. This variational task has only one unique solution on which the novel numerical method of image smoothing is based. The performance of the proposed smoothing procedure is tested using real images obtained from the digital camera in Meteor Automatic Imager and Analyzer (MAIA).
A modified iterative closest point algorithm for noisy data
Author(s):
Dmitrii Tihonkih;
Artyom Makovetskii;
Aleksei Voronin
Show Abstract
The problem of aligning of 3D point data is the known registration task. The most popular registration algorithm is the Iterative Closest Point (ICP) algorithm. The traditional ICP algorithm is a fast and accurate approach for rigid registration between two point clouds but it is unable to handle affine case. Recently, extension of the ICP algorithm for composition of scaling, rotation, and translation is proposed. A generalized ICP version for an arbitrary affine transformation is also suggested. In this paper, a new iterative algorithm for registration of point clouds based on the point-to-plane ICP algorithm with affine transformations is proposed. At each iteration, a closed-form solution to the affine transformation is derived. This approach allows us to get a precise solution for transformations such as rotation, translation, and scaling. With the help of computer simulation, the proposed algorithm is compared with common registration algorithms.
Estimation of Poisson noise in spatial domain
Author(s):
Jan Švihlík;
Karel Fliegel;
Stanislav Vítek;
Jaromír Kukal;
Zuzana Krbcová
Show Abstract
This paper deals with modeling of astronomical images in the spatial domain. We consider astronomical light images contaminated by the dark current which is modeled by Poisson random process. Dark frame image maps the thermally generated charge of the CCD sensor. In this paper, we solve the problem of an addition of two Poisson random variables. At first, the noise analysis of images obtained from the astronomical camera is performed. It allows estimating parameters of the Poisson probability mass functions in every pixel of the acquired dark frame. Then the resulting distributions of the light image can be found. If the distributions of the light image pixels are identified, then the denoising algorithm can be applied. The performance of the Bayesian approach in the spatial domain is compared with the direct approach based on the method of moments and the dark frame subtraction.
The relationship between the retinal image quality and the refractive index of defects arising in IOL: numerical analysis
Author(s):
Malwina Geniusz
Show Abstract
The best treatment for cataract patients, which allows to restore clear vision is implanting an artificial intraocular lens (IOL). The image quality of the lens has a significant impact on the quality of patient’s vision. After a long exposure the implant to aqueous environment some defects appear in the artificial lenses. The defects generated in the IOL have different refractive indices. For example, glistening phenomenon is based on light scattering on the oval microvacuoles filled with an aqueous humor which refractive index value is about 1.34. Calcium deposits are another example of lens defects and they can be characterized by the refractive index 1.63. In the presented studies it was calculated how the difference between the refractive indices of the defect and the refractive index of the lens material affects the quality of image. The OpticStudio Professional program (from Radiant Zemax, LLC) was used for the construction of the numerical model of the eye with IOL and to calculate the characteristics of the retinal image. Retinal image quality was described in such characteristics as Point Spread Function (PSF) and the Optical Transfer Function with amplitude and phase. The results show a strong correlation between the refractive indices difference and retinal image quality.
Analysis of image reconstruction artifacts in structured illumination microscopy
Author(s):
Jakub Pospíšil;
Karel Fliegel;
Miloš Klíma
Show Abstract
Structured Illumination Microscopy (SIM) is a super-resolution technique which enables to enhance the resolution of optical microscopes beyond the diffraction limit. The final super-resolution image quality strongly depends on the performance of SIM image reconstruction. Standard SIM methods require precise knowledge of the illumination pattern and assume the sample to be stationary during the acquisition of illumination patterned images. In the case of imaging live cells, the movements of the cell result in the occurrence of image reconstruction artifacts. To reduce this kind of artifacts the short acquisition time is needed. However, short exposure time causes low signal-to-noise ratio (SNR). Moreover, a drift of the specimen may distort the illumination pattern properties in each image. This issue together with the low SNR makes the estimation of reconstruction parameters a challenging task. Inaccurate assessment of spatial frequency, phase shift or orientation of the illumination pattern leads to incorrect separation and shift of spectral components in Fourier space. This results in unwanted image reconstruction artifacts and hampers the resolution enhancement in practice. In this paper, we analyze possible artifacts in super-resolution images reconstructed using super-resolution SIM technique (SR-SIM). An overview of typical image reconstruction artifact types is presented. Distinguishing image artifacts from newly resolved sample features is essential for future SIM applications in cell biology.
FPGA implementation of image dehazing algorithm for real time applications
Author(s):
Rahul Kumar;
Brajesh Kumar Kaushik;
R. Balasubramanian
Show Abstract
Weather degradation such as haze, fog, mist, etc. severely reduces the effective range of visual surveillance. This degradation is a spatially varying phenomena, which makes this problem non trivial. Dehazing is an essential preprocessing stage in applications such as long range imaging, border security, intelligent transportation system, etc. However, these applications require low latency of the preprocessing block. In this work, single image dark channel prior algorithm is modified and implemented for fast processing with comparable visual quality of the restored image/video. Although conventional single image dark channel prior algorithm is computationally expensive, it yields impressive results. Moreover, a two stage image dehazing architecture is introduced, wherein, dark channel and airlight are estimated in the first stage. Whereas, transmission map and intensity restoration are computed in the next stages. The algorithm is implemented using Xilinx Vivado software and validated by using Xilinx zc702 development board, which contains an Artix7 equivalent Field Programmable Gate Array (FPGA) and ARM Cortex A9 dual core processor. Additionally, high definition multimedia interface (HDMI) has been incorporated for video feed and display purposes. The results show that the dehazing algorithm attains 29 frames per second for the image resolution of 1920x1080 which is suitable of real time applications. The design utilizes 9 18K_BRAM, 97 DSP_48, 6508 FFs and 8159 LUTs.
Radiometric calibration of wide-field camera system with an application in astronomy
Author(s):
Stanislav Vítek;
Maria Nasyrova;
Veronika Stehlíková
Show Abstract
Camera response function (CRF) is widely used for the description of the relationship between scene radiance and image brightness. Most common application of CRF is High Dynamic Range (HDR) reconstruction of the radiance maps of imaged scenes from a set of frames with different exposures. The main goal of this work is to provide an overview of CRF estimation algorithms and compare their outputs with results obtained under laboratory conditions. These algorithms, typically designed for multimedia content, are unfortunately quite useless with astronomical image data, mostly due to their nature (blur, noise, and long exposures). Therefore, we propose an optimization of selected methods to use in an astronomical imaging application. Results are experimentally verified on the wide-field camera system using Digital Single Lens Reflex (DSLR) camera.
Robust parameterization of time-frequency characteristics for recognition of musical genres of Mexican culture
Author(s):
Osvaldo G. Pérez Rosas;
José L. Rivera Martínez;
Luis A. Maldonado Cano;
Mario López Rodríguez;
Laura M. Amaya Reyes;
Elizabeth Cano Martínez;
Mireya S. García Vázquez;
Alejandro A. Ramírez Acosta
Show Abstract
The automatic identification and classification of musical genres based on the sound similarities to form musical textures, it is a very active investigation area. In this context it has been created recognition systems of musical genres, formed by time-frequency characteristics extraction methods and by classification methods. The selection of this methods are important for a good development in the recognition systems. In this article they are proposed the Mel-Frequency Cepstral Coefficients (MFCC) methods as a characteristic extractor and Support Vector Machines (SVM) as a classifier for our system. The stablished parameters of the MFCC method in the system by our time-frequency analysis, represents the gamma of Mexican culture musical genres in this article. For the precision of a classification system of musical genres it is necessary that the descriptors represent the correct spectrum of each gender; to achieve this we must realize a correct parametrization of the MFCC like the one we present in this article. With the system developed we get satisfactory detection results, where the least identification percentage of musical genres was 66.67% and the one with the most precision was 100%.
Text detection in natural scenes with phase congruency approach
Author(s):
Julia Diaz-Escobar;
Vitaly Kober
Show Abstract
In recent years, the importance of text detection in imagery has been increasing due to the great number of applications developed for mobile devices. Text detection becomes complicated when backgrounds are complex or capture conditions are not controlled. In this work, a method for text detection in natural scenes is proposed. The method is based on the Phase Congruency approach, obtained via Scale-Space Monogenic signal framework. The proposed method is robust to geometrical distortions, resolution, illumination, and noise degradation. Finally, experimental results are presented using a natural scene dataset.
Enhancing user experience by using multi-sensor data fusion to predict phone’s luminance
Author(s):
Asmaa H. Marhoubi
Show Abstract
The movement of a phone in an environment with different brightness, makes the luminance prediction challenging. The
ambient light sensor takes time to modify the brightness of the screen based on the environment it is placed in. This
causes an unsatisfactory user experience and delays in adjustment of the screen brightness. In this research, a method is
proposed for enhancing the prediction of luminance using accelerometer, gyroscope and speed measurement technique.
The speed of the phone is identified using Sum-of-Sine parameters. The lux values are then fused with the accelerometer
and gyroscope data to present more accurate luminance values for the ALS based on the movement of the phone. An
investigation is made during the movement of the user in a standard lighting environment. This enhances the user
experience and improves the screen brightness precision. The accuracy has given an R-Square value of up to 0.97.
Global stereo matching algorithm based on disparity range estimation
Author(s):
Jing Li;
Hong Zhao;
Feifei Gu
Show Abstract
The global stereo matching algorithms are of high accuracy for the estimation of disparity map, but the time-consuming in the optimization process still faces a curse, especially for the image pairs with high resolution and large baseline setting. To improve the computational efficiency of the global algorithms, a disparity range estimation scheme for the global stereo matching is proposed to estimate the disparity map of rectified stereo images in this paper. The projective geometry in a parallel binocular stereo vision is investigated to reveal a relationship between two disparities at each pixel in the rectified stereo images with different baselines, which can be used to quickly obtain a predicted disparity map in a long baseline setting estimated by that in the small one. Then, the drastically reduced disparity ranges at each pixel under a long baseline setting can be determined by the predicted disparity map. Furthermore, the disparity range estimation scheme is introduced into the graph cuts with expansion moves to estimate the precise disparity map, which can greatly save the cost of computing without loss of accuracy in the stereo matching, especially for the dense global stereo matching, compared to the traditional algorithm. Experimental results with the Middlebury stereo datasets are presented to demonstrate the validity and efficiency of the proposed algorithm.
The error analysis of the handheld target in Target-based Vision Measurement System (T-VMS)
Author(s):
Yueyang Ma;
Hong Zhao;
Feifei Gu;
Meiqi Fang;
Hehui Geng;
Kejia Li
Show Abstract
The handheld target greatly expands the fields of the vision measurement systems. However, it introduces extraction errors and position errors, which degrades the positioning precision of the vision measurement systems. In order to evaluate the influence of the handheld targets on the accuracy of T-VMS, we first analyzed the positioning principle of the visual measurement system and established the precision model under two typical structures of the T-VMS. We then studied the extraction errors and position errors introduced by the handheld targets and quantified the errors. Finally, we discussed the influence of the said errors on the positioning in 3D space with system precision model. We applied the precision model in an actual T-VMS to confirm its feasibility and effectiveness, and found that it indeed estimate the errors introduced by the handheld targets effectively.