Real-Time Image Processing 2008 | (2008) | Publications

Volume Details

Date Published: 27 February 2008

Contents: 8 Sessions, 32 Papers, 0 Presentations

Conference: Electronic Imaging 2008

Volume Number: 6811

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 6811
Algorithms I
Video Processing and Surveillance
Video Compression
FPGA and Hardware I
FPGA and Hardware II
Algorithms II
Interactive Paper and Symposium Demonstration

Front Matter: Volume 6811

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 6811, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.

Algorithms I

Improving the SNR during color image processing while preserving the appearance of clipped pixels

Sergio Goma, Milivoje Aleksic

Show abstract

An image processing path typically involves color correction or white balance resulting in higher than unity color gains. A gain higher than unity increases the noise in that respective channel, and therefore degrades the SNR performance of the input signal. If the input signal does not have enough SNR to accommodate the extra gain, the resultant color image has increased color noise. This is the usual case for color processing in cell phone cameras, which have sensors with limited SNR and high color crosstalk. This phenomenon degrades images more as illuminants differ from D65. In addition, the incomplete information for clipped pixels often results in unsightly artifacts during color processing. To correct this dual problem, we investigate the use of under unity color gains, which, by increasing the exposure of the sensor, would improve the resultant SNR of the color corrected image. The proposed method preserves the appearance of clipped pixels and the overall luminance of the image, while applying the appropriate color gains.

Optimization model for memory bandwidth usage in x-ray image enhancement

Rob Albers, Eric Suijs, Peter H. N. de With

Show abstract

In Cardiovascular minimal invasive interventions, physicians require low-latency X-ray imaging applications, as their actions must be directly visible on the screen. The image-processing system should enable the simultaneous execution of a plurality of functions. Because dedicated hardware lacks flexibility, there is a growing interest in using off-the-shelf computer technology. Because memory bandwidth is a scarce parameter, we will focus on optimization methods for bandwidth reduction within multiprocessor systems at the chip level. We create a practical realistic model of required compute and memory bandwidth for a given set of image-processing functions. Similar modeling is applied for the available system resources. We concentrate in particular on X-ray image processing based on multi-resolution decomposition, noise reduction and image-enhancement techniques. We derive formulas for which we can optimize the mapping of the application onto processors, cache and memory for different configurations. The data-block granularity is matched to the memory hierarchy, so that caching will be optimized for low latency. More specifically, we exploit the locality of the signal-processing functions to streamline the memory communication. A substantial performance improvement is realized by a new memorycommunication model that incorporates the data dependencies of the image-processing functions. Results show a memory-bandwidth reduction in the order of 60% and a latency reduction in the order of 30-60% compared to straightforward implementations.

Chaos-based image encryption scheme using Galois field for fast and secure transmission

Shan Suthaharan

Show abstract

Chaos-based image encryption techniques are very useful for protecting the contents of digital images and videos. They use traditional block cipher principles known as chaos confusion, pixel diffusion and number of rounds. The complex structure of the traditional block ciphers makes them unsuitable for real-time encryption of digital images and videos. Real-time applications require fast algorithms with acceptable security strengths. This paper presents a simple chaosbased image encryption scheme using cryptographic operations based on Galois field of order 2ⁿ with combinations of a master key, a session key and a key image. A discretized 2D chaotic map and a pseudo random noise are also used to thwart statistical and differential attacks. The proposed approach, in contrast to traditional chaos-based schemes, generates a chaotic map from the key image and then uses this chaotic image to destroy the statistical and perceptual properties of the image to be encrypted using the Galois field operations. The simulation tests use real and synthetic images (a gradient image as the key image) to demonstrate the performance of the proposed approach. The results show that the proposed approach is fast enough for real-time applications and provides acceptable security strength.

Video Processing and Surveillance

Real-time turbulent video super-resolution using MPEG-4

Barak Fishbain, Leonid P. Yaroslavsky, Ianir A. Ideses

Show abstract

It has been shown that one can make use of local instabilities in turbulent video frames to enhance image resolution beyond the limit defined by the image sampling rate. This paper outlines a real-time solution for the implementation of super-resolution algorithm on MPEG-4 platforms. The MPEG-4 video compression standard offer, in real-time, several features, such as motion extraction with quarter pixel accuracy, scene segmentation to video object planes, global motion compensation and de-blocking and de-ringing filters, which can be incorporated into the super-resolution process to produce enhanced visual output. Experimental verification on real-life videos is also provided.

Fast multi-class distance transforms for video surveillance

Theo E. Schouten, Egon L. van den Broek

Show abstract

A distance transformation (DT) takes a binary image as input and generates a distance map image in which the value of each pixel is its distance to a given set of object pixels in the binary image. In this research, DT's for multi class data (MCDTs) are developed which generate both a distance map and a class map containing for each pixel the class of the closest object. Results indicate that the MCDT based on the Fast Exact Euclidean Distance (FEED) method is a factor 2 tot 4 faster than MCDTs based on exact or semi-exact euclidean distance (ED) transformations, and is only a factor 2 to 4 slower than the MCDT based on the crude city-block approximation of the ED. In the second part of this research, the MCDTs were adapted such that they could be used for the fast generation of distance and class maps for video sequences. The frames of the sequences contain a number of fixed objects and a moving object, where each object has a separate label. Results show that the FEED based version is a factor 2 to 3.5 faster than the fastest of all the other video-MCDTs which is based on the chamfer 3,4 distance measure. FEED is even a factor 3.5 to 10 faster than another fast exact ED transformation. With video, multi class FEED it will be possible to measure distances from a moving object to various identified stationary objects with nearly the frame rate of a webcam. This will be very useful when the risk exists that objects move outside surveillance limits.

Real-time road traffic classification using mobile video cameras

A. Lapeyronnie, C. Parisot, J. Meessen, et al.

Show abstract

On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified into different categories such as 'fluid', 'congestion' etc. The solution offers both good performances and low computing complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.

Real-time people counting system using a single video camera

Damien Lefloch, Faouzi Alaya Cheikh, Jon Yngve Hardeberg, et al.

Show abstract

There is growing interest in video-based solutions for people monitoring and counting in business and security applications. Compared to classic sensor-based solutions the video-based ones allow for more versatile functionalities, improved performance with lower costs. In this paper, we propose a real-time system for people counting based on single low-end non-calibrated video camera. The two main challenges addressed in this paper are: robust estimation of the scene background and the number of real persons in merge-split scenarios. The latter is likely to occur whenever multiple persons move closely, e.g. in shopping centers. Several persons may be considered to be a single person by automatic segmentation algorithms, due to occlusions or shadows, leading to under-counting. Therefore, to account for noises, illumination and static objects changes, a background substraction is performed using an adaptive background model (updated over time based on motion information) and automatic thresholding. Furthermore, post-processing of the segmentation results is performed, in the HSV color space, to remove shadows. Moving objects are tracked using an adaptive Kalman filter, allowing a robust estimation of the objects future positions even under heavy occlusion. The system is implemented in Matlab, and gives encouraging results even at high frame rates. Experimental results obtained based on the PETS2006 datasets are presented at the end of the paper.

Video Compression

Fast adaptive early termination for mode selection in H.264/AVC standard based on x264 implementation

Jianfeng Ren, Nasser Kehtarnavaz, Madhukar Budagavi

Show abstract

In the H.264/AVC video coding standard, the mode decision component involves a large amount of computation. This paper presents a fast or computationally efficient mode prediction and selection approach which has the following attributes: (a) both the spatial and temporal information are used to achieve early termination using adaptive thresholds, (b) inclusion of a modulator capable of trading off computational efficiency and accuracy, (c) a homogenous region detection procedure for 8×8 blocks based on adaptive thresholds. The developed approach consists of three main steps: (1) mode prediction, (2) early termination based on adaptive thresholds, and (3) refinement by checking all the modes. In addition, in order to avoid sub-partitions into smaller block sizes for 8x8 blocks, texture information is utilized. It is shown that the developed approach leads to a computationally efficient video coding implementation as compared to the previous fast approaches. The results obtained on QCIF, CIF, and HD format video sequences based on x264 are presented to demonstrate the computational efficiency of the developed approach at the expense of acceptably low losses in video quality.

A resource constrained MPEG-7 driven rate control scheme for the H264/AVC

Jesús Cánovas Serrano, Mingyuan Yang, Christos Grecos

Show abstract

Currently no information about different shots has been used in the H264/AVC video coding standard. This kind of information can help use choose the size of GOPs and finally reduce the bit rate and PSNR fluctuation when video sequences have multiple shots. We developed an MPEG-7 based rate control scheme in H264/AVC standard. Our proposed scheme outperforms the rate control of the H264 AVC significantly in terms of reducing the average bit rate fluctuation (variance) by 7-60% and the average PSNR fluctuation (variance) by 24-90% between shots. It is also applicable in computationally and memory restricted devices since it needs maximum 2 frames buffer space for MPEG-7 descriptor calculation, while the average amount of extra processing is only 5.8% of the total CPU cycles.

A high performance parallel architecture of H.264 intra prediction for motion estimation

Philip Dang

Show abstract

This paper presents an efficient VLSI architecture for the intra prediction of the H.264 video compression standard. To address the computational complexity issue, we propose a dedicated processor that can compute multiple intra prediction modes in parallel. The proposed architecture accelerates the intra coding process. It can support large video format at high frame rate in real-time.

A real-time wavelet-based video decoder using SIMD technology

Robert Klepko, Demin Wang

Show abstract

This paper presents a fast implementation of a wavelet-based video codec. The codec consists of motion-compensated temporal filtering (MCTF), 2-D spatial wavelet transform, and SPIHT for wavelet coefficient coding. It offers compression efficiency that is competitive to H.264. The codec is implemented in software running on a general purpose PC, using C programming language and streaming SIMD extensions intrinsics, without assembly language. This high-level software implementation allows the codec to be portable to other general-purpose computing platforms. Testing with a Pentium 4 HT at 3.6GHz (running under Linux and using the GCC compiler, version 4), shows that the software decoder is able to decode 4CIF video in real-time, over 2 times faster than software written only in C language. This paper describes the structure of the codec, the fast algorithms chosen for the most computationally intensive elements in the codec, and the use of SIMD to implement these algorithms.

FPGA and Hardware I

A real-time bit-serial rank filter implementation using Xilinx FPGA

Chang Choo, Punam Verma

Show abstract

Rank filter is a non-linear filter used in image processing for impulse noise removal, morphological operations, and image enhancement. Real-time applications, such as video and high-speed acquisition cameras, often require the rank filter, and the much simpler median filter. Implementing the rank filter in hardware, can achieve the required speeds for these applications. Bit-serial algorithm can increase the speed of rank filter by eliminating the time-consuming sorting network. In this paper, an 8-stage pipelined architecture for rank filter is described using the bit-serial algorithm. It also includes an efficient window extraction and boundary-processing scheme. This rank filter design was simulated and synthesized on the Xilinx family of FPGAs. For 3×3 window size, the maximum operating frequency achieved was 75 MHz on a low-end device XC3S200 of Spartan-3 family, and 180 MHz on a high-end device XC4VSX25 of Virtex-4 family. For 5×5 window size, the maximum operating frequency achieved was 67 MHz on XC3S200, and 138 MHz on XC4VSX25. With a pixel filtered out at every clock cycle, the achieved speeds are sufficient for most of the video applications. The 3×3 window size design used 31% of slices on XC3S200, and 5% on XC4VSX25. The 5×5 window size design used 60% of slices on XC3S200, and 11% on XC4VSX25. This IP design may be used as a hardware accelerator in a fast image processing SOC.

An implementation of a multiplierless Hough transform on an FPGA platform using hybrid-log arithmetic

Peter Lee, Evangelos Alexiadis

Show abstract

This paper describes an implementation of the Hough Transform (HT) that uses a hybrid-log structure for the main arithmetic components instead of fixed or floating point architectures. A major advantage of this approach is a reduction in the overall computational complexity of the HT without adversely affecting its overall performance when compared to fixed point solutions. The proposed architecture is compatible with the latest FPGA architectures allowing multiple units to operate in parallel without exhausting the dedicated (but limited) on-chip signal processing resources that can instead be allocated to other image processing and classification tasks. The solution proposed is capable of performing a real-time HT on megapixel images at frame rates of up to 25 frames per second using a Xilinx Virtex^TM architecture.

Streaming warper with cubic spline interpolation for rectification of distorted images on FPGAs

Johannes Fuertler, Konrad J. Mayer, Michael Rubik, et al.

Show abstract

For industrial print flaw detection images are acquired and then compared to a specimen (master image). Due to the production process, the images are not exactly aligned to each other. Therefore, preceding a pixel-by-pixel comparison, the acquired image has to be rectified in order to match the master image' properties-it has to be warped into the master image' coordinate system. To achieve the required detection speed, several Megapixels per second have to be processed. It proved to be very advantageous to continuously process the stream of image data in an image processing pipeline. The first stage is the warping process. In this paper we introduce a streaming warper unit which implements affine backward mapping and cubic spline interpolation. Since a complete pixel transformation is computed per clock cycle the performance-implemented on contemporary FPGA devices--can be up to 200 Megapixels per second. The implementation of several streaming warper units within a single FPGA is possible. This enables image processing systems which allow high data rates even under real-time constraints.

Architecture-template for massively parallel statistical image processing models

Stephan C. Stilkerich

Show abstract

The System-on-Chip design of specific image analysis architectures, which are based on massively parallel Markov Random Field (MRF) processing principles is so far an unstructured, faultprone and complex task. Up to now neither a systematically derived architecture-template nor an industrial approved tool-chain is available to support the VLSI design task for these kind of digital architectures. In this contribution, we report on a theoretical sound and systematically derived architecture-template for massively parallel MRF processing devices. The paper is finalized by prototypical implementations of selected architecture parts using FPGA technologies. These results demonstrate the capability of the proposed architecture-template and manifest the industrial relevance of the template.

FPGA and Hardware II

A memory and MHZ efficient EDMA transfer scheme for video encoding algorithms on TI TMS320DM642

Noha A. El-Yamany

Show abstract

Video encoding algorithms require processing of data arranged in blocks of pixels. For efficient computation, pixel blocks are expected to be stored contiguously in memory, and within each block, pixels are to be arranged in a raster scan (left to right, top to bottom order). Since data captured from the video port is linearly arranged in memory (one line after the other), it is necessary to arrange the data in the two-dimensional form before processing for encoding. A common approach to achieve the two-dimensional arrangement is through optimized functions (in C or Assembly) to arrange the captured data, which is stored in an intermediate buffer, into an input buffer from which it is ready for encoding. However, this approach has two main drawbacks. First, a portion of the CPU MHZ budget is consumed only on data arrangement. Second, an intermediate data buffer is required to hold the data before the arrangement into the input buffer takes place, and hence increasing the memory requirements. In this paper, a memory and MHZ efficient EDMA transfer scheme is introduced for simultaneous data transfer and two-dimensional arrangement from the video port to the DSP memory. The proposed scheme is described in details for TI TMS320DM642^TM.

Fast approximate curve evolution

James Malcolm, Yogesh Rathi, Anthony Yezzi, et al.

Show abstract

The level set method for curve evolution is a popular technique used in image processing applications. However, the numerics involved make its use in high performance systems computationally prohibitive. This paper proposes an approximate level set scheme that removes much of the computational burden while maintaining accuracy. Abandoning a floating point representation for the signed distance function, we use the integral values to represent the interior, zero level set, and exterior. We detail rules governing the evolution and maintenance of these three regions. Arbitrary energies can be implemented with the definition of three operations: initialize iteration, move points in, move points out. This scheme has several nice properties. First, computations are only performed along the zero level set. Second, this approximate distance function representation requires only a few simple integer comparisons for maintenance. Third, smoothness regularization involves only a few integer calculations and may be handled apart from the energy itself. Fourth, the zero level set is represented exactly removing the need for interpolation off the interface. Lastly, evolution proceeds on the order of milliseconds per iteration using conventional uniprocessor workstations. To highlight its accuracy, flexibility and speed, we demonstrate the technique on standard intensity tracking and stand alone segmentation.

Algorithms II

Reshuffling: a fast algorithm for filtering with arbitrary kernels

Fatih Porikli

Show abstract

A novel method to accelerate the application of linear filters that have multiple identical coefficients on arbitrary kernels is presented. Such filters, including Gabor filters, gray level morphological operators, volume smoothing functions, etc., are widely used in many computer vision tasks. By taking advantage of the overlapping area between the kernels of the neighboring points, the reshuffling technique prevents from the redundant multiplications when the filter response is computed. It finds a set of unique coefficients, constructs a set of relative links for each coefficient, and then sweeps through the input data by accumulating the responses at each point while applying the coefficients using their relative links. Dual solutions, single input access and single output access, that achieve 40% performance improvement are provided. In addition to computational advantage, this method keeps a minimal memory imprint, which makes it an ideal method for embedded platforms. The effects of quantization, kernel size, and symmetry on the computational savings are discussed. Results prove that the reshuffling is superior to the conventional approach.

Motion estimation through efficient matching of a reduced number of reliable singular points

Carlos R. del-Blanco, Fernando Jaureguizar, Luis Salgado, et al.

Show abstract

Motion estimation in video sequences is a classical intensive computational task that is required for a wide range of applications. Many different methods have been proposed to reduce the computational complexity, but the achieved reduction is not enough to allow real time operation in a non-specialized hardware. In this paper an efficient selection of singular points for fast matching between consecutive images is presented, which allows to achieve real time operation. The selection of singular points lies in finding the image points that are robust to the noise and the aperture problem. This is accomplished by imposing restrictions related to the gradient magnitude and the cornerness. The neighborhood of each singular point is characterized by a complex descriptor vector, which presents a high robustness to illumination changes and small variations in the 3D camera viewpoint. The matching between singular points of consecutive images is performed by maximizing a similarity measure based on the previous descriptor vector. The set of correspondences yields a sparse motion vector field that accurately outlines the image motion. In order to demonstrate the efficiency of this approach, a video stabilization application has been developed, which uses the sparse motion vector field as input. Excellent results have been efficiency of the proposed motion estimation technique.

Interactive Paper and Symposium Demonstration

An innovative real time system for infrared focal plan array image enhancement based on FPGA

Ehsan Koohestani, Ali Homaei

Show abstract

The conceptual configuration and special features of a new high-precision real-time signal processing system for a Cooled Infrared Focal Plan Array with 320×240 detectors is presented in this note. The most critical case in the image detectors based on array elements is the Non-Uniformity Correction (NUC) between the sensitive elements due to the different characteristics of the materials in fabrication phase, especially for the IRFPAs with high elements which their non-uniformities are inherently more severe. It is the case that a mechanism for NUC between the sensitive elements needs a structure for compensation of gradual drift in the detectors' output by update the correction factors in a regular way. A feasible method, Least-Mean-Square under a compact hardware is introduced. The correction formula deduced from this approach is the best approximation polynomial of the analytic formula through theoretical analysis. Inherently, the intended detector has not a suitable timing for sending out for standard display equipments, so it is essential to have hardware for frame-to-frame grab which can help for processing applications too. Applying a capability of the sophisticated FPGAs, the contrast enhancement based on Bi-Histogram Equalization which preserves the brightness of infrared image precisely is described.

Noise suppression in video sequences applying fuzzy vectorial directional algorithms

Volodymyr Ponomaryov, Alberto Rosales-Silva, Francisco Gallegos-Funes

Show abstract

The usage of spatial-temporal information is more efficient than just their usage in a separate way. It has been designed a new fuzzy logic adaptive scheme applying directional and fuzzy processing technique with motion detection and spatialtemporal filtering of video sequences. The proposed method can distinguish the uniform regions, edges and details features in the images decreasing the processing time charges, taking only in account the samples, which demonstrate high level of corruption or motion. The algorithm runs adapting spatial-temporal information to smooth an additive noise. The non-stationary noise, which left after temporal algorithm, is removed employing a magnitude algorithm that is adapted using parameters obtained during the filtering. The designed algorithm is compared with other filters found in literature, showing the effectiveness of proposed fuzzy logic filtering approach.

Optimization of tone-mapping functions in video cameras for high dynamic range images

Sascha D. Cvetkovic, Jan Klijn, Peter H. N. de With

Show abstract

For real-time imaging with digital video cameras, good tonal rendition of video is important to ensure high visual comfort for the user. Except local contrast improvements, High Dynamic Range (HDR) scenes require adaptive gradation correction (tone-mapping function) which should enable good visualization of details at lower brightness. We discuss how to construct and control optimal tone-mapping functions, which enhance visibility of image details in the dark regions while not excessively compressing the image in the bright image parts. The result of this method is a 21-dB expansion of the dynamic range. The new algorithm was successfully evaluated in HW and although suited for any video system, it is particularly beneficial for those processing HDR video.

Rapid object candidate detection using increment sign correlation

Masato Kazui, Masaya Itoh, Shoji Muramatsu

Show abstract

We develop a rapid object-candidates detector using Increment Sign Correlation (ISC). Our method aims to detect candidates of objects such as people or vehicles in real time using ISC and a simple shape model. Our method is similar to Generalized Hough Transform (GHT). However we modify its voting process. We use ISC for detecting object candidates instead of the shape voting done by GHT. ISC is robust against shading and low image contrast due to lighting changes because Increment Sign (IS) is insensitive to a perturbation of direction of intensity gradient. The computational cost of IS is lower than that of the gradient also. From the results of our experiment, our detector can run with a 320×240 pixel image within 32 milliseconds on a Pentium 4 processor at 2.8 GHz. Given the initial template size of 10×20 pixels, the number of candidates decreases from 170,196 sub-windows in a 320×240 pixel image to 400 at most with the miss rate of 0.2 %. The detection rate is enough for more precise detectors which need to use richer image features. The experimental results using real image sequences are reported.

Normal map compression based on BTC and wavelet coding

J. Stachera, P. Rokita

Show abstract

Normal mapping is a powerful technique for simulation of surface roughness by means of normal maps. The high polygon count model is represented by coarse polygon mesh with fine details stored in the normal map. Thus, the technique greatly reduces geometric complexity of models and shifts the demands on effective normal map compression algorithms. In this paper we present normal compression algorithm which extends the commonly used 3Dc algorithm introduced by ATI with wavelet compression based on Haar basis. Each block component is coded by one of two modes and the one which introduce the smallest error is chosen for block component representation. This allows for better adaptation to normal map data and improves the peak signal to noise ratio as compared to standalone 3Dc.

VHDL implementation of wavelet packet transforms using SIMULINK tools

Mukul Shirvaikar, Tariq Bushnaq

Show abstract

The wavelet transform is currently being used in many engineering fields. The real-time implementation of the Discrete Wavelet Transform (DWT) is a current area of research as it is one of the most time consuming steps in the JPEG2000 standard. The standard implements two different wavelet transforms: irreversible and reversible Daubechies. The former is a lossy transform, whereas the latter is a lossless transform. Many current JPEG2000 implementations are software-based and not efficient enough to meet real-time deadlines. Field Programmable Gate Arrays (FPGAs) are revolutionizing image and signal processing. Many major FPGA vendors like Altera and Xilinx have recently developed SIMULINK tools to support their FPGAs. These tools are intended to provide a seamless path from system-level algorithm design to FPGA implementation. In this paper, we investigate FPGA implementation of 2-D lifting-based Daubechies 9/7 and Daubechies 5/3 transforms using a Matlab/Simulink tool that generates synthesizable VHSIC Hardware Description Language (VHDL) code. The goal is to study the feasibility of this approach for real time image processing by comparing the performance of the high-level toolbox with a handwritten VHDL implementation. The hardware platform used is an Altera DE2 board with a 50MHz Cyclone II FPGA chip and the Simulink tool chosen is DSPBuilder by Altera.

Generic algorithms for motion compensation and transformation

Henryk Richter, Benno Stabernack, Erika Müller

Show abstract

In this paper, we propose algorithms that map the low-level motion compensation and transformation functions of MPEG-1/2, H.263/MPEG-4 ASP and H.264/MPEG-4 AVC video codecs onto common workflows. This way, a single discrete implementation of luma prediction, chroma prediction and residual transform stages is sufficient for all covered video coding standards. The proposed luma prediction is based on 4×4 blocks to cover the H.264 specifications as well as the elder standards. The design consists of a singular four stage pipeline for two block interpolation and two block averaging stages. Targeted for hardware implementation, a strictly linear execution is provided, avoiding branch operations. The algorithmic behavior is entirely dictated by the contents of the parameter ROM. Since chrominance prediction must cover blocks as small as 2×2 pixels, a distinct operation is proposed for chroma. The bilinear operation scheme in H.264 is able to carry out the operations for the elder standards with minor changes only. In H.264, the classic 8×8 DCT transformation was replaced by a simplified 4×4 integer transform, based on a heavily quantized DCT scheme. By modifications of a well-known multiplier-adder-based scheme, a generalized transformation stage can be derived.

An architecture for on-the-fly correction of radial distortion using FPGA

Sungchan Oh, Gyeonghwan Kim

Show abstract

In this paper, we introduce an FPGA implementation for correcting radial distortion which is non-linear and non-uniform, and inherently observed in images taken by wide angle lenses. In the implementation, the correction is performed in on-the-fly manner by employing a parallel architecture which focuses on efficient manipulation of look-up table (LUT) for coordinate translation: LUT decomposition and single-LUT-multiple-access method. 2D LUT is decomposed into three 1D LUTs to reduce the resource usage. The strategy of single-LUT-multipleaccess is inspired by the fact that there exists spatial and temporal proximity among the LUT accesses, even the nature of the mapping is non-linear and non-uniform. In addition, a way to eliminate redundancy, which occurs as the backward mappings and the interpolations are overlapping, is incorporated into the implementation. The series of effort aims to alleviate problems observed in conventional FPGA implementations of image handling algorithms, which are parallelization of function blocks for higher throughput and minimization of the number of access to off-chip memory. As the result, the corrected image to a distorted input frame can be stored within a vertical blank interval, with less usage of hardware resources and without unnecessary access to off-chip memory.

Robust object detection based on radial reach correlation and adaptive background estimation for real-time video surveillance systems

M. Itoh, M. Kazui, H. Fujii

Show abstract

A method of real-time object detection for video surveillance systems has been developed. The method aims to realize robust object detection by using Radial Reach Correlation (RRC). We also apply a statistical background estimation to cope with dynamic and complex environments. The computational cost of RRC is higher than the simple subtraction method and the background estimation method based on statistical approach needs large memory. It is necessary to reduce the calculation cost in order to apply to an embedded image processing device. Our method is composed of two techniques: fast RRC algorithm and background estimation based on statistical approach with cumulative averaging process. As a result, without deterioration in detection accuracy, the processing time of object detection can be decreased to about 1/4 in comparison with normal RRC.

Feature-assisted threshold selection for all-zero block detection and its application to video coding optimization in H.264

Jianfeng Ren, Nasser Kehtarnavaz

Show abstract

All-zero blocks (AZB) denote blocks with all zero DCT coefficients after quantization. Early determination of AZB can avoid unnecessary DCT/Q/IQ/IDCT computation. Existing techniques in the literature primarily address more efficient thresholds for early determination of AZB. This paper deals with the selection of such thresholds based on low level features including motion activity and texture information. This aspect is then utilized to avoid any: (1) unnecessary quarter accuracy motion estimation, (2) unnecessary multiple reference frame motion estimation, and (3) unnecessary DCT/Q/IQ/IDCT computation. The developed approach has been applied to two different format video sequences CIF and QCIF. The results show that the computational complexity is significantly reduced while the video quality is maintained at a tolerable loss level.

A new strategy based on adaptive mixture of Gaussians for real-time moving objects segmentation

Carlos Cuevas, Narciso García, Luis Salgado

Show abstract

Here, a new and efficient strategy is introduced which allows moving objects detection and segmentation in video sequences. Other strategies use the mixture of gaussians to detect static areas and dynamic areas within the images so that moving objects are segmented [1], [2], [3], [4]. For this purpose, all these strategies use a fixed number of gaussians per pixel. Typically, more than two or three gaussians are used to obtain good results when images contain noise and movement not related to objects of interest. Nevertheless, the use of more than one gaussian per pixel involves a high computational cost and, in many cases, it adds no advantages to single gaussian segmentation. This paper proposes a novel automatic moving object segmentation which uses an adaptive variable number of gaussians to reduce the overall computational cost. So, an automatic strategy is applied to each pixel to determine the minimum number of gaussians required for its classification. Taking into account the temporal context that identifies the reference image pixels as background (static) or moving (dynamic), either the full set of gaussians or just one gaussian are used. Pixels classified with the full set are called MGP (Multiple Gaussian Pixel), while those classified with just one gaussian are called SGP (Single Gaussian Pixel). So, a computation reduction is achieved that depends on the size of this last set. Pixels with a dynamic reference are always MGP. They can be Dynamic-MGP (DMGP) when they belong to the dynamic areas of the image. However, if the classification result shows that the pixel matches one of the gaussian set, then the pixel is labeled static and therefore it is called Static-MGP (SMGP). Usually, these last ones are noise pixels, although they could belong to areas with movement not related to objects of interest. Finally, pixels with a static reference that still match the same gaussian are SGP and they belong to the static background of the image. However, if they do not match the associated gaussian, they are changed either to SMGP or DMGP. In addition, any pixel can maintain its status and SMGP can be changed to DMGP and SGP. A state diagram shows the transition schemes and its characterizations, allowing the forecasting of the reduction of the computational cost of the segmentation process. Tests have shown that the use of the proposed strategy implies a limited loss of accuracy in the segmentations obtained, when comparing with other strategies that use a fixed number of gaussians per pixel, while achieving very high reductions of the overall computational cost of the process.

Improved tracking by decoupling camera and target motion

Shawn Lankton, Allen Tannenbaum

Show abstract

Video tracking is widely used for surveillance, security, and defense purposes. In cases where the camera is not fixed due to pans and tilts, or due to being fixed on a moving platform, tracking can become more difficult. Camera motion must be taken into account, and objects that come and go from the field of view should be continuously and uniquely tracked. We propose a tracking system that can meet these needs by using a frame registration technique to estimate camera motion. This estimate is then used as the input control signal to a Kalman filter which estimates the target's motion model based on measurements from a mean-shift localization scheme. Thus we decouple the camera and object motion and recast the problem in terms of a principled control theory solution. Our experiments show that using a controller built on these principles we are able to track videos with multiple objects in sequences with moving cameras. Furthermore, the techniques are computationally efficient and allow us to accomplish these results in real-time. Of specific importance is that when objects are lost off-frame they can still be uniquely identified and reacquired when they return to the field of view.