Proceedings Volume 8305

Visual Information Processing and Communication III

cover
Proceedings Volume 8305

Visual Information Processing and Communication III

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 2 March 2012
Contents: 10 Sessions, 35 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2012
Volume Number: 8305

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8305
  • Session 1
  • Session 2
  • Session 3
  • Session 4
  • Session 5
  • Session 6
  • Session 7
  • Session 8
  • Session 9
Front Matter: Volume 8305
icon_mobile_dropdown
Front Matter: Volume 8305
This PDF file contains the front matter associated with SPIE Proceedings Volume 8305, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Session 1
icon_mobile_dropdown
A novel distortion model for quadtree coding in high efficiency video coding
Bumshik Lee, Sangsoo Ahn, Munchurl Kim
In this paper, a novel distortion model based on a mixture of Laplacian distributions is presented for the transform coefficients of predicted residues in quadtree coding. The mixture Laplacian distribution is made on the coding structure with different quadtree coding unit (CU) depth. Moreover, for intra-coded CU, the distortion model is asymptotically simplified based on the signal characteristics of the transform coefficient. The proposed mixture model of multiple Laplacian distributions is tested for the High Efficiency Video Coding (HEVC) Test Model (HM) with quadtreestructured Coding Unit (CU) and Transform Unit (TU). The experimental results show that the proposed model achieves more accurate results of distortion estimation than the single probability models.
Weighted prediction for HEVC
Philippe Bordes, Dominique Thoreau, Philippe Salmon, et al.
HEVC is the new video coding standard developed in a joint effort (JCT-VC) by ISO MPEG and ITU-T VCEG. As other state-of-the-art block-based inter-prediction codec, it is very sensitive to illumination variations in-between frames. To cope with this limitation, the weighted prediction (WP) tool has been proposed. A comparison of the performance of WP in HEVC and MPEG-4 AVC/H.264 is carried out. The efficiency of WP is very dependent on the quality of the estimated WP parameters. The different stages of state-of-art WP parameters estimators are discussed and a new algorithm is proposed. It is based on histogram matching with global motion compensation. Several options are evaluated and comparison is made with other existing methods.
Impact of video parameters on the DCT coefficient distribution for H.264-like video coders
Nejat Kamaci, Ghassan Al-Regib
We examine the impact of various encoding parameters on the distribution of the DCT coefficients for H.264-like video coders. We model the distribution of the frame DCT coefficients using the most common Laplacian and Cauchy distributions. We show that the resolution, the quantization levels and the coding type have significant impact on the accuracy of the Laplacian and Cauchy distribution based models. We also show that the transform kernel (4 ×4 vs 8 × 8) has little impact. Moreover, we show that for the video sources that have little temporal or spatial detail, such as flat regions, the distribution of the frame DCT coefficients resembles a Laplacian distribution. When the video source exhibits more detail, such as texture and edges, the distribution of the frame DCT coefficients resembles a Cauchy distribution. The correlation between the details of the video source to the two probability distributions can be used to further improve the estimation of the distribution of the frame DCT coefficients, by using a classification based approach.
Adaptive loop filter with directional features and similarity mapping for video coding
PoLin Lai, Felix C. A. Fernandes
To improve both coding efficiency and visual quality of video coding, in this paper, we present an adaptive loop filtering design which exploits local directional characteristics exhibit in the video content. The design combines linear spatial filtering and directional filtering with a similarity mapping function. We compute and compare multiple simple directional features to classify blocks in a video frame into classes with different dominant orientations. Each class of blocks adapt to a directional filter, with symmetric constraints imposed on the filter coefficients according to the dominant orientation determined by the classification. To emphasis pixel similarity for explicit adaptation to edges, we use a simple hard-threshold mapping function to avoid artifacts arising from across-edge filtering. Our design uses only 4 filers per frame with fixed 7×7 diamond-shaped filter support, while achieving better coding efficiency and improved visual quality especially along edges, as compared to other approaches using up to 16 filters with up to 7 vertical × 9 horizontal diamond-shaped filter support.
Session 2
icon_mobile_dropdown
Distributed video coding with progressive significance map
A distributed video coding (DVC) system based on wavelet transform and set partition coding (SPC) is presented in this paper. Conventionally the significance map (sig-map) of SPC is not conducive to Slepian-Wolf (SW) coding, because of the difficulty of generating a side information sig-map and the sensitivity to decoding errors. The proposed DVC system utilizes a higher structured significance map, named progressive significance map (prog-sig-map), which structures the significance information into two parts: a high-level summation significance map (sum-sig-map) and a low-level complementary significance map (comp-sig-map). This prog-sig-map alleviates the above difficulties and thus makes part of the prog-sig-map (specifically, the fixed-length-coded comp-sig-map) suitable for SW coding. Simulation results are provided showing the improved rate-distortion performance of the DVC system even with a simple system configuration.
Improving side information generation using dynamic motion estimation for distributed video coding
Insu Park, David Capson
A new side information generation algorithm using dynamic motion estimation and post processing is proposed for improved distributed video coding. Multiple reference frames are employed for motion estimation at the side information frame generation block of the decoder. After motion estimation and compensation, post processing is applied to improve the hole and overlapped areas on the reconstructed side information frame. The proposed side information method contributes to improve the quality of reconstructed frames at the distributed video decoder. The average encoding time of the distributed video coding is around 15% of H.264 inter coding and 40% of H.264 intra coding. The proposed side information based distributed video coding demonstrates improved performance compared with that of H.264 intra coding.
Directional frame interpolation for MPEG compressed video
Chang Zhao, Xinwei Gao, Xiaopeng Fan, et al.
Image interpolation is one of the most elementary imaging research topics. A number of image interpolation methods have been developed for uncompressed images in the literature. However, a lot of videos have already been stored in MPEG-2 format or have to be transmitted in MPEG-2 format due to bandwidth limitation. The image interpolation methods developed for uncompressed images may not be effective when directly applied to compressed videos, because on the one hand, they do not utilize the information existed in the coded bitstreams; on the other hand, they do not consider quantization error, which may be dominant in some cases. Inspired by the success of the intra prediction in H.264/AVC and the edge-directed image interpolation methods (such as LAZA and NEDI), we propose a directional frame interpolation for MPEG compressed video. In the proposed method, 8×8 intra blocks in I frames are first classified to the nine block directions in transform domain. Then the interpolation on each block is performed along its block direction. For each block direction, an optimal Wiener filter is trained based on the representative video sequences and then used for its interpolation. In the similar way, for each pixel in an inter block in P or B frames, the interpolation is performed along the direction of its corresponding reference block. The experimental results demonstrate that the proposed method achieves better performance than the traditional linear methods such as Bicubic and Bilinear and the edge-directed methods such as LAZA and NEDI, while keeping low computational complexity which meets the requirement of practical applications.
A fast intra prediction method using Hadamard transform in high efficiency video coding
Younhee Kim, DongSan Jun, Soonheung Jung, et al.
For the higher coding performance than the previous video coding standards, high efficiency video coding (HEVC) adopts an angular intra prediction method, which requires heavy computational complexity due to the increased intra prediction modes. In this paper, we propose a fast intra prediction mode decision based on the estimation of rate distortion cost using Hadamard transform to reduce the number of intra prediction mode and early termination whether the current coding unit is splitted or not. The experimental results show that the proposed method reduces the computational complexity of intra prediction in HEVC and achieves similar coding performance to that of HEVC test mode 2.1
Session 3
icon_mobile_dropdown
Lossless description of 3D range models
Neslihan Bayramoğlu, A. Aydin Alatan
The improvements in scanning technologies lead obtaining and managing range image databases. Hence, need of describing and indexing this type of data arises. Since a range model owns different properties compared to complete 3D models, we propose a method that relies on Spherical Harmonics Transform (SHT) for retrieving similar models where the query and the database both consist of only range models. Although SHT, is not a new concept in shape retrieval research, we propose to utilize it for range images by representing the models in a world seen from the camera. The difference and advantage of our algorithm is being information lossless. That is the available shape information is completely included in obtaining the descriptor whereas other mesh retrieval applications utilizing SHT "approximates" the shape that leads information loss. The descriptor is invariant to scale and rotations about z-axis. Proposed method is tested on a large database having high diversity. Performance of the proposed method is superior to the performance of the D2 distribution.
Reference frame selection for loss-resilient depth map coding in multiview video conferencing
Bruno Macchiavello, Camilo Dorea, Edson M. Hung, et al.
Multiview video in "texture-plus-depth" format enables decoder to synthesize freely chosen intermediate views for enhanced visual experience. Nevertheless, transmission of multiple texture and depth maps over bandwidthconstrained and loss-prone networks is challenging, especially for conferencing applications with stringent deadlines. In this paper, we examine the problem of loss-resilient coding of depth maps by exploiting two observations. First, different depth macroblocks have significantly different error sensitivities with respect to the reconstructed images. Second, unlike texture, the relative overhead of using reference pictures with large prediction distance is low for depth maps. This motivates our approach of assigning a weight to represent the varying error sensitivity of each macroblock and using such weights to guide selection of reference frames. Results show that (1) errors in depth maps in sequence with high motion yields significant drop in quality in reconstructed images, and (2) that the proposed scheme can efficiently maintain the quality of reconstructed images even at relatively high packet loss rates of 3-5%.
Low-complexity automated depth-order estimation for 2D-to-3D video conversion
The increasing popularity of 3D TV creates the desire for more 3D video content. Unfortunately, it will take much time for there to be an abundance of 3D video content derived from stereoscopic cameras. However, there currently exists a vast quantity of 2D video material that can potentially be converted to 3D. Converting 2D into 3D is a complex process, and so can be costly. Thus, an automated solution that can be achieved with low-complexity would be desirable. Our past research work has already resulted in a real-time 2D-to-3D conversion technique, but this generates a surrogate depth map that results in pseudo-3D and not necessarily accurate 3D. Our current research focuses on improving the accuracy of the 3D effect by implementing a technique composed of a multi-step process to determine the depth-order of objects, with respect to the camera, in each frame of a video sequence, and incorporating into our existing technique. The multi-step process can be summarized as follows: detect pixels that belong to an edge; use block-based motion estimation to determine if an edge pixel is moving and thus belongs to a moving edge (i.e., occlusion boundary); determine which of either the left or right side block moves with the moving edge pixel, and by deduction determines the occluding object; select seed points from the moving edge pixels; implement color-only region growing from each seed; cluster regions into objects based on their proximity; globally assign depth-order to the objects based on perceived viewing perspective of a frame; and modify the original surrogate depth map to create a more accurate depth map. Test results show that this is a very effective and fast technique for deriving the depth-order of objects and generating more accurate depth map values.
Block-layer optimal bit allocation based on constant perceptual quality
Chao Wang, Xuanqin Mou, Lei Zhang
Bit allocation is a key issue in image/video coding. An optimal bit allocation can improve the encoding performance, which means to maximize the image/video quality in the constraint of bit rate, or vice versa, to minimize the bit rate with a restrictive quality. We suggest that the bit allocation for macro blocks (MBs) can be optimized by aiming at the constant perceptual quality (CPQ) inside an image/a frame. Based on the MINMAX criterion, we proposed a multi-pass block-layer bit allocation scheme for intra frame encoding, in which all the local areas in a frame get approximately the same perceptual quality by choosing the quantization parameter (QP) for each MB. The experimental results show that the proposed method can improve the encoding performance obviously.
Session 4
icon_mobile_dropdown
Patch-wise ideal stopping time for anisotropic diffusion
Hossein Talebi, Peyman Milanfar
Data-dependent filtering methods are powerful techniques for image denoising. Beginning with any base procedure (nonlinear filter), repeated applications of the same process can be interpreted as a discrete version of anisotropic diffusion. As such, a natural question is "What is the best stopping time in iterative data-dependent filtering?" This is the general question we address in this paper. To develop our new method, we estimate the mean-squared-error (MSE) in each image patch. This estimate is used to characterize the effectiveness of the iterative filtering process, and its minimization yields the ideal stopping time for the diffusion process.
Video attention deviation estimation using inter-frame visual saliency map analysis
Yunlong Feng, Gene Cheung, Patrick Le Callet, et al.
A viewer's visual attention during video playback is the matching of his eye gaze movement to the changing video content over time. If the gaze movement matches the video content (e.g., follow a rolling soccer ball), then the viewer keeps his visual attention. If the gaze location moves from one video object to another, then the viewer shifts his visual attention. A video that causes a viewer to shift his attention often is a "busy" video. Determination of which video content is busy is an important practical problem; a busy video is difficult for encoder to deploy region of interest (ROI)-based bit allocation, and hard for content provider to insert additional overlays like advertisements, making the video even busier. One way to determine the busyness of video content is to conduct eye gaze experiments with a sizable group of test subjects, but this is time-consuming and costineffective. In this paper, we propose an alternative method to determine the busyness of video-formally called video attention deviation (VAD): analyze the spatial visual saliency maps of the video frames across time. We first derive transition probabilities of a Markov model for eye gaze using saliency maps of a number of consecutive frames. We then compute steady state probability of the saccade state in the model-our estimate of VAD. We demonstrate that the computed steady state probability for saccade using saliency map analysis matches that computed using actual gaze traces for a range of videos with different degrees of busyness. Further, our analysis can also be used to segment video into shorter clips of different degrees of busyness by computing the Kullback-Leibler divergence using consecutive motion compensated saliency maps.
Robust grid registration for non-blind PSF estimation
Given a blurred image of a known test grid and an accurate estimate of the unblurred image, it has been demonstrated that the underlying blur kernel (or point-spread function, PSF) can be reliably estimated. Unfortunately, the estimate of the sharp image can be sensitive to common imperfections in the setup used to obtain the blurred image, and errors in the image estimate result in an unreliable PSF estimate. We propose a robust ad-hoc method to estimate a sharp prior image, given a blurry, noisy image of the test grid from Joshi1 taken in imperfect lab and lighting conditions. The proposed algorithm is able to reliably reject superfluous image content, can deal with spatially-varying lighting, and is insensitive to errors in alignment of the grid with the image plane. We demonstrate the algorithms performance through simulation, and with a set of test images. We also show that our grid registration algorithm leads to improved PSF estimation and deblurring, compared to an affine registration using spatially invariant lighting correction.
Fast pseudo-semantic segmentation for joint region-based hierarchical and multiresolution representation
Rafiq Sekkal, Clement Strauss, François Pasteau, et al.
In this paper, we present a new scalable segmentation algorithm called JHMS (Joint Hierarchical and Multiresolution Segmentation) that is characterized by region-based hierarchy and resolution scalability. Most of the proposed algorithms either apply a multiresolution segmentation or a hierarchical segmentation. The proposed approach combines both multiresolution and hierarchical segmentation processes. Indeed, the image is considered as a set of images at different levels of resolution, where at each level a hierarchical segmentation is performed. Multiresolution implies that a segmentation of a given level is reused in further segmentation processes operated at next levels so that to insure contour consistency between different resolutions. Each level of resolution provides a Region Adjacency Graph (RAG) that describes the neighborhood relationships between regions within a given level of the multiresolution representation. Region label consistency is preserved thanks to a dedicated projection algorithm based on inter-level relationships. Moreover, a preprocess based on a quadtree partitioning reduces the amount of input data thus leading to a lower overall complexity of the segmentation framework. Experiments show that we obtain effective results when compared to the state of the art together with a lower complexity.
Session 5
icon_mobile_dropdown
Optimal local dimming for LED-backlit LCD displays via linear programming
Xiao Shu, Xiaolin Wu, Søren Forchhammer
LED-backlit LCD displays hold the promise of improving the image quality while reducing the energy consumption with signal-dependent local dimming. To fully realize such potentials we propose a novel local dimming technique that jointly optimizes the intensities of LED backlights and the attenuations of LCD pixels. The objective is to minimize the distortion in luminance reproduction due to the leakage of LCD and the coarse granularity of the LED lights. The optimization problem is formulated as one of linear programming, and both exact and approximate algorithms are proposed. Simulation results demonstrate superior performances of the proposed algorithms over the existing local dimming algorithms.
Gestures for natural interaction with video
Nesrine Fourati, Emmanuel Marilly
In the context of immersive communications, we propose a method enabling natural video interactions through hand gesture recognition between users and a video meeting system. The interaction can be performed either by the mean of hand posture recognition or by the dynamic hand gesture recognition according to user's preference. The statistical approach adopted in our work to recognize hand posture has shown accurate results for both performance evaluation and user test. Besides, the combination of data-mining fields and signal processing for dynamic gestures recognition allows us to define the appropriate rules and to reduce the confusion between gestures. Furthermore, the hand region extraction is based on both skin color and background subtraction to avoid the detection of static objects that have a similar skin color. Finally, the collected user's feedback allows as to evaluate our approach from the user's point of view and to define the limitations that will be discussed in our perspectives in order to improve the results.
Improving underwater visibility using vignetting correction
K. Sooknanan, A. Kokaram, D. Corrigan, et al.
Underwater survey videos of the seafloor are usually plagued with heavy vignetting (radial falloff) outside of the light source beam footprint on the seabed. In this paper we propose a novel multi-frame approach for removing this vignetting phenomenon which involves estimating the light source footprint on the seafloor, and the parameters for our proposed vignetting model. This estimation is accomplished in a bayesian framework with an iterative SVD-based optimization. Within the footprint, we leave the image contents as is, whereas outside this region, we perform vignetting correction. Our approach does not require images with different exposure values or recovery of the camera response function, and is entirely based on the attenuation experienced by point correspondences accross multiple frames. We verify our algorithm with both synthetic and real data, and then compare it with an existing technique. Results obtained show significant improvement in the fidelity of the restored images.
Defect pixel interpolation for lossy compression of camera raw data
Michael Schöberl, Joachim Keinert, Jürgen Seiler, et al.
The image processing pipeline of a traditional digital camera is often limited by processing power. A better image quality could be generated only if more complexity was allowed. In a raw data workflow most algorithms are executed off-camera. This allows the use of more sophisticated algorithms for increasing image quality while reducing camera complexity. However, this requires a major change in the processing pipeline: a lossy compression of raw camera images might be used early in the pipeline. Subsequent off-camera algorithms then need to work on modified data. We analyzed this problem for the interpolation of defect pixels. We found that a lossy raw compression spreads the error from uncompensated defects over many pixels. This leads to a problem as this larger error cannot be compensated after compression. The use of high quality, high complexity algorithms in the camera is also not an option. We propose a solution to this problem: Inside the camera only a simple and low complexity defect pixel interpolation is used. This significantly reduces the compression error for neighbors of defects. We then perform a lossy raw compression and compensate for defects afterwards. The high complexity defect pixel interpolation can be used off-camera. This leads to a high image quality while keeping the camera complexity low.
Session 6
icon_mobile_dropdown
Cubic-panorama image dataset compression
Saeed Salehi, Eric Dubois
In this paper we address the problem of cubic panorama image dataset compression. Two state-of-the-art approaches, namely: H.264/MPEG4 AVC and Dirac video codec, are used and compared for the application of virtual navigation in image based representations of real world environments. Different prediction structures and Group Of Pictures (GOP) sizes are investigated and compared on this new type of visual data. Based on the obtained results, as well as the requirements of the system, an efficient prediction structure and bitstream syntax are proposed. The concept of Epipolar geometry is introduced and a method to facilitate efficient disparity estimation is suggested.
Lossless halftone image compression using adaptive context template update
Sungbum Park, Jaehyun Kim, Yongje Kim
A novel context template design method is presented for lossless compression of halftone images. In each pixel traversal, the proposed method modifies context template according to inter-pixel correlation. Then, each pixel is arithmetic coded by using the updated context template. Based on its adaptation to local pixel correlation, the proposed design scheme outperforms the standard JBIG arithmetic coding by 29 % of bit saving.
Session 7
icon_mobile_dropdown
Recognition of sport players' numbers using fast-color segmentation
Cédric Verleysen, Christophe De Vleeschouwer
This paper builds on a prior work for player detection, and proposes an efficient and effective method to distinguish among players based on the numbers printed on their jerseys. To extract the numbers, the dominant colors of the jersey are learnt during an initial phase and used to speed up the segmentation of the candidate digit regions. An additional set of criteria, considering the relative position and size (compared to the player bounding box) and the density (compared to the digit rectangular support) of the digit, are used to filter out the regions that obviously do not correspond to a digit. Once the plausible digit regions have been extracted, their recognition is based on feature-based classification. A number of original features are proposed to increase the robustness against digit appearance changes, resulting from the font thickness variability and from the deformations of the jersey during the game. Finally, the efficiency and the effectiveness of the proposed method are demonstrated on a real-life basketball dataset. It shows that the proposed segmentation runs about ten times faster than the mean-shift algorithm, but also outlines that the proposed additional features significantly increase the digit recognition accuracy. Despite significant deformations, 40% of the samples, that can be visually recognized as digits, are well classified as numbers. Out of these classified samples, more than 80% of them are correctly recognized. Besides, more than 95% of the samples, that are not numbers, are correctly identified as non-numbers.
On the use of clustering for resource allocation in wireless visual sensor networks
Angeliki V. Katsenou, Lisimachos P. Kondi, Konstantinos E. Parsopoulos
The present study is focused on the problem of quality-driven cross-layer optimization of Direct Sequence Code Division Multiple Access (DS-CDMA) Wireless Visual Sensor Networks (WVSNs). We consider a centralized topology where each sensor transmits directly to a Centralized Control Unit (CCU), which manages the network resources. In real environments, the visual sensors view and transmit scenes with varying amount of motion. Thus, each recorded video has its individual motion characteristics. Our aim is to enable the CCU to jointly allocate the transmission power and source-channel coding rates for each WVSN node under certain quality- driven criteria and constant chip rate. We consider two approaches for the cross-layer optimization scheme. In the first, the optimal set of network resources is assigned to each node according to its individual motion characteristics. In the second approach, the nodes are partitioned into clusters according to the amount of motion in the recorded scenes. Then, all nodes within a cluster are assigned identical network resources. Both approaches result in mixed-integer optimization problems, which are solved with the Particle Swarm Optimization algorithm. Experimental results demonstrate the quality/complexity trade-off for the two approaches.
Kalai-Smorodinsky bargaining solution for optimal resource allocation over wireless DS-CDMA visual sensor networks
Katerina Pandremmenou, Lisimachos P. Kondi, Konstantinos E. Parsopoulos
Surveillance applications usually require high levels of video quality, resulting in high power consumption. The existence of a well-behaved scheme to balance video quality and power consumption is crucial for the system's performance. In the present work, we adopt the game-theoretic approach of Kalai-Smorodinsky Bargaining Solution (KSBS) to deal with the problem of optimal resource allocation in a multi-node wireless visual sensor network (VSN). In our setting, the Direct Sequence Code Division Multiple Access (DS-CDMA) method is used for channel access, while a cross-layer optimization design, which employs a central processing server, accounts for the overall system efficacy through all network layers. The task assigned to the central server is the communication with the nodes and the joint determination of their transmission parameters. The KSBS is applied to non-convex utility spaces, efficiently distributing the source coding rate, channel coding rate and transmission powers among the nodes. In the underlying model, the transmission powers assume continuous values, whereas the source and channel coding rates can take only discrete values. Experimental results are reported and discussed to demonstrate the merits of KSBS over competing policies.
State-of-the-art lossy compression of Martian images via the CMA-ES evolution strategy
Brendan Babb, Frank Moore, Shawn Aldridge, et al.
The research described in this paper uses the CMA-ES evolution strategy to optimize matched forward and inverse transform pairs for the compression and reconstruction of images transmitted from Mars rovers under conditions subject to quantization error. Our best transforms outperform the 2/6 wavelet (whose integer variant was used onboard the rovers), substantially reducing error in reconstructed images without allowing increases in compressed file size. This result establishes a new state-of-the-art for the lossy compression of images transmitted over the deep-space channel.
Session 8
icon_mobile_dropdown
Survey of computer vision in roadway transportation systems
Natesh Manikoth, Robert Loce, Edgar Bernal, et al.
There is a world-wide effort to apply 21st century intelligence to evolving our transportation networks. The goals of smart transportation networks are quite noble and manifold, including safety, efficiency, law enforcement, energy conservation, and emission reduction. Computer vision is playing a key role in this transportation evolution. Video imaging scientists are providing intelligent sensing and processing technologies for a wide variety of applications and services. There are many interesting technical challenges including imaging under a variety of environmental and illumination conditions, data overload, recognition and tracking of objects at high speed, distributed network sensing and processing, energy sources, as well as legal concerns. This conference presentation and publication is brief introduction to the field, and will be followed by an in-depth journal paper that provides more details on the imaging systems and algorithms.
Compression of 2D navigation sequences with rotational and translational motion
D. Springer, F. Simmet, D. Niederkorn, et al.
In-car navigation systems have grown in complexity over the recent years, most notably in terms of route calculation, usability and graphical rendering. In order to guarantee correct system behavior, navigation systems need to be tested under real operating conditions, i.e. with field-tests on the road. In this paper, we will focus on a fast compression solution for 2D navigation renderings, so that field-tests can be archived and handed over to software engineers for subsequent evaluation. No parameters from the rendering procedure are available since access to the system is limited to the raw display signal. Rotation is a dominant factor throughout all navigation sequences, so we show how to reconstruct rotational motion parameters with high accuracy and develop a Global Motion Estimation (GME) method as support for a subsequent H.264/AVC video encoder. By integrating ratedistortion optimization concepts into our scheme, we can efficiently omit the segmentation of static and non-static areas. The runtime of the compression solution, which achieves bitrate savings of up to 19.5%, is evaluated both on a laptop CPU and an embedded OMAP4430 system on chip.
A semi-automatic traffic sign detection, classification, and positioning system
I. M. Creusen, L. Hazelhoff, P. H. N. de With
The availability of large-scale databases containing street-level panoramic images offers the possibility to perform semi-automatic surveying of real-world objects such as traffic signs. These inventories can be performed significantly more efficiently than using conventional methods. Governmental agencies are interested in these inventories for maintenance and safety reasons. This paper introduces a complete semi-automatic traffic sign inventory system. The system consists of several components. First, a detection algorithm locates the 2D position of the traffic signs in the panoramic images. Second, a classification algorithm is used to identify the traffic sign. Third, the 3D position of the traffic sign is calculated using the GPS position of the photographs. Finally, the results are listed in a table for quick inspection and are also visualized in a web browser.
Session 9
icon_mobile_dropdown
Image simulation for automatic license plate recognition
Raja Bala, Yonghui Zhao, Aaron Burry, et al.
Automatic license plate recognition (ALPR) is an important capability for traffic surveillance applications, including toll monitoring and detection of different types of traffic violations. ALPR is a multi-stage process comprising plate localization, character segmentation, optical character recognition (OCR), and identification of originating jurisdiction (i.e. state or province). Training of an ALPR system for a new jurisdiction typically involves gathering vast amounts of license plate images and associated ground truth data, followed by iterative tuning and optimization of the ALPR algorithms. The substantial time and effort required to train and optimize the ALPR system can result in excessive operational cost and overhead. In this paper we propose a framework to create an artificial set of license plate images for accelerated training and optimization of ALPR algorithms. The framework comprises two steps: the synthesis of license plate images according to the design and layout for a jurisdiction of interest; and the modeling of imaging transformations and distortions typically encountered in the image capture process. Distortion parameters are estimated by measurements of real plate images. The simulation methodology is successfully demonstrated for training of OCR.
Traffic camera markup language (TCML)
Yang Cai, Andrew Bunn, Kerry Snyder
In this paper, we present a novel video markup language for articulating semantic traffic data from surveillance cameras and other sensors. The markup language includes three layers: sensor descriptions, traffic measurement, and application interface descriptions. The multi-resolution based video codec algorithm enables a quality-of-service-aware video streaming according the data traffic. A set of object detection APIs are developed using Convex Hull and Adaptive Proportion models and 3D modeling. It is found that our approach outperforms 3D modeling and Scale-Independent Feature Transformation (SIFT) algorithms in terms of robustness. Furthermore, our empirical data shows that it is feasible to use TCML to facilitate the real-time communication between an infrastructure and a vehicle for safer and more efficient traffic control.
Passive detection of vehicle loading
Troy R. McKay, Carl Salvaggio, Jason W. Faulring, et al.
The Digital Imaging and Remote Sensing Laboratory (DIRS) at the Rochester Institute of Technology, along with the Savannah River National Laboratory is investigating passive methods to quantify vehicle loading. The research described in this paper investigates multiple vehicle indicators including brake temperature, tire temperature, engine temperature, acceleration and deceleration rates, engine acoustics, suspension response, tire deformation and vibrational response. Our investigation into these variables includes building and implementing a sensing system for data collection as well as multiple full-scale vehicle tests. The sensing system includes; infrared video cameras, triaxial accelerometers, microphones, video cameras and thermocouples. The full scale testing includes both a medium size dump truck and a tractor-trailer truck on closed courses with loads spanning the full range of the vehicle's capacity. Statistical analysis of the collected data is used to determine the effectiveness of each of the indicators for characterizing the weight of a vehicle. The final sensing system will monitor multiple load indicators and combine the results to achieve a more accurate measurement than any of the indicators could provide alone.
Application of the SNoW machine learning paradigm to a set of transportation imaging problems
Peter Paul, Aaron M. Burry, Yuheng Wang, et al.
Machine learning methods have been successfully applied to image object classification problems where there is clear distinction between classes and where a comprehensive set of training samples and ground truth are readily available. The transportation domain is an area where machine learning methods are particularly applicable, since the classification problems typically have well defined class boundaries and, due to high traffic volumes in most applications, massive roadway data is available. Though these classes tend to be well defined, the particular image noises and variations can be challenging. Another challenge is the extremely high accuracy typically required in most traffic applications. Incorrect assignment of fines or tolls due to imaging mistakes is not acceptable in most applications. For the front seat vehicle occupancy detection problem, classification amounts to determining whether one face (driver only) or two faces (driver + passenger) are detected in the front seat of a vehicle on a roadway. For automatic license plate recognition, the classification problem is a type of optical character recognition problem encompassing multiple class classification. The SNoW machine learning classifier using local SMQT features is shown to be successful in these two transportation imaging applications.
An on-board pedestrian detection and warning system with features of side pedestrian
Ruzhong Cheng, Yong Zhao, ChupChung Wong, et al.
Automotive Active Safety(AAS) is the main branch of intelligence automobile study and pedestrian detection is the key problem of AAS, because it is related with the casualties of most vehicle accidents. For on-board pedestrian detection algorithms, the main problem is to balance efficiency and accuracy to make the on-board system available in real scenes, so an on-board pedestrian detection and warning system with the algorithm considered the features of side pedestrian is proposed. The system includes two modules, pedestrian detecting and warning module. Haar feature and a cascade of stage classifiers trained by Adaboost are first applied, and then HOG feature and SVM classifier are used to refine false positives. To make these time-consuming algorithms available in real-time use, a divide-window method together with operator context scanning(OCS) method are applied to increase efficiency. To merge the velocity information of the automotive, the distance of the detected pedestrian is also obtained, so the system could judge if there is a potential danger for the pedestrian in the front. With a new dataset captured in urban environment with side pedestrians on zebra, the embedded system and its algorithm perform an on-board available result on side pedestrian detection.