Share Email Print

Journal of Electronic Imaging

Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards
Author(s): Francesc Massanes; Marie Cadennes; Jovan G. Brankov
Format Member Price Non-Member Price
PDF $20.00 $25.00

Paper Abstract

We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and noninteger search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a noninteger search grid. The additional speedup for a noninteger search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition, we compared the execution time of the proposed FS GPU implementation with two existing, highly optimized nonfull grid search CPU-based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and simplified unsymmetrical multi-hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720 × 480 pixels in resolution commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

Paper Details

Date Published: 1 July 2011
PDF: 11 pages
J. Electron. Imag. 20(3) 033004 doi: 10.1117/1.3606588
Published in: Journal of Electronic Imaging Volume 20, Issue 3
Show Author Affiliations
Francesc Massanes, Illinois Institute of Technology (United States)
Marie Cadennes, Illinois Institute of Technology (United States)
Jovan G. Brankov, Illinois Institute of Technology (United States)

© SPIE. Terms of Use
Back to Top