Share Email Print

Proceedings Paper

Detection and characterization of motion in video compression
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The movement of objects in video sequences comprises a type of spatiotemporal redundancy that can be decreased mathematically to facilitate video compression. This observation holds particularly in the case of periodic motion, for example, bipedal or quadrupedal locomotion or repetitive gestures. Previously-published motion detection techniques were based on optical flow, interframe differences represented in terms of transform coefficient perturbations, or changes in eigenvalues between frames in a video sequence. However, such methods have deficits that include sensitivity to noise, burdensome computational requirements (e.g., floating point operations), and prohibitive instability in the presence of spatial or temporal interframe discontinuities. In this paper, we discuss several techniques of motion detection in two-dimensional images of three-dimensional scenes — pointwise tracking of constant-intensity pixels, region-based vector field characterization of apparent motion, and correlationbased detection. In the latter category is a technique called Interframe Similarity Matrices (ISMs). ISMs were developed and successfully applied by Yacoob, Black, and Davis to address the challenging problem of detecting human and animal motion in surveillance video sequences. In particular, given an N-frame video sequence, an NxN-element interframe correlation matrix can be constructed and Fourier-transformed to obtain an N/2-element power spectrum of interframe periodicities. Different actions (e.g., walking vs. running) and various actors (e.g., quadruped versus human) tend to be characterized by distinct spatiotemporal spectra, and can often be distinguished from one another. Since each spectrum can be computed from a sequence of small image regions, it is possible to represent interframe motion by a pixel tagging technique, thus implementing detection, segmentation, and representation. If there are K objects with M pixels per frame having B bits per pixel (bpp) in N frames of a compressed video sequence, and each object is segmented into a region represented by a P bit tag, then increased compression results if NKMB < NKP, i.e., MB < P. Implementational discussion concerns efficient algorithms for tagging of motion-containing regions, to decrease the representational overhead to several bits per region. We also discuss motion encoding in a compressed format for purposes of efficient extraction of motion parameters from a compressed image, which can support efficient object recognition in highresolution compressed image sequences.

Paper Details

Date Published: 30 January 2003
PDF: 13 pages
Proc. SPIE 4793, Mathematics of Data/Image Coding, Compression, and Encryption V, with Applications, (30 January 2003); doi: 10.1117/12.452384
Show Author Affiliations
Mark S. Schmalz, Univ. of Florida (United States)
Gerhard X. Ritter, Univ. of Florida (United States)

Published in SPIE Proceedings Vol. 4793:
Mathematics of Data/Image Coding, Compression, and Encryption V, with Applications
Mark S. Schmalz, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?