Share Email Print

Optical Engineering

Discovering objects and their location in videos using spatial-temporal context words
Author(s): Hao Sun; Cheng Wang; Boliang Wang; Naser El-Sheimy
Format Member Price Non-Member Price
PDF $20.00 $25.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We present a novel unsupervised learning algorithm for discovering objects and their location in videos from moving cameras. The videos can switch between different shots, and contain cluttered background, occlusion, camera motion, and multiple independently moving objects. We exploit both appearance consistency and spatial configuration consistency of local patches across frames for object recognition and localization. The contributions of this paper are twofold. First, we propose a combined approach for simultaneous spatial context and temporal context generation. Local video patches are extracted and described using the generated spatial-temporal context words. Second, a dynamic topic model, based on the representation of a bag of spatial-temporal context words, is introduced to learn object category models in video sequences. The proposed model can categorize and localize multiple objects in a single video. Objects leaving or entering the scene at multiple times can also be handled efficiently in the dynamic framework. Experimental results on the CamVid data set and the VISAT™ data set demonstrate the effectiveness and robustness of the proposed method.

Paper Details

Date Published: 1 September 2010
PDF: 9 pages
Opt. Eng. 49(9) 097003 doi: 10.1117/1.3488041
Published in: Optical Engineering Volume 49, Issue 9
Show Author Affiliations
Hao Sun, National Univ. of Defense Technology (China)
Cheng Wang, Xiamen Univ. (China)
Boliang Wang, Xiamen Univ. (China)
Naser El-Sheimy, Univ. of Calgary (Canada)

© SPIE. Terms of Use
Back to Top