Share Email Print

Proceedings Paper

Video content parsing based on combined audio and visual information
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

While previous research on audiovisual data segmentation and indexing primarily focuses on the pictorial part, significant clues contained in the accompanying audio flow are often ignored. A fully functional system for video content parsing can be achieved more successfully through a proper combination of audio and visual information. By investigating the data structure of different video types, we present tools for both audio and visual content analysis and a scheme for video segmentation and annotation in this research. In the proposed system, video data are segmented into audio scenes and visual shots by detecting abrupt changes in audio and visual features, respectively. Then, the audio scene is categorized and indexed as one of the basic audio types while a visual shot is presented by keyframes and associate image features. An index table is then generated automatically for each video clip based on the integration of outputs from audio and visual analysis. It is shown that the proposed system provides satisfying video indexing results.

Paper Details

Date Published: 24 August 1999
PDF: 12 pages
Proc. SPIE 3846, Multimedia Storage and Archiving Systems IV, (24 August 1999); doi: 10.1117/12.360413
Show Author Affiliations
Tong Zhang, Univ. of Southern California (United States)
C.-C. Jay Kuo, Univ. of Southern California (United States)

Published in SPIE Proceedings Vol. 3846:
Multimedia Storage and Archiving Systems IV
Sethuraman Panchanathan; Shih-Fu Chang; C.-C. Jay Kuo, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?