Share Email Print

Proceedings Paper

Discovery and fusion of salient multimodal features toward news story segmentation
Author(s): Winston Hsu; Shih-Fu Chang; Chih-Wei Huang; Lyndon Kennedy; Ching-Yung Lin; Giridharan Iyengar
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to effectively fuse diverse features from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to automatically discover relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features -- asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measures up to 0.76 in ABC news and 0.73 in CNN news), present how these multi-level multi-modal features construct the probabilistic framework, and more importantly observe an interesting opportunity for further improvement.

Paper Details

Date Published: 18 December 2003
PDF: 15 pages
Proc. SPIE 5307, Storage and Retrieval Methods and Applications for Multimedia 2004, (18 December 2003); doi: 10.1117/12.533037
Show Author Affiliations
Winston Hsu, Columbia Univ. (United States)
Shih-Fu Chang, Columbia Univ. (United States)
Chih-Wei Huang, Columbia Univ. (United States)
Lyndon Kennedy, Columbia Univ. (United States)
Ching-Yung Lin, IBM Thomas J. Watson Research Center (United States)
Giridharan Iyengar, IBM Thomas J. Watson Research Center (United States)

Published in SPIE Proceedings Vol. 5307:
Storage and Retrieval Methods and Applications for Multimedia 2004
Minerva M. Yeung; Rainer W. Lienhart; Chung-Sheng Li, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?