Share Email Print

Proceedings Paper

Multi-modal analysis for person type classification in news video
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Classifying the identities of people appearing in broadcast news video into anchor, reporter, or news subject is an important topic in high-level video analysis, which remains as a missing piece in the existing research. Given the visual resemblance of different types of people, this work explores multi-modal features derived from a variety of evidences, including the speech identity, transcript clues, temporal video structure, named entities, and face information. A Support Vector Machine (SVM) model is trained on manually-classified people to combine the multitude of features to predict the types of people who are giving monologue-style speeches in news videos. Experiments conducted on ABC World News Tonight video have demonstrated that this approach can achieve over 93% accuracy on classifying person types. The contributions of different categories of features have been compared, which shows that the relatively understudied features such as speech identities and video temporal structure are very effective in this task.

Paper Details

Date Published: 17 January 2005
PDF: 8 pages
Proc. SPIE 5682, Storage and Retrieval Methods and Applications for Multimedia 2005, (17 January 2005); doi: 10.1117/12.587251
Show Author Affiliations
Jun Yang, Carnegie Mellon Univ. (United States)
Alexander G. Hauptmann, Carnegie Mellon Univ. (United States)

Published in SPIE Proceedings Vol. 5682:
Storage and Retrieval Methods and Applications for Multimedia 2005
Rainer W. Lienhart; Noboru Babaguchi; Edward Y. Chang, Editor(s)

© SPIE. Terms of Use
Back to Top