Share Email Print

Proceedings Paper

Automatic movie index generation based on multimodal information
Author(s): Ying Li; Shrikanth S Narayanan; Wei H. Ming; C.-C. Jay Kuo
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

A fundamental task in video analysis is to organize and index multimedia data in a meaningful manner so as to facilitate user access for tasks such as browsing and retrieval. This paper addresses the problem of automatic index generation of movie databases based on audiovisual information. In particular, given a movie we first extract key movie events including two-speaker dialog scenes, multiple-speaker dialog scenes and hybrid scenes by using the proposed window-based sweep algorithm and the K-means clustering algorithms. Following event detection, the identity of each individual speaker in a dialog scene is recognized based on a statistical maximum likelihood approach. The identification relies on the likelihood ratio calculation between the incoming speech data and Gaussian mixture models of the speakers and the background. It is evident that the event and the speaker identity information will serve as a crucial part of the movie index table. Preliminary experimental results show that, by integrating multiple media information, we can obtain robust and meaningful event detection and speaker identification results.

Paper Details

Date Published: 20 July 2001
PDF: 12 pages
Proc. SPIE 4519, Internet Multimedia Management Systems II, (20 July 2001); doi: 10.1117/12.434282
Show Author Affiliations
Ying Li, Univ. of Southern California (United States)
Shrikanth S Narayanan, Univ. of Southern California (United States)
Wei H. Ming, Univ. of Southern California (United States)
C.-C. Jay Kuo, Univ. of Southern California (United States)

Published in SPIE Proceedings Vol. 4519:
Internet Multimedia Management Systems II
John R. Smith; Sethuraman Panchanathan; C.-C. Jay Kuo; Chinh Le, Editor(s)

© SPIE. Terms of Use
Back to Top