Share Email Print

Proceedings Paper

Unsupervised real-time speaker identification for daily movies
Author(s): Ying Li; C.-C. Jay Kuo
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The problem of identifying speakers for movie content analysis is addressed in this paper. While most previous work on speaker identification was carried out in a supervised mode using pure audio data, more robust results can be obtained in real-time by integrating knowledge from multiple media sources in an unsupervised mode. In this work, both audio and visual cues will be employed and subsequently combined in a probabilistic framework to identify speakers. Particularly, audio information is used to identify speakers with a maximum likelihood (ML)-based approach while visual information is adopted to distinguish speakers by detecting and recognizing their talking faces based on face detection/recognition and mouth tracking techniques. Moreover, to accommodate for speakers' acoustic variations along time, we update their models on the fly by adapting to their newly contributed speech data. Encouraging results have been achieved through extensive experiments, which shows a promising future of the proposed audiovisual-based unsupervised speaker identification system.

Paper Details

Date Published: 1 July 2002
PDF: 12 pages
Proc. SPIE 4862, Internet Multimedia Management Systems III, (1 July 2002); doi: 10.1117/12.473031
Show Author Affiliations
Ying Li, Univ. of Southern California (United States)
C.-C. Jay Kuo, Univ. of Southern California (United States)

Published in SPIE Proceedings Vol. 4862:
Internet Multimedia Management Systems III
John R. Smith; Sethuraman Panchanathan; Tong Zhang, Editor(s)

© SPIE. Terms of Use
Back to Top