Share Email Print

Proceedings Paper

Fusion of visual and audio features for person identification in real video
Author(s): Dongge Li; Gang Wei; Ishwar K. Sethi; Nevenka Dimitrova
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this research, we studied the joint use of visual and audio information for the problem of identifying persons in real video. A person identification system, which is able to identify characters in TV shows by the fusion of audio and visual information, is constructed based on two different fusion strategies. In the first strategy, speaker identification is used to verify the face recognition result. The second strategy consists of using face recognition and tracking to supplement speaker identification results. To evaluate our system's performance, an information database was generated by manually labeling the speaker and the main person's face in every I-frame of a video segment of the TV show 'Seinfeld'. By comparing the output form our system with our information database, we evaluated the performance of each of the analysis channels and their fusion. The results show that while the first fusion strategy is suitable for applications where precision is much more critical than recall. The second fusion strategy, on the other hand, generates the best overall identification performance. It outperforms either of the analysis channels greatly in both precision an recall and is applicable to more general applications, such as, in our case, to identify persons in TV programs.

Paper Details

Date Published: 1 January 2001
PDF: 8 pages
Proc. SPIE 4315, Storage and Retrieval for Media Databases 2001, (1 January 2001); doi: 10.1117/12.410926
Show Author Affiliations
Dongge Li, Wayne State Univ. (United States)
Gang Wei, Wayne State Univ. (United States)
Ishwar K. Sethi, Oakland Univ. (United States)
Nevenka Dimitrova, Philips Research (United States)

Published in SPIE Proceedings Vol. 4315:
Storage and Retrieval for Media Databases 2001
Minerva M. Yeung; Chung-Sheng Li; Rainer W. Lienhart, Editor(s)

© SPIE. Terms of Use
Back to Top