Share Email Print

Proceedings Paper

Audio-video feature correlation: faces and speech
Author(s): Gwenael Durand; Claude Montacie; Marie-Jose Caraty; Pascal Faudemay
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

This paper presents a study of the correlation of features automatically extracted from the audio stream and the video stream of audiovisual documents. In particular, we were interested in finding out whether speech analysis tools could be combined with face detection methods, and to what extend they should be combined. A generic audio signal partitioning algorithm as first used to detect Silence/Noise/Music/Speech segments in a full length movie. A generic object detection method was applied to the keyframes extracted from the movie in order to detect the presence or absence of faces. The correlation between the presence of a face in the keyframes and of the corresponding voice in the audio stream was studied. A third stream, which is the script of the movie, is warped on the speech channel in order to automatically label faces appearing in the keyframes with the name of the corresponding character. We naturally found that extracted audio and video features were related in many cases, and that significant benefits can be obtained from the joint use of audio and video analysis methods.

Paper Details

Date Published: 24 August 1999
PDF: 11 pages
Proc. SPIE 3846, Multimedia Storage and Archiving Systems IV, (24 August 1999); doi: 10.1117/12.360415
Show Author Affiliations
Gwenael Durand, Univ. Pierre et Marie Curie (France)
Claude Montacie, Univ. Pierre et Marie Curie (France)
Marie-Jose Caraty, Univ. Pierre et Marie Curie (France)
Pascal Faudemay, Univ. Pierre et Marie Curie (France)

Published in SPIE Proceedings Vol. 3846:
Multimedia Storage and Archiving Systems IV
Sethuraman Panchanathan; Shih-Fu Chang; C.-C. Jay Kuo, Editor(s)

© SPIE. Terms of Use
Back to Top