Share Email Print

Proceedings Paper

Noninvasive extraction of audiovisual cues for multimodal applications
Author(s): Harouna Kabre
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We describe HOPS, a system for extracting some audiovisual cues for the modeling of a computer end-user environment. The objective of the study is to provide some reliable audiovisual cues in order to 'augment' the computer input devices set for multimodal applications. The system accepts an audio-visual scene as input and produces different kinds of events which could contribute to increase the awareness and robustness of interactive system. The described framework for the extraction of cues is ecological and homogenous. On the audio path a cross power spectrum method is applied for extracting different kind of acoustic patterns defined as acoustic segments. The acoustic signal from a microphone and the acoustic segments are firstly FFT- transformed, averaged, and secondly correlated in the spectral domain. The maxima of the inverse Fourier transform of this cross-power spectrum is the criteria for the detection of some acoustic events. On the video path, we define some initial color models of some desired cues such as mouth, eyes, etc. and then track them in the audiovisual scene recorded by a camera.

Paper Details

Date Published: 8 July 1998
PDF: 6 pages
Proc. SPIE 3389, Hybrid Image and Signal Processing VI, (8 July 1998); doi: 10.1117/12.316534
Show Author Affiliations
Harouna Kabre, Joseph Fourier Univ. (France)

Published in SPIE Proceedings Vol. 3389:
Hybrid Image and Signal Processing VI
David P. Casasent; Andrew G. Tescher, Editor(s)

© SPIE. Terms of Use
Back to Top