Share Email Print
cover

Proceedings Paper

Performance and competence models for audiovisual data fusion
Author(s): Harouna Kabre
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

We describe two Artificial Neural Network (ANN) Models for Audio-visual Data Fusion. For the first model, we start an ANN training with an a-priori chosen static architecture together with a set of weighting parameters for the visual and for the auditory paths. Those weighting parameters, called attentional parameters, are tuned to achieve best performance even if the acoustic environment changes. This model is called the Performance Model (PM). For the second model, we start without any unit in the hidden layer of the ANN. Then we incrementally add new units which are partially connected to either the visual path or to the auditory one, and we reiterate this procedure until the global error cannot be reduced anymore. This model is called the Competence Model (CM). CM and PM are trained and tested with acoustic data and their corresponding visual parameters (defined as the vertical and the horizontal lip widths and as the lip-opening area parameters) for the audio-visual speech recognition of the 10 French vowels in adverse conditions. In both cases, we note the recognition rate and analyze the complementarity between the visual and the auditory information in terms of number of hidden units (which are connected either to the visual or to the auditory inputs vs Signal To Noise Ratio (SNR)) and in terms of the tuning of the attentional parameters vs SNR.

Paper Details

Date Published: 15 September 1995
PDF: 8 pages
Proc. SPIE 2589, Sensor Fusion and Networked Robotics VIII, (15 September 1995); doi: 10.1117/12.220949
Show Author Affiliations
Harouna Kabre, Univ. Stendhal (France)


Published in SPIE Proceedings Vol. 2589:
Sensor Fusion and Networked Robotics VIII
Paul S. Schenker; Gerard T. McKee, Editor(s)

© SPIE. Terms of Use
Back to Top