Share Email Print

Journal of Electronic Imaging

Visual speech recognition by recurrent neural networks
Author(s): Gihad Rabi; Si Wei Lu
Format Member Price Non-Member Price
PDF $20.00 $25.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

One of the major drawbacks of current acoustically based speech recognizers is that their performance deteriorates drastically with noise. Our focus is to develop a computer system that performs speech recognition based on visual information concerning the speaker. The system automatically extracts visual speech features through image-processing techniques that operate on facial images taken in a normally illuminated environment. To cope with the dynamic nature of change in speech patterns with respect to time as well as the spatial variations in the individual patterns, the proposed recognition scheme uses a recurrent neural network architecture. By specifying a certain behavior when the network is presented with exemplar sequences, the recurrent network is trained with no more than feedforward complexity. The network’s desired behavior is based on characterizing a given word by well-defined segments. Adaptive segmentation is employed to segment the training sequences of a given class. This technique iterates the execution of two steps. First, the sequences are segmented individually. Then, a generalized version of dynamic time warping is used to align the segments of all sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentation error. The system is implemented and tested on a few words. The results are satisfactory. In particular, the system is able to distinguish between words with common segments. Moreover, it tolerates to a great extent variable-duration words of the same class.

Paper Details

Date Published: 1 January 1998
PDF: 9 pages
J. Electron. Imag. 7(1) doi: 10.1117/1.482627
Published in: Journal of Electronic Imaging Volume 7, Issue 1
Show Author Affiliations
Gihad Rabi, Memorial Univ. of Newfoundland (Canada)
Si Wei Lu, Memorial Univ. of Newfoundland (Canada)

© SPIE. Terms of Use
Back to Top