Share Email Print
cover

Proceedings Paper

Speech recognition by humans and machines under conditions with severe channel variability and noise
Author(s): Richard P. Lippmann; Beth A. Carlson
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Despite dramatic recent advances in speech recognition technology, speech recognition still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered below 3 kHz or high-pass filtered above 1 kHz. Machines trained with wide-band speech, however, degrade dramatically under these conditions. An approach to compensate for variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as input features, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory to dynamically modify the probability computations performed using Gaussian Mixture or Radial Basis Function neural network classifiers embedded within Hidden Markov Model recognizers. The approach was successfully demonstrated using a talker-independent digit recognition task. It was found that recognition accuracy across many conditions rises from below 50% to above 95% with this approach. These promising results suggest future work to dynamically estimate SNR's and to explore the dynamics of human adaptation to channel and noise variability.

Paper Details

Date Published: 4 April 1997
PDF: 12 pages
Proc. SPIE 3077, Applications and Science of Artificial Neural Networks III, (4 April 1997); doi: 10.1117/12.271525
Show Author Affiliations
Richard P. Lippmann, MIT Lincoln Lab. (United States)
Beth A. Carlson, MIT Lincoln Lab. (United States)


Published in SPIE Proceedings Vol. 3077:
Applications and Science of Artificial Neural Networks III
Steven K. Rogers, Editor(s)

© SPIE. Terms of Use
Back to Top