Share Email Print

Proceedings Paper

Variable frame rate analysis for automatic speech recognition
Author(s): Zheng-Hua Tan
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper we investigate the use of variable frame rate (VFR) analysis in automatic speech recognition (ASR). First, we review VFR technique and analyze its behavior. It is experimentally shown that VFR improves ASR performance for signals with low signal-to-noise ratios since it generates improved acoustic models and substantially reduces insertion and substitution errors although it may increase deletion errors. It is also underlined that the match between the average frame rate and the number of hidden Markov model states is critical in implementing VFR. Secondly, we analyze an effective VFR method that uses a cumulative, weighted cepstral-distance criterion for frame selection and present a revision for it. Lastly, the revised VFR method is combined with spectral- and cepstral-domain enhancement methods including the minimum statistics noise estimation (MSNE) based spectral subtraction and the cepstral mean subtraction, variance normalization and ARMA filtering (MVA) process. Experiments on the Aurora 2 database justify that VFR is highly complementary to the enhancement methods. Enhancement of speech both facilitates the frame selection in VFR and provides de-noised speech for recognition.

Paper Details

Date Published: 10 September 2007
PDF: 8 pages
Proc. SPIE 6777, Multimedia Systems and Applications X, 67770G (10 September 2007); doi: 10.1117/12.734890
Show Author Affiliations
Zheng-Hua Tan, Aalborg Univ. (Denmark)

Published in SPIE Proceedings Vol. 6777:
Multimedia Systems and Applications X
Susanto Rahardja; JongWon Kim; Jiebo Luo, Editor(s)

© SPIE. Terms of Use
Back to Top