Share Email Print
cover

Proceedings Paper

Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique
Author(s): Minook Kim; Ji-Seon Kim; Hyung-Min Park
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

A method for target speech enhancement based on degenerate unmixing and estimating technique (DUET) has been described. To avoid the requirements of the DUET which need to know the number of sources in advance and to estimate the attenuation and delay parameters for all sources, the method assumes that extraction of only one target signal is required, which is often plausible in real-world applications such as speech enhancement. The method can efficiently recover the target speech with fast convergence by estimating the parameters for the target source only. In addition, it does not need to know the number of sources in advance. In order to accomplish robust speech recognition, we propose an algorithm which employs the cluster-based missing feature reconstruction technique based on log-spectral features of enhanced speech in the process of extracting mel-frequency cepstral coefficients (MFCCs). The algorithm estimates missing time-frequency regions by computing the signal-to-noise ratios (SNRs) from the log-spectral features of the enhanced speech and observed noisy speech and by finding time-frequency segments which have the SNRs smaller than a threshold. The missing time-frequency regions are filled by using bounded estimation based on the log-spectral features that are considered to be reliable and on the knowledge of the log-spectral feature cluster to which the incoming target speech is assumed to belong. Then, the log-spectral features are transformed into cepstral features in the usual fashion of extracting MFCCs. Experimental results show that the proposed algorithm significantly improves recognition performance in noisy environments.

Paper Details

Date Published: 3 June 2011
PDF: 6 pages
Proc. SPIE 8058, Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering IX, 80580D (3 June 2011); doi: 10.1117/12.883340
Show Author Affiliations
Minook Kim, Sogang Univ. (Korea, Republic of)
Ji-Seon Kim, Sogang Univ. (Korea, Republic of)
Hyung-Min Park, Sogang Univ. (Korea, Republic of)


Published in SPIE Proceedings Vol. 8058:
Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering IX
Harold Szu, Editor(s)

© SPIE. Terms of Use
Back to Top