Share Email Print

Proceedings Paper

Continuous speech segmentation determined by blind source separation
Author(s): Harold H. Szu; Charles C. Hsu; Da-Hong Xie
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

One of the problems of 5 percent error rate encountered in continuous speech recognition is partly due to the difficulty in the identification of a mixed up to two phonemes in a close concatenation. For instance, one speaks of 'Let's go' instead of 'Let us go'. There are two kinds of speech segmentations: the linguistic segmentation and the acoustic segmentation. The linguistic segmentation relies on a combination of acoustic, lexical, semantic, and statistical knowledge sources, which has been studied. Daily spoken conversations are usually abbreviated for speakers' convenience. The acoustic segmentation is to separate the mixed sounds such as /ts/ into /t/ and /s/ for automatically finding linguistic units. Adaptive wavelet transform (AWT) developed by Szu is a linear superposition of banks of constant-Q zero-mean mother wavelets implemented by an ANN called a 'wavenet'. Each neuron is represented by a daughter wavelet, which can be an affine scale change of identical or different method wavelet for a continuous AWT. AWT was designed for the cocktail party effect and to solve the acoustic segmentation of phonemes using a supervised learning ANN architecture. In this paper, we reviewed AWT from Independent Component Analysis viewpoint, and then applied blind source separation to the acoustic de-mixing and segmentation.

Paper Details

Date Published: 26 March 1998
PDF: 13 pages
Proc. SPIE 3391, Wavelet Applications V, (26 March 1998); doi: 10.1117/12.304890
Show Author Affiliations
Harold H. Szu, Univ. of Southwestern Louisiana (United States)
Charles C. Hsu, George Washington Univ. (United States)
Da-Hong Xie, Univ. of Southwestern Louisiana (United States)

Published in SPIE Proceedings Vol. 3391:
Wavelet Applications V
Harold H. Szu, Editor(s)

© SPIE. Terms of Use
Back to Top