Share Email Print
cover

Proceedings Paper

Speech endpoint detection with non-language speech sounds for generic speech processing applications
Author(s): Matthew McClain; Brian Romanowski
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

Paper Details

Date Published: 5 May 2009
PDF: 9 pages
Proc. SPIE 7305, Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense VIII, 73051B (5 May 2009); doi: 10.1117/12.818868
Show Author Affiliations
Matthew McClain, 21st Century Technologies (United States)
Brian Romanowski, 21st Century Technologies (United States)


Published in SPIE Proceedings Vol. 7305:
Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense VIII
Edward M. Carapezza, Editor(s)

© SPIE. Terms of Use
Back to Top