Share Email Print
cover

Proceedings Paper

CommonSense: a preprocessing system to identify errors in large transcribed corpora
Author(s): Ryan Propper; Keyvan Mohajer; Vaughan Pratt
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

A system was designed to locate and correct errors in large transcribed corpora. The program, called CommonSense, relies on a set of rules that identify mistakes related to homonyms, words with distinct definitions but identical pronunciations. The system was run on the 1996 and 1997 Broadcast News Speech Corpora, and correctly identified more than 400 errors in these data. Future work may extend CommonSense to automatically correct errors in hypothesis files created as the output of speech recognition systems.

Paper Details

Date Published: 18 April 2006
PDF: 6 pages
Proc. SPIE 6242, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006, 62420B (18 April 2006); doi: 10.1117/12.663836
Show Author Affiliations
Ryan Propper, Stanford Univ. (United States)
Keyvan Mohajer, Stanford Univ. (United States)
Vaughan Pratt, Stanford Univ. (United States)


Published in SPIE Proceedings Vol. 6242:
Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2006
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top