Share Email Print

Proceedings Paper

Cross-modal retrieval of scripted speech audio
Author(s): Charles B. Owen; Fillia Makedon
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script- light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.

Paper Details

Date Published: 29 December 1997
PDF: 10 pages
Proc. SPIE 3310, Multimedia Computing and Networking 1998, (29 December 1997); doi: 10.1117/12.298423
Show Author Affiliations
Charles B. Owen, Dartmouth College (United States)
Fillia Makedon, Dartmouth College (United States)

Published in SPIE Proceedings Vol. 3310:
Multimedia Computing and Networking 1998
Kevin Jeffay; Dilip D. Kandlur; Timothy Roscoe, Editor(s)

© SPIE. Terms of Use
Back to Top