Share Email Print

Proceedings Paper

Keyword spotting for multimedia document indexing
Author(s): Philippe Gelin; Christian J. Wellekens
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

We tackle the problem of multimedia indexing using keyword spotting on the spoken part of the data. Word spotting systems for indexing have to meet vary hard specifications: short response times to queries, speaker independent mode, open vocabulary in order to be able to track any keyword. To meet these constraints keyword models should be build according to their phonetic spelling and the process should be divided in two parts: preprocessing of the speech signal and query over a lattice of hypotheses. Different classification criteria have been studied for hypothesis generation: frame labeling, maximum likelihood and maximum a posteriori (MAP). The hypothesis probability is computed either through standard gaussian model or through a hybrid Hidden Markov Model-Neural Network. The training of the phonemic models is based either on Viterbi alignment or on recursive estimation and maximization of a posteriori probabilities. In the latter discriminant properties between phonemes are enforced. Tests have been conducted on TIMIT database as well as on TV news soundtracks. Interesting results have been obtained in time saving for the documentalist. The ultimate goal is to couple the soundtrack indexing with tools for video indexing in order to enhance the robustness of the system.

Paper Details

Date Published: 6 October 1997
PDF: 12 pages
Proc. SPIE 3229, Multimedia Storage and Archiving Systems II, (6 October 1997); doi: 10.1117/12.290357
Show Author Affiliations
Philippe Gelin, Institut Eurecom (France)
Christian J. Wellekens, Institut Eurecom (France)

Published in SPIE Proceedings Vol. 3229:
Multimedia Storage and Archiving Systems II
C.-C. Jay Kuo; Shih-Fu Chang; Venkat N. Gudivada, Editor(s)

© SPIE. Terms of Use
Back to Top