Share Email Print

Proceedings Paper

Audio indexing using speaker identification
Author(s): Lynn D. Wilcox; Don Kimber; Francine R. Chen
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper, a technique for audio indexing based on speaker identification is proposed. When speakers are known a priori, a speaker index can be created in real time using the Viterbi algorithm to segment the audio into intervals from a single talker. Segmentation is performed using a hidden Markov model network consisting of interconnected speaker sub- networks. Speaker training data is used to initiate sub-networks for each speaker. Sub- networks can also be used to model silence, or non-speech sounds such as musical theme. When no prior knowledge of the speakers is available, unsupervised segmentation is performed using a non-real time iterative algorithm. The speaker sub-networks are first initialized, and segmentation is performed by iteratively generating a segmentation using the Viterbi algorithm, and retraining the sub-networks based on the results of the segmentation. Since the accuracy of the speaker segmentation depends on how well the speaker sub-networks are initiated, agglomerative clustering is used to approximately segment the audio according to speaker for initialization of the speaker sub-networks. The distance measure for the agglomerative clustering is a likelihood ratio in which speed segments are characterized by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomerative clustering initialization matches accuracy using initialization with speaker labeled data.

Paper Details

Date Published: 25 October 1994
PDF: 9 pages
Proc. SPIE 2277, Automatic Systems for the Identification and Inspection of Humans, (25 October 1994); doi: 10.1117/12.191878
Show Author Affiliations
Lynn D. Wilcox, Xerox Palo Alto Research Ctr. (United States)
Don Kimber, Xerox Palo Alto Research Ctr. (United States)
Francine R. Chen, Xerox Palo Alto Research Ctr. (United States)

Published in SPIE Proceedings Vol. 2277:
Automatic Systems for the Identification and Inspection of Humans
Richard J. Mammone; J. David Murley Jr., Editor(s)

© SPIE. Terms of Use
Back to Top