Share Email Print

Proceedings Paper

Automatic segmentation of speakers in broadcast audio material
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper, dimension-reduced, decorrelated spectral features for general sound recognition are applied to segment conversational speech of both broadcast news audio and panel discussion television programs. Without a priori information about number of speakers, the audio stream is segmented by a hybrid metric-based and model-based segmentation algorithm. For the measure of the performance we compare the segmentation results of the hybrid method versus metric-based segmentation with both the MPEG-7 standardized features and Mel-scale Frequency Cepstrum Coefficients (MFCC). Results show that the MFCC features yield better performance compared to MPEG-7 features. The hybrid approach significantly outperforms direct metric based segmentation.

Paper Details

Date Published: 18 December 2003
PDF: 10 pages
Proc. SPIE 5307, Storage and Retrieval Methods and Applications for Multimedia 2004, (18 December 2003); doi: 10.1117/12.526080
Show Author Affiliations
Hyoung-Gook Kim, Technical Univ. of Berlin (Germany)
Thomas Sikora, Technical Univ. of Berlin (Germany)

Published in SPIE Proceedings Vol. 5307:
Storage and Retrieval Methods and Applications for Multimedia 2004
Minerva M. Yeung; Rainer W. Lienhart; Chung-Sheng Li, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?