Share Email Print

Proceedings Paper

Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video
Author(s): Regunathan Radhakrishan; Ziyou Xiong; Ajay Divakaran; Bhiksha Raj
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In our past work, we have attempted to use a mid-level feature namely the state population histogram obtained from the Hidden Markov Model (HMM) of a general sound class, for speaker change detection so as to extract semantic boundaries in broadcast news. In this paper, we compare the performance of our previous approach with another approach based on video shot detection and speaker change detection using the Bayesian Information Criterion (BIC). Our experiments show that the latter approach performs significantly better than the former. This motivated us to examine the mid-level feature closely. We found that the component population histogram enabled discovery of broad phonetic categories such as vowels, nasals, fricatives etc, regardless of the number of distinct speakers in the test utterance. In order for it to be useful for speaker change detection, the individual components should model the phonetic sounds of each speaker separately. From our experiments, we conclude that state/component population histograms can only be useful for further clustering or semantic class discovery if the features are chosen carefully so that the individual states represent the semantic categories of interest.

Paper Details

Date Published: 26 November 2003
PDF: 7 pages
Proc. SPIE 5242, Internet Multimedia Management Systems IV, (26 November 2003); doi: 10.1117/12.514397
Show Author Affiliations
Regunathan Radhakrishan, Mitsubishi Electric Research Labs. (United States)
Ziyou Xiong, Mitsubishi Electric Research Labs. (United States)
Ajay Divakaran, Mitsubishi Electric Research Labs. (United States)
Bhiksha Raj, Mitsubishi Electric Research Labs. (United States)

Published in SPIE Proceedings Vol. 5242:
Internet Multimedia Management Systems IV
John R. Smith; Sethuraman Panchanathan; Tong Zhang, Editor(s)

© SPIE. Terms of Use
Back to Top