Share Email Print

Proceedings Paper

Singing voice detection for karaoke application
Author(s): Arun Shenoy; Yuansheng Wu; Ye Wang
Format Member Price Non-Member Price
PDF $17.00 $21.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We present a framework to detect the regions of singing voice in musical audio signals. This work is oriented towards the development of a robust transcriber of lyrics for karaoke applications. The technique leverages on a combination of low-level audio features and higher level musical knowledge of rhythm and tonality. Musical knowledge of the key is used to create a song-specific filterbank to attenuate the presence of the pitched musical instruments. This is followed by subband processing of the audio to detect the musical octaves in which the vocals are present. Text processing is employed to approximate the duration of the sung passages using freely available lyrics. This is used to obtain a dynamic threshold for vocal/ non-vocal segmentation. This pairing of audio and text processing helps create a more accurate system. Experimental evaluation on a small database of popular songs shows the validity of the proposed approach. Holistic and per-component evaluation of the system is conducted and various improvements are discussed.

Paper Details

Date Published: 31 July 2006
PDF: 11 pages
Proc. SPIE 5960, Visual Communications and Image Processing 2005, 596028 (31 July 2006); doi: 10.1117/12.631645
Show Author Affiliations
Arun Shenoy, National Univ. of Singapore (Singapore)
Yuansheng Wu, National Univ. of Singapore (Singapore)
Ye Wang, National Univ. of Singapore (Singapore)

Published in SPIE Proceedings Vol. 5960:
Visual Communications and Image Processing 2005
Shipeng Li; Fernando Pereira; Heung-Yeung Shum; Andrew G. Tescher, Editor(s)

© SPIE. Terms of Use
Back to Top