Share Email Print

Proceedings Paper

Defining properties of speech spectrogram images to allow effective pre-processing prior to pattern recognition
Author(s): Mohammed Al-Darkazali; Rupert Young; Chris Chatwin; Philip Birch
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The speech signal of a word is a combination of frequencies which can produce specific transition frequency shapes. These can be regarded as a written text in some unknown ‘script’. Before attempting methods to read the speech spectrogram image using image processing techniques we need first to define the properties of the speech spectrogram image as well as the reduction of the clutter of the spectrogram image and the selection of the methods to be employed for image matching. Thus methods to convert the speech signal to a spectrogram image are initially employed, followed by reduction of the noise in the signal by capturing the energy associated with formants of the speech signal. This is followed by the normalisation of the size of the image and its resolution of in both the frequency and time axes. Finally, template matching methods are employed to recognise portions of text and isolated words. The paper describes the pre-processing methods employed and outlines the use of normalised grey-level correlation for the recognition of words.

Paper Details

Date Published: 29 April 2013
PDF: 11 pages
Proc. SPIE 8748, Optical Pattern Recognition XXIV, 87480G (29 April 2013); doi: 10.1117/12.2014511
Show Author Affiliations
Mohammed Al-Darkazali, Univ. of Sussex (United Kingdom)
Rupert Young, Univ. of Sussex (United Kingdom)
Chris Chatwin, Univ. of Sussex (United Kingdom)
Philip Birch, Univ. of Sussex (United Kingdom)

Published in SPIE Proceedings Vol. 8748:
Optical Pattern Recognition XXIV
David Casasent; Tien-Hsin Chao, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?