Share Email Print

Proceedings Paper

Subjective analysis of a HMM-based visual speech synthesizer
Author(s): Jay J. Williams; Aggelos K. Katsaggelos; Dean C. Garstecki
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Emerging broadband communication systems promise a future of multimedia telephony. The addition of visual information, for example, during telephone conversions would be most beneficial to people with impaired hearing useful for speech reading, based on existing narrowband communications system used for speech signal. A Hidden Markov Model (HMM)-based visual speech synthesizer is designed to improve speech understanding. The key elements in the application of HMMs to this problem are: a) the decomposition of the overall modeling task into key stages; and, b) the judicious determination of the components of the observation vector for each stage. The main contribution of this paper is the development of a novel correlation HMM model that is able to integrate independently trained acoustic and visual HMMs for speech-to-visual synthesis. This model allows increased flexibility in choosing model topologies for the acoustic and visual HMMs. It also reduces the amount of required training data compared to early integration modeling techniques. Results form objective and subjective analysis show that an HMM correlating model can significantly decrease audio-visual synchronization errors and increase speech understanding.

Paper Details

Date Published: 8 June 2001
PDF: 12 pages
Proc. SPIE 4299, Human Vision and Electronic Imaging VI, (8 June 2001); doi: 10.1117/12.429527
Show Author Affiliations
Jay J. Williams, Northwestern Univ. (United States)
Aggelos K. Katsaggelos, Northwestern Univ. (United States)
Dean C. Garstecki, Northwestern Univ. (United States)

Published in SPIE Proceedings Vol. 4299:
Human Vision and Electronic Imaging VI
Bernice E. Rogowitz; Thrasyvoulos N. Pappas, Editor(s)

© SPIE. Terms of Use
Back to Top