Share Email Print

Proceedings Paper

Face and lip tracking in unconstrained imagery for improved automatic speech recognition
Author(s): Brandon Crow; Jane Xiaozheng Zhang
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

When combined with acoustical speech information, visual speech information (lip movement) significantly improves Automatic Speech Recognition (ASR) in acoustically noisy environments. Previous research has demonstrated that visual modality is a viable tool for identifying speech. However, the visual information has yet to become utilized in mainstream ASR systems due to the difficulty in accurately tracking lips in real-world conditions. This paper presents our current progress in tracking face and lips in visually challenging environments. Findings suggest the mean shift algorithm performs poorly for small regions, in this case the lips, but it achieves near 80% accuracy for facial tracking.

Paper Details

Date Published: 19 January 2009
PDF: 11 pages
Proc. SPIE 7257, Visual Communications and Image Processing 2009, 72571Y (19 January 2009); doi: 10.1117/12.817092
Show Author Affiliations
Brandon Crow, California Polytechnic State Univ. (United States)
Jane Xiaozheng Zhang, California Polytechnic State Univ. (United States)

Published in SPIE Proceedings Vol. 7257:
Visual Communications and Image Processing 2009
Majid Rabbani; Robert L. Stevenson, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?