Share Email Print

Proceedings Paper

RGB-D dynamic facial dataset capture for visual speech recognition
Author(s): Naveed Ahmed
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We present a new comprehensive RGB-D dynamic facial dataset capturing system that can be used for facial recognition, emotion recognition, or visual speech processing. Our facial dataset uses an RGB-D (Kinect) camera to record 20 individuals saying 20 common English words or phrases. Using Kinect facial tracking, we not only record the facial features, but also facial outline, RGB data, depth data, mapping between RGB and depth data, facial animation units, facial shape units, and finally 2D and 3D face representations of the face along with the 3D head orientation. The captured RGBD dynamic facial dataset can be employed in several applications. We demonstrate its effectiveness by presenting a new visual speech recognition that employs three-dimensional spatial and temporal data of different facial feature points. The results demonstrate the our RGB-D dynamic facial dataset can be effectively employed in a visual speech recognition system.

Paper Details

Date Published: 27 November 2019
PDF: 5 pages
Proc. SPIE 11321, 2019 International Conference on Image and Video Processing, and Artificial Intelligence, 1132108 (27 November 2019); doi: 10.1117/12.2538762
Show Author Affiliations
Naveed Ahmed, Univ. of Sharjah (United Arab Emirates)

Published in SPIE Proceedings Vol. 11321:
2019 International Conference on Image and Video Processing, and Artificial Intelligence
Ruidan Su, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?