Share Email Print
cover

Proceedings Paper

Implicit prosody mining based on the human eye image capture technology
Author(s): Pei-pei Gao; Feng Liu
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit Prosody mining based on the human eye image capture technology to extract the parameters from the image of human eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore, how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing system of text and eye movement control system was discussed. It was proved that under the same text familiarity condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze duration model based on the Chinese language level prosodic structure was presented to change previous methods of machine learning and probability forecasting, obtain readers’ real internal reading rhythm and to synthesize voice with personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining based on the human eye image capture technology makes the synthesized speech has more flexible expressions.

Paper Details

Date Published: 21 August 2013
PDF: 6 pages
Proc. SPIE 8908, International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Sensors and Applications, 89081X (21 August 2013); doi: 10.1117/12.2034656
Show Author Affiliations
Pei-pei Gao, Nankai Univ. (China)
Feng Liu, Tianjin Univ. (China)


Published in SPIE Proceedings Vol. 8908:
International Symposium on Photoelectronic Detection and Imaging 2013: Imaging Sensors and Applications
Jun Ohta; Nanjian Wu; Binqiao Li, Editor(s)

© SPIE. Terms of Use
Back to Top