Share Email Print

Proceedings Paper

Robust speech separation using visually constructed speech signals
Author(s): Parham Aarabi; Negar Habibi Khameneh
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

A technique to virtually recreate speech signals entirely from the visual lip motions of a speaker is proposed. By using six geometric parameters of the lips obtained from the Tulips1 database, a virtual speech signal is recreated by using a 3.6s audiovisual training segment as a basis for the recreation. It is shown that the virtual speech signal has an envelope that is directly related to the envelope of the original acoustic signal. This visual signal envelope reconstruction is then used as a basis for robust speech separation where all the visual parameters of the different speakers are available. It is shown that, unlike previous signal separation techniques, which required an ideal mixture of independent signals, the mixture coefficients can be very accurately estimated using the proposed technique in even non-ideal situations.

Paper Details

Date Published: 6 March 2002
PDF: 9 pages
Proc. SPIE 4731, Sensor Fusion: Architectures, Algorithms, and Applications VI, (6 March 2002); doi: 10.1117/12.458389
Show Author Affiliations
Parham Aarabi, Univ. of Toronto (Canada)
Negar Habibi Khameneh, Univ. of Toronto (Canada)

Published in SPIE Proceedings Vol. 4731:
Sensor Fusion: Architectures, Algorithms, and Applications VI
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top