Share Email Print
cover

Proceedings Paper • new

Non-native speech recognition using audio style transfer
Author(s): Kacper Radzikowski; Mateusz Forc; Le Wang; Osamu Yoshie; Robert M. Nowak
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Recently automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. However, the score drops significantly, when the ASR system is being used with a non-native speaker of the language to be recognized, mainly because of specific pronunciation and accent features. A limited volume of labeled datasets containing samples of a non-native speech makes it difficult to train any new ASR systems targeted for non-native speakers. In our research, we tried tackling the problem of a non-native accent and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech produced by a nonnative speaker, so that it resembles the native speech to a higher extent, i.e. a method for accent neutralization. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new speech recognizers, adapted for non-native speech. The modification can be thus performed on the fly, before passing the data forward to the speech recognition system itself.

Paper Details

Date Published: 6 November 2019
PDF: 6 pages
Proc. SPIE 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, 111762J (6 November 2019); doi: 10.1117/12.2536535
Show Author Affiliations
Kacper Radzikowski, Waseda Univ. (Japan)
Mateusz Forc, Warsaw Univ. of Technology (Poland)
Le Wang, Waseda Univ. (Japan)
Osamu Yoshie, Waseda Univ. (Japan)
Robert M. Nowak, Warsaw Univ. of Technology (Poland)


Published in SPIE Proceedings Vol. 11176:
Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019
Ryszard S. Romaniuk; Maciej Linczuk, Editor(s)

© SPIE. Terms of Use
Back to Top