Share Email Print

Proceedings Paper

A Chinese acoustic model based on convolutional neural network
Author(s): Qian Zhang; Jun Sang; Mohammad S. Alam; Bin Cai; Li Yang
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Speech recognition has always been one of the research focuses in the field of human-computer communication and interaction. The main purpose of automatic speech recognition (ASR) is to convert speech waveform signals into text. Acoustic model is the main component of ASR, which is used to connect the observation features of speech signals with the speech modeling units. In recent years, deep learning has become the mainstream technology in the field of speech recognition. In this paper, a convolutional neural network architecture composed of VGG and Connectionist Temporal Classification (CTC) loss function was proposed for speech recognition acoustic model. Traditional acoustic model training is based on frame-level labels with cross-entropy criterion, which requires a tedious label alignment procedure. The CTC loss was adopted to automatically learn the alignments between speech frames and label sequences, such that the training process is end-to-end. The architecture can exploit temporal and spectral structures of speech signals simultaneously. Batch normalization (BN) technique was used for normalizing each layers input to reduce internal covariance shift. To prevent overfitting, dropout technique was used during training to improve network generalization ability. The speech signal was transformed into a spectral image through a series of processing to be the input of the neural network. The input feature is 200 dimensions, and output labels of acoustic mode is 415 Chinese pronunciation without pitch. The experimental results demonstrated that the proposed model achieves the Character error rate (CER) of 17.97% and 23.86% on public Mandarin speech corpus, AISHELL-1 and ST-CMDS-20170001_1, respectively.

Paper Details

Date Published: 13 May 2019
PDF: 7 pages
Proc. SPIE 10995, Pattern Recognition and Tracking XXX, 109950U (13 May 2019); doi: 10.1117/12.2520356
Show Author Affiliations
Qian Zhang, Chongqing Univ. (China)
Jun Sang, Chongqing Univ. (China)
Mohammad S. Alam, Texas A&M Univ.-Kingsville (United States)
Bin Cai, Chongqing Univ. (China)
Li Yang, Chongqing Univ. (China)

Published in SPIE Proceedings Vol. 10995:
Pattern Recognition and Tracking XXX
Mohammad S. Alam, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?