Share Email Print

Proceedings Paper

A comparison study between MLP and convolutional neural network models for character recognition
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Optical Character Recognition (OCR) systems have been designed to operate on text contained in scanned documents and images. They include text detection and character recognition in which characters are described then classified. In the classification step, characters are identified according to their features or template descriptions. Then, a given classifier is employed to identify characters. In this context, we have proposed the unified character descriptor (UCD) to represent characters based on their features. Then, matching was employed to ensure the classification. This recognition scheme performs a good OCR Accuracy on homogeneous scanned documents, however it cannot discriminate characters with high font variation and distortion.3 To improve recognition, classifiers based on neural networks can be used. The multilayer perceptron (MLP) ensures high recognition accuracy when performing a robust training. Moreover, the convolutional neural network (CNN), is gaining nowadays a lot of popularity for its high performance. Furthermore, both CNN and MLP may suffer from the large amount of computation in the training phase. In this paper, we establish a comparison between MLP and CNN. We provide MLP with the UCD descriptor and the appropriate network configuration. For CNN, we employ the convolutional network designed for handwritten and machine-printed character recognition (Lenet-5) and we adapt it to support 62 classes, including both digits and characters. In addition, GPU parallelization is studied to speed up both of MLP and CNN classifiers. Based on our experimentations, we demonstrate that the used real-time CNN is 2x more relevant than MLP when classifying characters.

Paper Details

Date Published: 1 May 2017
PDF: 11 pages
Proc. SPIE 10223, Real-Time Image and Video Processing 2017, 1022306 (1 May 2017); doi: 10.1117/12.2262589
Show Author Affiliations
S. Ben Driss, ESIEE Paris, IGM, A3SI (France)
M. Soua, ESIEE Paris, IGM, A3SI (France)
R. Kachouri, ESIEE Paris, IGM, A3SI (France)
M. Akil, ESIEE Paris, IGM, A3SI (France)

Published in SPIE Proceedings Vol. 10223:
Real-Time Image and Video Processing 2017
Nasser Kehtarnavaz; Matthias F. Carlsohn, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?