Share Email Print

Proceedings Paper

Arabic character recognition
Author(s): May Allam
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

This paper presents a complete system for learning and recognizing Arabic characters. Arabic OCR faces technical problems not encountered in other languages such as cursiveness, overriding and overlapping of characters, multiple shapes per character and the presence of vowels above and below the characters. The proposed approach relies on the fact that the process of connecting Arabic characters to produce cursive writing tends to form a fictitious baseline. During preprocessing, contour analysis provides both component isolation and baseline location. In the feature extraction phase, the words are processed from right to left to generate a sequence of labels. Each label is one of a predetermined codebook that represents all possible bit distribution with respect to the baseline. At a certain position, which depends on the label context, a segmentation decision is taken. During training, a model is generated for each character. This model describes the probability of the occurrence of the labels at each vertical position. During recognition, the probability of the label observation sequence is computed and accumulated. The system has been tested on different typewritten, typeset fonts and diacriticized versions of both and the evaluation results are presented.

Paper Details

Date Published: 23 March 1994
PDF: 9 pages
Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171123
Show Author Affiliations
May Allam, Cairo Univ. (Egypt)

Published in SPIE Proceedings Vol. 2181:
Document Recognition
Luc M. Vincent; Theo Pavlidis, Editor(s)

© SPIE. Terms of Use
Back to Top