Share Email Print
cover

Proceedings Paper

Machine-printed Arabic OCR
Author(s): Khosrow M. Hassibi
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

This paper presents a brief overview of our research in the development of an OCR system for recognition of machine-printed texts in languages that use the Arabic alphabet. The cursive nature of machine-printed Arabic makes the segmentation of words into letters a challenging problem. In our approach, through a novel preliminary segmentation technique, a word is broken into pieces where each piece may not represent a valid letter in general. Neural networks trained on a training sample set of about 500 Arabic text images are used for recognition of these pieces. The rules governing the alphabet and character-level contextual information are used for recombining these pieces into valid letters. Higher-level contextual analysis schemes including the use of an Arabic lexicon and n-grams is also under development and are expected to improve the word recognition accuracy. The segmentation, recognition, and contextual analysis processes are closely integrated using a feedback scheme. The details of preparation of the training set and some recent results on training of the networks will be presented.

Paper Details

Date Published: 25 February 1994
PDF: 9 pages
Proc. SPIE 2103, 22nd AIPR Workshop: Interdisciplinary Computer Vision: Applications and Changing Needs, (25 February 1994); doi: 10.1117/12.169463
Show Author Affiliations
Khosrow M. Hassibi, Mitek Systems, Inc. (United States)


Published in SPIE Proceedings Vol. 2103:
22nd AIPR Workshop: Interdisciplinary Computer Vision: Applications and Changing Needs

© SPIE. Terms of Use
Back to Top