Share Email Print
cover

Proceedings Paper

New statistical method for machine-printed Arabic character recognition
Author(s): Hua Wang; Xiaoqing Ding; Jianming Jin; M. Halmurat
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Although about 300 million people worldwide, in several different languages, take Arabic characters for writing, Arabic OCR has not been researched as thoroughly as other widely used characters (Latin or Chinese). In this paper, a new statistical method is developed to recognize machine-printed Arabic characters. Firstly, the entire Arabic character set is pre-classified into 32 sub-sets in terms of character forms (Isolated, Final, Initial, Medial), special zones (divided according to the headline and the baseline of a text line) that characters occupy and component information (with or without secondary parts, say, diacritical marks, movements, etc.). Then 12 types of directional features are extracted from character profiles. After dimension reduction by linear discriminant analysis (LDA), features are sent to modified quadratic discriminant function (MQDF), which is utilized as the final classifier. At last, similar characters are discriminated before outputting recognition results. Selecting involved parameters properly, encouraging experimental results on test sets demonstrate the validity of proposed approach.

Paper Details

Date Published: 17 January 2005
PDF: 9 pages
Proc. SPIE 5676, Document Recognition and Retrieval XII, (17 January 2005); doi: 10.1117/12.586491
Show Author Affiliations
Hua Wang, Tsinghua Univ. (China)
Xiaoqing Ding, Tsinghua Univ. (China)
Jianming Jin, Tsinghua Univ. (China)
M. Halmurat, Xinjiang Univ. (China)


Published in SPIE Proceedings Vol. 5676:
Document Recognition and Retrieval XII
Elisa H. Barney Smith; Kazem Taghva, Editor(s)

© SPIE. Terms of Use
Back to Top
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?
close_icon_gray