Proceedings PaperArabic character recognition
|Format||Member Price||Non-Member Price|
This paper presents a complete system for learning and recognizing Arabic characters. Arabic OCR faces technical problems not encountered in other languages such as cursiveness, overriding and overlapping of characters, multiple shapes per character and the presence of vowels above and below the characters. The proposed approach relies on the fact that the process of connecting Arabic characters to produce cursive writing tends to form a fictitious baseline. During preprocessing, contour analysis provides both component isolation and baseline location. In the feature extraction phase, the words are processed from right to left to generate a sequence of labels. Each label is one of a predetermined codebook that represents all possible bit distribution with respect to the baseline. At a certain position, which depends on the label context, a segmentation decision is taken. During training, a model is generated for each character. This model describes the probability of the occurrence of the labels at each vertical position. During recognition, the probability of the label observation sequence is computed and accumulated. The system has been tested on different typewritten, typeset fonts and diacriticized versions of both and the evaluation results are presented.