Share Email Print

Proceedings Paper

A unified approach for development of Urdu Corpus for OCR and demographic purpose
Author(s): Prakash Choudhary; Neeta Nain; Mushtaq Ahmed
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

This paper presents a methodology for the development of an Urdu handwritten text image Corpus and application of Corpus linguistics in the field of OCR and information retrieval from handwritten document. Compared to other language scripts, Urdu script is little bit complicated for data entry. To enter a single character it requires a combination of multiple keys entry. Here, a mixed approach is proposed and demonstrated for building Urdu Corpus for OCR and Demographic data collection. Demographic part of database could be used to train a system to fetch the data automatically, which will be helpful to simplify existing manual data-processing task involved in the field of data collection such as input forms like Passport, Ration Card, Voting Card, AADHAR, Driving licence, Indian Railway Reservation, Census data etc. This would increase the participation of Urdu language community in understanding and taking benefit of the Government schemes. To make availability and applicability of database in a vast area of corpus linguistics, we propose a methodology for data collection, mark-up, digital transcription, and XML metadata information for benchmarking.

Paper Details

Date Published: 14 February 2015
PDF: 5 pages
Proc. SPIE 9445, Seventh International Conference on Machine Vision (ICMV 2014), 944526 (14 February 2015); doi: 10.1117/12.2180903
Show Author Affiliations
Prakash Choudhary, National Institute of Technology (India)
Neeta Nain, Malaviya National Institute of Technology (India)
Mushtaq Ahmed, Malaviya National Institute of Technology (India)

Published in SPIE Proceedings Vol. 9445:
Seventh International Conference on Machine Vision (ICMV 2014)
Antanas Verikas; Branislav Vuksanovic; Petia Radeva; Jianhong Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?