Share Email Print

Proceedings Paper

Digitizing physical documents using optical character recognition
Author(s): Abhinav Kaushal Keshari; Rajat Sharma; Madhav J. Nigam
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The need to convert printed text into a computer documented form which can be edited has increased rapidly in recent years which is fulfilled by using Optical Character Recognition (OCR). The challenge is to develop a character recognition mechanism which can convert these scanned images to an electronic mode which will provide the feature to reuse this text, access to every line and word of the document. This paper analyzes the architecture and method used for text recognition in OCR performed by Tesseract and extend this to an application which can transform sources of large number of paper printed documents like magazines, books, newspapers, etc. to an editable electronic format. This paper hence provides an application system that can make digitization of the physical documents faster and better with more accuracy.

Paper Details

Date Published: 17 April 2019
PDF: 5 pages
Proc. SPIE 11071, Tenth International Conference on Signal Processing Systems, 110710H (17 April 2019); doi: 10.1117/12.2516743
Show Author Affiliations
Abhinav Kaushal Keshari, Indian Institute of Technology Roorkee (India)
Rajat Sharma, Indian Institute of Technology Roorkee (India)
Madhav J. Nigam, Indian Institute of Technology Roorkee (India)

Published in SPIE Proceedings Vol. 11071:
Tenth International Conference on Signal Processing Systems
Kezhi Mao; Xudong Jiang, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?