Share Email Print
cover

Proceedings Paper

Text-image alignment for historical handwritten documents
Author(s): S. Zinger; J. Nerbonne; L. Schomaker
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of handwritten text, we need a training set - images of words with their transcriptions. We present our results on aligning words from the images of handwritten lines and their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve the alignment results considerably. To take into account the relative word length, we define the expressions for the cost function that has to be minimized for aligning text words with their images. We apply right to left alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.

Paper Details

Date Published: 19 January 2009
PDF: 8 pages
Proc. SPIE 7247, Document Recognition and Retrieval XVI, 724703 (19 January 2009); doi: 10.1117/12.805511
Show Author Affiliations
S. Zinger, Eindhoven Univ. of Technology (Netherlands)
J. Nerbonne, Univ. of Groningen (Netherlands)
L. Schomaker, Univ. of Groningen (Netherlands)


Published in SPIE Proceedings Vol. 7247:
Document Recognition and Retrieval XVI
Kathrin Berkner; Laurence Likforman-Sulem, Editor(s)

© SPIE. Terms of Use
Back to Top