Share Email Print

Proceedings Paper

Form classification and retrieval using bag of words with shape features of line structures
Author(s): Florian Kleber; Markus Diem; Robert Sablatnig
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper a document form classification and retrieval method using Bag of Words and newly introduced local shape features of form lines is proposed. In a preprocessing step the document is binarized and the form lines (solid and dotted) are detected. The shape features are based on the line information describing local line structures, e.g. line endings, crossings, boxes. The dominant line structures build a vocabulary for each form class. According to the vocabulary an occurrence histogram of structures of form documents can be calculated for the classification and retrieval. The proposed method has been tested on a set of 489 documents and 9 different form classes.

Paper Details

Date Published: 24 March 2014
PDF: 9 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 902107 (24 March 2014); doi: 10.1117/12.2037210
Show Author Affiliations
Florian Kleber, Technische Univ. Wien (Austria)
Markus Diem, Technische Univ. Wien (Austria)
Robert Sablatnig, Technische Univ. Wien (Austria)

Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)

© SPIE. Terms of Use
Back to Top