Share Email Print

Proceedings Paper

Semi-automated document image clustering and retrieval
Author(s): Markus Diem; Florian Kleber; Stefan Fiel; Robert Sablatnig
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

In this paper a semi-automated document image clustering and retrieval is presented to create links between different documents based on their content. Ideally the initial bundling of shuffled document images can be reproduced to explore large document databases. Structural and textural features, which describe the visual similarity, are extracted and used by experts (e.g. registrars) to interactively cluster the documents with a manually defined feature subset (e.g. checked paper, handwritten). The methods presented allow for the analysis of heterogeneous documents that contain printed and handwritten text and allow for a hierarchically clustering with different feature subsets in different layers.

Paper Details

Date Published: 24 March 2014
PDF: 10 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210M (24 March 2014); doi: 10.1117/12.2043010
Show Author Affiliations
Markus Diem, Technische Univ. Wien (Austria)
Florian Kleber, Technische Univ. Wien (Austria)
Stefan Fiel, Technische Univ. Wien (Austria)
Robert Sablatnig, Technische Univ. Wien (Austria)

Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?