Share Email Print

Proceedings Paper

Rule-based versus training-based extraction of index terms from business documents: how to combine the results
Author(s): Daniel Schuster; Marcel Hanke; Klemens Muthmann; Daniel Esser
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Current systems for automatic extraction of index terms from business documents either take a rule-based or training-based approach. As both approaches have their advantages and disadvantages it seems natural to combine both methods to get the best of both worlds. We present a combination method with the steps selection, normalization, and combination based on comparable scores produced during extraction. Furthermore, novel evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000 scanned business documents.

Paper Details

Date Published: 4 February 2013
PDF: 10 pages
Proc. SPIE 8658, Document Recognition and Retrieval XX, 865813 (4 February 2013); doi: 10.1117/12.2002509
Show Author Affiliations
Daniel Schuster, TU Dresden (Germany)
Marcel Hanke, TU Dresden (Germany)
Klemens Muthmann, TU Dresden (Germany)
Daniel Esser, TU Dresden (Germany)

Published in SPIE Proceedings Vol. 8658:
Document Recognition and Retrieval XX
Richard Zanibbi; Bertrand Coüasnon, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?