
Proceedings Paper
Rule-based versus training-based extraction of index terms from business documents: how to combine the resultsFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
Current systems for automatic extraction of index terms from business documents either take a rule-based
or training-based approach. As both approaches have their advantages and disadvantages it seems natural to
combine both methods to get the best of both worlds. We present a combination method with the steps selection,
normalization, and combination based on comparable scores produced during extraction. Furthermore, novel
evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our
methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000
scanned business documents.
Paper Details
Date Published: 4 February 2013
PDF: 10 pages
Proc. SPIE 8658, Document Recognition and Retrieval XX, 865813 (4 February 2013); doi: 10.1117/12.2002509
Published in SPIE Proceedings Vol. 8658:
Document Recognition and Retrieval XX
Richard Zanibbi; Bertrand Coüasnon, Editor(s)
PDF: 10 pages
Proc. SPIE 8658, Document Recognition and Retrieval XX, 865813 (4 February 2013); doi: 10.1117/12.2002509
Show Author Affiliations
Klemens Muthmann, TU Dresden (Germany)
Daniel Esser, TU Dresden (Germany)
Daniel Esser, TU Dresden (Germany)
Published in SPIE Proceedings Vol. 8658:
Document Recognition and Retrieval XX
Richard Zanibbi; Bertrand Coüasnon, Editor(s)
© SPIE. Terms of Use
