Share Email Print

Proceedings Paper

Document page structure learning for fixed-layout e-books using conditional random fields
Author(s): Xin Tao; Zhi Tang; Canhui Xu
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper, a model is proposed to learn logical structure of fixed-layout document pages by combining support vector machine (SVM) and conditional random fields (CRF). Features related to each logical label and their dependencies are extracted from various original Portable Document Format (PDF) attributes. Both local evidence and contextual dependencies are integrated in the proposed model so as to achieve better logical labeling performance. With the merits of SVM as local discriminative classifier and CRF modeling contextual correlations of adjacent fragments, it is capable of resolving the ambiguities of semantic labels. The experimental results show that CRF based models with both tree and chain graph structures outperform the SVM model with an increase of macro-averaged F1 by about 10%.

Paper Details

Date Published: 24 March 2014
PDF: 9 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210I (24 March 2014); doi: 10.1117/12.2039492
Show Author Affiliations
Xin Tao, Peking Univ. (China)
Zhi Tang, Peking Univ. (China)
State Key Lab. of Digital Publishing Technology (China)
Canhui Xu, Peking Univ. (China)
State Key Lab. of Digital Publishing Technology (China)

Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)

© SPIE. Terms of Use
Back to Top