
Proceedings Paper
Document flow segmentation for business applicationsFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
The aim of this paper is to propose a document flow supervised segmentation approach applied to real world
heterogeneous documents. Our algorithm treats the flow of documents as couples of consecutive pages and studies the
relationship that exists between them. At first, sets of features are extracted from the pages where we propose an
approach to model the couple of pages into a single feature vector representation. This representation will be provided to
a binary classifier which classifies the relationship as either segmentation or continuity. In case of segmentation, we
consider that we have a complete document and the analysis of the flow continues by starting a new document. In case
of continuity, the couple of pages are assimilated to the same document and the analysis continues on the flow. If there is
an uncertainty on whether the relationship between the couple of pages should be classified as a continuity or
segmentation, a rejection is decided and the pages analyzed until this point are considered as a "fragment". The first
classification already provides good results approaching 90% on certain documents, which is high at this level of the
system.
Paper Details
Date Published: 24 March 2014
PDF: 11 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210G (24 March 2014); doi: 10.1117/12.2043141
Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)
PDF: 11 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210G (24 March 2014); doi: 10.1117/12.2043141
Show Author Affiliations
Hani Daher, LORIA (France)
Abdel Belaïd, LORIA, Univ. de Lorraine (France)
Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)
© SPIE. Terms of Use
