Share Email Print

Proceedings Paper

Document flow segmentation for business applications
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The aim of this paper is to propose a document flow supervised segmentation approach applied to real world heterogeneous documents. Our algorithm treats the flow of documents as couples of consecutive pages and studies the relationship that exists between them. At first, sets of features are extracted from the pages where we propose an approach to model the couple of pages into a single feature vector representation. This representation will be provided to a binary classifier which classifies the relationship as either segmentation or continuity. In case of segmentation, we consider that we have a complete document and the analysis of the flow continues by starting a new document. In case of continuity, the couple of pages are assimilated to the same document and the analysis continues on the flow. If there is an uncertainty on whether the relationship between the couple of pages should be classified as a continuity or segmentation, a rejection is decided and the pages analyzed until this point are considered as a "fragment". The first classification already provides good results approaching 90% on certain documents, which is high at this level of the system.

Paper Details

Date Published: 24 March 2014
PDF: 11 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210G (24 March 2014); doi: 10.1117/12.2043141
Show Author Affiliations
Hani Daher, LORIA (France)
Abdel Belaïd, LORIA, Univ. de Lorraine (France)

Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?