Share Email Print

Proceedings Paper

Use of document structure analysis to retrieve information from documents in digital libraries
Author(s): Debashish Niyogi; Sargur N. Srihari
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired, and are parsed to determine the type of document from which information is desired, the syntactic level of the information desired, and the level of analysis required to extract the information. Using these clauses in the query, a set of salient documents are retrieved, layout analysis and logical structure derivation are performed on the retrieved documents, and the documents are then analyzed in detail to extract the relevant logical components. A 'document browser' application, being developed based on this approach, allows a user to interactively specify queries on the documents in the digital library using a graphical user interface, provides feedback about the candidate documents at each stage of the retrieval process, and allows refinements of the query based on the intermediate results of the search. Results of a query are displayed either as an image or as formatted text.

Paper Details

Date Published: 3 April 1997
PDF: 12 pages
Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270074
Show Author Affiliations
Debashish Niyogi, SUNY/Buffalo (United States)
Sargur N. Srihari, SUNY/Buffalo (United States)

Published in SPIE Proceedings Vol. 3027:
Document Recognition IV
Luc M. Vincent; Jonathan J. Hull, Editor(s)

© SPIE. Terms of Use
Back to Top