Share Email Print

Optical Engineering

Document page segmentation based on pattern spread analysis
Author(s): Phillip E. Mitchell; Hong Yan
Format Member Price Non-Member Price
PDF $20.00 $25.00

Paper Abstract

This paper introduces an algorithm designed to segment black-and-white documents for the purpose of compression. A single document is segmented into two documents suitable for pattern-based and run-length-based compression. With some modification the same algorithm may also be used for optical character recognition. The segmentation is performed in two main steps: pattern extraction and classification. Patterns are extracted using a fast scan method that does not need to scan every pixel, and classification uses pattern characteristics such as spread and pattern context to segment the patterns. Documents may be segmented with an accuracy of at least 98%, depending on the content. Furthermore, text of any size and orientation may be successfully classified without the need for skew estimation or correction. This paper presents the segmentation algorithm and discusses the complete compression system.

Paper Details

Date Published: 1 March 2000
PDF: 11 pages
Opt. Eng. 39(3) doi: 10.1117/1.602419
Published in: Optical Engineering Volume 39, Issue 3
Show Author Affiliations
Phillip E. Mitchell, Univ. of Sydney (Australia)
Hong Yan, Univ. of Sydney (Hong Kong)

© SPIE. Terms of Use
Back to Top