Share Email Print

Proceedings Paper

Geometrical approach to skew detection for documents containing the Latin/Cyrillic characters
Author(s): Oleg G. Okun
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Document skew is a distortion mainly concerning the orientation of text lines and occurring when digitizing the paper documents. Its visual effect is a slope of text lines, which are normally horizontal for such scripts as Latin or Cyrillic, with respect to the X-axis. Many available document recognition systems, however, require properly aligned text liens for accurate text segmentation and recognition. It means that the skew, if present, should be estimated and compensated before further processing. The Hough transform is one of the popular techniques for skew detection. To lower its computational cost, it is usually applied to a small number of representative points of each character or its bounding box. However, a problem with this method is that different characters have different heights. As a result, the representative points of characters belonging to the same line often do not fit well to a straight line and this often leads to errors in skew detection by using the Hough transform. In this paper, we propose a new algorithm to overcome this problem. It only uses the bounding boxes of the connected components of characters and a number of simple tests in order to obtain the skew angle estimation.

Paper Details

Date Published: 23 September 1999
PDF: 9 pages
Proc. SPIE 3811, Vision Geometry VIII, (23 September 1999); doi: 10.1117/12.364111
Show Author Affiliations
Oleg G. Okun, Univ. of Oulu and Institute of Engineering Cybernetics (Belarus) (Finland)

Published in SPIE Proceedings Vol. 3811:
Vision Geometry VIII
Longin Jan Latecki; Robert A. Melter; David M. Mount; Angela Y. Wu, Editor(s)

© SPIE. Terms of Use
Back to Top