Share Email Print

Proceedings Paper

Symbolic document image compression based on pattern matching techniques
Author(s): Chwan-Yi Shiah; Yun-Sheng Yen
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper, a novel compression algorithm for Chinese document images is proposed. Initially, documents are segmented into readable components such as characters and punctuation marks. Similar patterns within the text are found by shape context matching and grouped to form a set of prototype symbols. Text redundancies can be removed by replacing repeated symbols by their corresponding prototype symbols. To keep the compression visually lossless, we use a multi-stage symbol clustering procedure to group similar symbols and to ensure that there is no visible error in the decompressed image. In the encoding phase, the resulting data streams are encoded by adaptive arithmetic coding. Our results show that the average compression ratio is better than the international standard JBIG2 and the compressed form of a document image is suitable for a content-based keyword searching operation.

Paper Details

Date Published: 30 September 2011
PDF: 7 pages
Proc. SPIE 8285, International Conference on Graphic and Image Processing (ICGIP 2011), 82851D (30 September 2011); doi: 10.1117/12.913413
Show Author Affiliations
Chwan-Yi Shiah, Fo Guang Univ. (Taiwan)
Yun-Sheng Yen, Fo Guang Univ. (Taiwan)

Published in SPIE Proceedings Vol. 8285:
International Conference on Graphic and Image Processing (ICGIP 2011)
Yi Xie; Yanjun Zheng, Editor(s)

© SPIE. Terms of Use
Back to Top