Share Email Print
cover

Proceedings Paper

XML data compression in web publishing
Author(s): Ruiheng Qiu; Wei Hu; Zhi Tang; Xiaoqing Lu; Lei Zhang
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

XML is widely used in various document formats on the web. But it has caused negative impacts such as expensive document distribution time over the web, and long content jumping and rendering delay, especially on mobile devices. Hence we proposed a Schema-based efficient queryable XML compressor, called XTrim, which significantly improves compression ratio by utilizing optimized information in XML Schema while supporting efficient queries. Firstly, XTrim draws structure information from XML document and corresponding XML Schema. Then a novel technique is used to transform the XML tree-like structure into a compact indexed form to support efficient queries. At the same time, text values are obtained, and a language-based text trim method (LTT) that facilitates language-specific text compressors is adopted to reduce the size of text values in various languages. In LTT a word composition detection method is proposed to better process text in non-Latin languages. To evaluate the performance of XTrim, we have implemented a compressor and query engine prototype. Via extensive experiments, results show that XTrim outperforms XMill and existing queryable alternatives in terms of compression ratio, as well as the query efficiency. By applying XTrim to documents, the storage space can save up to 30% and the content jumping and rendering delay is reduced to less than 100ms from 4 seconds.

Paper Details

Date Published: 21 February 2012
PDF: 8 pages
Proc. SPIE 8302, Imaging and Printing in a Web 2.0 World III, 83020I (21 February 2012); doi: 10.1117/12.905400
Show Author Affiliations
Ruiheng Qiu, Peking Univ. (China)
Peking Univ. Founder Group Co., Ltd. (China)
Wei Hu, Peking Univ. (China)
Zhi Tang, Peking Univ. (China)
Xiaoqing Lu, Peking Univ. (China)
Lei Zhang, State Key Lab. of Digital Publishing Technology (China)


Published in SPIE Proceedings Vol. 8302:
Imaging and Printing in a Web 2.0 World III
Qian Lin; Jan P. Allebach; Zhigang Fan, Editor(s)

© SPIE. Terms of Use
Back to Top