Share Email Print
cover

Proceedings Paper

Information extraction from semi-structured web page based on DOM tree
Author(s): Wei-Dong Li; Yi-bing Dong
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

To extract information automatically from semi-structured web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and Maximal Similar Sub Tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. Furthermore, the system can extract information from result pages of paper searching websites automatically. The experiments made through with some common paper searching websites have demonstrated that this system has high efficiency and accuracy.

Paper Details

Date Published: 10 July 2009
PDF: 6 pages
Proc. SPIE 7490, PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering, 749015 (10 July 2009); doi: 10.1117/12.837215
Show Author Affiliations
Wei-Dong Li, Hebei Univ. of Economics & Business (China)
Yi-bing Dong, Hebei Univ. of Economics & Business (China)


Published in SPIE Proceedings Vol. 7490:
PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering
Honghua Tan; Qi Luo, Editor(s)

© SPIE. Terms of Use
Back to Top