Share Email Print
cover

Proceedings Paper

Web entity extraction based on entity attribute classification
Author(s): Chuan-Xi Li; Peng Chen; Ru-Jing Wang; Ya-Ru Su
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

The large amount of entity data are continuously published on web pages. Extracting these entities automatically for further application is very significant. Rule-based entity extraction method yields promising result, however, it is labor-intensive and hard to be scalable. The paper proposes a web entity extraction method based on entity attribute classification, which can avoid manual annotation of samples. First, web pages are segmented into different blocks by algorithm Vision-based Page Segmentation (VIPS), and a binary classifier LibSVM is trained to retrieve the candidate blocks which contain the entity contents. Second, the candidate blocks are partitioned into candidate items, and the classifiers using LibSVM are performed for the attributes annotation of the items and then the annotation results are aggregated into an entity. Results show that the proposed method performs well to extract agricultural supply and demand entities from web pages.

Paper Details

Date Published: 13 January 2012
PDF: 6 pages
Proc. SPIE 8350, Fourth International Conference on Machine Vision (ICMV 2011): Computer Vision and Image Analysis; Pattern Recognition and Basic Technologies, 835014 (13 January 2012); doi: 10.1117/12.920237
Show Author Affiliations
Chuan-Xi Li, Institute of Intelligent Machines (China)
Univ. of Science and Technology of China (China)
Peng Chen, Institute of Intelligent Machines (China)
Univ. of Science and Technology of China (China)
Ru-Jing Wang, Institute of Intelligent Machines (China)
Univ. of Science and Technology of China (China)
Ya-Ru Su, Institute of Intelligent Machines (China)
Univ. of Science and Technology of China (China)


Published in SPIE Proceedings Vol. 8350:
Fourth International Conference on Machine Vision (ICMV 2011): Computer Vision and Image Analysis; Pattern Recognition and Basic Technologies
Safaa S. Mahmoud; Zhu Zeng; Yuting Li, Editor(s)

© SPIE. Terms of Use
Back to Top