Share Email Print

Proceedings Paper

Automatic generation of Web mining environments
Author(s): Maurizio Cibelli; Gennaro Costagliola
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.

Paper Details

Date Published: 25 February 1999
PDF: 11 pages
Proc. SPIE 3695, Data Mining and Knowledge Discovery: Theory, Tools, and Technology, (25 February 1999); doi: 10.1117/12.339984
Show Author Affiliations
Maurizio Cibelli, Univ. of Salerno (Italy)
Gennaro Costagliola, Univ. of Salerno (Italy)

Published in SPIE Proceedings Vol. 3695:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?