Share Email Print

Proceedings Paper

Exploring a hybrid of support vector machines (SVMs) and a heuristic-based system in classifying web pages
Author(s): Ahmad Rahman; Yuliya Tarnikova; Hassan Alam
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Due to the proliferation of various types of devices used to browse the web and the shift of document access via web interfaces, it is now becoming very important to classify web pages into pre-selected types. This often forms the pre-processing stage of a number of web applications. However, classification of web pages is known to be a difficult problem because it is inherently difficult to identify specific features of web pages that are distinct and therefore it is equally difficult to use a set of heuristics to accomplish this. This paper describes a solution to the problem by combining a heuristic based system and a Support Vector Machine (SVM). It is found that such a hybrid system is able to perform at a very high accuracy when compared to using SVMs on their own.

Paper Details

Date Published: 13 January 2003
PDF: 8 pages
Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); doi: 10.1117/12.472836
Show Author Affiliations
Ahmad Rahman, BCL Technologies Inc. (United States)
Yuliya Tarnikova, BCL Technologies Inc. (United States)
Hassan Alam, BCL Technologies Inc. (United States)

Published in SPIE Proceedings Vol. 5010:
Document Recognition and Retrieval X
Tapas Kanungo; Elisa H. Barney Smith; Jianying Hu; Paul B. Kantor, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?