Share Email Print
cover

Proceedings Paper

Automated identification of biomedical article type using support Vector machines
Author(s): In Cheol Kim; Daniel X. Le; George R. Thoma
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Authors of short papers such as letters or editorials often express complementary opinions, and sometimes contradictory ones, on related work in previously published articles. The MEDLINE® citations for such short papers are required to list bibliographic data on these "commented on" articles in a "CON" field. The challenge is to automatically identify the CON articles referred to by the author of the short paper (called "Comment-in" or CIN paper). Our approach is to use support vector machines (SVM) to first classify a paper as either a CIN or a regular full-length article (which is exempt from this requirement), and then to extract from the CIN paper the bibliographic data of the CON articles. A solution to the first part of the problem, identifying CIN articles, is addressed here. We implement and compare the performance of two types of SVM, one with a linear kernel function and the other with a radial basis kernel function (RBF). Input feature vectors for the SVMs are created by combining four types of features based on statistics of words in the article title, words that suggest the article type (letter, correspondence, editorial), size of body text, and cue phrases. Experiments conducted on a set of online biomedical articles show that the SVM with a linear kernel function yields a significantly lower false negative error rate than the one with an RBF. Our experiments also show that the SVM with a linear kernel function achieves a significantly higher level of accuracy, and lower false positive and false negative error rates by using input feature vectors created by combining all four types of features rather than any single type.

Paper Details

Date Published: 24 January 2011
PDF: 9 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 787403 (24 January 2011); doi: 10.1117/12.873023
Show Author Affiliations
In Cheol Kim, National Library of Medicine (United States)
Daniel X. Le, National Library of Medicine (United States)
George R. Thoma, National Library of Medicine (United States)


Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top