Share Email Print
cover

Proceedings Paper

Naïve Bayes and SVM classifiers for classifying databank accession number sentences from online biomedical articles
Author(s): Jongwoo Kim; Daniel X. Le; George R. Thoma
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

This paper describes two classifiers, Naïve Bayes and Support Vector Machine (SVM), to classify sentences containing Databank Accession Numbers, a key piece of bibliographic information, from online biomedical articles. The correct identification of these sentences is necessary for the subsequent extraction of these numbers. The classifiers use words that occur most frequently in sentences as features for the classification. Twelve sets of word features are collected to train and test the classifiers. Each set has a different number of word features ranging from 100 to 1,200. The performance of each classifier is evaluated using four measures: Precision, Recall, F-Measure, and Accuracy. The Naïve Bayes classifier shows performance above 93.91% at 200 word features for all four measures. The SVM shows 98.80% Precision at 200 word features, 94.90% Recall at 500 and 700, 96.46% F-Measure at 200, and 99.14% Accuracy at 200 and 400. To improve classification performance, we propose two merging operators, Max and Harmonic Mean, to combine results of the two classifiers. The final results show a measureable improvement in Recall, F-Measure, and Accuracy rates.

Paper Details

Date Published: 18 January 2010
PDF: 8 pages
Proc. SPIE 7534, Document Recognition and Retrieval XVII, 75340U (18 January 2010); doi: 10.1117/12.838961
Show Author Affiliations
Jongwoo Kim, National Library of Medicine (United States)
Daniel X. Le, National Library of Medicine (United States)
George R. Thoma, National Library of Medicine (United States)


Published in SPIE Proceedings Vol. 7534:
Document Recognition and Retrieval XVII
Laurence Likforman-Sulem; Gady Agam, Editor(s)

© SPIE. Terms of Use
Back to Top