Share Email Print

Proceedings Paper

Automated labeling of bibliographic data extracted from biomedical online journals
Author(s): Jongwoo Kim; Daniel X. Le; George R. Thoma
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine’s MEDLINE database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented zones in an article’s HTML pages as specific bibliographic data. Results from experiments conducted with 1,149 medical articles from forty-seven journal issues are presented.

Paper Details

Date Published: 13 January 2003
PDF: 10 pages
Proc. SPIE 5010, Document Recognition and Retrieval X, (13 January 2003); doi: 10.1117/12.476047
Show Author Affiliations
Jongwoo Kim, National Library of Medicine (United States)
Daniel X. Le, National Library of Medicine (United States)
George R. Thoma, National Library of Medicine (United States)

Published in SPIE Proceedings Vol. 5010:
Document Recognition and Retrieval X
Tapas Kanungo; Elisa H. Barney Smith; Jianying Hu; Paul B. Kantor, Editor(s)

© SPIE. Terms of Use
Back to Top