Share Email Print
cover

Proceedings Paper

Keyword and image-based retrieval of mathematical expressions
Author(s): Richard Zanibbi; Bo Yuan
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search, where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique is a form of Content-Based Image Retrieval (CBIR). Expressions are segmented into connected components, and then components in the query expression and each expression in the collection are matched using contour and density features, aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000 expressions, precision-at-k (k = 20) for the keyword-based approach was higher (keyword: μ = 84.0, σ = 19.0, imagebased: μ = 32.0, σ = 30.7), but for a few of the queries better results were obtained using a combination of the two techniques.

Paper Details

Date Published: 24 January 2011
PDF: 9 pages
Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740I (24 January 2011); doi: 10.1117/12.873312
Show Author Affiliations
Richard Zanibbi, Rochester Institute of Technology (United States)
Bo Yuan, Rochester Institute of Technology (United States)


Published in SPIE Proceedings Vol. 7874:
Document Recognition and Retrieval XVIII
Gady Agam; Christian Viard-Gaudin, Editor(s)

© SPIE. Terms of Use
Back to Top