Share Email Print

Proceedings Paper

Math expression retrieval using an inverted index over symbol pairs
Author(s): David Stalnaker; Richard Zanibbi
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We introduce a new method for indexing and retrieving mathematical expressions, and a new protocol for evaluating math formula retrieval systems. The Tangent search engine uses an inverted index over pairs of symbols in math expressions. Each key in the index is a pair of symbols along with their relative distance and vertical displacement within an expression. Matched expressions are ranked by the harmonic mean of the percentage of symbol pairs matched in the query, and the percentage of symbol pairs matched in the candidate expression. We have found that our method is fast enough for use in real time and finds partial matches well, such as when subexpressions are re-arranged (e.g. expressions moved from the left to the right of an equals sign) or when individual symbols (e.g. variables) differ from a query expression. In an experiment using expressions from English Wikipedia, student and faculty participants (N=20) found expressions returned by Tangent significantly more similar than those from a text-based retrieval system (Lucene) adapted for mathematical expressions. Participants provided similarity ratings using a 5-point Likert scale, evaluating expressions from both algorithms one-at-a-time in a randomized order to avoid bias from the position of hits in search result lists. For the Lucene-based system, precision for the top 1 and 10 hits averaged 60% and 39% across queries respectively, while for Tangent mean precision at 1 and 10 were 99% and 60%. A demonstration and source code are publicly available.

Paper Details

Date Published: 8 February 2015
PDF: 12 pages
Proc. SPIE 9402, Document Recognition and Retrieval XXII, 940207 (8 February 2015); doi: 10.1117/12.2074084
Show Author Affiliations
David Stalnaker, Rochester Institute of Technology (United States)
Richard Zanibbi, Rochester Institute of Technology (United States)

Published in SPIE Proceedings Vol. 9402:
Document Recognition and Retrieval XXII
Eric K. Ringger; Bart Lamiroy, Editor(s)

© SPIE. Terms of Use
Back to Top