
Proceedings Paper
The Lehigh Steel Collection: a new open dataset for document recognition researchFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
Document image analysis is a data-driven discipline. For a number of years, research was focused on small,
homogeneous datasets such as the University of Washington corpus of scanned journal pages. More recently, library
digitization efforts have raised many interesting problems with respect to historical documents and their recognition. In
this paper, we present the Lehigh Steel Collection (LSC), a new open dataset we are currently assembling which will be,
in many ways, unique to the field. LSC is an extremely large, heterogeneous set of documents dating from the 1960's
through the 1990's relating to the wide-ranging research activities of Bethlehem Steel, a now-bankrupt company that was
once the second-largest steel producer and the largest shipbuilder in the United States. As a result of the bankruptcy
process and the disposition of the company's assets, an enormous quantity of documents (we estimate hundreds of
thousands of pages) were left abandoned in buildings recently acquired by Lehigh University. Rather than see this
history destroyed, we stepped in to preserve a portion of the collection via digitization. Here we provide an overview of
LSC, including our efforts to collect and scan the documents, a preliminary characterization of what the collection
contains, and our plans to make this data available to the research community for non-commercial purposes.
Paper Details
Date Published: 24 March 2014
PDF: 9 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210O (24 March 2014); doi: 10.1117/12.2042615
Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)
PDF: 9 pages
Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210O (24 March 2014); doi: 10.1117/12.2042615
Show Author Affiliations
Barri Bruno, Lehigh Univ. (United States)
Daniel Lopresti, Lehigh Univ. (United States)
Published in SPIE Proceedings Vol. 9021:
Document Recognition and Retrieval XXI
Bertrand Coüasnon; Eric K. Ringger, Editor(s)
© SPIE. Terms of Use
