Automated image processing of historical maps
There are many important questions to be answered based on the cartographic information hidden in maps from over the past several hundreds of years. For example, historians study how knowledge of geographic regions varied between nations and over time. However, while visual inspections are frequently used to identify properties of cartographic objects, they are very time consuming, do not scale well with the increasing number of maps, and sometimes lead only to approximations of properties because the extraction requires pixel level counting (e.g., area or boundary length estimation). The number of image scans of historical and contemporary maps is increasing, and so it would be less labor-intensive if we could automate the extraction of cartographic information from these image scans.
As a specific example of the above question, we chose to study the differences in geographic knowledge of the Great Lakes region possessed by the French and British from the 16th to the 19th century.1 We first gathered French and British historical maps from different points in this time period (see Figure 1 for an example) before designing a supervised segmentation algorithm to detect Great Lakes. We next devised an algorithm to estimate the map scale based on neatlines and computed the shape/area characteristics of lakes in physical units. (See Figure 2 for an illustration of how the characteristics vary over time.)
The current underlying assumptions of our technical approach are that objects of interest are located around the center of a cropped image from the original historical map and that the shape of the objects of interest can be modeled by a shape template. We also assume that the interior colors of cartographic objects tend to be uniform, and that the time of map creation and the map scale can be extracted from metadata. In addition, we assume that the shape of an object on a map accurately reflect the cartographers’ knowledge. Our approach uses a priori knowledge about the true shape of a geographic object for segmentation as well as the presence of neatlines around the border of each map for scale estimation.
We developed a segmentation algorithm for extracting regions whose shape is most similar to that of a given example. The algorithm uses ball-based region-growing segmentation combined with the seven Hu moments—derived from standard image moments to be invariant under translation rotation, and scaling2—to evaluate shape similarity. The ball-based segmentation places a circular region into a seed location and grows the region subject to color homogeneity and spatial contiguity constraints. Each resulting region is described by the Hu moments and compared to the Hu moments of a given example. The algorithm searches over a space of parameters including the region growing criteria and seed placement. Segmentation performance can be evaluated through comparison with manually segmented masks. We regarded 28 out of 40 segmentation results for Lake Ontario as satisfactory according to this criterion, with less than 10% mean square error (calculated as the average squared differences between corresponding pixels in the manually segmented images and the automatically segmented images).
Second, we have designed a map scale estimation method to determine the conversion between miles and pixels for each map. This can be accomplished using two known points (such as cities) or by using neatline analysis, that is, examining the dashed neatline around the border of a map, and counting the number of dashes between successive intersections with latitude or longitude lines (see Figure 3). Due to mapmaking conventions, this number corresponds uniquely to the desired scale. Neatline analysis can be broken down into several constituent steps including boundary selection, line detection, line classification (to distinguish the dashed neatline from other lines), dash-length calculation, and transversal detection (to find intersections with latitude/longitude lines).
Third, we extracted the area in pixels and converted it to physical units for each historical map. This process has been completed for 40 (18 British and 22 French) digitized historical maps of Lake Ontario (see Figure 4). In each case, the algorithm segments Lake Ontario from a cropped image, and computes its surface area. The differences between these calculated areas and the modern Figure of 7540mi2 can be taken as a measure of the accuracy of regional geographic knowledge at the time when each historical map was made.
In automating extraction of cartographic information from images of historical maps, we had to overcome two automation obstacles: segmenting cartographic objects of interest and estimating map scale to report characteristics in physical units. Based on our preliminary work, we have learned that, although the French occupied the Great Lakes region sooner, French maps do not indicate any more accurate depictions of Lake Ontario than British maps (see Figure 4).
The broader impact of the work is to provide information for more advanced searching of historical maps. However, integrating existing map repositories and improving accuracy remain on our future work list.
This project has been funded by NCSA and NSF. We would like to acknowledge the NCSA Faculty Fellow program and the NSF ITS 09-10562 EAGER program for providing the funding. We would like to acknowledge UIUC library Betsy Kruger for providing the images of historical maps and Michael Simeone and Robert Markley for consultations and insights.
Tenzing Shaw is a graduate student and research assistant to Peter Bajcsy.
Peter Bajcsy is a research scientist at ISDA and works on problems related to automatic transfer of image content to knowledge. His scientific interests include image processing, novel sensor technology, and computer and machine vision.