Multicore speedup for automated stitching of large images
Gigapixel and terapixel images are commonly viewed using a mosaic of smaller megapixel images. Stitching is used to create Google Maps from satellite imagery, panoramic views from photographic images, and high resolution images from microscopy tiles. Creating and disseminating these mosaics requires significant computational capacity. We deployed existing image-pyramid stitching methods onto multicore and parallel architectures to benchmark how performance improves with the addition of computing nodes. Our motivation is to explore the benefits of multiple hardware architectures (such as multicore servers and computer clusters) and parallel computing to reduce the time needed to stitch very large images.
Our sample case is the processing and dissemination of airborne images acquired over multiple flight paths of Costa Rica in 20052,3 (see Figure 1). The input set of 10,158 images, each 4072×4072 pixels, has very coarse georeferencing information (latitude and longitude of each image). As a result, processing these photos required us to transform each airborne image to its georeferenced location, refine its location with respect to its neighboring images, and build an image pyramid for fast access and retrieval.
Given the spatial coverage and resolution of our input images, the final stitched color image was 294,847 by 269,195 pixels (79.3 gigapixels) and 238.2 gigabytes of data (see Figure 2). This amount of data requires either hardware with a large shared memory or algorithms to use disk access in tandem with enough available volatile memory for local image operation. The problem is more challenging when working with film-digitized images rather than directly acquired digital images. In our example, the georeferencing information available for stitching is highly uncertain, since it was obtained manually during film scanning.
We used the coarse georeferencing information for the initial image grouping. Then we applied an intensity-based stitching of image groups and created image-pyramid representations. Image pyramids, introduced by Burt & Adelson,4 are built by cutting the original image into 256×256 tiles, saving the tiles, then sub-sampling the original image and repeating the cut and save steps until a 1×1 pixel tile is formed. The process of building an image pyramid can be easily separated into parallel processes because each tile is independent of the others at the same level. We applied image-pyramid methods to very large images, using the Microsoft Seadragon library,5 in order to understand the computational requirements for stitching and dissemination.
Group-based stitching is ideally suited for multicore hardware and computer clusters. The stitching process results in image patches that can be cropped to fit an image-pyramid tile for fast image access and retrieval. Processing involves multiple computational steps: at least one image transformation, multiple comparisons to place the pixels into a pyramid representation, and an averaging of four neighboring pixels to create the pyramid layers. The initial transformation converts the image location to a geospatial location. Next, all images inside the geospatial area of a pyramid tile are identified, then scaled and rotated to match the flight path and scale of that tile level. Finally, all the images are combined to create the pyramid representation tile.
Our resulting image pyramid consisted of 19 levels with the bottom level being formed from 1,211,904 tiles out of the total 1,616,015 tiles of the full pyramid, which required 317.6 gigabytes of storage. We benchmarked the stitching of a subset of 23 images (73,712×67,298 pixels) using multiple threads (ranging from 1 to 100) on an eight-core server with a RAID-5 disk array. The processing took 25 minutes using a single thread and only 26 seconds using 10 or more threads. The more threads added, the more we became input/output (I/O) limited, until the processor was 100% waiting for I/O. Profiling showed a disk throughput of 300MB/s.
To support the hypothesis that the code is parallelizable and benefits from high-performance computers (HPC), we tested on an 80 core cluster using 10 nodes, each with 8 cores and 8GB of memory. The disks were a shared samba drive. We were able to form the entire 79.3 gigapixel image and its pyramid representation with a network throughput of 2GB/s, resulting in performance similar to the multi-core tests.
Gigapixel and terapixel images require new solutions for efficient storage, processing, searching and image understanding, as well as new hardware architectures that can minimize computation times. Our work helps with decisions and cost cases for computational resource allocations. We recommend using a large number of cores with parallel algorithms. In the future, we plan to design parallel algorithms that will allow us to use more cores.
This research has been funded through the National Science Foundation Cooperative Agreement NSF OCI 05-25308 and Cooperative Support Agreement NSF OCI 05-04064 by the National Archives and Records Administration (NARA). We would like to acknowledge the Costa Rica Center for Advanced Technology Studies (CeNAT) for providing the airborne imagery.
Peter Bajcsy is a research scientist working on automatic transfer of image content to knowledge. His scientific interests include image processing, novel sensor technology, and computer and machine vision.
Rob Kooper is a research programmer working on computational scalability of image analyses, workflow execution, and distributed computing. His research interests include computational scalability of various applicability, human computer interaction, and graphics.