Share Email Print

Proceedings Paper

Scalable storage of whole slide images and fast retrieval of tiles using Apache Spark
Author(s): Daniel E. Lopez Barron; Dig Vijay Kumar Yarlagadda; Praveen Rao; Ossama Tawfik; Deepthi Rao
Format Member Price Non-Member Price
PDF $14.40 $18.00

Paper Abstract

Whole slide images (WSIs) can greatly improve the workflow of pathologists through the development of software for automatic detection and analysis of cellular and morphological features. However, the gigabyte size of a WSI poses serious challenge for scalable storage and fast retrieval, which is essential for next-generation image analytics. In this paper, we propose a system for scalable storage of WSIs and fast retrieval of image tiles using Apache Spark, a space-filling curve, and popular data storage formats. We investigate two schemes for storing the tiles of WSIs. In the first scheme, all the WSIs were stored in a single table (partitioned by certain table attributes for fast retrieval). In the second scheme, each WSI is stored in a separate table. The records in each table are sorted using the index values assigned by the space-filling curve. We also study two data storage formats for storing WSIs: Parquet and ORC (Optimized Row Columnar). Through performance evaluation on a 16-node cluster in CloudLab, we observed that ORC enables faster retrieval of tiles than Parquet and requires 6 times less storage space. We also observed that the two schemes for storing WSIs achieved comparable performance. On an average, our system took 2 secs to retrieve a single tile and less than 6 seconds for 8 tiles on up to 80 WSIs. We also report the tile retrieval performance of our system on Microsoft Azure to gain insight on how the underlying computing platform can affect the performance of our system.

Paper Details

Date Published: 6 March 2018
PDF: 6 pages
Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 1058113 (6 March 2018); doi: 10.1117/12.2290380
Show Author Affiliations
Daniel E. Lopez Barron, Univ. of Missouri-Kansas City (United States)
Dig Vijay Kumar Yarlagadda, Univ. of Missouri-Kansas City (United States)
Praveen Rao, Univ. of Missouri-Kansas City (United States)
Ossama Tawfik, The Univ. of Kansas Medical Ctr. (United States)
Deepthi Rao, ProPath (United States)

Published in SPIE Proceedings Vol. 10581:
Medical Imaging 2018: Digital Pathology
John E. Tomaszewski; Metin N. Gurcan, Editor(s)

© SPIE. Terms of Use
Back to Top