Share Email Print
cover

Proceedings Paper

Data processing factory for the Sloan Digital Sky Survey
Author(s): Christopher Stoughton; Jennifer Adelman; James T. Annis; John Hendry; John Inkmann; Sebastian Jester; Steven M. Kent; Nickolai Kuropatkin; Brian Lee; Huan Lin; John Peoples; Robert Sparks; Douglas Tucker; Dan Vanden Berk; Brian Yanny; Dan Yocum
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The Sloan Digital Sky Survey (SDSS) data handling presents two challenges: large data volume and timely production of spectroscopic plates from imaging data. A data processing factory, using technologies both old and new, handles this flow. Distribution to end users is via disk farms, to serve corrected images and calibrated spectra, and a database, to efficiently process catalog queries. For distribution of modest amounts of data from Apache Point Observatory to Fermilab, scripts use rsync to update files, while larger data transfers are accomplished by shipping magnetic tapes commercially. All data processing pipelines are wrapped in scripts to address consecutive phases: preparation, submission, checking, and quality control. We constructed the factory by chaining these pipelines together while using an operational database to hold processed imaging catalogs. The science database catalogs all imaging and spectroscopic object, with pointers to the various external files associated with them. Diverse computing systems address particular processing phases. UNIX computers handle tape reading and writing, as well as calibration steps that require access to a large amount of data with relatively modest computational demands. Commodity CPUs process steps that require access to a limited amount of data with more demanding computations requirements. Disk servers optimized for cost per Gbyte serve terabytes of processed data, while servers optimized for disk read speed run SQLServer software to process queries on the catalogs. This factory produced data for the SDSS Early Data Release in June 2001, and it is currently producing Data Release One, scheduled for January 2003.

Paper Details

Date Published: 24 December 2002
PDF: 11 pages
Proc. SPIE 4836, Survey and Other Telescope Technologies and Discoveries, (24 December 2002); doi: 10.1117/12.457014
Show Author Affiliations
Christopher Stoughton, Fermi National Accelerator Lab. (United States)
Jennifer Adelman, Fermi National Accelerator Lab. (United States)
James T. Annis, Fermi National Accelerator Lab. (United States)
John Hendry, Fermi National Accelerator Lab. (United States)
John Inkmann, Fermi National Accelerator Lab. (United States)
Sebastian Jester, Fermi National Accelerator Lab. (United States)
Steven M. Kent, Fermi National Accelerator Lab. (United States)
Nickolai Kuropatkin, Fermi National Accelerator Lab. (United States)
Brian Lee, Lawrence Berkeley National Lab. (United States)
Huan Lin, Fermi National Accelerator Lab. (United States)
John Peoples, Fermi National Accelerator Lab. (United States)
Robert Sparks, Fermi National Accelerator Lab. (United States)
Douglas Tucker, Fermi National Accelerator Lab. (United States)
Dan Vanden Berk, Univ. of Pittsburgh (United States)
Brian Yanny, Fermi National Accelerator Lab. (United States)
Dan Yocum, Fermi National Accelerator Lab. (United States)


Published in SPIE Proceedings Vol. 4836:
Survey and Other Telescope Technologies and Discoveries
J. Anthony Tyson; Sidney Wolff, Editor(s)

© SPIE. Terms of Use
Back to Top