Until recently, image archiving was about preserving physical media. We are able to view 17,000-year-old cave paintings and 150-year-old photographs, for example, merely through physical access. In these cases, archiving is concerned with the stability and physical preservation of the image mediarock surfaces, photographic paper, etc.
All this has changed within the last two decades in the digital world. Digital media provide access to electronic versions of images, but require a digital infrastructurecapture systems, networks, digital media, file systems, file formats, display devices, search enginesto connect the viewer to the original. Archiving in this case is concerned with preservation of the digital files. Part of this effort involves making sure the digital infrastructure can display the digital files. More and more, such systems will also be dealing with pictures that were created digitally, and hence do not have physical references.
A digital project starts well before the scanning of the first picture or saving of the first file. Careful planning to define the aims, priorities, technical requirements, procedures, and future use is essential for efficient workflow and results that meet expectations. Digital images constitute active collections that require regular maintenance. Provisions to upgrade digital collections to keep pace with the changing computer infrastructure should be made at the start of a project, to avoid the prospect of digital collections created at considerable cost becoming inaccessible over time. A good digital archiving project is conceived as teamwork, combining expertise in imaging, collection management, information technology, conservation, descriptive methods, and preservation strategies.1Quality
Despite all the possibilities in digital image manipulation, quality choices such as image quality, usability, and functionality made at file creation carry the same finality as in conventional photography. Such choices have a profound effect on project cost, value to users, and long-term image usability. Requirements for all of these aspects therefore have to be established carefully before a digital project starts. Ensuring long-term value of digitized files necessitates developing quality-control tools to check imaging systems, digital masters, and derivatives.
Converting an original hard copy image to a digital master is an expensive operation so you want to get it right the first time. This means capturing the original and then creating and storing a version with as many pixels and bits as needed to meet foreseeable uses over the lifetime of the master (see figure 1). In the case of a system designed initially for access via web browsers, it is tempting to archive images matched to browser capabilitiestypically 24-bit sRGB images compressed using JPEG, a standard color space based on the red, green, blue primaries of a typical monitor. The range of colors that a hard copy original can generate exceeds the range that sRGB color space can represent, however, so digital archivists must make compromises at the cost of color fidelity. Better to make these compromises at the time of use in a way that is matched to each use rather than at the time of capture and storage when they would limit future uses in a way that couldn't be undone.
Figure 1. Depending on the use, an image may be rendered as (from left) normal resolution, gray scale, low resolution, or adjusted tone.
Covering the full range of hard copy colors requires an extended- or wide-gamut color space. It also requires more bits to obtain sufficiently fine color quantization as the number of available bits is stretched to cover a bigger region of the color space. The extended-gamut color spaces based on sRGB and wide-gamut spaces such as reference output media metric (ROMM) and Adobe RGB now support more than 8 bits per component (see figure 2). These are all 3-D color spaces, since the eye has three color receptors and thus only three degrees of freedom as a sensor. Some hard copy materials, such as lithographic prints and oil paintings, can have many more degrees of freedom, however. As a result, several researchers are working on spectral capture systems to recreate the multi-dimensional spectral reflectance (or transmittance) of the original, rather than an equivalent RGB produced under specific viewing conditions (see oemagazine, January 2003, p. 24).
Compression and Rendition
Figure 2. The graph compares the range of colors covered by different RGB spaces with the gamut of real-world surface colors, shown in gray.
Carrying sufficient spatial resolution and bit depth in the digital master to meet foreseeable uses comes at some cost. The master itself may not meet the needs of any one user so that serving a request requires processing the master to produce the image actually delivered to the user. If processing time is an issue, then one option is pre-computing the multiple renditions and derivatives that would satisfy expected requests. Although multiple renditions save processing, the archiving system would need to track and synchronize multiple objects, which increases the management burden. The other cost associated with a digital master (and with multiple renditions) is that of storage.
The obvious solution is image compression. Lossless image compression preserves the original information while reducing the data volume by roughly halfprecisely how much depends on the image content. Lossy compression can do an order of magnitude or more better, but is an irreversible process that discards data, even when the visual effects are not noticeable. A risk associated with typical methods that use variable-length codes is that if an error appears somewhere in the middle of the compressed image, then the decompressor loses synch and all the data from there on in is lost. The risk may outweigh the benefits of reduced storage.
An emerging technology that can simultaneously meet the needs of image compression and multiple renditions while managing the risk associated with errors is JPEG 2000 (see oemagazine, January 2002, p. 38). JPEG 2000 is a wavelet-based compression method that stores a compressed image as a collection of packets. Each packet contains layers of compressed data at a specific spatial resolution and from a specific location in the image. The JPEG 2000 architecture also supports lossless and lossy compression. With JPEG 2000, the lossless version of an image can be turned into a lossy version by selectively discarding packets. The number of packets discarded determines the amount of loss; which packets are discarded determines whether the loss takes the form of reduced resolution or fewer quality layers. On the other hand, progressively higher resolutions of an image (or portions of an image) can be obtained by decompressing packets incrementally by resolution.
This scalability is an important characteristic of JPEG 2000 and the basis for its ability to serve multiple renditions from a single compressed image. Another significant property of JPEG 2000 is its error resilience; for example, resynchronization markers at the boundaries between packets limit the effect of errors and prevent them from rippling through the entire image. Preservation Through Metadata
The principles of secure preservation for digital data are fundamentally different from those for traditional analog data. In traditional preservation there is a more or less slow decay of image quality, whereas a digital image can either be read accurately or not at all. Second, every analog duplication process results in a slight deterioration of the quality of the copy, compared to digital images, which undergo duplication without any loss at all. Traditional image archives should be stored under optimal climatic conditions and ideally never be touched again, which hinders access to images even as decay is only slowed down. In comparison, the safekeeping of digital information requires an active and regular maintenance of the data, which have to be copied to new media before it becomes unreadable. Since information technology evolves rapidly, the lifetime of both software and hardware formats is generally less than the lifetime of the recording media.
A major challenge in creating digital collections that will survive for a long time is to build digital repositories that maintain functionality and quality intrinsic to images. Often, project planning and budgeting stops after the creation of the digital assets; in specifying requirements for archival digital still images and building digital repositories, a budget including costs for maintenance of the images over time is mandatory, however. Certified archival repositories that will be able to guarantee the storage of a file for a specified number of years for a specified cost are being created.
It is imperative that the parties involved in creating digital repositories are clear about the difference between "archival" and "deliverable." An archival file has a very low risk factor, meaning that we are confident that neither its integrity nor its functionality will be lost when the format must be migrated in order to remain compatible with image processing applications. A deliverable file can have the same image quality. Depending on file format and compression choices, there is a higher risk of obsolescence, but not total loss, if an archival version has also been created and saved. Another important issue is the obsolescence of the use requirements. Users of today demand more and more dynamic images. This is another requirement that JPEG 2000 can fulfill. Images also have to be delivered to a wider variety of output devices.
Metadata is often defined in accordance with its literal interpretation: data about data. In the context of digital objects, we can categorize metadata as either descriptive (facilitating resource discovery and identification), administrative (supporting resource management within a collection), or structural (representing the components of complex information objects).
Of these three categories, descriptive metadata has received the most attention, most notably through the Dublin Core metadata initiative; however, increasing awareness of the challenges posed by digital preservation has underscored metadata needs for digital objects far beyond resource discovery.
Effective management of the preservation of digital objects is likely to be facilitated by the creation, maintenance, and evolution of detailed metadata. Metadata documents the technical processes associated with preservation, specifies rights-management information, and establishes the authenticity of digital content. Metadata records the chain of custody for a digital object and uniquely identifies it within the archive in which it resides. In short, the creation and employment of preservation metadata is likely to be a key component of most digital preservation strategies.2
To date, cultural institutions have focused primarily on defining descriptive metadata for discovery and identification, and comparatively little work has been done to codify the technical attributes of digital images and their production. Technical metadata, a part of the preservation metadata, is necessary to support two fundamental goals: documenting image provenance and history (production metadata) and ensuring accurate rendering of image data to screen, print, or film. These functions will require the development of applications to validate, process, and migrate image data according to criteria encoded as technical metadata.
A couple of years ago, the U.S. National Information Standards Organization (NISO) started work on defining the necessary technical metadata.3 Two overarching goals led NISO to develop this data dictionary. The first was to identify the data elements that would be used to control transformations of images according to metrics for meaningful quality attributes such as detail, tone, color, and size. The second was to propose elements that would be used by digital repository managers, curators, or imaging specialists to assess the current value of a given image or collection of images. Mapping the metadata in an image file to these dictionary elements will be essential to automate the collection of technical metadata.
The digital archiving effort still faces many challenges. According to those responsible for some of the big digital reformatting projects, rapid changes in the technology make it difficult to choose the best time to set up a reformatting policy that will not be outdated tomorrow. The lack of communication between the technical field and institutions remains a formidable obstacle. It cannot be emphasized enough that if institutions fail to communicate their needs to the hardware and software industries, they will not get the tools they need for their special applications. Metadata, and specifically technical metadata, is important to make these tools work. oe
1. F. Frey and Steve Chapman, "Developing Specifications for Archival Digital Still Images," Proceedings IS&T's PICS Conference, pp. 166-172, April 2001.
2. OCLC/RLG, "Preservation Metadata for Digital Objects: A Review of the State of the Art," http://www.oclc.org/research/projects/pmwg/presmeta_wp.pdf.
3. NISO "Technical Metadata for Digital Still Images," 2002 http://www.niso.org/committees/committee_au.html.
For more information, attend Storage and Retrieval Methods and Applications for Multimedia 2004 at the Electronic Imaging symposium, 18-22 January, San Jose, CA (electronicimaging.org/program/04/).
Rob Buckley is a research fellow in the Xerox Imaging & Services Technology Center in Webster, NY.
Franziska Frey is an associate professor in the School of Print Media at the Rochester Institute of Technology, Rochester, NY.