Image compression plays a central role in modern multimedia communications, and compressed images arguably represent the dominant source of Internet traffic today. Typical lossy image compression factors range from 10:1 to 50:1, without which image communications, digital cameras, and other applications would be either much less appealing or exorbitantly expensive.
The JPEG image compression standard has found wide acceptance on the Internet for use with digital cameras, printing equipment, and scanning equipment. The JPEG2000 standard that succeeds JPEG is the latest image compression standard to emerge from Working Group 1 of ISO/IEC JTC1/SC29, popularly known as the Joint Photographic Experts Group (JPEG).1 JPEG2000 is motivated by the need for compressed image representations that offer features increasingly demanded by modern applications while also offering better compression performance.
Fundamentally different than the original JPEG, JPEG2000 provides a scalable way to represent and interact with digital imagery. Scalability is a modern buzzword, but for the image and video compression communities the term has a precise meaning with great practical relevance in communicating and storing digital media. A compressed data stream is considered scalable if it consists of an embedded collection of smaller streams, each representing an efficient compression of the original source material.
Scalability allows one to compress once but decompress in many ways. The person who compresses the image need not know the resolution and quality required by a consumer. The complete data stream, as well as each of its subsets, represents the source as efficiently as if the compressor had known the consumer's requirements. scalability in JPEG2000
A compressed data stream is resolution scalable if it contains identifiable subsets that represent successively lower resolution versions of the source (see sidebar on page 40); it is distortion scalable (or SNR scalable) if it contains identifiable subsets that represent the source at full resolution, but with successively lower quality (in other words, with more distortion and with coarser quantization). Scalability can also describe other attributes.
Subsets of a JPEG2000 code-stream may be extracted to represent the original image at a reduced resolution, a reduced quality (higher distortion), or over a reduced spatial region. At the same time, JPEG2000 offers state-of-the-art compression efficiency and allows the user to selectively emphasize image quality within any particular region of interest.
JPEG2000 is based on the discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). An image is decomposed into a collection of sub-sampled spatial frequency bands, known as subbands. The subbands belong to a multi-resolution hierarchy in which each successively higher-resolution version of the image is formed by composing the immediate lower-resolution version (denoted LLd) with three spatial detail subbands (denoted LHd, HLd, and HHd), using a process known as "wavelet synthesis."
The samples describing each subband are partitioned into rectangular blocks, known as code-blocks, each of which is independently coded into a finely embedded bitstream. Truncating the embedded bitstream associated with any given code-block has the effect of quantizing the samples in that block more coarsely. Each block of each subband in each image component may be independently truncated to any desired length after the compression is complete.
Resolution scalability in JPEG2000 is a direct consequence of the multiresolution properties of the DWT. By dropping the code-blocks corresponding to the highest resolution detail subbands and omitting the final stage of DWT synthesis, it is possible to reconstruct a half-resolution image from the remaining subbands. Dropping the next lower resolution subbands leaves a quarter resolution image, and so forth.
Distortion scalability is introduced into JPEG2000 code-streams through a "quality layer" abstraction. Each quality layer represents an incremental contribution (possibly empty) from the embedded bitstream associated with each code-block in the image. The sizes of these incremental layer contributions are determined during compression so that any leading set of quality layers corresponds to an efficient compressed representation of the original image. One can then scale the distortion by discarding one or more final quality layers. One can also discard partial layers, although this is not optimal.
JPEG2000 compressed data streams also offer spatial random access into the image because each code-block is associated with a limited spatial region and is coded independently. Typical code-block dimensions are 32 x 32 or 64 x 64 subband samples. The size of the reconstructed image region affected by any given code-block depends on the particular subband to which it belongs. Also, adjacent code-blocks from any given subband have overlapping regions of influence in the reconstructed image (because wavelet synthesis is a spatially expansive operation). This property tends to blur the boundaries between code-blocks in the reconstructed image, avoiding the appearance of hard boundary artifacts if individual block bit-streams are aggressively truncated.
In a JPIK image browsing session, the user can select a region of interest and extract a detailed view (below) from the overall reconstructed image (above). (University of New South Wales)Kakadu and JPIK
JPEG2000 offers tremendous flexibility in the sequencing and emphasis of compressed data from different frequency bands and spatial regions. One of the most popular implementations of the JPEG2000 standard is the Kakadu software toolkit developed by our group at the University of New South Wales (Sydney, Australia).
Apart from its emphasis on execution speed and dynamic memory minimization, Kakadu exposes the spatial accessibility of JPEG2000 code-streams in a number of attractive ways. Applications based around Kakadu can choose to decompress any selected spatial region of any desired image components at any desired resolution. The high level interfaces offered by Kakadu allow applications to work with the compressed image from a geometric perspective that can be rotated, flipped, windowed, or zoomed relative to the original image. Image data is decompressed incrementally in accordance with the requested geometry in a manner that avoids unnecessary computation or buffering.
These facilities allow enormous compressed images to be interactively decompressed and rendered with comparative ease. The "kdu_show" application has been used to interactively view images as large as 3 Gb, after JPEG2000 compression. This application is free to download along with other tools for demonstrating Kakadu and JPEG2000.
While the services offered by Kakadu may be exploited in a variety of different communication paradigms, our recent experience with remote browsing of images has been in the context of the JPEG2000 Interactive, Kakadu (JPIK) protocol.2 JPIK is a network communication protocol that uses transmission control protocol (TCP) and, optionally, user datagram protocol (UDP) as the underlying network transport protocols. The client communicates changes in its current region, resolution, or components of interest, which the server uses to customize its transmission of compressed data. The server maintains a mirror image of the state of the client's cache, transmitting only data that is not already available to the client.
In this way, the client gradually accumulates information concerning the image, with higher image quality in those regions that the user has spent more time browsing. The figure on page 39 shows the image quality obtained after a brief browsing session in which the user quickly zooms into an initial low-resolution version of the image, focusing attention on the 10-digit number displayed on the banner flown by the left-most boat in the image. The original uncompressed image is 17.4 Mb in size; whereas the total amount of data transmitted (at 4 kb/s) in this brief browsing session is only 73 kb. the JPEG2000 advantage
Efficient interactive browsing of large images has been of interest for some time. "Flashpix," developed by the Digital Imaging Consortium (DIC), provides one reasonably well-known alternative approach to achieve exactly this type of functionality. The Flashpix format is based on JPEG compression of small, independent image tiles, each of which can be accessed and rendered on demand.
Apart from its compression efficiency, JPEG2000 offers a number of important advantages over Flashpix. Among these is the ability to have image quality progress smoothly all the way up to a truly lossless representation within any spatial region of interest. Also, the overlapping properties of the DWT synthesis operation tend to prevent the appearance of hard boundaries between the region of interest and the rest of the image. JPEG2000 is widely regarded as the natural replacement for Flashpix-based applications.
The JPEG2000 image compression standard is likely to benefit a number of applications in which an image, video, or other source must be delivered to multiple consumers or a single consumer with varying requirements. oe
1. D. Taubman and M. Marcellin, JPEG2000: image compression fundamentals, standards and practice, Kluwer Academic Publishers, 2002.
2. D. Taubman, The JPIK Protocol, white paper available from http://maestro.ee.unsw.edu.au/~taubman/.
know the code
Distortion is a numerical measure of the difference between the original image and its reproduction, after lossy compression.
Image component is a two-dimensional array of scalar sample values. Color images typically have three components (or planes), corresponding to the red, green, and blue intensities, or the luminance and two chrominance components.
Lossless compression means that a compressed representation can be decompressed to recover the original image sample values exactly.
Lossy compression does not preserve all of the original information in the image, but it preserves as much as possible of the visually relevant information, subject to constraints on the compressed file size.
Multiresolution hierarchy refers to a family of images having successively lower resolutions. In this article, the image resolution is reduced by a factor of two between successive levels in the hierarchy.
Image resolution may be defined as the minimum number of rows and columns required to capture the spatial details in the image.
David Taubman is a senior lecturer in the School of Electrical Engineering and Telecommunications at the University of New South Wales, Australia.