Share Email Print

Proceedings Paper

Is the CCSDS rice coding suitable for GPU massively parallel implementation?
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

The Consultative Committee for Space Data Systems (CCSDS) Rice Coding is a recommendation for lossless compression of satellite data. It was also integrated with HDF (Hierarchical Data Format) software for lossless compression of scientific data, and was proposed for lossless compression of medical images. The CCSDS Rice coding is an approximate adaptive entropy coder. It uses a subset of the family of Golomb codes to produce a simpler, suboptimal prefix code. The default preprocessor is a unit-delay predictor with positive mapping. The adaptive entropy coder concurrently applies a set of variable-length codes to a block of consecutive preprocessed samples. The code option that yields the shortest codeword sequence for the current block of samples is then selected for transmission. A unique identifier bit sequence is attached to the code block to indicate to the decoder which decoding option to use. In this paper we explore the parallel efficiency of the CCSDS Rice code running on Graphics Processing Units (GPUs) with Compute Unified Device Architecture (CUDA). The GPU-based CCSDS Rice encoder will process several codeword blocks in a massively parallel fashion on different GPU multiprocessors. We parallelized the CCSDS Rice coding by using reduction sum for code option selection, prefix sum for intra-block and inter-block bit stream concatenation as well as asynchronous data transfer. For NASA AVIRIS hyperspectral data, the speedup is near 6× as compared to the single-threaded CPU counterpart. The CCSDS Rice coding has too many flow control instructions which significantly affect the instruction throughput by causing threads of the same CUDA warp to diverge. Consequently, the different execution paths must be serialized, increasing the total number of instructions executed within the same warp. We conclude that this branching and divergence issue is the bottleneck of the Rice coding that leads to smaller speedup than other entropy coding on GPUs.

Paper Details

Date Published: 2 November 2011
PDF: 9 pages
Proc. SPIE 8183, High-Performance Computing in Remote Sensing, 818308 (2 November 2011); doi: 10.1117/12.896893
Show Author Affiliations
Xianyun Wu, Xidian Univ. (China)
Yunsong Li, Xidian Univ. (China)
Chengke Wu, Xidian Univ. (China)
Bormin Huang, Univ. of Wisconsin-Madison (United States)

Published in SPIE Proceedings Vol. 8183:
High-Performance Computing in Remote Sensing
Bormin Huang; Antonio J. Plaza, Editor(s)

© SPIE. Terms of Use
Back to Top