New space-mission concepts often require the generation of large amounts of data, but the capacity of communications channels has not increased proportionally. Thus, data compression has become a crucial aspect in designing new missions and instruments. In addition, many modern ground-based systems that must transfer impressive amounts of data—both between distant locations and within local networks of high-performance computers—are currently in the planning or operation stages. These systems will also benefit from highly efficient data compressors.

Existing data-compression solutions either require large amounts of computational resources or are unable to efficiently compress unexpected values that may be found in data streams. This applies to general-purpose compressors that are based on dictionary coding^{1} (such as .zip or .rar), which additionally require excessively long data blocks for adequate operation. They are, therefore, not suitable for use onboard satellites. Even in ground-based systems involving high throughputs, these solutions are inefficient. Some alternative codes, including arithmetic,^{2} range,^{3} and Huffman,^{4} offer optimal or close-to-optimal efficiencies but at the price of excessive computational loads.

Currently, the solution generally adopted for space systems^{5} is based on two-stage data processing, where it is first pre-processed (often using a data predictor), followed by coding of the prediction errors with some simple entropy coder. Although this is an appropriate solution, it is too sensitive to outliers in the data stream (i.e., values outside the expected statistical distribution).^{6} The most problematic situation is encountered when the compressor receives values that are much larger than expected, which often leads to a significant decrease in the compression ratio. This occurs frequently for space-based instruments because of the impact of energetic particles. When entropy coding is used for data compression, an adequate compromise between high compression ratios for frequent values and small expansion ratios for least-frequent data numbers must be found. Obviously, the best overall ratios are pursued, but at an acceptable computing cost. The compressor should additionally adapt to changing statistics, but in regards to spaceborne solutions, this should be done using small and independent data blocks to minimize packet losses when transmission errors occur.^{7} In general, common solutions perform fairly well, but some compromises cannot be attained. In addition, an efficient entropy coder can be used as both a general-purpose compressor (when combined with a generic pre-processing stage) and the coding stage of sophisticated solutions that are currently available, such as image compression^{8} or hyperspectral-data compression.^{9}

We focused on the second stage of the two-stage data-compression strategy. Specifically, our goal was to design an optimal entropy coder, initially aimed at lossless data compression (although it may eventually be used for lossy compression as well). We designed the prediction error coder^{11} (PEC) based on a segmentation strategy that makes it adequate for data described by unusual statistical distributions (contrary to Golomb-based coders). Figure 1 illustrates the coder's robustness compared to the Rice-Golomb coder^{12,13} used as space standard for lossless data compression. The efficiency of the PEC never drops below 40%, even when very large values are received (and it limits the maximum code length to less than twice the original size), while the Rice compressor can sometimes lead to prohibitive code lengths.

**Figure 1. **Compression efficiency of Rice-Golomb and prediction error coder (PEC) codes for discrete Laplacian distributions, using only three calibration points (at data entropies of 3, 5, and 11bits/sample).

We also designed a highly optimized adaptive layer, resulting in the fully adaptive PEC^{14} (FAPEC). This solution requires nearly the same computing resources as the current space standard, while offering much better resiliency regarding outliers. Figure 2 shows the ratios achievable by FAPEC when applied to synthetic data with 10% of outliers, a situation that may be found for several space instruments.^{15} The current standard cannot even reach ratios of two, while FAPEC may exceed four. Table 1 shows some tests done on real data using a very simple pre-processing stage, where the potential of FAPEC can be better appreciated. In the worst case, FAPEC is 5% below the standard, while in some cases it can double the compression ratios. FAPEC has been prototyped in a field-programmable gate array,^{15} offering throughputs of 32Mb/s with just 35mW of power consumption.

**Figure 2. **Compression ratios achieved with the current standard for space, Consultative Committee for Space Data Systems (CCSDS) 121.0, an optimally configured PEC, and the fully adaptive PEC (FAPEC). A discrete Laplacian distribution with 10% of flat, random noise was used to simulate outliers. Small (large) values of b mean low (high) dispersions.

**Table 1. **Data-compression results obtained with FAPEC, compared to the equivalent values obtained with the current CCSDS standard for lossless compression in space. The maximum theoretical ratio (Shannon limit) is also shown, thus revealing the achievable compression efficiencies. FAPEC performs excellently with large sample sizes. LISA PF: Laser Interferometer Space Antenna Pathfinder. GPS: Global Positioning System.

File | 121.0 | FAPEC | Shannon limit^{10} | Sample |
---|

Astronomical imaging | 4.16 | 4.41 | 5.44 | 32bits |

Galaxy imaging | 1.14 | 2.58 | 4.84 | 64bits |

Spectroscopic data | 1.62 | 1.61 | 1.63 | 16bits |

Hyperspectral data (Moffett) | 1.96 | 1.97 | 1.96 | 16bits |

LISA PF data | 3.86 | 4.00 | 4.12 | 24bits |

GPS data | 4.45 | 4.64 | 4.93 | 24bits |

In summary, we have designed and implemented—both in software and hardware—a highly efficient coding stage for data compression. It is a reliable and demonstrated alternative to existing coders that can be applied to complex, existing solutions such as in the context of image or hyperspectroscopy compression. The FAPEC coder can be used onboard satellites but also in ground systems such as for high-performance computing. We intend to further improve the coder and test its performance as coding stage of sophisticated compression systems. We also plan to parallelize its hardware implementation to reach higher throughputs.

*Part of this work was supported by Spanish Ministries of Science and Innovation (MICINN: grants AYA2009-14648-C02-01 and ESP2006-13855-C02-01), and of Education and Science (MEC: grant TME2008-01214), European Union regional development (FEDER) funds, the Agència de Festió d'Ajuts Universitaris i de Recerca (Agency for Administration of University and Research Grants) and the Institut d'Estudis Espacials de Catalunya (Institute for Space Studies of Catalonia).*

Jordi Portell de Mora

Institute of Cosmos Sciences, University of Barcelona

Barcelona, Spain

Jordi Portell is a postdoctoral research associate. He has been working on the European Space Agency's Gaia mission since 2000, including on data-compression issues.

Alberto G. Villafranca

Institute for Space Studies of Catalonia

Barcelona, Spain

Alberto Villafranca joined the data-compression group in 2005. He now works at the Cartographic Institute of Catalonia.

Enrique García-Berro

Department of Applied Physics, Technical University of Catalonia (UPC)

Castelldefels, Spain

Enrique García-Berro has been at UPC since 1991. He has also been a research associate at the Institute for Space Studies of Catalonia since 1996. He has published more than 130 papers in refereed journals.

References:

1. T. A. Welch, A technique for high-performance data compression, *IEEE Comput*. 17, pp. 8-19, 1984.

2. I. H. Witten, R. M. Neal, J. G. Cleary, Arithmetic coding for data compression,

*Commun. ACM* 30, pp. 520-540, 1987. doi:

10.1145/214762.2147713. G. N. N. Martin, Range encoding: an algorithm for removing redundancy from a digitized message, *Proc. Video Data Record. Conf*., pp. 173-180, 1979.

4. D. Huffman, A method for the construction of minimum redundancy codes,* Proc. IRE* 40, pp. 1098-1101, 1952.

5. Lossless data compression, blue book tech. rep., 1993. CCSDS 121.0-B-1.

6. M. Evans, N. Hastings, B. Peacock, *Statistical Distributions*, Wiley-Interscience, 2000.

7. Telemetry channel coding, blue book tech. rep, 2001. CCSDS 101.0-B-5.

8. P.-S. Yeh, P. Armbruster, A. Kiely, B. Masschelein, G. Moury, C. Schaefer, C. Thiebaut, The new CCSDS image compression recommendation, *Proc. IEEE Aerosp. Conf*., pp. 4138-4145, 2005.

9. G. Motta, F. Rizzo, *Hyperspectral Data Compression*, Springer-Verlag, 2006.

10. C. E. Shannon, W. Weaver,* A Mathematical Theory of Communication*, Univ. of Illinois Press, 1949.

11. J. Portell, A. G. Villafranca, E. García-Berro, A resilient and quick data compression method of prediction errors for space missions,

*Proc. SPIE* 7455, pp. 745505, 2009. doi:

10.1117/12.82941012. R. F. Rice, Some practical universal noiseless coding techniques tech. rep., 1979. Jet Propulsion Lab.

14. J. Portell, A. G. Villafranca, E. García-Berro, Quick outlier-resilient entropy coder for space missions,

*J. Appl. Remote Sens*. 4, pp. 041784, 2010. doi:

10.1117/1.347958515. M. A. Nieto-Santisteban, D. J. Fixsen, J. D. Offenberg, R. J. Hanisch, H. S. Stockman, Data compression for NGST, *Astron. Data Anal. Softw. Syst. VIII*, pp. 137-140, 1999.

16. A. G. Villafranca, S. Mignot, J. Portell, E. García-Berro, Hardware implementation of the FAPEC lossless data compressor for space, *NASA/ESA Conf. Adapt. Hardw. Syst*., pp. 170-176, 2010.