A recently invented coding technique for data compression is based on recursive interleaving of variable-to-variable-length binary source codes. The technique can be used as a key component in data compressors for a wide variety of data types, including image, video, audio, and text data.
The technique is adaptable in that the probability estimate used to encode a data bit can be updated with each new bit. This can result in better compression performance compared to encoders that require a fixed or slowly changing prob- ability estimate.
The technique seems to have advantages over the prior entropy coding method called arithmetic coding. The technique is particularly amenable to the design of relatively simple, fast decoders. Moreover, because the technique offers flexibility in the choice of parameters for a particular design, it is possible to trade compression performance versus speed. A straightforward code design procedure can be used to produce an encoder with compression efficiency arbitrarily close to the theoretical limit, with increasing complexity as the limit is approached.
The coding problem solved by this technique is that of compressing a sequence of source bits. The technique allows the estimated probability (pi) that the ith source bit (bi) is zero to depend on the values of the preceding source bits, correlations or memory in the source, and any other information available to both the encoder and the decoder. The technique efficiently encodes the source sequence by recursively encoding groups of bits with similar probabilities, ordering the output in a way that is suited to the decoder.
Before encoding, input bits are inverted as needed to force pi≥1/2. For the purpose of encoding, the probability range from 1/2 to 1 is partitioned into several narrow intervals. Associated with each interval is a bin that is used to store bits. When bi arrives, it is placed in the bin that corresponds to the probability interval that contains pi. Because each interval spans a small probability range, each bin can be regarded as corresponding to some nominal probability value. For each bin except the leftmost one (which contains probability 1/2), an exhaustive prefix-free set of binary code words is specified. When the bits collected in a bin form one of the code words, these bits are deleted from the bin and the value of the code word is encoded by placing one or more new bit(s) in other bins. (The bits in each bin are arranged in a specific order that make decoding possible.) Thus, bits arrive in various bins either directly from the source or as a result of processing code words in other bins. The net effect of this process is to cause bits to migrate to the leftmost (uncoded) bin, from whence they are transmitted.
The formation of code words is represented by a binary tree (see figure) for each bin except the leftmost one. Each code word is assigned a terminal node in the tree. Branch nodes of the tree are labeled to represent destination bins, and the branch labels (zeroes and ones) correspond to output bits that are placed in the bins.
Once the end of the source bit sequence has been reached and no code words remain in any bin, there are still likely to be partially formed code words in one or more bins. Because these bits are needed for decoding, one or more extra bits are appended to each partial code word to form complete code words, which are then processed in the normal manner. The extra bits can be regarded as being used to “flush” the encoder.
In practice, the encoder and decoder do not track probability values. Instead, each bin is assigned an index, starting with 1 for the leftmost (uncoded) bin. The label for each node in the binary tree is the index (rather than the nominal probability value) of the bin to which the next output bit is mapped. There is imposed a requirement that each output bit from the tree for bin j must be mapped to a bin with index strictly less than j. No computations involving prob- ability values are needed, apart form those that may be required to map each input bit to the bin of the appropriate index. (Although probability values are largely unneeded for the operation of an encoder and decoder, it is useful to track probability values to design a good encoder.)
If the trees and bins are well designed, then on the average, the number of bits used to describe a code word is less than the code word length, and data compression occurs. Some redundancy is present because the bins have positive width; in other words, the probability associated with a bit that arrives in a bin usually does not exactly equal the nominal probability for that bin, and bits in the leftmost bin are transmitted uncompressed, even though many do not have probability exactly equal to 1/2. However, by increasing the number of bins and/or the size of the trees, one can trade complexity for performance and decrease the maximum redundancy to arbitrarily small values.
Although the underlying compression process relies on binary encoding, any nonbinary source can be first converted to a binary one before encoding. Thus, the technique can be applied to nonbinary sources as well.
This work was done by Aaron Kiely and Matthew Klimesh of Caltech for NASA’s Jet Propulsion Laboratory.
In accordance with Public Law 96-517, the contractor has elected to retain title to this invention. Inquiries concerning rights for its commercial use should be addressed to
Intellectual Property group JPL Mail Stop 202-233 4800 Oak Grove Drive Pasadena, CA 91109 (818) 354-2240
Refer to NPO-21101, volume and number of this NASA Tech Briefs issue, and the page number.