Data compression leverages redundant data to free up storage capacity, boost computing speeds, and provide other perks. In current computer systems, accessing main memory is very expensive compared to actual computation. Because of this, using data compression in the memory helps improve performance, as it reduces the frequency and amount of data programs need to fetch from main memory.
Memory in modern computers manages and transfers data in fixed-size chunks, on which traditional compression techniques must operate. Software, however, doesn't naturally store its data in fixed-size chunks. Instead, it uses “objects” — data structures that contain various types of data and have variable sizes. Therefore, traditional hardware compression techniques handle objects poorly.
A new technique was developed that frees up more memory used by computers and mobile devices, allowing them to run faster and perform more tasks simultaneously. Using a modified Java virtual machine, the technique compressed twice as much data and reduced memory usage by half over traditional cache-based methods.
Traditional architectures store data in blocks in a hierarchy of progressively larger and slower memories, called caches. Recently accessed blocks rise to the smaller, faster caches while older blocks are moved to slower and larger caches, eventually ending back in main memory. While this organization is flexible, it is costly: To access memory, each cache needs to search for the address among its contents.
A system called Hotpads stores entire objects, tightly packed into hierarchical levels or “pads.” These levels reside entirely on efficient, on-chip, directly addressed memories with no sophisticated searches required. Programs then directly reference the location of all objects across the hierarchy of pads. Newly allocated and recently referenced objects, and the objects they point to, stay in the faster level.
When the faster level fills, it runs an “eviction” process that keeps recently referenced objects but kicks down older objects to slower levels and recycles objects that are no longer useful, to free up space. Pointers are then updated in each object to point to the new locations of all moved objects. In this way, programs can access objects much more cheaply than searching through cache levels.
The new technique, called Zippads, leverages the Hotpads architecture to compress objects. When objects first start at the faster level, they're uncompressed. But when they're evicted to slower levels, they're all compressed. Pointers in all objects across levels then point to those compressed objects, which makes them easy to recall back to the faster levels and able to be stored more compactly than prior techniques.
A compression algorithm then leverages redundancy across objects efficiently. This technique uncovers more compression opportunities than previous techniques, which were limited to finding redundancy within each fixed-size block. The algorithm first picks a few representative objects as “base” objects. Then, in new objects, it only stores the different data between those objects and the representative base objects.