What is Program Compression?

Program compression is a method of reducing the size of software, data, or images on disc. This is a great benefit when sending information back and forth over the Internet, particularly when using slower connections, and saves storage space when archiving material. Program compression is achieved by a fairly simple process of removing redundancy from data with several well known software packages. In this sense, redundancy refers to repetitive use of common components in any data block. Program compression does not harm the original data in any way; once restored or decompressed, it maintains all of its original content and formatting.

Data or image file size can be a problem when trying to send information via the Internet or when large numbers of files or applications are stored. Slower Internet connections labor when large files are processed, and users paying high premiums for bandwidth usage certainly appreciate smaller file sizes as well. The question is, how do you take a program, file, or image and squish it down to a more manageable size without destroying its content? Data or program compression applications perform this miraculous magic trick with ease by simply removing a lot of the repetition which makes up any piece of electronic information.

For example, consider the previous paragraph. The word “data” is repeated twice and the word “or” four times. In the first stage of redundancy removal, a program compression application uses a LZ adaptive dictionary-based algorithm to remove all of these repeated pieces of information. It then creates a referential system which replaces the extraneous words with a short, unique identifier referencing back to the applications dictionary. The next stage of compression is isolating shorter common strings of information such as certain letter groups or specific letters followed by spaces.

This process is repeated until the data is free of any repetitive material. All the removed data is stored once for each entry along with its unique identifiers in a separate dictionary file. When the program is called upon to decompress the piece, it simply reverses the process and replaces each identifier with its relevant word or part thereof. This process of data compression is known as lossless compression as none of the original data is omitted. The process of compressing images, using a somewhat different approach that does discard some of the original material, is known as lossy compression.

Images files are typically made up of many different graduations of color and value. When viewed conventionally, many of these areas appear as single colors. They are, however, made up of many subtle hues which are all digitally represented as different colors. When an image is compressed, the program will replace all pixels with a similar color value with a single color. This information is then assigned the same sort of reference as that used in the program compression process. In this way, the amount of data which makes up the image is significantly reduced with little or no change in visual quality.