Introduction To Data Compression

If you are interested in learning more about data compression check out this ebook, [PDF] Introduction to Data Compression), that I just came across. It provides a good introduction to information theory, probability coding, applications of probability coding, data compression algorithms, and more related information. I haven’t had a chance to read over the entire ebook, but it seems to be a sufficient introduction to data compression topics. However, if you are still curious and want to learn more, check out some of these books:

So what is data compression? According to theoretical computer science and information theory, it is the process of encoding data by using less bits than the unencoded version would use via the use of certain encoding mechanisms. It is really useful because it helps to reduce the use of resources on a system, specifically network bandwidth and hard disk space. However, there is a down side to compression. The data must be compressed and decompressed, which requires additional processing. In some cases, this can take quite a bit of time and use a lot of system resources, negatively affecting other applications on the system.

There are many different algorithms used for compression. They fall into two main categories: Lossless and Lossy. Lossless algorithms represent the original data more concisely without any errors using statistical redundancy (compression is done by removing commonly occurring sets of bits and then fully replacing them during decompression). Lossy compression is a bit more extreme. It involves human perception of the data. With lossy algorithms, all data is not replaced upon decompression, which results in a slightly altered outcome. Although some data is removed, lossy compression provides a method of preserving greater amounts of system resources when compared to lossless compression.

Additional Resources

Managing Gigabytes: Compressing and Indexing Documents and Images

Written on September 8, 2010