Here's a way to compress images that has many desirable properties for network transmissions.
Introduction
Laplacian pyramid encoding is a technique used for compressing digital images [1] . Its characteristics are speed, good compression, and moderate quality of expanded image. The Laplacian pyramid encoding algorithm (I'll call it LPE throughout the rest of the article) exploits the common property of most images that adjacent pixels are highly correlated. The algorithm attempts to remove redundancy from the image by decorrelating the pixels. It also separates the image into different frequency levels and quantizes each level such that fewer bits per pixel represent higher frequency levels.
In this article I discuss why you might want to use LPE in your own imaging applications, explain how LPE works, and present an implementation in C++.
Why Use Laplacian Pyramid Encoding?
LPE is not the only image compression algorithm around, not even within its class. Wavelet encoding, for example, is an image compression method very similar to LPE. In fact, Wavelet theory was spawned from Laplacian pyramids. Therefore, in order to understand Wavelets, you might begin by first understanding Laplacian pyramids. Wavelet encoding may yield higher compression ratios, or, in some instances, it may produce better quality images. However, LPE still has a few advantages over Wavelets, and in certain circumstances may be the more appropriate algorithm.
Real-time compression is an example of an application more appropriate for LPE. This is especially true when high throughput is more important than image quality. That's because LPE implementations can be much faster than Wavelet implementations. Current popular Wavelet algorithms use sample blocks of 9 X 9 pixels for each pixel in the image, whereas Laplacian pyramids use blocks of 5 X 5 or 3 X 3 three pixels. So the Wavelet algorithm requires 81 calculations per pixel vs. LPE's 25 for 5 X 5 pixels, or 9 for 3 X 3 pixels. Thus, for the 3 X 3 format, LPE requires 1/7 the calculations per pixel overall (not 1/9, because a Laplacian pyramid structure covers 33% more pixels than a Wavelet structure).
The Center for Vision and Imaging Sciences at the University of Texas implements Laplacian pyramids in one of its Windows 95/NT Foveated Imaging Systems for precisely this reason [2] . A Wavelet implementation of this system is in development, but it can't hope to achieve the real-time performance of the Laplacian pyramid implementation.
Another advantage to Laplacian pyramids is that they are very easy to understand, so they are subsequently easy to implement. You don't need to know about sub-band filtering or multi-resolution analysis to understand Laplacian pyramids. Simply knowing the algorithm and understanding how an image is encoded and decoded is enough to understand why the algorithm is effective. For that reason I'll get right down to the algorithm and skip all that other stuff. You can find a more thorough explanation of LPE in [1] .
The Building Blocks Reduce and Expand
The heart of the Laplacian pyramid encoding algorithm lies in two operations called Reduce and Expand. Contrary to their names, Reduce and Expand do not perform the image compression/expansion in themselves. They are merely steps along the way.
Reduce takes an N X N image and reduces it to an N/2 X N/2 image. In other words, it reduces an image to 1/4 its original size in number of pixels. Reduce operates by taking a weighted average of pixel values over a block of pixels to produce a single pixel. Expand is just the opposite: it takes an N X N image and expands it to N*2 X N*2. Expand works by interpolating to create new pixels in the regions between pixels. Reduce and Expand come into play later, in building the Laplacian pyramid required for image compression. I will explain the Laplacian pyramid in detail, but first let's look at how these two operations work.
Reduce
LPE typically uses a 5 X 5 pixel block. However, I use a 3 X 3 block in this article because it's a little simpler than 5 X 5 and it achieves comparable results. 3 X 3 blocks will overlap one another by one pixel to produce a reduction ratio of 4:1. (The reduction ration is always 4:1 in LPE.) Figure 1 represents a piece of an image that is five pixels wide and five pixels tall. This part of the image will be reduced to 5/2 X 5/2, or 2 X 2 pixels.
The implementation that I present uses a function called Reduce, which iterates over the 3 X 3 blocks in the image. The code for Reduce is shown in Listing 1. Reduce calls Reduce3x3 for each 3 X 3 block in the image to compute the weighted average. Reduce3x3 returns the result of each weighted average to Reduce so that it may assign the average to a pixel in the reduced image.
Reduce also examines the blocks on the right and bottom edges of the image to determine whether or not they require special treatment:
- If the image width is a multiple of two, then blocks on the right edge are only two pixels wide, so Reduce calls Reduce2x3 to find the weighted average for these blocks.
- If the image height is a multiple of two, then blocks on the bottom edge are only two pixels tall, so Reduce calls Reduce3x2 to find the weighted average for these blocks.
- If both width and height are multiples of two, then the bottom right corner is only two pixels wide by two pixels tall, so Reduce calls Reduce2x2 to find the weighted average for this block.
The weighted averaging routines, Reduce3x3, Reduce2x3, Reduce3x2, and Reduce2x2, are shown in Listing 2. Each accepts a pointer to a pixel array, which points to the top left corner of a block. These routines apply the weighting function to each pixel in the block by scanning the block row by row; the result for each pixel is added to a common sum. Note that these routines must know the width of the entire image to find the next row of the patch.
The weighting function is symmetric about the center pixel. Pixels further from the center pixel are weighted less than pixels closer to the center pixel. Therefore, the four corner pixels each get a weight of 1/16, the four pixels on the side each get a weight of 1/8, and the center pixel gets a weight of 1/4. The sum of these nine weights is equal to one. The sum of the weights must equal one so that the average energy in the source image is the same as in the reduced image.
Edge blocks that are only two pixels wide or two pixels tall will reflect themselves to create a block that is fully three by three. For example, a 2 X 3 block is missing pixels in the right side column; therefore, the pixels in the left side column are used to create the right side column. The code carries out this replication simply by multiplying the pixel values in the left side column by two.
Expand
The Expand process is the opposite of Reduce: it takes an N x N image and expands it to an N*2 x N*2 image. In other words, Expand increases an image to four times its original size in number of pixels. The Expand process works by interpolating to create new pixels in the regions between pixels. The new pixel's values are found by averaging the pixels between which the new pixels are placed.
Figure 2 represents four pixels that are expanded to a 5 X 5 block. Pixels a, b, c, and d will retain their values after the expansion. Pixel x will be interpolated by averaging together pixels a and b. Pixel y will be interpolated by averaging together pixels a, b, c, and d. The implementation presented here uses a function called Expand (Listing 3) that iterates over the pixels in a reduced image and expands them each to 3 X 3 overlapping blocks.
Expand calls Expand3x3 for each pixel in the image, except those on the right and bottom edges. Expand3x3 interpolates as follows: first, it assigns the full value of the pixel passed to the routine to the center pixel. Next, 1/2 the value of the pixel passed to the routine is added to each of the four side pixels. Finally, 1/4 the value of the pixel passed to the routine is added to each of the four corner pixels. Referring again to Figure 2, look at the pixel labeled y. Four different blocks overlap this pixel, so by adding 1/4 from each of a, b, c, and d, this pixel becomes the average of the four pixels. The pixel labeled x is overlapped by two blocks, so by adding 1/2 from each of a and b, this pixel becomes the average of a and b. Pixels a, b, c, and d aren't overlapped, so they are just assigned the value of the pixel that is to be expanded.
Similarly to Reduce, Expand must also examine the right and bottom edges to determine what sort of blocks they must be expanded to 3 X 3, 2 X 3, 3 X 2, or 2 X 2. These expansions are carried out by the routines Expand3x3, Expand2x3, Expand3x2, and Expand2x2 respectively (see Listing 4).
Expand has one additional thing to do before it's finished. Notice that all the expanded pixels on the edges aren't completely interpolated. The top left pixel of an image, for example, does not have neighbors above or to the left. Therefore, when it gets expanded, the top left pixel of the expanded image gets contributions from only one pixel instead of four. Likewise, all the pixels on the top and left edges of an expanded image receive contributions from only 1/2 as many pixels as they need. To remedy this, all the pixels on the top and left edges of an expanded image are multiplied by two. In carrying out this multiplication, the top left corner is multiplied by two twice, so it is effectively multiplied by four, just as it should be. The right and bottom edges require a little more thought.
Depending upon the dimensions of the image, the right and bottom edges may not require correction. Odd dimensions must be multiplied; even dimensions may be left alone. Again, the corners get multiplied by either 1, 2, or 4 automatically, with no special consideration.
Building a Laplacian Pyramid
While Reduce and Expand are the most basic processes in LPE, the fundamental data structure in LPE is the Laplacian pyramid. It serves as an intermediary between the original image and the final compressed file. The Laplacian pyramid consists of layers of computed images known as Laplacian images. I explain the process of computing Laplacian images, and how they are stacked to form a pyramid, as follows.
Given an image I, that image can be Reduced and Expanded again to produce an image G of the same size:
G = Expand(Reduce(I))The process that produces G is like a Gaussian weighting function, hence the label G. G is not exactly the same as the original image. Instead, it is an approximation of the original image with some of the high frequency information removed. Another way of saying this is that G is a low-pass filtered image.
Subtracting G from I yields a difference image, L, which shows how much information was removed:
L = I - GThe zero values in L represent places where the original image was highly correlated or smooth. Large negative or positive values represent edges or anomalies in the original image. For example, referring to Figure 3, the original image is labeled I. The image labeled R is obtained by applying the Reduce operation to I. G is the low-pass filtered image the expanded version of R. Finally, L is the difference between G and I, the Laplacian image.
L by itself is commonly used for edge detection. Dark pixels in this image represent low magnitude values and correspond to smooth areas in I. Bright pixels represent high magnitude values and correspond to edges in I. Image compression, however, calls for more than just these two images; we need many of them representing different frequency scales. We need a Laplacian pyramid.
To show how to construct a Laplacian pyramid, I will first restate the above formula
L = I - Gas
L = I - Expand (Reduce (I))Or, if R = Reduce (I),
L = I - Expand (R)which can be rearranged to yield
I = L + Expand (R)This equation shows that restoring the original image, I, requires only L and R.
Refer to Figure 4. The original is reduced to R1 and then expanded again. The expanded image is then subtracted from the original to produce L0. Now the original can be reproduced using only L0 and R1. We can also encode R1. R1 gets reduced to produce R2. R2 is then expanded and subtracted from R1 to produce L1. R1 can now be reproduced using only R2 and L1. If R2 is subsequently encoded, the only images needed to restore it will be R3 and L2. Finally, the only images we will need to restore the original image are L0, L1, L2 and R3.
This collection of images constitutes the Laplacian pyramid. L0 is the base or bottom level of the pyramid, and R3 is the top of the pyramid. Notice that R3 is very small; it's 1/64 the size of the original. Also, because L0, L1, and L2 are difference images they contain a high number of zeroes and have very peaked distributions. These characteristics enable the Laplacian images to be compressed very effectively. Also notice that the base level of the pyramid, because it is the largest, represents finer details in the image. It is therefore the image containing the highest frequencies. As the images get smaller, each of their pixels represent a greater area in the original image. Thus, higher levels in the pyramid represent lower frequencies in the image.
The Laplacian pyramid data structure consists of a width, a height, the number of levels in the pyramid, and two arrays of pointers to arrays used to store the reduced and expanded images. The constructor for the Laplacian pyramid object (Listing 5) takes width, height, number of levels, and a pointer to the original image as parameters. The constructor then allocates the memory necessary for the reduced and expanded image arrays, and copies the original image into the reduction array at index 0. A faster implementation would not make this copy and instead would have a special case for reducing the original image. However, in this case I choose compactness over speed. You can ignore the _bpp array for now. I show how it is used later in the Quantization section.
The constructor allocates the Laplacian pyramid structure, but fills in only level 0. The Compress routine (shown below) is what fills in the remaining levels of the pyramid. Compress indexes through each level of the pyramid, starting with level 0, calling Reduce, Expand, and Subtract for that level.
void LP::Compress () { // Compute the Laplacian Pyramid for (unsigned level = 0; level < _levels - 1; level++) { Reduce (level); Expand (level); Subtract (level); } }To read a Laplacian image file, I've created a second constructor for the LP class. This constructor opens a file, reads the necessary parameters, allocates the level arrays, and reads the values from the file into the arrays. To decompress the image, Decompress (shown below) expands the top-level image and adds in the next level Laplacian image. It continues to do this down the pyramid until it reaches the base level image.
void LP::Decompress () { // Build the original image from the Laplacian pyramid for (int level = _levels - 2; level >= 0; level--) { Expand (level); Add (level); } }Achieving Compression
So far I've not shown anything that will produce compression. In fact, the Laplacian pyramid without further processing is about 4/3 the size of the original image. What's missing is a step within the Compress routine to perform entropy encoding of each level. Two examples of entropy encoding are Huffman encoding and Arithmetic encoding. For more information on Huffman encoding, see the CUJ articles listed in references [3] through [6]. The source code submitted with these articles is available with this month's ftp listings (see p. 3 for downloading instructions). Even with entropy encoding, the compression efficiency relies heavily on the smoothness of the image. You can compress smooth images a lot; unsmooth images won't compress very much they may even expand.
That's okay. I'm not done yet. There's an additional technique, called quantization, that will boost compression ratios up much further. So far the encoding has been lossless: the encoded image contains just as much information as the original. Quantization is lossy, meaning that the encoded image contains less information than the original. However, if the right information is removed in small amounts, our human visual system won't be able to perceive the difference. Removing information in greater amounts, of course, causes image quality to deteriorate, but increases the compression ratios.
Quantization
Have you ever seen those pictures that, from far away, portray the Mona Lisa or George Washington, but upon close inspection turn out to be just type-written letters on a sheet of paper? From far away we see an image, but up close the image disappears, and we see only black symbols on white paper. The reason for this is that the human visual system is very sensitive to contrast changes at low frequencies but is less sensitive to contrast changes at high frequencies.
From far away, as you scan across the page the image switches between black and white values at a very high frequency. (More precisely, the image striking your retina is high in spacial frequency content, but the simpler explanation will suffice.) The switching between white and black occurs so rapidly that we see varying shades of gray. Thus, at a distance just two colors, black and white, can create a relatively smooth image. Up close, however, the changes from black to white become more easily discernible as their frequency becomes lower until finally the image fades and we just see a bunch of random letters. At low frequencies, black and white cannot create a smooth image. We need more shades of gray to render George Washington.
The Laplacian pyramid can exploit the human eye's variation in sensitivity to contrast changes versus frequency. The lowest level in the pyramid contains the highest frequencies. Quantizing this image thus requires fewer bits per pixel. The exact number required is a little arbitrary: as few bits as possible without noticeably changing the image quality. Finding the right number of bits per pixels requires some experimentation. In general, the lower levels need fewer bits than the higher levels, and this is fortunate because the lower levels are the biggest levels in the pyramid.
The code in listing 6 shows how quantization is achieved in the pyramid. The bpp array specifies the number of bits per pixel each level of the pyramid should receive, from lowest level to highest level. The Quantize routine just goes through each level and shifts off any bits greater than this number. There are fancier ways of quantizing, but this is the simplest.
One More Useful Technique
There is another thing that makes LPE and other multi-resolution image coding schemes desirable for real-time image processing. Since a single frame (or image) is broken down into different frequencies, it's possible to send the low frequency difference images before the high frequency difference images. If the bandwidth should drop, the encoder can adapt by sending fewer Laplacian images per frame. This technique is called progressive transmission.
For example, suppose you are sending video frames across a network. If the transmitter is able to detect that bandwidth has dropped for any given frame, (e.g. due to noise in the channel), it could decide to not send the lowest image in the pyramid. This image usually represents a very large percentage of the pyramid, depending on how the image has been quantized. So, by dropping the lowest level, throughput could drastically increase.
Of course, the decoder will produce a much blurrier image in this case, but suppose this only happens in one frame out of 10. The viewer on the decoding end would not even notice a difference. Or suppose the channel is very noisy, and all of the level-0 images must be dropped. The viewer will certainly notice the difference, but a blurry image is better than no image.
Results
To close, I present some images that have been Laplacian pyramid encoded and their corresponding compression ratios. Note that the compression ratios listed are those that would typically be achieved by inserting an entropy encoding step. See references [3] through [6] for more information on Huffman encoding, an entropy encoding technique. I'll continue using the famous Lena image as the original image. Since many popular publications involving image compression use this image, you may be able to compare these results with other compression schemes to see how it measures up. Figure 5 shows a series of images that have been encoded with a five-level Laplacian pyramid. Below each image is the average number of bits per byte in the image, plus the five bpp parameters used to quantize the pyramid, listed from lowest level to highest level. The more the image is quantized, the blurrier the image appears, but the more compression achieved. This trade-off allows any compression ratio to be achieved by sacrificing image quality.
Image A has not been quantized, so no information is lost in the encoding. Thus, the encoded image is identical to the original, but pyramid encoding with entropy encoding only reduces the image size by about 25%. In images B and C, degradations in image quality begin to appear, especially in C. The compression ratio also improves to about 4:1. In image D, the lowest level in the pyramid has been dropped by quantizing away all the bits; the compression jumps up to 16:1. Image E drops even more levels away. Finally in F, all but the last level are quantized away. Image F is barely discernable, but its compression ratio is now 267:1.
References
[1] P. J. Burt and E. H. Adelson. "The Laplacian Pyramid as a Compact Image Code," IEEE Transactions on Communications, vol. 31, pp. 532-540, 1983.
[2] W. S. Geisler and J. S. Perry. "A Real-Time Foveated Multiresolution System for Low-bandwidth Video Communication," to appear in Human Vision and Electronic Imaging II (The International Society of Optical Engineering, 1998).
[3] Dwayne Phillips. "Data Compression Using Huffman Encoding," The C Users Journal, February 1992.
[4] Ludger Engbert. "On-the-Fly Huffman Encoding," The C Users Journal, December 1993.
[5] John W. Ross. "Record-Oriented Data Compression," The C Users Journal, April 1994.
[6] Richard Zigler. Improves Upon Ross's Huffman Encoding Implementation. "We Have Mail," C/C++ Users Journal, September 1994, p. 111.
Jeff Perry is a research assistant at The University of Texas at Austin Center for Vision & Image Sciences. Jeff can be reached at sp@mail.utexas.edu.