Mark is vice president of software development at Greenleaf Software and author of the upcoming The Data Compression Book (M&T Publishing). Mark can be contacted at 16479 Dallas Parkway, Suite 570, Dallas, TX 75248.
When we announced the DDJ Data Compression Contest in the February 1991 issue, we expected a big response from readers -- and we weren't disappointed. Over the course of the following months, compression programs from all over the world poured into DDJ's offices.
As you might expect from a contest like this, the entries covered the whole spectrum, from the good and bad to the downright ugly. However, as I reviewed the code coming in, the one thing that stood out was a refreshingly high level of creativity and innovation in the approaches to data compression. There were virtually no "me too" rehashed versions of familiar programs such as UNIX Compress or LHarc. Instead, programmers who read DDJ seemed to be eager to try out new algorithms and techniques, frequently with good results. This provides an interesting counterpoint to those who tell us that to produce good software today you need $1,000,000 budgets and hordes of interchangeable programmers. Maybe it doesn't have to be that way.
After running all the submitted programs through the DDJ compression test suite, a handful distinguished themselves as exceptional contenders for the DDJ compression championship:
The final test suite for the contest consisted of approximately 2 Mbytes each of text, graphics, and executable files. Individual file sizes ranged from as little as 5 Kbytes up to 700 Kbytes. I judged programs based on their ability to compress each of the three different types of files, as well as overall compression ability. We also judged speed of compression and expansion.
It took literally weeks of continuous testing on the test machine (a 25-MHz 80386 PC from Everex Systems that had an 160-Mbyte hard disk and 8 Mbytes of RAM), but after a few false starts, I was finally able to complete testing and begin tabulation of the results. The final results confirmed what I had suspected during testing: The championship was going to be decided between two programs that were as different as night and day.
When I first began testing Charles Ashford's COMPRESS program, I knew it was going to be a problem. Charles's program had two characteristics that made it stand out: First, it compressed files significantly better than any other program submitted, or for that matter, any other program I have seen. Second, it was by far the slowest compression program submitted; in fact it was three orders of magnitude slower than the fastest entrant! Testing the Ashford code took almost four full days on a 25-MHz 386 machine. However, the contest didn't place any restrictions on the time a program could take, so this was all perfectly legal.
The second stand-out program was Gene Olson's COMPACT. This program was at the complete opposite end of the spectrum. COMPACT is written entirely in 32-bit C for UNIX targets, and by MS-DOS standards is quite a resource hog in its own right. At a minimum, COMPACT requires 1 Mbyte of RAM for internal table space. Gene's program makes good use of system resources for managing I/O, however, as his compression and expansion speeds dwarf any of the competition. Gene has an advantage in working on a platform that implements overlapping I/O, and he makes the most of it.
The source code for the programs discussed here is available electronically; see "Availability," page 3. In addition, I've included a complete listing of the test results in both text and Lotus formats.
Tables 1 and 2 show the final results in all of the categories judged. As you can see, the two stand-out programs dominate their respective categories, but fail to make even an appearance where they are weak. A few other programs, such as Gage's SIXPACK, show up consistently high in the standings, but are generally not able to defeat the Ashford code.
Overall Graphics Text Executables
---------------------------------------------------------------------------
1st Ashford 37.5% Koistinen 38.6% Ashford 27.1% Ashford 45.7%
2nd Koistinen 40.5% Ashford 29.7% Gage 31.1% Gage 48.0%
3rd Gage 41.8% S. Heller 41.8% Koistinen 33.2% Koistinen 48.6%
4th Ehlert 43.8% T. Isaac 42.61% Ehlert 34.3% Ehlert 49.5%
5th E. Hatton 47.1% Gage 47.7% E. Hatton 34.3% E. Hatton 56.1%
NOTE: Ratios are expressed as 100 * (compressed size / original size)
Compression Expansion
--------------------------------------------------------------
1st Olson 85,613 bytes/sec Olson 131,968 bytes/sec
2nd U. Turku 67,003 bytes/sec U. Turku 90,575 bytes/sec
3rd Ehlert 15,564 bytes/sec A. Tam 63,668 bytes/sec
4th S. Boyd 14,962 bytes/sec Ehlert 43,341 bytes/sec
5th C. Wong 11,862 bytes/sec C. Wong 27,389 bytes/sec
As neither program is able to claim a decisive victory, the judge's final decision is that Gene Olson and Charles Ashford will share the Grand Prize. I hope that Gene and Charles combine their talents in the future, as between the two of them they have an unbeatable program.
During the course of testing the submissions to the DDJ compression contest, I also ran some well-known freeware and shareware programs through the test suite to see how well they performed. In every case, I used the default options to add files to an archive, and I took the archiving program's word for how many bytes it used to create the compressed file. The programs tested were:
Compression Speed Expansion Speed Compression Ratio ------------------------------------------------------------------------- PKZIP 13987 bytes/sec PKZIP 39745 bytes/sec ARJ 41.4% ARJ 13623 bytes/sec ARJ 34652 bytes/sec LHA 41.8% LHA 12168 bytes/sec LHA 23090 bytes/sec PKZIP 43.8% PAK 9452 bytes/sec PAK 20979 bytes/sec PAK 45.0% COMPRESS 6132 bytes/sec COMPRESS 8516 bytes/sec COMPRESS 52.6%
--M.N.
Copyright © 1991, Dr. Dobb's Journal