DDJ DATA COMPRESSION CONTEST RESULTS

And the winner is...

Mark Nelson

Mark is vice president of software development at Greenleaf Software and author of the upcoming The Data Compression Book (M&T Publishing). Mark can be contacted at 16479 Dallas Parkway, Suite 570, Dallas, TX 75248.


When we announced the DDJ Data Compression Contest in the February 1991 issue, we expected a big response from readers -- and we weren't disappointed. Over the course of the following months, compression programs from all over the world poured into DDJ's offices.

As you might expect from a contest like this, the entries covered the whole spectrum, from the good and bad to the downright ugly. However, as I reviewed the code coming in, the one thing that stood out was a refreshingly high level of creativity and innovation in the approaches to data compression. There were virtually no "me too" rehashed versions of familiar programs such as UNIX Compress or LHarc. Instead, programmers who read DDJ seemed to be eager to try out new algorithms and techniques, frequently with good results. This provides an interesting counterpoint to those who tell us that to produce good software today you need $1,000,000 budgets and hordes of interchangeable programmers. Maybe it doesn't have to be that way.

The Cream of the Crop

After running all the submitted programs through the DDJ compression test suite, a handful distinguished themselves as exceptional contenders for the DDJ compression championship:

The Results

The final test suite for the contest consisted of approximately 2 Mbytes each of text, graphics, and executable files. Individual file sizes ranged from as little as 5 Kbytes up to 700 Kbytes. I judged programs based on their ability to compress each of the three different types of files, as well as overall compression ability. We also judged speed of compression and expansion.

It took literally weeks of continuous testing on the test machine (a 25-MHz 80386 PC from Everex Systems that had an 160-Mbyte hard disk and 8 Mbytes of RAM), but after a few false starts, I was finally able to complete testing and begin tabulation of the results. The final results confirmed what I had suspected during testing: The championship was going to be decided between two programs that were as different as night and day.

When I first began testing Charles Ashford's COMPRESS program, I knew it was going to be a problem. Charles's program had two characteristics that made it stand out: First, it compressed files significantly better than any other program submitted, or for that matter, any other program I have seen. Second, it was by far the slowest compression program submitted; in fact it was three orders of magnitude slower than the fastest entrant! Testing the Ashford code took almost four full days on a 25-MHz 386 machine. However, the contest didn't place any restrictions on the time a program could take, so this was all perfectly legal.

The second stand-out program was Gene Olson's COMPACT. This program was at the complete opposite end of the spectrum. COMPACT is written entirely in 32-bit C for UNIX targets, and by MS-DOS standards is quite a resource hog in its own right. At a minimum, COMPACT requires 1 Mbyte of RAM for internal table space. Gene's program makes good use of system resources for managing I/O, however, as his compression and expansion speeds dwarf any of the competition. Gene has an advantage in working on a platform that implements overlapping I/O, and he makes the most of it.

The source code for the programs discussed here is available electronically; see "Availability," page 3. In addition, I've included a complete listing of the test results in both text and Lotus formats.

The Judge's Decision

Tables 1 and 2 show the final results in all of the categories judged. As you can see, the two stand-out programs dominate their respective categories, but fail to make even an appearance where they are weak. A few other programs, such as Gage's SIXPACK, show up consistently high in the standings, but are generally not able to defeat the Ashford code.

Table 1: Compression ratio results

       Overall          Graphics          Text             Executables
---------------------------------------------------------------------------

  1st  Ashford   37.5%  Koistinen  38.6%  Ashford   27.1%  Ashford    45.7%
  2nd  Koistinen 40.5%  Ashford    29.7%  Gage      31.1%  Gage       48.0%
  3rd  Gage      41.8%  S. Heller  41.8%  Koistinen 33.2%  Koistinen  48.6%
  4th  Ehlert    43.8%  T. Isaac  42.61%  Ehlert    34.3%  Ehlert     49.5%
  5th  E. Hatton 47.1%  Gage       47.7%  E. Hatton 34.3%  E. Hatton  56.1%

NOTE: Ratios are expressed as 100 * (compressed size / original size)

Table 2: Compression speed results

       Compression                 Expansion
--------------------------------------------------------------

  1st  Olson     85,613 bytes/sec  Olson     131,968 bytes/sec
  2nd  U. Turku  67,003 bytes/sec  U. Turku   90,575 bytes/sec
  3rd  Ehlert    15,564 bytes/sec  A. Tam     63,668 bytes/sec
  4th  S. Boyd   14,962 bytes/sec  Ehlert     43,341 bytes/sec
  5th  C. Wong   11,862 bytes/sec  C. Wong    27,389 bytes/sec

As neither program is able to claim a decisive victory, the judge's final decision is that Gene Olson and Charles Ashford will share the Grand Prize. I hope that Gene and Charles combine their talents in the future, as between the two of them they have an unbeatable program.

How do Well-Known Compression Programs Stack Up?

During the course of testing the submissions to the DDJ compression contest, I also ran some well-known freeware and shareware programs through the test suite to see how well they performed. In every case, I used the default options to add files to an archive, and I took the archiving program's word for how many bytes it used to create the compressed file. The programs tested were:

As the results listed in Table 3 show, PKWare may be losing its edge as the data compression leader in the PC world. ARJ is a relative newcomer, but it appears that Robert Jung is coming close in his efforts to claim the number one position in the compression contest. We will have to wait to see if a new release of PKZIP is launched any time soon.

Table 3: Off-the-shelf compression program results

  Compression Speed          Expansion Speed            Compression Ratio
-------------------------------------------------------------------------
  PKZIP     13987 bytes/sec  PKZIP     39745 bytes/sec  ARJ       41.4%
  ARJ       13623 bytes/sec  ARJ       34652 bytes/sec  LHA       41.8%
  LHA       12168 bytes/sec  LHA       23090 bytes/sec  PKZIP     43.8%
  PAK        9452 bytes/sec  PAK       20979 bytes/sec  PAK       45.0%
  COMPRESS   6132 bytes/sec  COMPRESS   8516 bytes/sec  COMPRESS  52.6%

--M.N.


Copyright © 1991, Dr. Dobb's Journal