v14, i06: Testing the Reverse Options

Sidebar

jun2005.tar

Testing the Reverse Options

Out of curiosity, we compared the performance of the various techniques used to reverse a file. The results were interesting and sometimes unexpected.

We created seven data files. The file names and the number of lines in each file were as follows:

1000000 enormous
100000 huge
10000 large
1000 medium
100 small
10 tiny
1 trivial

The data files contained the sentence "A rat in the house may eat the ice cream." repeated multiple times, once per line, along with a line number at the beginning of each line.

We created a test script for each of the different ways to reverse a file. For example, the following script, called test_awk, tested the awk commmand:

#!/bin/ksh

cd $(dirname $0)/../files
for file in $(ls -r)
do
   time awk '{ a[NR]=$0 } END { for(i=NR; i; --i) print a[i] } ' \
     $file > /dev/null
done

The test scripts and the scripts used to create the data files are located in the tarball (see http://www.sysadminmag.com/code/).

We executed the scripts on Solaris 7 running on an Intel AD450NX server with 4 x 500 MHz processors, 2 GB memory, and minimal system activity. Table 1 shows the average "real time" values reported by the time command for each script tested against the seven data files.

The script using shell arrays failed on the large, huge, and enormous files with the error "subscript out of range". We tired of waiting for the sed scripts to complete with the huge and enormous files. Similarly, the vi script seemed destined to run forever while processing the enormous file.

Of the scripts that could handle all seven data files, tail was by far the fastest, processing the enormous file in 2.6 seconds. Perl handled the enormous file in 5.2 seconds; tac took 18 seconds; the nl technique finished in 43 seconds; and our cool shell script that uses dynamic variables finally finished processing the enormous file after 223 seconds. In this case, cool doesn't equate to fast.