ALGORITHM ALLEY

Algorithm Grab Bag

I was fascinated by your description of palindrome encryption, particularly the fact that you doubted the existence of an equivalent decryption method. So, I wrote a small C program (see Listing One, PALTEST.C, page 127) to see what I could learn. The program's output explains why you can't decrypt the text. In the test, the encryption function maps 676 possible inputs (Aa through Zz) onto 51 outputs. The second table from the output is the frequency of each of the 51 outputs. BB, for example, occurs 26 times.

To generalize on your observation that ABCD produces the encrypted result A5 A5 A5 A5, any alphabetically sequential input AB through ABC_XYZ always produces an output string in which all of the characters are the same. Thus ABCDEFG results in A8 A8 A8 A8 A8 A8 A8 and ABC_XYZ results in a string of 26 BBs. This means it is highly probable that two different inputs of the same length can result in the exact same output. That's real information loss! It would be interesting to find [the algorithm for] a set of inputs that result in the same output. Perhaps some of your readers would like to give it a try.

Open-ended Searching

Dean Gienger wrote to ask whether I knew of a method for performing a binary search on a sorted database of unknown length, containing variable-size records. The question is interesting because, using the classic binary-search algorithm in which data is repeatedly divided into chunks until the target key is found, it is necessary to know how many records the database contains. Before I could come up with a possible answer, however, Dean wrote back with his own solution. If any readers know of other algorithms for open-ended binary searching, please let me know. Dean writes:

I've found an interesting problem you may wish to present to your readers. The problem is, given a sorted list L in which the number of records is unknown, I need a function get_record(index,key) that returns true if an indexed record exists, and also returns that record's key. Assume the list contains variable-length records in a disk file, meaning you can't read to the end of file to determine the

number of records it contains. FINDKEY.TXT (Listing Two, page 127) shows some sample code I developed to investigate possible answers. I need to search an ordered list stored on disk along with a sparse index (roughly a pointer to every 2000 records), so the code doesn't have to search every record sequentially.

Sample Bug

Alexander J. Oss took me to task (rightly) for a bug in Algorithm #12, Selection Sampling, from the October 1993 column. He explains:

The problem [with the code] is that the distribution of odds is incorrect. In this case, it's a relatively minor mistake (one of those "off-by-one" errors). Try selecting three of four records. If your stopping condition is that the number to select has been selected, you will always pick the first three, the likelihood of which should be 0.25. If your stopping condition is end of file, you will always pick all four (even though you said you only wanted three). When implemented correctly, the algorithm should work properly with either stopping condition--that is, after the correct number of records has been selected, the likelihood of choosing more should be 0.

There are actually two errors in the original algorithm, and they are repaired and reprinted in Example 1. The While loop must test for two nonterminating conditions: that the number of selected records is less than the number requested, and that the end of file has not been reached. A more serious mistake occurs in the If statement. Because r is a random number between 0 and 1 (but never equal to 1), if no records have been selected, it is not possible for the next record to be skipped. Selecting nine out of ten records, therefore, always causes the first record to be selected, resulting in the same nine selections on every run. In order to select a different set of nine out of ten, the corrected program uses M+1 rather than M in the If statement control expression.

Windows Bitmap Compression

Neil Galarneau questioned the necessity of my bitmap-compression utility in the August 1993 column. He writes, "I assume that when Windows reads a compressed bitmap into memory, it decompresses the image, thus compression saves only disk space, not memory. Do you agree?"

Sorry, no. Strangely, Windows does not recognize its own bitmap-compression algorithm. For example, the Paintbrush utility cannot read a compressed bitmap, nor can it create one. Bitmap compression is the responsibility of the output device driver to which the bitmap is passed. Bitmap-compression routines are not built into Windows, and therefore, you need separate compression and decompression utilities, or a compatible driver, for reading and writing compressed .BMP files. Unfortunately, not all output device drivers handle compressed bitmaps correctly.

More on Bitmaps

Waldeck Schutzer from Brazil sent the following comments and suggested repair to the BPACK program in that same article on Windows bitmaps. He writes:

I'm a fourth-year mathematics student at the University of S~ao Paulo, and my interests include data compression, digital image processing, cryptography, and correlates. Since 1988 I've been a Dr. Dobb's reader, and I enjoy its fine articles.

The length of a line in a Windows bitmap file is not even, but is a multiple of four. I tested the BPACK program with a scanned image of 382x400 and the output was damaged by the loss of two bytes at the end of each line. After making the following adjustment, the program worked correctly. In function WriteBitmapBits, replace Example 2(a) with 2(b).

The expression, (4--np&3), is equivalent to (4--np)%4, and produces the extra length value needed to complete a double word. By the way, I'm seeking documentation for the TIFF file format. Can you help?

Steve Rimmer's book, Bit-Mapped Graphics, Second Edition, (Windcrest/

McGraw-Hill, 1993) explains the TIFF (Tagged Image File Format) layout and includes sample code in C. I highly recommend the book to anyone who wants to know more about TIFF files and other graphical image formats.

Your Turn

That's it. My grab bag runneth empty. Send me your algorithms in care of DDJ, or send CompuServe mail to 73627,3241. By the way, I occasionally have trouble with my CompuServe connection (must have something to do with the fact that I've moved aboard a sailboat, and data communications offshore are iffy at best). If you sent a message and haven't received a reply, try again, or send mail to me at DDJ. Until someone comes up with a reliable and affordable method for mobile data communications, I'm temporarily offline. Am I the only programmer who wants to get away from it all but still have e-mail?

Example 1: Algorithm #12, Selection Sampling (rev. 1.01).

const
  M=1000;  { Input records }
  N=128;   { Subset (N<=M) }
var
  requested,
  examined,
  selected: Integer;
  r: Real;
begin
  requested<-N;
  examined<-0;
  selected<-0;
  while (selected<requested) and
(not EOF) do
  begin
    examined<-examined+1;
    r<-Random;
    if (M+1--examined)*r
      >=(requested--selected)
      then skip next input record
    else begin
      selected<-selected+1;
      use next input record
    end
  end
end.

Example 2: Correcting Windows bitmap compression.

(a)
     if (Odd(np))
       slSize=np+1;
     else
       slSize=np;
(b)
     slSize=np+(4--np&3);

[LISTING ONE]


/**************************************************************************
This program computes all possible inputs to a Palindromic encoder and the
frequency of all of the possible encrypted results. (C) 1993 Mort Bernstein
**************************************************************************/
#include <stdio.h>

int i, j, k = 0;
unsigned char U[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
unsigned char L[] = "abcdefghijklmnopqrstuvwxyz";
unsigned char F[256], t;

void main(void)
{
   for(i = 0; i < 256; i++) F[i] = 0;
   printf(" ");
   for(j = 0; j < 26; j++) printf("  %c", L[j]);
   printf("\n");
   for(i = 0; i < 26; i++)
   {
      printf("%c", U[i]);
      for(j = 0; j < 26; j++)
      {
         t = (U[i] + L[j]);
         F[(int)t] += 1;
         printf(" %02X", (unsigned char)t);
      }
      printf("\n");
   }
   printf("\n");
   k = 0;
   for(i = 0; i < 256; i++)
   {
      if(F[i] == 0) continue;
      printf("[%02X]%3u", i, F[i]);
      if(++k > 9)
      {
         printf("\n");
         k = 0;
      }
      else printf(" ");
   }
}

[LISTING TWO]



!===========================================================================
! Find key record in sorted list of unknown length.
! Dean Gienger, 1993
!===========================================================================

integer FIND_KEY(the_key)
!===========================================================================
! Perform a modified binary search (since we don't know where end of list is!)
! Using step values of 1,2,4,8,16,... until we pass the value or pass the end
! of the list. When we pass value or get to end of list, start cutting the step
! in half. Return 0 if key not found, else place in list where key is found.
!===========================================================================

integer lo,hi,step,maximum_hi

lo = 1
hi = 1
step = 1
maximum_hi = MAX_INT

! Get first record and bail out if there isn't one
record = GET_RECORD(lo)
IF record.number = 0 THEN RETURN 0

! If desired key is before first message, bail out
IF the_key < record.key THEN RETURN 0

LOOP
   ! fetch record [hi]
   record = GET_RECORD(hi)
   ! double the step
   step = step+step
   ! if there is no such message or we passed the desired key, step back
   IF (record.number = 0) OR (the_key < record.key) THEN
      step = step/4
      IF step = 0 THEN RETURN 0 ! either we found it or it's beyond eof
      maximum_hi = hi
      hi = lo
      ENDIF
   ! choose next probe point (hi)
   lo = hi
   hi = hi + step
   ! don't go beyond the highest reasonable boundary known to date
   IF hi > maximum_hi THEN
      hi = maximum_hi
      ! cut step back to reasonable value
      LOOP
         WHILE (lo+step) > hi
         step = step/2
         ENDLOOP
      ENDIF
   ENDLOOP
RETURN lo
END PROCEDURE