What I like best about writing this column is hearing from many readers who have suggestions, tips, and sample code to contribute. Over the past several months, I've collected a grab bag of algorithms and comments passed along by "Algorithm Alley" readers. So, this month, it's time to clean house (or disk, that is) and share some mail I've received on past columns.
In November's column on palindrome text encryption, I questioned the value of an encryption algorithm for which there's no recovery method. Jeff Pipkins reminded me that such algorithms may be put to good use after all. He writes:
An irreversible encryption method can be used to encrypt passwords before storing them in a file. It's mathematically impossible to figure out the password, given only the ciphertext. You merely encrypt passwords entered by users and compare the results against the encrypted words stored in the file. Since the algorithm that checks a password for validity always has the encrypted passwords, it's never necessary to decode the text.
There's a flaw in this logic, however. Since the encryption isn't reversible, the encryption function is not one-to-one. That means there's necessarily more than one password that will produce identical encryptions. A hacker doesn't have to figure out a "real" password; it's only necessary to determine one of the many "valid" entries for any given account.
My favorite palindromes, by the way, are "Flee to me, remote elf" and "A man, a plan, a canal--Panama!" I'm an adjunct to a local college where I sometimes teach a course in C. One of my assignments is to write a function that detects palindromes. Before the assignment is complete, some of the students develop aibohphobia, "an irrational fear of palindromes."
Mort Bernstein, who wrote on that same topic, offered this explanation of why a decryption function was unlikely to be found:
I was fascinated by your description of palindrome encryption, particularly the fact that you doubted the existence of an equivalent decryption method. So, I wrote a small C program (see Listing One, PALTEST.C, page 127) to see what I could learn. The program's output explains why you can't decrypt the text. In the test, the encryption function maps 676 possible inputs (Aa through Zz) onto 51 outputs. The second table from the output is the frequency of each of the 51 outputs. BB, for example, occurs 26 times.
To generalize on your observation that ABCD produces the encrypted result A5 A5 A5 A5, any alphabetically sequential input AB through ABC_XYZ always produces an output string in which all of the characters are the same. Thus ABCDEFG results in A8 A8 A8 A8 A8 A8 A8 and ABC_XYZ results in a string of 26 BBs. This means it is highly probable that two different inputs of the same length can result in the exact same output. That's real information loss! It would be interesting to find [the algorithm for] a set of inputs that result in the same output. Perhaps some of your readers would like to give it a try.
Dean Gienger wrote to ask whether I knew of a method for performing a binary search on a sorted database of unknown length, containing variable-size records. The question is interesting because, using the classic binary-search algorithm in which data is repeatedly divided into chunks until the target key is found, it is necessary to know how many records the database contains. Before I could come up with a possible answer, however, Dean wrote back with his own solution. If any readers know of other algorithms for open-ended binary searching, please let me know. Dean writes:
I've found an interesting problem you may wish to present to your readers. The problem is, given a sorted list L in which the number of records is unknown, I need a function get_record(index,key) that returns true if an indexed record exists, and also returns that record's key. Assume the list contains variable-length records in a disk file, meaning you can't read to the end of file to determine the
number of records it contains. FINDKEY.TXT (Listing Two, page 127) shows some sample code I developed to investigate possible answers. I need to search an ordered list stored on disk along with a sparse index (roughly a pointer to every 2000 records), so the code doesn't have to search every record sequentially.
Alexander J. Oss took me to task (rightly) for a bug in Algorithm #12, Selection Sampling, from the October 1993 column. He explains:
The problem [with the code] is that the distribution of odds is incorrect. In this case, it's a relatively minor mistake (one of those "off-by-one" errors). Try selecting three of four records. If your stopping condition is that the number to select has been selected, you will always pick the first three, the likelihood of which should be 0.25. If your stopping condition is end of file, you will always pick all four (even though you said you only wanted three). When implemented correctly, the algorithm should work properly with either stopping condition--that is, after the correct number of records has been selected, the likelihood of choosing more should be 0.
There are actually two errors in the original algorithm, and they are repaired and reprinted in Example 1. The While loop must test for two nonterminating conditions: that the number of selected records is less than the number requested, and that the end of file has not been reached. A more serious mistake occurs in the If statement. Because r is a random number between 0 and 1 (but never equal to 1), if no records have been selected, it is not possible for the next record to be skipped. Selecting nine out of ten records, therefore, always causes the first record to be selected, resulting in the same nine selections on every run. In order to select a different set of nine out of ten, the corrected program uses M+1 rather than M in the If statement control expression.
Neil Galarneau questioned the necessity of my bitmap-compression utility in the August 1993 column. He writes, "I assume that when Windows reads a compressed bitmap into memory, it decompresses the image, thus compression saves only disk space, not memory. Do you agree?"
Sorry, no. Strangely, Windows does not recognize its own bitmap-compression algorithm. For example, the Paintbrush utility cannot read a compressed bitmap, nor can it create one. Bitmap compression is the responsibility of the output device driver to which the bitmap is passed. Bitmap-compression routines are not built into Windows, and therefore, you need separate compression and decompression utilities, or a compatible driver, for reading and writing compressed .BMP files. Unfortunately, not all output device drivers handle compressed bitmaps correctly.
Waldeck Schutzer from Brazil sent the following comments and suggested repair to the BPACK program in that same article on Windows bitmaps. He writes:
I'm a fourth-year mathematics student at the University of S~ao Paulo, and my interests include data compression, digital image processing, cryptography, and correlates. Since 1988 I've been a Dr. Dobb's reader, and I enjoy its fine articles.
The length of a line in a Windows bitmap file is not even, but is a multiple of four. I tested the BPACK program with a scanned image of 382x400 and the output was damaged by the loss of two bytes at the end of each line. After making the following adjustment, the program worked correctly. In function WriteBitmapBits, replace Example 2(a) with 2(b).
The expression, (4--np&3), is equivalent to (4--np)%4, and produces the extra length value needed to complete a double word. By the way, I'm seeking documentation for the TIFF file format. Can you help?
Steve Rimmer's book, Bit-Mapped Graphics, Second Edition, (Windcrest/ McGraw-Hill, 1993) explains the TIFF (Tagged Image File Format) layout and includes sample code in C. I highly recommend the book to anyone who wants to know more about TIFF files and other graphical image formats.
That's it. My grab bag runneth empty. Send me your algorithms in care of DDJ, or send CompuServe mail to 73627,3241. By the way, I occasionally have trouble with my CompuServe connection (must have something to do with the fact that I've moved aboard a sailboat, and data communications offshore are iffy at best). If you sent a message and haven't received a reply, try again, or send mail to me at DDJ. Until someone comes up with a reliable and affordable method for mobile data communications, I'm temporarily offline. Am I the only programmer who wants to get away from it all but still have e-mail?
Copyright © 1994, Dr. Dobb's JournalAibohphobia
Loss Leader
Open-ended Searching
Sample Bug
Windows Bitmap Compression
More on Bitmaps
Your Turn
Example 1: Algorithm #12, Selection Sampling (rev. 1.01).
const
M=1000; { Input records }
N=128; { Subset (N<=M) }
var
requested,
examined,
selected: Integer;
r: Real;
begin
requested<-N;
examined<-0;
selected<-0;
while (selected<requested) and
(not EOF) do
begin
examined<-examined+1;
r<-Random;
if (M+1--examined)*r
>=(requested--selected)
then skip next input record
else begin
selected<-selected+1;
use next input record
end
end
end.Example 2: Correcting Windows bitmap compression.
(a)
if (Odd(np))
slSize=np+1;
else
slSize=np;
(b)
slSize=np+(4--np&3);
[LISTING ONE]
/**************************************************************************
This program computes all possible inputs to a Palindromic encoder and the
frequency of all of the possible encrypted results. (C) 1993 Mort Bernstein
**************************************************************************/
#include <stdio.h>
int i, j, k = 0;
unsigned char U[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
unsigned char L[] = "abcdefghijklmnopqrstuvwxyz";
unsigned char F[256], t;
void main(void)
{
for(i = 0; i < 256; i++) F[i] = 0;
printf(" ");
for(j = 0; j < 26; j++) printf(" %c", L[j]);
printf("\n");
for(i = 0; i < 26; i++)
{
printf("%c", U[i]);
for(j = 0; j < 26; j++)
{
t = (U[i] + L[j]);
F[(int)t] += 1;
printf(" %02X", (unsigned char)t);
}
printf("\n");
}
printf("\n");
k = 0;
for(i = 0; i < 256; i++)
{
if(F[i] == 0) continue;
printf("[%02X]%3u", i, F[i]);
if(++k > 9)
{
printf("\n");
k = 0;
}
else printf(" ");
}
}
[LISTING TWO]
!===========================================================================
! Find key record in sorted list of unknown length.
! Dean Gienger, 1993
!===========================================================================
integer FIND_KEY(the_key)
!===========================================================================
! Perform a modified binary search (since we don't know where end of list is!)
! Using step values of 1,2,4,8,16,... until we pass the value or pass the end
! of the list. When we pass value or get to end of list, start cutting the step
! in half. Return 0 if key not found, else place in list where key is found.
!===========================================================================
integer lo,hi,step,maximum_hi
lo = 1
hi = 1
step = 1
maximum_hi = MAX_INT
! Get first record and bail out if there isn't one
record = GET_RECORD(lo)
IF record.number = 0 THEN RETURN 0
! If desired key is before first message, bail out
IF the_key < record.key THEN RETURN 0
LOOP
! fetch record [hi]
record = GET_RECORD(hi)
! double the step
step = step+step
! if there is no such message or we passed the desired key, step back
IF (record.number = 0) OR (the_key < record.key) THEN
step = step/4
IF step = 0 THEN RETURN 0 ! either we found it or it's beyond eof
maximum_hi = hi
hi = lo
ENDIF
! choose next probe point (hi)
lo = hi
hi = hi + step
! don't go beyond the highest reasonable boundary known to date
IF hi > maximum_hi THEN
hi = maximum_hi
! cut step back to reasonable value
LOOP
WHILE (lo+step) > hi
step = step/2
ENDLOOP
ENDIF
ENDLOOP
RETURN lo
END PROCEDURE