February 1999/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@mfi.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Dear CUJ,

I was disconcerted with reading Lukas Knutsson's article on generated optimized GIFs from DIBs (December 1998) and seeing no mention of the legal issues surrounding GIF.

Unisys holds a patent on the LZW algorithm, and use of this algorithm requires a license from Unisys. From Unisys' own documents at http://corp2.unisys.com/LeadStory/lzwfaq.html, it appears that even using software for personal use is infringing on the patent. Another analysis of this problem done by the Free Software Foundation can be found on the web at http://www.gnu.org/philosophy/gif.html.

Considering the amount of legal problems surrounding GIF, I was surprised that the format was even selected as the subject of the article. Most everything discussed in the article can just as easily apply to another, patent free, World Wide Web Consortium recommended, better-designed file format known as the Portable Network Graphics (PNG) format. More information can be found on the PNG Home Page at http://www.cdrom.com/pub/png/ and at W3C website at http://www.w3.org/Graphics/PNG/.

Thank you,

Mike Castle
Milwaukee, Wisconsin

Yes, other methods exist for compressing images, but GIF is widely used and hence of widespread interest. I know that use of the GIF format is actively encumbered by patents, but I'm not worried that the Patent Police will roust our readers out of their dens for tinkering with the code. — pjp

Dear PJP,

In October's "We Have Mail," Kishore Meduri wrote a letter lamenting the lack of C++ updates to the switch statement. Since I assume that other readers may also share his view, I thought I'd try to show why I support your assertion that the C++ standards committee was wise to leave this statement alone.

Mr. Meduri expressed a desire for use of any data type (presumably with apppropriate comparison operators) to be used in a switch statement:
switch(str1)
{
    case str2:
        // Do something...
        break;
    case str3:
        // Do something else...
        break;
}
rather than
if(str1 == str2)
    // Do Something...
else if(str1 == str3)
    // Do something else...
However, applying the limitations originally imposed on the case values in C which were brought forward to C++ (i.e., replacing the strings above with literals or constants), the second method does not necessarily generate the same code as the first! Modern compilers can take many liberties with these values to generate more efficient code. By looking at the resulting assembly code, it becomes immediately obvious that the compiler does not use this obvious, but inefficient method [if-else chains].

Since I'm not a compiler expert, and I didn't attempt to disassemble the code completely (besides, your compiler will handle things differently than mine), I can't tell you the specific methods employed. However, it's obvious that a compiler could use any of a huge variety of methods to reduce the number of actual comparisons. Whether that involves sequentially ordering the case values to divide and conquer, a hash table, or something completely different, the bottom line is that there is enough information at compile time for the compiler to take such liberties.

Run-time comparisons obviously do not afford this luxury, so the compiler has no choice but to generate the most inefficient code possible. In fact, I would venture that the C++ standards committee discussed this and came to a similar conclusion as occurred to me: if there is any manner of making the switch more efficient at run time, only the programmer has this information. Therefore, by disallowing the use of the switch statement for such situations, the programmer is encouraged to use this information to the benefit of the program.

As an example, if the desire was a switch statement to identify a keyword as part of a parser, and there were 100 keywords, the compiler would be forced to generate code that would average 50 comparisons for each execution. However, the programmer presumably knows some inherent constraints that can be used. For instance, it may be known that the strings all begin with lower-case letters. Even if no more is known, this immediately allows dividing the keywords into 26 groups that a single operation (even a switch on the first letter) could handle. Assuming an average of four keywords per initial letter, and assuming that the subsequent comparison is brute force as described above, the average number of operations/comparisons immediately goes from 50 to three (one switch/hash plus two comparisons)!

I hope this shows Mr. Meduri and others that while making such a change in the switch statement might appear good on the surface, it would have been a great deal of work for basically no gain. I think this was a good "no-call" by the C++ standards committee.

Rick Tillery
Senior Software Engineer
Texas Instruments, Inc.
Dallas, TX

I think the main reason the C++ committee didn't address this issue is that nobody submitted a proposal on the topic. That's the rationale behind most things that didn't happen. In any event, I'm quite content that C++ didn't tinker much with the control flow machinery inherited from C. — pjp

Dear CUJ:

I read with great interest Warren Ward's fantastic article on Stream Encryption and have already incorporated his Cryptor class into a project. Talk about Great Timing!

I have two questions about encryption. First, am I correct in my assumption that by limiting the key submitted to the Transform_String method to eight characters (i.e., 64 bits) the Cryptor class could be used in a project for export outside the United States? I am not currently working on such a project, but am in line for one in the near future.

The second question is really a request. Does Mr. Ward have any articles in the works describing asynchronous or public-key encryption techniues? A simplified version of public-key encryption would be a great help to those of us that send messages across the Internet as part of our code.

In any case, thanks again for a great, and very useful article.

Respectfully,

Wayne W. Shearer
TELEDAT SYSTEMS

Warren Ward replies:

Dear Wayne:

The C/C++ Users Journal forwarded your letter to me, and I'd like to address your questions as best I can.

1) Key Length and Cryptographic Export

The export of cryptographic technology is covered by the ITAR (the International Traffic in Arms Regulations). In brief, any software containing strong cryptography must be approved by the State Department before export. The laws are in flux, and the administration seems to be warming to the idea of relaxing the regulations for some product families, such as health software and e-commerce products. Some bills have recently been proposed to allow broader export of strong encryption. (For example, Rep. Bob Goodlatte has authored one that did not get through Congress in the recent session.) But none has yet been approved.

In previous years, the rules seemed simple: "short key good, long key bad." Now they are much more complex, and changing rapidly. To get a clear idea of whether software from the project you mentioned could be legally exported, I'm afraid you will need information from a lawyer or legal expert who is currently working in this area.

2) Future Articles on Public-Key Encryption

Public-key encryption has long been encumbered with US patents held by Public Key Partners (PKP), which we conservatively assume to cover any form of public-key cryptography in the US. The grandparent of these patents, the Diffie-Hellman patent, expired last year. Some public-key encryption techniques not covered by other patents (such as the ElGamal approach) may now be open for use.

I haven't considered developing a public-key Cryptor yet, mostly out of concern for patents and other intellectual-property issues. Moreover, public-key systems require much more than the basic encryption engine to work well. Key exchange protocols, digital signatures, and several other complex behaviors must all be covered before you can use a public-key system properly.

I don't think all the detail needed for a successful public-key system could be compressed into a single article. On the other hand, I believe that serious developers who plan to offer Internet services in their software need to be aware of the basic public-key protocols. These protocols support safe commerce and secure data exchange, and there is no way to avoid them — no other approaches allow secure communication over a public network.

I'd like to offer two suggestions. First, you should search the Web for current information on PKI (Public-Key Infrastructure) efforts. (A massive links page can be found at http://www.excelmail.com/.) These PKI products and services reflect the real use of public-key methods, and let you see how complex the protocols and administration become when they're used seriously. Second, you may want to buy a commercial encryption library for public-key services. In addition to the encryption algorithms, the libraries generally support all the necessary public-key infrastructure, including digital identification certificates, secure key exchange, message digests, digital signatures, and so on. Without all those extra services, the encryption code alone is not very useful.

I'm sorry I couldn't answer your questions more constructively — I know it's annoying to have a simple question you can't get a straight answer to. Unfortunately, the legalities of cryptographic export are complicated and evolving even as I write, which frustrates all of us involved in developing secure products. I suspect you'll agree with me after reviewing PKI that it's a lot easier to ride someone else's coat-tails in putting together public-key software than it is to roll your own. I wish you the best of luck in your development projects, and thank you for your kind words about the article.

Warren Ward
vpresearch@winterforce.com
(403) 452-1328

Dear CUJ:

The article "Skip Lists in C++" by Bill Whitney, in the November 1998 CUJ contains a number of problems.

First, the statement "In my SkipList class, new memory for the key will be allocated in the SkipNode object, but productData's memory will not be copied (so you should not delete that memory after calling insert)." This appears in the "Inserting Objects into the SkipList" section of the article. It is incorrect. The treatment of the key and the object are identical — neither is allocated within the SkipList, so neither can be deleted after calling insert. In fact, after calling insert, the memory for the key and object are both owned by the SkipList, and should never be deleted by the user. Furthermore, both the key and object must be in heap allocated memory, since they will be deleted by the SkipList implementation.

Second, there are several memory leaks in the implementation. Local variable updateVec in routines insert, remove, and retrieve points to allocated memory which is never deleted. The statement delete [] updateVec; should be inserted before each return statement in each of these routines. Also, the member randGen of the SkipList class is not deleted by the destructor, and should be. Insert delete randGen; as the last statement of the destructor for SkipList.

Finally, it's possible for the find code to be moved into a private or protected routine that would then be called by insert, remove, and retrieve. (This code is almost identical in the three routines — for some reason the for loop in remove was written slightly differently, but the two versions are equivalent.) I'd suggest doing it, for efficiency. (If the three copies are left as is, it's possible to remove all uses of updateVec from the retrieve routine, since it is set up there, but never used. By writing the find routine to make this computation optional, you can still get this advantage with common code, however.)

My find routine would be as follows (declared as protected in the class declaration):
template <class Key, class Obj>
SkipNode<Key,Obj> *
SkipList<Key,Obj>::find(
    Key *k,
    SkipNode<Key,Obj>**updateVec)
{
  SkipNode<Key,Obj>*tmp = head;
  Key * cmpKey;
     
  // Find the key and return the node
  for (h = curHeight; h >= 1; h-)
  {
    cmpKey = tmp->fwdNodes[h]->getKey();
     
    while (*cmpKey < *k)
    {
      tmp = tmp->fndNodes[h];
      cmpKey = tmp->fwdNodes[h]->getKey();
    }
     
    if (updateVec)
      updateVec[h] = tmp;
  }
     
  return tmp->fwdNodes[1];
}
and the lines SkipNode<Key,Obj>*tmp = head; through cmpKey = tmp->getKey(); in the insert and remove routines would be replaced by:
SkipNode<Key,Obj>*tmp = find(k, updateVec);
Key *cmpKey = tmp->getKey();
In the retrieve routine, these same lines would be replaced by:
SkipNode<Key,Obj>*tmp = find(k, 0);
Key *cmpKey = tmp->getKey();
and the declaration and allocation (and deletions, if you've made the correction suggested above) of updateVec would be removed from that routine.

Frederick N. Webb
179 Harvard Rd.
Littleton, MA 01460
fnwebb@pobox.com

Thanks for your suggestions. — mb

Dear CUJ,

First, I would like to comment on how annoying it is that CUJ doesn't keep a single domain for all email, ftp, and web accesses. Right now I counted four different domains that you guys use, mfi.com, neodata.com, compuserve.com, and cuj.com. All these should be compressed down to a single cuj.com domain, which isn't even funny on how easy it is to get up.

The fix for the ftp site is to use a simple CNAME to point to ftp.mfi.com. For mail, it is a bit more involved. You need to set up a site to accept mail for cuj.com and then have a simple forwarding list for each of the addresses in the cuj.com domain. Sendmail supports this with very little modification on the admin part. If the admin looks at you strangely when you suggest this, get a new admin.

Now for the real reason that I am writing this letter.

The article in Vol. 16, No. 11 on skip lists (Bill Whitney, "Skip Lists in C++," CUJ, November 1998.) is seriously lacking some very important features. First off, the skip list node should not contain separate Key and Obj pointers. The Obj class should be a descendant of a common class that contains pure virtual functions for key comparisons. This way you don't disassociate your key from your object. If you really want to disassociate an object's key for itself, then you should encapsulate it. This is just more problems with the design than anything else.

Now we actually evaluate the memory required to use this library. Sure, you will save space on code size, but this is actually minimal compared to the bloat the skip lists take up. Each node will only contain a single object. In addition, it will also require sizeof (void *) * height bytes to store all the required pointers for the skip lists. This is terribly inefficient when compared with B-Trees (multiple keys per node.) This is really significant when you use it to store large numbers of objects.

I know this from personal experience as I have written a C library set to handle both skip lists and B-Trees. I did a performance/memory comparison between the two, and B-Trees were significantly smaller (by about 4-5 times), which means that when you exhausted available memory (and forced the operating system to swap), skip lists hit swap much sooner than B-Trees, which drasticly hurt performance for large data sets. This is contrary to Michael D. Black's claim ("Skip Lists vs. B-Trees," Michael D. Black, http://www.csihq.com/analysis/skiplist.html.)

With my personal code, the B-Trees use up half the memory that skip lists do to store one million integer keys. The times for insert and delete are about 50% and 17% slower for skip lists than for B-Trees respectively. The time to search was about equal, which isn't that surprising.

Then of course the article doesn't even touch on the fact that the one main advantage of skip lists is the fact that they are easier to make multi-thread safe then B-Trees. With B-Trees, you have to make sure that you lock a node, and that you only traverse the tree going down to be safe from race conditions. Skip lists are a different ball game. There is almost no locking required (actually I don't think any major locking is required, it's been a while since I last dealt with this topic) to make skip lists multithreaded capable.

Of course, this doesn't touch on the fact that you should use a library that has been provided for you, or you have already written and completely tested before hand. Eliminating the requirement that the code be easy to implement. It simply needs to be able to work, and quickly at that.

John-Mark Gurney
Cu Networking

I'll take your domain-name suggestions up with our system administrator. I think, though, there may be a more reasonable solution than firing the poor guy if he happens to look at me strangely. I agree that Whitney's article would have been more complete if it had offered some guidance in evaluating skip lists vs. B-trees and other key-based storage schemes. But that was not the primary purpose of the article; the primary purpose was to acquaint readers with skip lists and how they work. It would sure be nice if there was one right way to design a skip list, or a one-size-fits-all library that eliminated our need to understand how things worked. But there is no such thing, and I doubt that there ever will be. Thanks for writing. — mb