February 2000/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@mfi.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Dear CUJ,

In Mr. Plauger's column, A Better Sort (CUJ, October 1999), I couldn't help wondering about the time it would take to call the little rotate function on an array of large elements. The trouble is, the memmove in that function is called n times, where n is the result of size/BUF_SIZE rounded up. If the area to move is large, this could be overly penalizing.

The problem is that rotate needs to store the last (rightmost) element at the front. So you need some temporary store for that last element which gets overwritten in the move. rotate does it by pretending that the element size is BUF_SIZE (or less), then rotating n times.

I have two other solutions. The first stores the last element on the program stack, in pieces, by using recursion. The second uses repeated reversal, a technique I saw for the first time, if I remember correctly, in Software Tools in Pascal, a book which Mr. Plauger wrote with B. W. Kernighan, where the discussion was about moving lines of text in an editor using a minimal amount of intermediate memory.

The stack storage version works as follows: copy part of the last element into a local (stack) buffer at each level of recursion, until it is (in pieces) completely copied to the stack. Once copied, perform the shuffling of the rest of the table to its final position (one call to memmove), then start copying back the bits of the element saved. At each function return, another chunk is copied back. The function is split into two: rotate simply calls the rotate_r with the initial parameters set up.
static void
rotate_r(char *qb, char *qe,
   size_t size, size_t stored)
   {
   /* buffer chunk (local) */
   char buf[BUF_SIZE]; 

   /* how much remains to buffer */
   size_t left = size - stored;    

   /* how much to put in buf */
   size_t sbuf = left < sizeof buf ?
         left : sizeof buf;   
   if (sbuf < left)
      {

      /* put aside chunk from end */
      memcpy(buf, qe + stored, sbuf); 

      /* do the rest */
      rotate_r(qb, qe, size,
         stored + sbuf);  
                
      /* copy chunk to beginning */
      memcpy(qb + stored, buf, sbuf);
      }
   else /* sbuf == left */
      {

      /* put aside chunk from end */
      memcpy(buf, qe + stored, sbuf); 

      /* shuffle up by size */
      memmove(qb + size, qb,
         qe - qb - size); 

      /* copy chunk to beginning */
      memcpy(qb + stored, buf, sbuf);
      }
   }

static void
rotate(char *qb, char *qe,
   size_t size)
   {
   /* right rotate [*qb, *qe]
      one place */
   rotate_r(qb, qe, size, 0);
   }
The reversing version (which uses the swap routine for elements) works as follows: to rotate the last element to the front, reverse the order of all elements apart from the last. The result is the reverse of the desired array, so now reverse the whole lot. I've added the reverse function to help. (I like this stack storage technique: I've used it in the past to read arbitrarily long text lines into buffers which are allocated once each line has been read completely and its length established. However, I've never seen anybody else use it.)
static void
reverse(char *qb, char *qe,
   size_t size)
   {
   /* reverse the order of elements
      in a (sub)array */
   while (qb < qe)
      {
      swap(qb, qe, size);
      qb += size;
      qe -= size;
      }
   }

static void
rotate(char *qb, char *qe,
   size_t size)
   {
   /* right rotate [*qb, *qe]
      one place */
   reverse(qb, qe - size, size);
   reverse(qb, qe, size);
   }
Both Mr. Plauger's and the recursive rotates allow arbitrarily sized rotations: they can let you rotate more than one element from the end to the front. It is fairly simple to generalize the reversing rotate to do the same: reverse the first part of the array, reverse the second part of the array, then reverse the whole array. (That was what was in the book.) Note that pretty well every element in the first part of the array gets moved (swapped, involving two copies) twice; the stack storage method only copies one element twice, (and memmoves the others) but has the possible disadvantage of requiring an arbitrary amount of storage.

Regards,

Tony Balinski

I like the recursive move too. Never seen that trick before. Thanks for showing it to us. But I'll probably leave qsort as is, gambling that most uses will never require multiple copies. — pjp

Dear CUJ,

Many of us might be aware of the recent popularity of Perl. I believe this is mainly due to CPAN (Central Perl Archive Network) at http://www.cpan.org. CPAN has modules for almost any problem that you can think of. They have a DBI module which talks to almost any database in any platform (unlike ODBC). They have similar modules for HTTP, Telnet, Parsing etc. Of course all of them are Free and Open Source.

As the most respected and reputed source for C/C++ articles, you (CUJ) and your readers are in a unique position to make something similar happen. SFL (Standard Function Library) from www.imatix.com is a good start but it is in C and is not broken into neat independent modules. What is needed is well-tested classes with very good interfaces which can be just plugged and played with. Of course the modules will not satisfy everybody, but it will give a good quick start for all programmers. Experienced programmers can customize only the modules that are creating a problem for them. A tool like perldoc, which extracts the documentation for the module from the source code itself, would be great!

For starters, we could have your editor's code for composite traversing, in a neatly encapsulated code for handling directories and files.

I think these free modules presented in an easy-to-search location will really bring about the code reuse promise of OOP with C++.

Shivakumar Gopalakrishnan
Structural Analysis Technologies, Inc.
http://www.sat-i.com
kumar@sat-i.com

You flatter us, but what you are proposing sounds suspiciously like work! Actually, there are two sources of C/C++ source code you ought to be aware of, which are somewhat along the lines of your idea. The first is simply the source code that accompanies our articles. It is freely available from our ftp site (ftp://ftp.mfi.com/pub/cuj) or our website (www.cuj.com). If you purchase the CUJ CD-ROM (ten years' worth of articles and code — you can order it from our website), you not only receive all the source code, but you can even read the articles about it to figure out what it does.

The second source is the C/C++ Users' Group website (www.hal9k.com/cug) and CD-ROM. The CUG is in fact the entity which birthed CUJ. (Don't feel bad if you have trouble keeping this straight — so do I.) Since 1981 the CUG's mission has been to provide low-cost distribution of shareware and freeware C/C++ source code. The CUG library consists of over 400 volumes of source code covering everything from AI to Z-80.

I am not sure if what you are suggesting is radically different from what I discussed above. Having been involved with the CUG in the past, I can tell you that maintaining a repository of source code is a lot more labor-intensive than most people realize. CUG's main guru, Victor Volkman, already does an astounding job. I think we'll just stick to publishing articles. — mb

Dear CUJ,

I just wanted to express a comment regarding a letter to the editor by Giovanni Bavestrelli in the September 1999 issue. What he suggested [a redefine keyword for C++ to signify intent to override a virtual function] is already part of the Object Pascal language, under the keyword override. I've found this to be a very important feature of that language (implemented by Inprise's Delphi suite), and I'd like to have it as part of Standard C++.

The Object Pascal override modifier acts exactly as you proposed. You cannot declare a virtual function on a derived class without this modifier. If you use virtual you are declaring a new virtual function that hides the base's definition:
class Base
   procedure foo ; virtual ;
class Derived
   // OK, override virtual function
   // base.foo
   procedure foo ; override ; 

   // WARNING: virtual Derived.foo
   // hides virtual Base.foo
   procedure foo ; virtual ;       

   // WARNING: Derived.foo hides
   // virtual Base.foo
   procedure foo ; 

   // Derived. foo hides virtual
   // function Base.foo, but that's
   // what the programmer intended
   procedure foo ; reintroduce ;
The reintroduce modifier tells the compiler that we are deliberately hiding a base class. It simply disables the warning.

Now, if you specify override you must match exactly the base function signature (including arguments and return types). It may be a bit forcing to insist on matching return types, however. Some compilers handle correctly the case of a virtual function returning different types, which is specifically useful when implementing:
some_class* clone() const
some_class is the type of the class where the clone is implemented. Those compilers create stub functions which add an implicit dynamic_cast on the return type, converting from the type (some_class*) as seen by the caller and the type (some_class*) returned by the particular function implemented.

P.S: I liked Bavestrelli's article about assertions. Unlike one reader, I think the template semantics must evolve a bit more before they can successfully replace preprocessor usage. Currently, template functions cannot reproduce what a macro function can do. For example, I don't know of any portable way to hide the arguments (const char * file, int line) to some notification function the way assert does using templates. I've tried different forms with my compiler but none of them worked because __FILE__ and __LINE__ are expanded by the preprocessor and not by the template instanting procedures inside the compiler, so a construct like:
template
   <
   cost char * aFILE=__FILE__,
   int aLINE=__LINE__
   >
doesn't work.

Regards,

FLC

That Mr. Bavestrelli gets around, doesn't he? Whenever he writes in, I can almost be sure someone else will have something to say about it. Thanks for writing. — mb

Dear Mr. Plauger,

This is in response to your November 1999 article, "Standard C/C++: Frequently Answered Questions." You were asked how to recover from a failure of operator new now that it no longer is required to return zero on failure, and instead throws an exception. Your response showed two methods for adding recovery logic. There is a third method that the questioner's recovery logic could be used in. That is, to use the "new handler." The new handler is a pointer to a function prototyped as void (*new_handler)() which can be set using the set_new_handler function. Whenever operator new fails, this new handler function is called to see if it can "fix" the situation so that operator new can try to allocate the memory again. The default is to do nothing and let operator new throw a bad_alloc exception.

The benefits of using the new handler are twofold. The first is that your handler function will be called if new fails anywhere in your program — you don't have to write the same logic (i.e., logging the error) for each call of new. The more important benefit is that if you are able to do something about the failure of new (i.e., freeing memory somewhere else) then you can have new try to allocate the memory again.

Though the new handler is very specialized in its uses, I believe that it is necessary to mention it when discussing recovery logic options for the failure of operator new.

Note: I have had difficulties using set_new_handler with the Microsoft Visual C++ 6.0 compiler. They use their own function _set_new_handler and their new handler is prototyped as int (*new_handler)(size_t).

Sincerely,

Eric A. Harrs
eharrs@earthlink.net

A new handler does indeed address issues related to allocation failures, but as you observe, its use is very specialized. That's why I omitted it from my discussion of changes in the behavior of operator new. As for Visual C++, Microsoft chose to preserve a degree of backward compatibility with their existing code base, rather than track the draft C++ Standard more closely. Now that the C++ Standard has settled down, they will have to revisit this decision, I suspect. — pjp