July 1999/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@mfi.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Editor,

I have a program that uses <string> class to construct SQL statements on the fly. Performance is very important as we are creating millions of these at a time. In benchmarks I've created it appears that basic_string is quite a bit slower (greater than 10X) than using char[] and standard library functions like strcat. I suspect the problem is that memory is getting allocated/deallocated every time an assignment is performed on the string.

My question is, is there anything you can do with basic_string to specify a maximum size so that new/deletes don't get done all the time? In other words, can you declare an arbitrary size not to exceed? Or should I not be using basic_string for things like this? I've tried going to the source (MS VC++ v6.0) but I'm still in the process of learning/understanding the subtleties of the STL programming style. Your thoughts?

efisher4@csc.com

Template class basic_string does indeed have a reserve member function that preallocates storage. But the class tends to trim storage on an assignment, so it may not give you what you want, unless you replace assignment with an erase plus an insert (kind of a kludge). You can also try a vector<char>, which tends not to trim quite so often. In any case, you'll have to experiment. Good luck. — pjp

Hello,

I read your article in the April 1999 C/C++ Users Journal about Unicode files. It was a pretty good introduction, and considering how most college grads never get exposed to this material, I'm glad someone addressed the issue.

However, I did notice a problem with your description of the Unicode byte order mark. The Unicode Standard Version 2.0, Section 2.4 defines the byte order mark as 0xfeff, not 0xfffe. Stream implementations should always write out the character 0xfeff, letting the byte order on disk define the endianness of the file. So the rule is if a reader sees 0xfffe then bytes should be swapped, but seeing 0xfeff requires no byte swapping, allowing stream implementations to use memcpy for any internal buffer copying. The performance gains are generally worth the added code complexity.

Interestingly, 0xfeff is also defined as a non-breaking space character that can occur in the middle of a stream, although I hear this usage has been deprecated in version 3 of the Unicode Standard.

Brian Grunkemeyer
Software Design Engineer

That'll teach me to write code from memory, instead of looking up the details of a specification once again. Thanks for pointing out the error. — pjp

Sir,

For information on the various methods for encoding Japanese characters, you and your readers might find the following book to be of value:

Ken Lunde. Understanding Japanese Information Processing (O'Reilly, 1993), ISBN: 1-56592-043-0.

Best regards,

Jim Butler
Cimetrics Inc.
jimbutler@cimetrics.com

Thanks — pjp

Dear CUJ,

I really liked Mr. Stroustrup's article on learning C++ (CUJ, May 1999). I totally agree with his thesis. However, his example is a bit poor. Reading a string into a variable using the scanf function is horrible. Does anybody do this? I use fgets simply to avoid the buffer overrun that he mentions. I would hope that most C programmer do. My code, for his example, would be:
#include <stdio.h>
int main(int argc, char *argv[])
{
  const int max=20; /* max length */
  char name[max];
  printf
    ("Please enter your name: \n");
  fgets(name,max,STDIN);
  printf("Hello %s\n",name);
  return 0;
}
The only "problem" with this code is that the variable name may or may not contain a newline and/or a terminating null. So I guess to be totally fair, the code might be:
#include <stdio.h>
int main(int argc, char *argv[])
{
  const int max=20; /* max length */
  int name2len;
  char name[max];
  char name2[max+1];
  printf
    ("Please enter your name: \n");
  fgets(name,max,STDIN);
  strncpy(name2,name1,max);

  /* make sure there's a null */
  name2[max+1]=0x00;
  name2len = strlen(name2);
  if ('\n'==name2[name2len] {
    name2[name2len]=0x00;
  }
  printf("Hello %s\n",name2);
  return 0;
}
Again, I really liked the article. Actually, I like all the articles that I understand. I'm a hobbist, not a professional C programmer.

John McKown

Dear CUJ,

In the article "Learning Standard C++ as a New Language" By Bjarne Stroustrup, C/C++ User Journal, May 1999, it is nice to see that even the experts can make mistakes. Even though he claims that "Thanks to long experience, I didn't make any of the obvious off-by-one or allocation errors." There are still a few beginner's errors in the program in the text of the article.

1. When checking for EOF, it will always break the way the code is written.

2. The memory allocated is never freed before calling quit.

3. i will never be incremented because of a typo at the end of the while loop.

It is nice to know that even the experts are not perfect when it comes to these little typos and mistakes.

Gary Krone
GK Software
gkrone@execpc.com

Dear CUJ,

The core of Bjarne Stroustrup's article in the May issue is excellently done; but the frayed edges were more interesting.

When I was learning my third dialect of Fortran in 1957, a programmer could expect to know: the machine instructions generated by any Fortran statement, the effect of any machine instruction on every gate in the computer, the exact functionality, buffering, and timing of all I/O drivers, the function and timing of every part of the storage devices, and the details of all error detection, correction, and reporting mechanisms of all hardware and software. About 30% of my time went to coding and 70% was focused on helping engineers understand the effects of Xenon accumulation on the power output of a nuclear reactor during any series of submarine maneuvers.

Nothing changed very much over the decades until the day I switched from C and DOS to C++ and Windows and MFC and STL. I still love programming and C++ is a hoot, but now 70% of my time is spent on coding.

Before I continue with this diatribe, I want to digress to make two points. Abstraction is not necessarily simplicity. Show me it is easy to get a good grasp on the proof of Godel's theorem before you try to persuade me otherwise. Effective abstraction requires coherent, reliable, fully documented components. Is this your descriptions of Windows?

Mr. Stroustrup says there is no decent graphical environment for learning C++. Since virtually all C++ programs use graphics, do you sense a small dissonance? Articles warn us not to use double underscores in variable names because the experts who write the really tough internals can't succeed using normal C++ syntax and have compilers put in special hacks based on the names of variables. C++ can't really do the abstractions for essential libraries so the horrendous, arcane templates were added to allow STL. The most trivial syntax error can get you a 500-character error message from unwound templates that defies analysis. Mr. Stroustrup made four errors writing a 24-line C++ program, starting from a debugged C program! Is this the great language he is talking about?

Finally, an attack from a different direction. Pretend for the moment that C++ has a clean, elegant design and syntax and includes all the functionality for writing good objects. Such a paragon would be part of the problem, not of the solution, for a programmer today. A programmer often occupies the role of a politician, negotiating a set of compromises among competing constituencies. Must a program be multilingual, run on Windows 98 and NT and Unix? How much Internet, maybe have a Java version? Are all files on one computer, should we use multitasking? You can easily add 20 items to this list. Magazines are full of dozens of special technologies you can buy to cope with this chaos. We have begot a horde of Frankensteins!

That often deprecated Cobol never caused a fraction of this mess. Clothes or no clothes, C++ is not an emperor; it is a pet cat. Aloof, house broken, untamable, fun to play with, nice purr, and a lesser member of the household.

Richard Smith
Spokane, WA 99202
rdsmith@cet.com

Bjarne Stroustrup replies:

Thanks to McKown for the compliments, and yes, people do use scanf in the way I outlined. See for example Kernighan & Pike, The Practice of Programming. For early stages of teaching/learning, fgets exposes about as much mechanism as scanf but it handles arrays of characters only, so you'll soon have to introduce several more functions — or scanf. As they relate to my main thesis, scanf and fgets are roughly equivalent. They lead students directly into problems of pointers, memory management, terminating zeros, etc.

In reply to Gary Krone:

1. Fortunately, the if (c = EOF) error was introduced by the typesetter. It does not occur in my code or in my original text. But, sorry, I should have caught it in the galley proof.

2. I relied on the free store being implicitly freed when a C or C++ program exits. Had I written a library routine, I would have had to be more careful. Similarly, I would have had to close the file in the later C style examples (the C++ style examples implicitly release all resources). On the other hand, I thought that being explicit about the resource management would have led some people to think that I was unfairly stacking the cards against the C style.

3. Again the itt; error was introduced by the typesetter. It does not occur in my code or in my original text. Again, sorry, I should have caught it in the galley proof.

And yes, experts make mistakes, even silly mistakes. That is one reason why I strongly prefer to work at a level of abstraction where there are fewer "silly mistakes" to make and in a language that provides good compile-time checking.

Finally, Richard Smith's letter covers a number of topics, many of which have nothing to do with C++. But let me respond to what I see as the two main threads — comments on templates and comments on Windows.

Templates are integral to C++ and essential. They were envisioned early in the design of C++ rather than "added to allow STL." Without templates, users of containers would have to resort to inefficient type tests and/or potentially unsafe type conversions — as they did in early C++ and as they do in languages with insufficiently expressive type systems. The elegance and efficiency of the STL demonstrate the utility of templates as they appear in C++.

In the context of teaching and learning C++, poor error messages relating to templates can indeed be a serious problem. Providing much better error reporting is not hard, though. I hope to see significant improvements next year when the main effort to reach ISO conformance will be complete.

I'm the designer of C++, not the designers of Windows and MFC. If you have problems with Windows, please take your complains to Redmond rather than to me.

I'm quite aware of the problems of producing real-world software, but the article addressed approaches to C++ and programming. It is not an attempt to exhaustively enumerate the problems faced by software developers. One of the problems with some current environments is exactly that abstraction techniques have not been effectively applied. I agree that abstraction isn't a panacea. However, my paper demonstrates how abstraction supported by effective language constructs can package concepts (such as strings, vectors, and sorts) for ease of use. Anyone who claims to have a panacea is IMO peddling snake oil.

Bjarne Stroustrup
http://www.research.att.com/~bs

Hi,

I just want to make a comment regarding the use of the finalize method as a substitute for destructors in Java. I agree with what Chuck Allison said, and indeed the finalize method cannot be depended on to do any kind of important cleanup. But it should not be written off completely, because the finalize method is a useful debugging/checking tool.

Instead of relying on the finalize method, one should do the cleaning of any resources manually, but the finalize method can be used to check for conditions under which the cleanup was not done properly.

For example, suppose we have the following class:
public class XYZ {
     
protected SomeIOStream stream;
. ....
public void closeXYZ() {
   stream.close()
   stream = null;
}
     
public void finalize() {
   Assert.check( null == stream );
}
     
} // XYZ
On a JVM that has finalize implemented "correctly," it will do some checking to see if the user of the class has forgotten to call closeXYZ to clean things up.

Regards

Sheng-Te Tsao

Chuck Allison replies:

Thank you for your insight. I added a statement to that effect in my July article, since I speak of finalize again.

Chuck Allison
cda@freshsources.com

Dear Dr. Plauger,

When I pass a C string contained in the STL string class to a function accepting a const char*, I use c_str():
void func(const char* cstr)
{/*does something*/}
     
string str="my string";
func(str.c_str());
It will be more intuitive to have user-defined conversion operator in basic_string:
template<class _E,
    class _Tr = char_traits<_E>,
    class _A = allocator<_E> >
class basic_string {
 public:
...
operator const _E* () const
{return c_str();}
...
};
In this case I can rewrite call to func from above as:
func(str);
What was the reason not to include it in the basic_string class? I'm using MS Visual C++ 5.0, and STL files have your copyright. Maybe it is different from other versions of STL?

Respectfully,

Leonid Tochinski
leonid@enter.net

It is generally a dangerous practice to have implicit conversions both to and from a common data type, such as int or const char *. It's too easy to create ambiguities and unexpected conversions. The designers of the string class in the C++ Standard were well aware of this danger. Thus, they took the conventional prudent course of omitting the free conversion to a char pointer. — pjp o