October 2002/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@cmp.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Hi Guys,

First I would like to say thanks for the fantastic magazine. The information you provide is invaluable. I’ve only started reading it recently, but the number of times I’ve found myself reading an article and thinking “Aha!” or “of course, it is so simple when you think about it that way” is already too many to remember. Well done. Please keep up the great work.

Okay, now back to the point at hand. Is it just me, or does the use of typelists as described in Andrei Alexandrescu’s “Generic<Programming>: Typelists and Applications” (C++ Experts Forum, February 2002, <www.cuj.com/experts/2002/alexandr.htm>) result in very long compile times (due to the large amounts of template instantiations)? While this I can live with, a larger problem is the template instantiation depth.

In the article, Alexandrescu suggests techniques for using typelists of up to 50 elements; however this seems to breach the ANSI/ISO maximum template instantiation depth limit of 17 (or at least when I try to use them in my code).

Am I missing something? Are there techniques to decrease compilation time and reduce the template instantiation depth? Was there a reason why these issues where not mentioned in the article (possibly because they are problems I’m having and not specific to typelists)?

Regards,

Tom Howard

Tom,

Thank you very much for your kind words and for the excellent questions.

First, it should be pointed out that there is no replacement technique for recursion when dealing with features such as scalable typelists. This is because the template engine is “pure functional” — it doesn’t support assignment and therefore iteration.

On the template instantiation depth issue, indeed the theoretical limit is 17. If this looks like an artificial and odd limit to you, that makes two of us. Fortunately, all compiler implementers that I know of either allow command-line parameters to control this limit, or naturally allow template instantiation depth limited only by the available memory. In other words, 17 is a problem, but one that tends not to appear in practice; and as your other questions suggest, we have bigger fish to fry.

On the compilation speed when using recursive templates, that is largely dependent of the compiler architecture. Modern implementations of functional languages such as Haskell or ML — which do at run time what the C++ template engine does at compile time — prove that efficient implementation is possible. Now, if compiler implementers believe recursive templates are regularly used, they will put more effort into an efficient implementation. Fortunately, this is the current trend, and I have positive experience with the Metrowerks compiler. So there is more that compiler implementers can do, rather than what we can do in user-level code.

Andrei Alexandrescu

Dear CUJ,

I was surprised to see the article “A Multiple Substring Search Algorithm,” by Moishe Halibard and Moshe Rubin (June 2002). They give a O(n log n) average time algorithm to search a set of substrings from a given long string, where n is the length of the input. That algorithm runs in O(nm) worst case time, where m is the length of the longest substring.

This is an old and well-studied problem, the first optimal algorithm in the worst case was published in 1979 [1]. This algorithm runs in O(n+t) time, where n is the length of the input, and t is the number of occurrences reported. Several other algorithms have been proposed since then, improving this result. An optimal average case algorithm can be found for example in [2], which runs in O(n/m log(r)) expected time, and in O(n+t) worst case time, where m is the shortest substring, and r is the total length of the substrings. By using this algorithm and the well known “pattern partitioning” technique, one can achieve O(n/m k log(r)) expected time for k = O(m/log(r)).

Reinventing the wheel can be very educational, but a literature search could save a lot of time and give the readers of CUJ more up-to-date information.

Regards,

Kimmo Fredriksson
Department of Computer Science
University of Helsinki
Finland

I would like to point out that fast (linear-time) algorithms for multiple substring searches are well-known [1]. Algorithms that work faster on average have also been developed. The authors might have saved some time designing and implementing their program if they had known of the GNU tool fgrep, which implements the Aho-Corasick algorithm.

Arvind Sankar
Department of Math
Massachusetts Institute of Technology

We’d like to thank the readers who pointed out that the first algorithm for multiple string search was first published in the open literature by Aho and Corasick [1]. When attempting to solve our programming problem, we searched the Internet for an appropriate algorithm, but failed to find one. Our search was obviously not thorough enough. We fully agree that authors should do everything possible to avoid reinventing the wheel (and making them somewhat square in the process!). A timely posting to the appropriate newsgroup would verify whether a proposed algorithm is novel or not.

What is gratifying is the number of responses we received from readers thanking us for presenting our algorithm for solving the proposed problem. After a slight detour, they are ready to tackle the seminal article by Aho and Corasick.

The link in reference one in our article has since been removed from the Internet. Fortunately, we have since found that Christian Charras and others have reopened a newer and very impressive site: <www-igm.univ-mlv.fr/~lecroq/string/index.html>. This site not only carries all the content of the previous one, but each algorithm now has a Java applet that shows the algorithm in action. We’re convinced you will find the site very informative.

See [3] for a taxonomy of articles related to multiple string searching.

Moshe Rubin

References

[1] A. V. Aho and M. J. Corasick. “Efficient String Matching: An Aid to Bibliographic Search,” Communications of ACM, June 1975, pp. 333-340, <www.win.tue.nl/~watson/2R080/opdracht/p333-aho-corasick.pdf>.

[2] M. Crochemore, et al. “Fast Practical Multi-Pattern Matching,” Information Processing Letters 71, 1999, pp. 107-113.

[3] B.W. Watson and G. Zwaan. “A Taxonomy of Sublinear Multiple Keyword Pattern Matching Algorithms,” Science of Computer Programming, 27(2), pp. 85-118,<http://citeseer.nj.nec.com/watson95taxonomy.html.>

Hi!

In reference to the article “Using Constructed Types in C++ Unions” by Kevin T. Manley (August 2002), I just wanted to point out that it is missing a very important consideration: alignment. In fact, I’m really surprised he didn’t even mention it.

Basically, the problem is that, when any of the “constructed types” has any specific alignment restriction, the solution he proposes might either be slower than it should, or it might not work altogether, depending on the platform the program runs on.

The example he showed works on Microsoft Visual C++ 6.0 because the i member of the U union already enforces enough alignment for the rest (four bytes). And, even if it weren’t there, the currtype member of the MYUNION structure would also enforce the same alignment because its size and alignment are four bytes, too. (Note that other compilers might implement short-range enums using less bytes and less alignment, which would also make it align the union members incorrectly.) So the solution presented might work under certain conditions, but it’s neither portable nor robust.

One partial simple solution would be to move the currtype member to the end of the structure (to save space and to help with the alignment of the union by putting it as the first data member), and having the programmer add a dummy POD to the union to ensure correct alignment for the emulated union members. This would require a moderate knowledge of the particular platform’s alignment requirements, and of the internals of the emulated union members, but it would fix the problem.

I should point out, too, that all these considerations have been discussed at length by Andrei Alexandrescu in his latest columns about discriminated unions in CUJ’s own C++ Experts Forum. He also proposes a much more sophisticated solution than the one I described above.

Salutations,

Juan Carlos Arevalo-Baeza
jcab@roningames.com

Editor’s Note: We received several letters from readers — including a letter from CUJ Expert’s Forum columnist Andrei Alexandrescu — mentioning the possible alignment problem associated with the solution outlined in Kevin Manley’s article “Using Constructed Types in C++ Unions.” The author’s response to Alexandrescu’s comments follows.

Andrei,

Thanks for taking the time to read and send feedback on my article. I recently bought your book Modern C++ Design and I’ve been learning a lot from it.

Of course, you are correct about the alignment issue; I should have mentioned this in the article. While this is difficult to solve portably, I believe it can be solved satisfactorily for a given compiler/platform, so I don’t know that the issue completely invalidates my approach.

An advantage of the approach I discussed is its simplicity. The variant class you present requires a significant amount of supporting code (~80KB) and an advanced knowledge of C++ templates. It also takes three articles to explain instead of one!

Performance is another consideration. Herb Sutter wrote to me with concerns similar to yours and proposed the Boost any class as an alternative. However the any class uses dynamic memory allocation and performs poorly in comparison. (I sent him a VC++ project demonstrating this). I wanted to try your variant class and profile its performance, but I don’t have access to a compiler that can compile it.

I agree with you that I should have addressed portability and conformance to the C++ Standard in the article. However, as you know, working programmers have many concerns, and portability and conformance are not always at the top of the list (though perhaps they should be.) In evaluating a solution one might also consider compatibility with current compilers, readability, maintainability, dependencies/coupling, run-time performance, etc.

Perhaps we can agree there’s a need for a portable way to get the alignment information essential for using placement new with a static buffer. That would help us both.

Best regards,

Kevin T. Manley

Editor’s Note: Andrei Alexandrescu responded that he still considers the solution dangerous and expressed the concern that it might cause undefined behavior. For further study of this matter of C++ unions, see Alexandrescu’s series on “Discriminated Unions” at the CUJ website: <www.cuj.com/experts/author_index.htm>.

RE: C and C++: A Case for Compatibility

I gather by the tone of the article that Mr. Stroustrup doesn’t like the recent changes that have driven C and C++ apart. I completely agree. C++ was originally presented to me as the C language with the addition of object-oriented constructs. The idea was to write C code that was object oriented. Since there is nothing in C that contradicts object-oriented design, it seemed natural to move to using C++.

I think that there are numerous advantages to keeping C a subset of C++:

Anything that is useful in C is useful in C++. There was a recent addition to C of a new numeric type. Proponents of separation say that C++ allows users to create their own types, so they don’t need it. This is hogwash. If the type is so important that it becomes part of a language, then enough users must need it that it should be in both languages. Making C++ users all implement their own type just because they can is time consuming and error prone.

More jobs. Yes, it is selfish, but I like the idea of being able to go from a C++ programming job to a C programming job without a learning curve. It makes me more marketable and will save my butt if the C++ work dries up in my area.

I won’t have to buy two compilers. This could happen if the languages separate enough. There really only needs to be one compiler with switches to tell it to allow C++ constructs. Actually, the only reason to even tell the compiler that the code is C code is to reduce the extra code that might get generated. The compiler could easily decide the best way to generate the code.

So going with that last comment, I think that C should become obsolete, and C programmers should just program in C using C++. That’s right: one language, one compiler! I write C programs often, but I don’t bother to use a different file extension to tell the compiler to act differently, I just write non-object-oriented code without C++ features. Who’s to say that the C language should not have overloading? It’s not an object-oriented construct, and it might be meaningful to anyone using C also. C was originally a subset of C++, so there was never a reason to make them separate. A few different words and an accent doesn’t really make American English a different language from UK English, does it?

Thanks,

David Rector
Senior Software Engineer
TimeLogic Corporation
davidr@timelogic.com