November 2000/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@cmp.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Dear Marc,

In the June 2000 issue CUJ published an article by David Berry on "Combining Boyer-Moore String Search with Regular Expressions."

Although Regular Expressions have a slightly different syntax on different implementations, in no case do they exhibit a backslash inside brackets, except to mean the '\' char itself. I'm quite disappointed you didn't point out that when responding to Mark van Peteghem in the August 2000 "We Have Mail" department.

I'm not sure everybody is familiar with RegEx, hence I recall that it is a string matching technique featured in a vast number of utilities, from the classic Sed and Awk to Perl, and many more.

To give a real-life programming example, let me quote this [broken to fit column]:
"^(([^:/?#]+):)?(//([^/?#]*))?
([^?#]*)(\\?([^#]*))?(#(.*))?"
It is the RegEx used in PHP3 to parse a URL, according to appendix B of draft-fielding-url-syntax-09, http://www.ics.uci.edu/~fielding/

The example shows how a RegEx can be used for parsing. When applied to Roy Fieldings's URL above, the RegEx will return the matches "http:", "//www.ics.uci.edu", "/~fielding/", and no match for the query string (anything after a question mark) and the anchor (anything after a hash sign).

Notice that the question mark of the query string is escaped twice ('\\'), as the string constant will be parsed by a C compiler before being passed to the regcomp function. The '?' needs no escapes when it is inside brackets.

The example also shows the syntax for the so-called extended RegEx. Old style RegEx needs more backslashes to mark a store patterns.

Finally, the example shows one case where applying a Boyer-Moore string search would lead to worst performance. The RegEx provides a minimal number of explicit characters, namely the 1-length tokens mentioned in the URL syntax, and most of them are optional (i.e. followed by '?').

David Berry oversimplified the grammar for RegEx, in order to demonstrate his approach. It is more challenging to take advantage of Boyer-Moore algorithm in a general RegEx package, as is, e.g., Henry Spencer's, in ftp://ftp.zoo.toronto.edu/pub/regex.shar. Presumably, when compiling a RegEx one should set a flag that will indicate what optimizations are worth being tried when executing the RegEx.

Then, of course, until such a generalization has not been developed, one may be happy to run the available code. However, care should be taken to ensure compatibility with standard RegEx. Reinventing a new syntax for the sole purpose of applying an optimized algorithm won't pay.

Alessandro Vesely

Sorry my reply disappointed you; I am not a RegEx wizard so the backslash problem escaped me (pun intended). I still think Berry has a neat idea, even if his optimization is not universally applicable. Thanks for writing — mb

Dear Erik Nelson,

In the article, "Network Programming with Linux," CUJ, September 2000, you stated that TCP is built upon UDP. This is not correct — UDP and TCP are both built upon IP.

This is a minor error, and does not affect your article in any way; just thought you might want to know.

Cheers,

Claude Brown
Sydney, Australia

Dear Erik Nelson,

You presented a useful class for network programming in "Network Programming with Linux" in the September 2000 issue of CUJ. In your comparison of TCP to UDP, you seemed to imply that TCP is a connection-oriented protocol that uses virtual circuits ("...This connection, or virtual circuit,...").

TCP is connection-oriented, but it does not use virtual circuits. A virtual circuit implies that packets will take the same path through the network between the two end points and always arrive in order. TCP packets or segments can take different paths through the network and can arrive out of order. The protocol uses the sliding window algorithm and the sequence number of the first byte in the segment to re-order the bytes correctly before sending them to the application.

My point is that describing a TCP connection as a "virtual circuit" is misleading.

Regards,
Jerry Champeau
Datex-Ohmeda

I suspect we've run into a culture clash here, where one domain has a slightly different definition for "virtual circuit" than another. Nelson's "virtual circuit" illustration for TCP is not without precedent. It is also used in the book Internetworking with TCP/IP, Volume 1, Third Edition, by Douglas E. Comer (Prentice Hall, 1995 — there is a fourth edition out now, which I have not seen). Actually, what you are describing sounds more like a real circuit to me.

In any case, your letter points up the limitations of illustration and metaphor in technical literature. (In fact, Herb Sutter and I have had a friendly disagreement over the advisability of his Really Dead Parrot illustration. See p. 72.) Remember, all metaphors should be taken with a big grain of salt. — mb

Mr. Hanov,

Just a few words to thank you for the outstanding article ["A Lightweight Window Wrapper"] in the August 2000 issue of C/C++ Users Journal. As a longtime user of MFC searching for a simpler way of using C++ in small projects, your techniques hit the mark — bull's eye. Your insights are clean and extremely useful. Keep up the good work and good luck in your career.

Lance Hagen
San Antonio, Texas

Hi,

While reading through recent C/C++ Users Journal issues, I was worried by the (lack of) quality in the code examples of several articles. Here are some examples from the August 2000 issue:

p. 16: 1) Failure of the function DfmConvert is signaled to the caller through some arbitrary negative values. 2) The file referred to by *fin isn't guaranteed to be closed if an exception is thrown between the calls to fopen and fclose.

p. 20: The simple Listing 3 does not compile because of lacking #includes and std qualification unless it's always #included from places where the lacking pieces are already defined (same thing for Listings 4/5 on pp. 52/3).

p. 29: Usage of identifiers of the form _[_A-Z].* is reserved to the implementation (compiler + Standard library).

p. 36: The function Leg::GetCurrent should be const.

p. 42: main has to return int.

p. 46: The body of function ABC::create looks very strange.

I don't mean that every article has to discuss the whole Software Engineering theory, Exception safety, const correctness etc. But I'd like you and your authors to understand that the code published serves as a model for a great part of your readership. I've seen too much code with errors like these, and I'd really like them not to be covered by what is now the major C/C++ magazine on the market.

Thanks,

Thomas Maeder

I don't know about you, but the only way I can improve my coding style is in an incremental fashion. If I'm focusing on const correctness, I will have to put off exception safety for another day. If authors waited to submit articles until their code was exemplary in every conceivable respect — software engineering, exception safety, const correctness, and every other form of correctness you can think of (how about i18n correct?) we would all be the poorer for it, because nobody would send in anything. So I don't get too upset if an author who wants to share his hard work unraveling the Borland VCL happens to use arbitrary return codes to signal errors. I kinda don't think most people are going to seize on that as their model for software engineering.

That said, some of your points are very well taken. The errors you found are indeed regrettable. I now have two good technical reviewers on my fledgling editorial board: Dan Saks and Herb Sutter. With their help, I anticipate CUJ will move closer to the ideal of providing model code for our readers. — mb

Dear CUJ,

I look forward eagerly to the arrival of [CUJ] each month because there always seem to be a perfect balance between code I can just "slot in" and articles which I have to mull over. Michael Bramley's article in the July 2000 issue on finding "nice" intervals for graph axes falls definitely into the first category.

His method as it stands suffers, however, from an unnecessary limitation. Ranges involving small magnitudes (e.g. 1e-12 to 5e-12) are disallowed. In my field such values are not uncommon and it seems unreasonable to force my clients to rescale as he blythely suggests. The great advantage of floating-point numbers is surely precisely their accommodation of different magnitudes.

Mr. Bramley observes that due to inexact floating-point representation and rounding errors, sometimes when the calculated axis minimum should be exactly zero, a very small number is produced instead (e.g. 1.4517e-17). To get around this, he clamps all tiny axis minima values ( < 1e-10) to zero.

This draconian measure is unnecessary, as one can distinguish easily whether a small axis minimum is real or merely an approximation to zero (fp artefact) by the range of the axis. The recognition of "true zeros" is merely a comparison to 0.0 in disguise. The standard way of testing two inexact floating-point numbers for equality is to examine their their difference scaled appropriately. I believe the following code would do nicely:
if (fabs(Test_min) / Test_inc < 1e-5)
   Test_min = 0.0;
Llew Goodstadt
University Laboratory of Physiology
Oxford, UK