February 1997/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@mfi.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Editor,

I read your article in the Ordering algorithms published on the October issue of C/C++ Users Journal. I found it very good and very informative. We're currently working with the 4.52 Borland C++ 16-bit compiler, which has a limited stack space. Using the STL recursive implementation of the ordering algorithms we get stack overflow problems. This has forced us to rewrite non-recursive ordering algorithms, which defeats the original purpose of STL. Has this been addressed in the development of STL implementations? Do you know of any way the recursive algorithms will cause stack overflow problems in the 16-bit environment?

I will appreciate your response.

Thank you,
George Dedes
Topcon-Geocomp, Columbus, Ohio

A good recursive sort makes at most log2(N) recursive calls to sort N items. That's about 20 calls to sort one million items. The Hewlett-Packard version of STL, and mine, have this desirable behavior. I can't see how they can be causing stack overflows, given a decent sized stack. — pjp

Jerry Dwyer and K.B. Williams:

After reading your two-part article in the June and August issues of C/C++ Users Journal, I feel compelled to respond. I am a retired auditor, applied statistician, and computer programmer. About ten years ago I became involved with pseudo-random number generation in connection with a large-scale business audits research project. Desktop computer spreadsheet programs are my preferred audit control medium. I merely needed to verify the adequacy of their built-in rand functions for audit random sampling purposes. Little did I know what I was letting myself in for.

I began by phoning the companies behind the ten or so spreadsheets I possessed at the time. Not a single company would tell me anything about their built-in rand functions. Several claimed it was a trade secret; most just shucked me off.

I do not take kindly to that kind of behavior. I decided to attempt a "black-box" random-number evaluation system. It took me more than a year and much despair, but I succeeded. During this period I read Knuth's Volume 2 over and over again. Since I am not a mathematics major, I do not pretend to fully understand the theorems and proofs. I do understand the conclusions reached.

I subsequently evaluated all my spreadsheet rand functions, several thousand digits of pi, several thousand digits from the Rand Corporation book of one million random digits, and several high-level computer-language random-number functions. I wrote an article for Byte Magazine summarizing some of my results, but they rejected it as not new enough material. Since I uncovered two spreadsheets with adequate functions during this testing, I lost further interest in pursuing this subject.

While writing the Byte article, I phoned Knuth at Stanford University. He told me he was only the editor for that section of his books and knew nothing about random-numbers. He asked me to phone another professor at Stanford whose name I do not recall. That professor was a pleasant man who told me random-numbers was a very deep subject and that I should phone a professor at the University of Southern California. The USC professor was very unpleasant. He finally told me to send him $400 with my article and he would deign to read and critique it. I hung up wondering what the hell was going on.

Your article awakens those memories. Also, I now play duplicate bridge on my computer. I notice no current bridge program generates adequate pseudo-random numbers. I have spoken to several bridge program authors about their generators without success. They think two billion numbers are a lot. I think not, because that is only a speck in the ocean of all possible bridge deals. A good uniform generator with a period of about 1.0E32 would be a good step forward.

Using your example Lehmer generator on page 57 of the August CUJ, I laid out each multiplier from 2 through 127 in an Excel spreadsheet. Surprisingly, I find 36 full-cycle multipliers, not 12. They are: 3, 6, 7, 12, 14, 23, 29, 39, 43, 45, 46, 48, 53, 55, 56, 57, 58, 65, 67, 78, 83, 85, 86, 91, 92, 93, 96, 97, 101, 106, 109, 110, 112, 114, 116, 118. Many of these obviously share common factors with 126.

Page 10 of Knuth's Volume 2 asserts your example Lehmer generator will have a reduced period because you are holding the additive constant C to zero. It did not reduce this generator.

Look at Figure 2 on page 61 of the August CUJ. Multiplier 53 is supposed to be practically good and multiplier 85 a worst-case. I look at them and see they are different, but both are clear-cut patterns.

I did chi-square tests on my 36 full-cycle sequences. I broke each 126-number sequence into three groups of 42. I conclude some of the multipliers produce reasonably uniform results. The statistical best is 65. The worst is 116. Your best, 53, is near the bottom of my rankings.

I predict 65 will cause an imperfect-looking distribution graph after your spectral test. I think imperfect-looking results are the desired essential property in uniform random-number sequences.

Thanks to your article I can now visualize the spectral test. At the same time I think it does not do what it purports to do. Simply, it is not measuring the correct expected value. Therefore, its numeric conclusions do not coincide with reality.

You and Knuth's Volume 2 author seem to think uniform means an ultimate one-to-one result relationship. I think the chi-square table conforms to uniform distributions. It converts varying curved distribution possibilities into linear probability measurement. I currently think an equivalent conversion table is necessary before the spectral test can be usefully interpreted.

You state, "If you need more than a simple random-number generator, you need more than a simple test to choose a suitable candidate." I came at the issue in the opposite direction. I focused on "uniform" and "random." My "black box" system uses only two tests, the chi-square test and a creation of mine I call the decay test.

I note you think calculating a generator's cycle-length is "messy and not very helpful." I think it is crucial. My decay test measures the period and relative potency of any random-number sequence from whatever source. This is what makes it a "black box" device. It is simple enough to be done with a spreadsheet macro.

What we are talking about so far are some primitive elements in a larger issue. My main concern is the usability of pseudo-random numbers. At the moment, they are not very usable. Here is what is in the way.

Periods versus digit display:

Table 1 on page 63 of the August CUJ conforms to my general decay test results. Most spreadsheet and high-level computer-language standard" random-number generators have a period of nine to ten digits. A major difficulty is those same programs display their results in 14 to 16 digits. That makes the last five to seven digits bogus.

When I speak of an adequate generator, my first requirement is for a period greater than the displayed-digit capacity of whatever program uses the generator output. Using Table 1, the Knuth subtractive generator has the shortest usable period for the double-precision floating-point arithmetic currently prevalent in spreadsheets and computer languages.

More is better. I want to test the Marsaglia-Zaman generators. If they produce satisfactory uniform sequences and anywhere near their claimed periods, they will do for anything I visualize in my lifetime.

Double-precision floating-point and 16 displayed-digit arithmetic are increasingly inadequate in the case of large-company statistical audits. For instance, a $50,000,000,000 sales year probability-proportional-to-size audit requires a minimum of 14 digits to properly select a sample from the complete books. I will feel relief when 18, 21, or even 24 displayed-digits become the norm.

Seeding control:

Currently generators rely on fixed-start seeds, limited-range input seeds, or out-of-operator-control randomization calls. Each is grossly incomplete.

I want complete control of the seeding. Using the example Lehmer generator, I want to be able to enter 1 through 126 manually. Also, I want the choice to automatically seed the generator outside my sight and control, and know that the computer is able to provide each and every seed from 1 through 126. Finally, I want the choice to retrieve that initial seed and final random-number for after-the-fact verification.

My Excel rand function is an example. It automatically seeds the generator when I use =RAND(). While I know it has a generator with an adequate period, I do not know if it has 360, 36,000, or whatever limited number of possible seeds. Even if it had the same number of possible seeds as the generator period, I cannot enter a particular seed. In this instance it would be fairly easy to implement. =RAND(123456789) could override the automatic seeding without damage to the program logic. It could also return the current random-number inside the formula brackets.

I want to hear back from you. I am grateful to you for the clarity your article brought to me. At the very least, I would like to see a good standard C/C++ generator implemented. At the very most, like most, I would like everything my way!

Cheerfully yours,
Douglas McLean
Sparks, NV

Random-number generation seems to be another of those bottomless pits of complexity, like calendar calculations. — pjp

Dear Mr. Plauger,

Thank you for The C/C++ Users Journal. It seems to fill a niche other journals don't. However, I'm writing this email to express a growing concern I have with the suitability of the main programming language for embedded systems, namely C.

In particular, when I tell a language that I want an array, a[10], I do not expect to be able to access a[12] and get away with it. At the very least, I would expect a run-time error while running debug versions of the code. This causes the needless waste of a lot of time tracking down bugs, and means that you are never sure that a seemingly good version of code hasn't been reading and writing from invalid bits of memory and will blow up at some unspecified time in the future.

It doesn't have to be this way (as Pascal, Java etc show), so how long will ANSI C/C++ allow it?

Regards,
Alasdair Johnson

Both ANSI and ISO C and C++ will probably allow it forever. But neither language forbids an implementation from generating a checked version of your program for debugging. Some, in fact, do. — pjp

Dear Mr. Plauger,

I work for a software development company and we are currently beginning to define how we will develop our next generation of products. The biggest issue we are faced with is portability and we view C++ as the truly portable language we will use. Other than C++, we are not biased to any specific technology and of course don't want to limit our business opportunities because our applications don't run on this platform or that one. I have never been able to find an article that completely addresses portability issues from a business software developer's point of view. Maybe the issues are too complex or mabye true portability is impossible. Anyway, I probably have a thousand questions but I think I can distill them to the following two.

1. Can I write a non-GUI C++ program that will run on Windows95, WindowsNT, AIX, HP-UX, Solaris, OS/400 and MVS. If so, which, if any of the C++ development environments that are out there can I do it in. For example, I would like to be in, say, MS Visual C++ and not only write GUI programs that run on Windows95 but also non-GUI programs that I can send to an AS/400 to compile and execute there. Is this possible?

2. Can I have embedded SQL in my C++ programs that will allow me to access any of the following databases, SQL Server, Oracle, Informix, DB2/6000, DB2/400, DB2/MVS? If so, does the SQL pre-processor typically come with the Visual C++ development environments or do I have to get some other add-on tools? Do I need ODBC drivers?

I appreciate any help you can give me. Can you write an article in your magazine? Can you respond to my letter in your magazine? Can you write to me or call me with suggestions as to how I can get these questions answered? I think this is a very current and relevant topic.

Thank you.
Juan Perez
330 North Wabash Suite 2501
Chicago, IL 60611-3608

It's possible to write a non-GUI C++ program to run in multiple environments, but you will find few tools to help you do the job. Portability is, in the end, an economic term. Software is deemed portable if it's cheaper to move it between two platforms than to rewrite it for the new platform. Usually, you have to impose an added discipline on yourselves if you intend to write software that is portable in the sense you require. Leaving out the GUI requirement makes it much easier to write portable stuff — stick with the simplest uses of the C++ library — but it's still not easy.

I'm not long on DMBS knowledge, but my understanding is that you can write a sufficiently conservative subset of embedded SQL that will work the same on multiple systems. As far as I know, you have to acquire an SQL preprocessor, and drivers, for each environment, but I may well be wrong.

We'll happily print your letter in our Letters column. Maybe that will stimulate an article or two from people with experience in this area.

Thanks for writing, pjp

To the editor:

I can't really disagree with Pete Becker — const ints and static const ints are usually much better than #defines. You don't have the textual substitution nightmare he describes so eloquently. And they have another advantage he doesn't mention — since they have a "real" existence in memory, you can see them in the debugger. Our project spent many hours in code reviews purging this "C idiom" from our thought processes.

And yet — there's an exception that proves every rule. We ran into it on the large project we're (STILL) building using (ugh) Visual C++ v1.52. While the experts have gone on to bigger and better things, we've had to discover every trick in the book (and many not in the book) to keep the damned thing rolling. Microsoft never intended for this package to handle a system of the magnitude we're building, but somehow, we're doing it. I almost wish we weren't so successful at this — our client might finally have agreed to go 32-bit. But I digress.

The first nit I have to pick with with Pete's example is that he initializes his const int variable in the header. While legal, this will create a copy of this beast in every .cpp file that includes it. If you don't want this, then it needs to be declared static const int or extern const int and defined in one .cpp file. #defines, being a figment of the preprocessor's imagination, rather than real variables, pose no such problem.

But the problem that really bit us was when all else failed and we had to bite the bullet and split off some of our app into MFC Extension DLLs. If you've ever had to do this, you know the time to do it is before you make all your design decisions, not after. Live and learn. Some of our code ended up living in an application and some in extension DLLs. Now, all the sudden static const ints became a big problem. If the definition of the variable was in the .exe, the DLL would use a separate, uninitialized, copy. This made for all kinds of fun. Finally we realized that in this context, not occupying any real memory was a big advantage and went back to using #defines for those constants that had to cross the DLL wall.

By the way, this incident also provided some grist for Bobby Schmidt's anti-Hungarian notation tirade, although I'm not as strongly opposed as he is.

As part of the DLL conversion mentioned above, our team's finest minds, parents of young children all, sat huddled one day around a debugger trying to find the cause of a particular new GPF the conversion had caused. As we tried to delve into the ugly macros (yes — try building typesafe collection classes in VC1.52 without them — templates, what are they?) used to tie this thing together, eventually we found that the problem was a variable that the compiler was perversely making a NEAR pointer — its name — ppHead. At that point we all started rolling on the floor laughing. It had been a rough week.

Steve Cohen

Even the advice of experts must be tempered with reality now and then. Thanks for writing.— pjp

Mr. Plauger:

In response to L. Zyka's query regarding timers for Windows NT (We Have Mail, November 1996): to my knowledge, there are two major classes of timers available as part of the Win32 API.

First, there's GetTickCount(), GetCurrentTime(), and timeGetTime(). These functions are, as far as I can determine, completely identical (except for one possible difference, which I'll get to in a minute.) Each of these functions takes no arguments and returns a 32-bit unsigned value that represents the number of milliseconds since Windows (NT) was started. The catch is that the precision is not guaranteed to be anything in particular. This is where the (possible) distinction between timeGetTime() and the other two comes in. timeGetTime()'s documentation mentions that its precision can be adjusted by timeBeginPeriod() and timeEndPeriod(). These two functions take a UINT as a parameter that specifies the minimum timer resolution in milliseconds, and returns TIMERR_NOERROR if successful and TIMERR_NOCANDO (I am not making this up!) if the specified resolution is unavailable. (The return type of each function is MMRESULT.)

The idea is that you call timeBeginPeriod() right before your timing procedure to crank up the resolution, call timeGetTime() as necessary, and then call timeEndPeriod() after you're done, specifying the same resolution as in timeBeginPeriod(). (For those of you that can manage to keep Windows NT running for this long, you should note that this function will wrap over to 0 after 2³² ms, or 49 days and some change.) Second, there is the pair of functions QueryPerformanceFrequency() and QueryPerformanceCounter(). These functions access, if it exists, a "high-resolution performance counter." Each takes a pointer to LARGE_INTEGER, which is a 64-bit structure as follows:
typedef struct _LONG_INTEGER {
  DWORD LowPart;
  LONG HighPart;
} LONG_INTEGER;
(I've left some decoration out; this is intended merely as a guide to accessing the structure.) They return FALSE if no such counter is available. QueryPerformanceFrequency() returns the number of cycles per second that the counter executes; on my system this number happens to be 1193182, or slightly over a million per second.

QueryPerformanceCounter() returns the number of cycles since Windows was started. Since this is stored in a 64-bit integer, I don't expect that my machine will be able to stay up long enough to cause that to wrap, which it should do every 15 trillion seconds or so (close on to half a million years). I've found that the easiest way to manipulate the counter return values is to convert the LONG_INTEGER into a double via a macro.

I've read several issues of CUJ, and I must say that I appreciate the emphasis that you all place on efficiency and robustness of code — something that seems to be deprecated these days.

Joshua Madden
Software Engineer
Imagenation Corporation
jmadden@imagenation.com
"Obscurium per Obscurius"

Thanks for your help. — pjp