April 1995/An Error Manager With Message Text Replacement

User Interfaces

An Error Manager With Message Text Replacement

David Chapman

David Chapman is a Software Consultant specializing in VLSI CAD tools, compilers, and language design. He has a BSCS from California Polytechnic State University, San Luis Obispo, and an MSEE from Stanford University. He is currently working on a research project in VLSI layout synthesis. He may be reached at dchapman@netcom.com.

Introduction
Library developers often face a difficult problem: the need to report errors without knowing what kind of environments the library routines will be used in. For example, VLSI ComputerAided Design (CAD) software often runs under a graphical user interface (GUI), based on the X-Window system. My code usually does straight number crunching with little user interaction. However, VLSI CAD software sells worldwide, wherever there is a local semiconductor industry. I want to be able to translate my software to the language used by my customers even though I don't speak any languages other than English (so far). Thus my code needs a simple way to translate the error messages it prints.
The more commonly-used techniques for making programs configurable don't support this need very well. Macintosh and Windows programmers are familiar with resource files, which contain the printable text and some of the graphics used by a program. Resource editors allow users to change the text associated with error messages and prompts. Resource files exemplify a "heavyweight" system, requiring a fair amount of source code and auxiliary files. Such a system requires editing and checking in of both the source code and the message file. On a project with only one message file being maintained by multiple programmers, updates may be lost unless extreme care is taken.
I've maintained code based on other message systems that export the text, but I've found them difficult to manage. Here's a sample message from one of those systems:

raiseWarning(runOutShape,baseName,cellName);
Since the original author didn't comment this particular message, I had to go look up the text so I could understand the code that used it.
Finally, in a CAD environment, the same code may run within a GUI or as a non-graphical program, selected at run-time, so even conditional compilation isn't enough. In such an environment, it's a good idea to pass all messages through a single centralized location. This makes it easier to enforce message presentation standards as well as to redirect error messages: I don't want to print to stderr while a GUI is running!

Designing an Error Manager
My CAD software is destined for a UNIX environment, so Windows and Macintosh resource tools won't help. Also, I'm working alone, so I need a system that's easy to develop and use. My design goals are as follows:
1. easy replacement of message text
2. multiple replacement dictionaries (for partial replacement or subsystems)
3. automatic error checking of replacement strings
4. 100% reliability and functionality even in the absence of replacement dictionaries
5. easy re-use of low-level library code that prints error messages
6. code readability during maintenance

General Approach
I route all error and warning messages through a single error manager. I don't try to handle dialogues here, because graphical and text-based interfaces often differ substantially, and my software isn't very interactive. However, I can replace prompts as well.
My system stores the default messages in the source code, so they can document some of its intent (after all, the user must be able to understand the message). Linking the program incorporates the messages into the program directly, so there are no resource files to get lost. (However, performing a translation requires a message dictionary, which is a separate file containing replacement text.)
Keyed error messages in the source code begin with a '$' followed by an alphanumeric message name and a colon. The error manager strips off these keys before printing the message (See Listing 1 for an example of a keyed message). If the error manager finds a message corresponding to the key name in the message dictionary, and the dictionary's parameter order is compatible with the original, then the error manager uses the replacement message from the dictionary. If it finds any errors in the replacement text, the manager prints a warning and uses the original text from the source code.
I based the system on fprintf because it's very difficult to modify streams-based I/O to fit this approach. Every string constant in an output statement would require its own key, and formatting could not be changed. If you despise fprintf and switched to streams as soon as you learned C++, you're probably better off with a resource editor.

Message Dictionaries
Message or complaint dictionaries store replacement text for the keyed error messages. If a message has a keyed name, the dictionary manager retrieves the replacement text and the error manager compares it with the program text. The error manager then walks down both message bodies comparing the format specifications. If the two disagree, the error manager assumes the program text is correct and prints a warning message. The error manager also lists the key name so that the user can fix the dictionary text.
This process demonstrates the advantage of storing the original message in the source code: even if the dictionary files are damaged or deleted, the program can still print the original message text. The user might complain about messages suddenly appearing in English, but I consider this better than crashing with no message whatsoever.
Some fprintf format specifiers are equivalent, or at least read same-size objects from the argument list. The function format_specifier in errmgr.cpp returns a "canonical" format character for each type. For example, format_specifier converts all floating-point numbers to double for fprintf, so the 'e', 'f', and 'g' types all map to the same letter. The user can change the way values are printed (more digits of precision), but the type ordering is fixed in the program.
Each logical line in a message dictionary contains one message. Newlines are quoted with '\' and comment lines begin with '#'. The dictionary format is the same as the message format except that the leading '$' is omitted. (This is not program text, so you don't need to quote special characters.)
Note that an implicit newline resides at the end of each message, even if the program text does not contain one. If you use this system to retrieve prompt text using err_mgr.message, you'll want to strip off this newline or else the user's input will be on the next line.

Message Handlers
The error manager handles four message types:
1) Fatal — the program prints the message and exits.
2) Serious — the program asks for permission to continue.
3) Warning — the message is printed and the program continues.
4) Posting — the message is simply printed.
The error manager routes all messages through a handler, which can be replaced at run time. For example, the startup code of a GUI would install a graphical error reporter (printing to a pop-up box, for example) and the termination code would restore the previous handler. All error handlers are based on the failure_handler class (see Listing 2) .
The default failure_handler prints to stderr except for the post routine, which prints to stdout. The routines themselves are fairly simple (see Listing 3) . The other routines in failure_handler appear on this month's source code disk.
The class error_mgr defines the error manager. Application code calls the member functions of this class's one-and-only instance, err_mgr. err_mgr then directs the message to the handler currently defined. (See Listing 4. )
Handlers stack; define_handler returns the previously defined handler to allow a sort of run-time inheritance. For example, an instance of the class counting_failure_handler, when created, will install itself in the handler stack and then tally each serious error or warning before passing the message to the previous handler. The counting_failure_handler will remove itself from the handler stack when it is destroyed. Listing 5 shows how counting_failure_handler installs and removes itself from the stack.
The error_mgr class uses counting_failure_handler when reading dictionaries. The complaint_dict constructor (Listing 9) parses the dictionary file, reporting errors through the warn function. Class complaint_dict is shown in Listing 8. If complaint_dict reports any problems the error manager knows the dictionary is invalid. This technique gets around the "constructor returns no value" problem. Of course, if another failure_handler is stacked on top of a counting_failure_handler, errors won't be tallied unless that handler also passes its messages downward. Only the topmost handler receives messages. Listing 7 shows class ptr_stack, which manages the handler stack.
A program can use multiple error dictionaries. define_dictionary parses the named dictionary and returns 1 if it finds no errors. If define_dictionary finds errors the error manager ignores that dictionary. I pass in a filename rather than an open file because file descriptors are a precious commodity. To replace a message, the dictionary manager reopens the file and rereads the message using the stored file offset. If a message appears in more than one dictionary the last one defined takes precedence. Listing 6 shows function define_dictionary, as well as other member functions of class error_mgr.
There are two error_mgr routines for each type of error. The second, similar to vfprintf, accepts a va_list argument. I found out the hard way that it's not a good idea to overload these names; if va_list resolves to (char *) (e.g. in Borland C++), then the call
err_mgr.warn("Can't open %s\n", filename);
may be routed to the va_list version of the function instead! (ARM section 13.2 says "sequences that involve matches with the ellipsis are worse than all others.")
The message function searches for the replacement text for the fmt argument and, if found, stores it into the buffer passed along. message then returns a pointer to the start of this buffer or within the fmt argument if no such message is found so that the result can be used within an fprintf call (e.g. a prompt). In this demonstration version, the buffer length is passed in; for robustness all routines of this sort should use a buffer object that grows when text is added instead.

Debugging Features
Last but not least this error manager provides an assertion capability. Assertions are always compiled into the code; the macro NDEBUG controls only the default value of the assertions_off function. The ASSERT macro is similar to assert except that it first checks err_mgr.assertions_off. An assertion failure is a serious but not necessarily fatal error in this system.
err_mgr.set_assert_flag(0) sets err_mgr.assertions_off, short-circuiting all assertion evaluations. Since it's an inline function returning the value of a static member value, set_assert_flag doesn't slow down assertion evaluation. In the worst case assertions can slow execution up to 20%, so users might not want assertions on unless the program is prone to crashes.
Note that all of the private variables are all static integers or pointers. In errmgr.cpp these variables are all initialized to zero, because the ARM states that assignments to zero are performed first. Global constructors are called in an implementation-specific order, so if an error occurs in a constructor that executes prior to the err_mgr constructor, err_mgr must set itself up. Thus every routine in errmgr.cpp, including the constructor, calls setup first, directly or indirectly.
setup allocates all of the non-simple variables off the heap to ensure they are properly initialized as well.

Automated Message Extraction
Once you've completed your application, you'll want to build the message dictionaries automatically. This month's code disk contains source code for a scanner written for flex, the fast replacement for the UNIX lex program. The scanner searches for string constants (merging adjacent string constants, of course) that appear to be keyed error messages.
I didn't try to write a C++ parser, or even to locate all err_mgr calls, because some of my mid-level utility routines implement their own error logging systems on top of err_mgr. For example, a parser will often print the language context in addition to the error message. Thus I'd have to analyze the program to determine that calls such as

lexer->complain("$illegal_char:" "Illegal character. \n");
were in fact calls to err_mgr. At worst the scanner will find a few extra "messages." Note that the scanner won't find keyed messages constructed at run time; they would be difficult for users to translate anyway.
The scanner in its current form requires flex because it uses some features that only flex provides. [A public-domain version of flex that can be compiled under both MS-DOS and UNIX is available from the C User's Group Library. See ordering information at the end of this article. —mb] I've included the output C file as well so that you can at least compile the program. The code that actually generates the dictionaries is in findmsgs.c, so if you want to change the format of the messages or dictionaries you won't need to edit the scanner itself.

Message Database Management
With every release of your software you will probably need to rebuild the message dictionaries, because you will most likely add new messages and modify old messages. If your customers (or a group within your company) have translated all of the messages into another language, you'll need to help them with the update. I recommend building a new set of dictionaries for the release, then comparing them message-by-message with the previous release. This process will result in a list of messages added or changed by the programmer. Customers must then write or edit translations for these messages into another language, you'll need to help them with the update. I recommend building a new set of dictionaries of the release then comparing them message-by-message with the previous release. This process will result in a list of messages added or changed by the programmer. Customers must then write or edit translations for these messages.
If your documentation or support groups are modifying message text (for example, to ensure consistent wording), then you'll need to compare the source code messages in the current release with the edited dictionaries. This step will give you a list of messages that the support groups have modified, to merge with the programmers' modifications.

Limitations
Naturally, any system based on fprintf is subject to parameter typing problems. Without a full C++ parser, plus program knowledge to determine which calls will be routed to the error manager, you can't guarantee against printing occasional garbage. Streams have an obvious advantage here, but again text replacement is a problem with them.
Currently, you can't edit messages in a way that rearranges the order of the parameters. At most you can rearrange the text around the parameters. You would have to define a whole new fprintf-styleformat and write routines to extract the parameters from the argument list in their original order, then shuffle them to meet the new requirements. It's not a simple task, so I didn't even try.
My CAD software is rather large, straining the limits of MSDOS, so I don't keep the replacement text in memory. You could modify the dictionary manager very easily to keep the message text around so that it wouldn't have to re-open the file every time a message was requested.

Limitations in This Version
My production error manager uses a managed buffer that grows in length automatically as text is added to it. This demonstration implementation uses fixed-length (usually 512 characters) buffers instead. For safety you should use some kind of expanding buffer. I've marked the source code to show where the fixedlength buffers are used. Look for variables named linelen and the associated comments.
For speed, keyword lookup should use a hash table. I've used a singly-linked list here for simplicity.
This month's code disk contains the following:

fully commented source for all modules and the C++ scanner

test programs for error_mgr, complaint_dict, ASSERT

makefiles for Zortech C++ 3.l, Symantec C++ 6.00, Borland C++ 4.00, and UNIX (the latter untested)

findmsgs.exe (the scanner) compiled using Zortech C++
All code described in this article is placed in the public domain. You may use it as you wish.

Acknowledgements
The flex scanner is based on one that Tony Sanders (once upon a time at cs.utexas.edu!ibmaus!auschs!sanders.austin. ibm.com!sanders) wrote in 1990. I've extended it to support floating-point numbers and quoted characters within string constants.

Obtaining a Version of Flex
Daniel R. Haney (MA) has ported flex to MS-DOS. Flex is available in CUJ Library volume 290. The disk contains a complete set of source code, documentation, a makefile, and a word count program as an example. Haney's implementation of flex can be compiled under MSDOS and UNIX. An OS/2 executable is included.
Also check out Flex++, CUJ volume 405. For more information, contact:
R&D Publications
1601 W 23rd, Suite 200
Lawrence, KS 66046
(913)-841-1631. FAX: (913)-841-2624
e-mail: michelle@rdpub.com.