Testing with Paired Messages

John Torjo

A trace file can help debug a program, but only if you can decipher the trace file. This technique will help.

In complex applications, one of the reasons for logging is to track bugs. However, as logs grow larger, the task of finding errors becomes increasingly difficult. You'll have a better chance of discovering the problem if your log isn't clogged with miscellaneous unrelated messages. However, sometimes an innocuous message contains a clue. This article resolves these seemingly contradictory insights by introducing the concept of paired messages.

In complex applications, you end up logging a lot of messages in pairs: "Start receiving data for client jt2001," "End receiving data for client jt2001," etc. But most likely, the pairs are far apart. Between the two messages in a pair, you will find other messages, stating what happened between the start of an action and the end of the same action. A paired message is composed of two messages: one that describes the start of an action, and one that describes the end of the same action. The first message (the one describing the start of the action) must occur first. The second message is written later when the action ends.

If the second message in the pair is written to the log, the action completes successfully. The strategy described in this article eliminates completed pairs from the log so that the log contains only:

Un-paired messages. An un-paired message is an unfinished paired message. That is, the first message occurred but the second message has not yet occurred.
Non-paired messages. A non-paired message is a message that does not belong to a pair. For instance, it's an informational message, like "Found the word 'html' in file 'C:\test.html'."

In case of an error, you can consult the log -- the Isolated Messages Log -- and immediately see which actions were currently unfinished at the time the error occurred. Once you have determined the possible cause of the error, you can consult the ALL Messages Log to learn more about the context in which the error occurred. The ALL Messages Log contains all messages: paired, un-paired, and non-paired.

This article describes how you can hunt for errors in your programs using a Isolated Messages Log. I have implemented the Isolated Messages Log in Windows, using Visual C++ v6. However, it should be easily ported to other platforms as well.

Figure 1 shows an example of Isolated Messages Log and Figure 2 shows a small part of an ALL Messages Log.

Message Writing

The logging method described in this article requires two steps:

1. Write all messages to both the ALL Messages Log and the Isolated Messages Log.

2. Periodically refresh the Isolated Messages Log to remove completed pairs.

You can refresh the Isolated Messages Log either manually or automatically (see the later section titled "Refreshing").

The first step is to write messages to the logs. The CPairedMessagesLog class (available for download at <www.cuj.com/code>) writes each message to the Isolated Messages Log and the ALL Messages Log.

Say you have a reference to a CPairedMessagesLog object. How do you write to it?

In order to uniquely identify a paired message, you assign it a unique message ID. Non-paired messages don't need an ID.

When writing a paired message, you can do it in two ways:

1. Automatically -- a Message ID is automatically assigned:

CPairedMessagesLog<>::paired_log
  MessageLog = log.next_paired();
MessageLog
  << "Start receiving from [" 
  << nSatteliteNo
  << "] satellite" << std::endl;
// ... user code
MessageLog
  << "End receiving from [" 
  << nSatteliteNo
  << "] satellite" << std::endl;

2. Manually -- you get a message ID or provide your own unique message ID:

// get a Message ID, OR provide your own
CPairedMessagesLog<>::message_id_type
  MessageID = log.nextmessageid();
  
log.paired( MessageID)
  << "Start receiving from [" 
  << nSatteliteNo
  << "] satellite" << std::endl;
// ... user code
log.paired( MessageID)
  << "End receiving data from [" 
  << nSatteliteNo
  << "] satellite" << std::endl;

Writing a non-paired message is done like this:

log.not_paired()
  << "Satellite [" << nSatelliteNo
  << "] has been checked: no malfunctions."
  << std::endl;

As you can see, writing the message is done in an STL-stream manner. You use operator << to write any type of data. Terminating a message is done with std::endl. Actually, it must end with std::endl (see the sidebar, "std::endl Usage").

As a matter of fact, the functions paired, non_paired, and next_paired all return STL stream objects. That is, you can pass them to functions that accept std::basic_stream< char_type> arguments or even use manipulators to write to them.

You can also indent messages (see the sidebar, "Indenting Messages").

Construction

Constructing a CPairedMessagesLog can be done in two ways:

1) CPairedMessagesLog( strLogFileName, strAllMessagesLogFileName). This definition is almost self-explanatory, with one more detail: the refresh type is "Auto Refresh at 10 seconds" (see "Refreshing" below).

2) CPairedMessagesLog( CPairedMessagesLog::Settings settings). This further refines the settings of the CPairedMessagesLog. Listing 1 shows the definition of the Settings class (also, see "Refreshing" below).

Notes:

If strAllMessagesLogFileName is empty, there won't be an ALL Messages Log. However, I don't recommend using this.
strLogFileName must contain the full path to the file. This is because the file will be opened and closed multiple times. Therefore, even if by chance the current directory changes, strLogFileName should point to the same file on disk!

Refreshing

The ALL Messages Log does not need refreshing. You append each message to its end. But the Isolated Messages Log needs to be refreshed. Why? Once a full paired message is written (i.e., the end of an action has been written to the log), you must delete this paired message from the log.

The following types of refreshing are allowed:

Every time: this will refresh the log each time any message (paired message or non-paired message) is written to it. Note that you won't rewrite the whole log unless necessary. For instance, when a new non-paired message is written, you just append it to the log; you don't need to rewrite the log!

Advantage: at any given time, you will have an up-to-date log.

Disadvantage: this type of refreshing might be inefficient when large amounts of data get written to the log very fast.

Auto refresh at X seconds: this will refresh, on a dedicated thread, the log automatically at X seconds.

Advantage: this is very efficient.

Disadvantage: in case the application crashes, you might lose some data.

Also, you can manually refresh at any time by calling the Refresh function. You can even implement a mechanism to Refresh the Isolated Messages Log just before your application crashes.

The default refresh is auto refresh at 10 seconds.

Implementation Details

The most important functions from CPairedMessagesLog class are:

WritePairedMessage, which writes a part of a paired message (i.e., the start or the end of an action)
WriteNonPairedMessage, which writes a non-paired message

I have created an abstract class, CMessageHandlerStream, which is STL stream-derived. When the CMessageHandlerStream is flushed (e.g., when std::endl is written to it -- see the sidebar, "std::endl Usage"), it calls a virtual function, OnNewMessage( strMessage), where strMessage is the message written so far, up to the flush. Internally, in CPairedMessagesLog, I have created two classes: COnePairedMessageLog and CNonPairedLog. Both are derived from CMessageHandlerStream. COnePairedMessageLog's OnNewMessage calls WritePairedMessage to write a part of a paired message. CNonPairedLog's OnNewMessage calls WriteNonPairedMessage to write a non-paired message.

As I said before, the paired, non_paired, and next_paired functions all return STL stream objects. As a matter of fact, paired and next_paired return a COnePairedMessageLog object, and non_paired returns a CNonPairedLog object.

So, when you write:

log.next_paired();
  << "Start receiving from [" << nSatteliteNo
  << "] satellite" << std::endl;

when std::endl is written, the COnePairedMessageLog stream is flushed, which in turn calls OnNewMessage, which calls WritePairedMessage for this message.

And when you write:

log.not_paired()
  << "Satellite [" << nSatelliteNo
  << "] has been checked: no malfunctions." << std::endl;

when std::endl is written, the CNonPairedLog stream is flushed, which in turn calls OnNewMessage, which calls WriteNonPairedMessage for this message.

Internally, when the Isolated Messages Log is rewritten (refreshed), you must make sure you preserve the order of messages (un-paired and non-paired). For the un-paired messages, this is a simple task since each message has an ID. However, it's not so easy with the non-paired messages. Somehow, you must know where each non-paired message is to be written, related to the other un-paired messages. In order to accomplish this, I have made each non-paired message belong to a paired message. For example, the "satellite [5] has been checked" non-paired message will always be written after the paired message with the ID of 6. When refreshing, it will write all un-paired messages with IDs less or equal than 6 and then write the "satellite [5] has been checked" non-paired message, and so on. Having each non-paired message be written after a certain paired message is enough to preserve the order of messages while refreshing.

Note that I'm not saying anything about the full paired messages. This is because once a full paired message has been written to the log (i.e., the start and end of an action), the action will have completed successfully. So, it will be removed from the Isolated Messages Log. Therefore, you don't need to worry about full paired messages.

Testing the Test

I have developed a small test for the Isolated Messages Log (available for download at <www.cuj.com/code>).

I have a class, do_for_each_file_t, which does a certain action for all files in a given folder plus its subfolders. In order to improve its performance, I have allowed executing actions for different files in parallel (assuming that each action operates only on the file to which it is assigned).

On the main thread, I have created two threads. On the first thread, I examine all files in a given folder, checking if a word is found. After a while (five seconds), the expiry thread is started, which starts deleting files. Soon after, the application might crash. This is because it will most likely try to examine an already deleted file.

This simulates a very simple proxy with a cache mechanism. It allows a search function (search through cached files) and expires unaccessed files once in a while. Of course, it has a bug. Since searching is done with multithreading, it should check whether a file exists before trying to read from it!

Checking the Isolated Messages Log and ALL Messages Log!

In Figure 1, line 3, "Expiring files started" gives you an indication that the bug is related somehow to the expiration of files. Lines 11, 12, and 14 show that one of the files 1[1].swf, 24909;type=t;uniq=58381[1].htm, or 31[1].gif caused the crash.

By examining Figure 2 (which contains the last lines of an ALL Messages Log), you can see the context of the crash, which shows that most of the files have been processed correctly. The last line shows that the file that most likely caused the crash is 31[1].gif. This should give you an indication that the bug appeared while expiration and searching were executed in parallel. This will lead you, when searching, to first check if the file exists before reading from it.

Notes:

When running this test, compile in release mode -- otherwise the application might not crash at all! Make sure you run the test with at least 5,000 files, and the files should usually be larger than 10K. I suggest you copy and paste some of your "Temporary Internet Files" (from the IE cache).
If you decide to run the test, make sure you don't mistakenly erase important data!

Possible Improvements

A few improvements can be made to CPairedMessagesLog. For instance, you can argue that over time you might end up keeping a lot of messages (both un-paired messages and non-paired messages) internally, and that you should have some kind of expiring mechanism. Well, you are right. But, I chose not to implement this, since the CPairedMessages log is mainly for testing. For this reason, there won't be too many messages internally, since there will be very few un-paired messages at a given time. Note that having too many un-paired messages at a given time only means that there are a lot of bugs in the tested code.

Implementing an expiring mechanism is not an easy task. You can do it by having a maximum number of un-paired messages kept internally (say, 1,000 un-paired messages). If you reach this limit, you should remove half of the un-paired messages and log this information somewhere. You can choose to remove the oldest messages, but you have to keep an extra property for each paired message: the time when the start of the action has been written. You also should do the same for non-paired messages.

Another improvement would be to have an additional template parameter for the internal log classes. Instead of using std::basic_ofstream logs, you could use another log class. For instance, instead of writing to a file, you might want to write to a window.

And last but not least, you might want each message to be preceded by the time it's been written. This is useful only for the ALL Messages Log file. To do this, you should modify the WritePairedMessage/WriteNonPairedMessage functions.

I recommend you implement these improvements only if necessary.

Conclusion

The Isolated Messages Log is a valuable tool in the programmer's toolkit. I hope you will enjoy it as much as I have.

About the Author

John Torjo has loved C++ since he first used it. He has experience in large projects, using STL/templates. Currently he's a freelancer and can be reached at john.torjo@peoplefirstsoft.com.