April 2000/Building HTML Documents with C++

Features

Building HTML Documents with C++

Giovanni Bavestrelli

The block structure of C++ can really help you get the block structure of HTML right.

Recently I had to build HTML documents from within a C++ server program. As usual, at the beginning of a project, I think of creating some low-level classes to help me build the foundation for higher levels of abstraction. Building HTML documents using C++ streams, such as std::ostrstream and std::ostringstream, is very simple and very intuitive, but I saw an opportunity for some kind of automation in the handling of opening and closing matching HTML tags.

The similarity between opening and closing HTML tags and opening and closing curly braces in C++ immediately struck me. A very simple and obvious solution came up. Let C++ constructors handle opening tags, and destructors handle closing tags. This way a C++ programmer could forget the tedious work of matching nested HTML tags and leave it to the C++ compiler to handle. What would have been done with careful attention to a free stream of characters could now be done by logically organizing scopes with curly braces and function calls within C++ programs.

All C++ programmers are very familiar with scopes and curly braces, constructors and destructors, and logically arranging them is very intuitive for us. This would also be less error prone. If I forget to close a curly brace, the compiler will tell me. If I forget to close an HTML tag, only the Web client will find out at run time. Common errors like mismatching tags and overlapping tags could be effectively avoided. My simple htmltag class lets me write the following "Hello World" HTML document:
<HTML>
<HEAD>
<TITLE>Simple Title</TITLE>
</HEAD>
<BODY>Hello World</BODY>
</HTML>
with code like this:
void HelloWorld(std::ostream & o)
{
   htmltag HTML(o,"<HTML>\r\n");
   {
      htmltag HEAD(o,"<HEAD>\r\n");
      htmltag TITLE(o,
         "<TITLE>Simple Title");
   }
   htmltag BODY(o,"<BODY>");
   o<<"Hello World";
}
The nice thing about this approach is that you can intermix HTML stream building with any other kind of operation. You can call functions that do part of this streaming and, if you let the htmltag class take care of all opening tags that need closing, you will have given your HTML building code a logical structure that will be difficult to break. The htmltag class is extremely simple. It consists of just the following:

a constructor that takes a reference to a std::ostream and an opening tag (with attributes if necessary) to put to the stream

a private member function that, given the opening tag, builds the closing one

a destructor that puts the closing tag onto the stream when the object goes out of scope.

As I started using these classes, I saw that often adding scopes (with curly braces) was a bit tedious when the opening and closing tags are very near to each other, when there's not much text in between them. So I saw another opportunity to exploit a C++ language feature to do some work for me — expressions. I can nest expressions the way I nest HTML tags, one within the other, and again the C++ compiler will help me get right what a free stream of characters would leave only to my diligence.

So I added a single static function to the htmltag class, named str. It takes the opening tag as a first argument (with attributes if necessary), and a string containing the HTML text between the opening tag and the closing tag as the second argument. It returns a string containing the opening tag concatenated with the HTML text passed as second parameter and the closing tag. The closing tag is automatically detected from the opening one.

Not much help, you might think. But the power of this approach is obvious when you build expressions. You can concatenate parts of HTML text by summing the outputs of this htmltag::str functions. You can also put one block of HTML text, with its opening and closing tags, nested correctly within another block, by passing the output of the htmltag::str function as input to another htmltag::str call (as the second argument). So the HTML document of Figure 1 can be built with the C++ code of Listing 1. Note that the htmltag::str function does not take an std::ostream parameter, as its string output can be streamed directly to the HTML stream. Note also that this function was not designed to be as fast as possible, although you probably won't notice the difference.

That's all! This htmltag class is very simple, very small, and does a very limited job, but it lets you give your HTML building code a logical structure, as far as opening and closing tags are concerned. If you let this class handle all opening tags that need closing (some tags, like <HR> and <BR> must not be closed and must not be fed to my class as opening tags) you can reduce drastically your chance of making mistakes building HTML documents. For the rest, it leaves you total freedom in the way you stream your HTML.

To use the htmltag class effectively, you should take a look at its implementation. It is not difficult, since it's only 50 lines of code. You can quickly see that whatever is before and after the first HTML tag in the string passed as opening tag to my htmltag class is just put to the stream (or string) as it is. You can easily attach an end-of-line ("\r\n") or other text as you like. You can also see that you can put any attribute you want in the string passed as opening tag.

Note that the parsing of the opening tag to build the closing one and the relative error handling is very simple and would have to be improved for production code.

Be careful of one mistake that's easy to make using these classes, as shown in the following code:
std::ostrstream o;
htmltag HTML(o,"<HTML>");
// put other tags on the stream
// extract text from stream and send it
The ostrstream and the htmltag objects are in the same scope, and they will go out of scope together, the ostrstream right after the htmltag object, without giving you a chance to access the ostrstream after the htmltag object's destructor has put the closing </HTML> tag on it. The code above should have been:
std::ostrstream o;
{
        htmltag HTML(o,"<HTML>");
        // put other tags on the stream
}
// extract text from stream and send it
This solution is so simple and obvious that I am sure many other C++ programmers have come up with something very similar, although I haven't seen it yet. At the beginning I had a more ambitious goal, designing classes HtmlHEAD, HtmlBODY, HtmlTABLE, etc., to encapsulate the use of each HTML tag, adding type safety to the way attributes were specified, with some enums and so on. But I soon discovered I was adding more complexity than I was removing. Besides, such an approach would have been less flexible, more intrusive, and less intuitive to use for HTML experts, and would be tied to the HTML version I was using when designing the classes. So I aimed at a more general and simpler solution that works with HTML, DHTML, and even XML, and is not closely tied to the evolution of such languages.

You can download the code from the CUJ website. (See the instructions on page 3.) There's an HtmlTag class that uses the MFC CString, and an htmltag class that uses std::string. (See Listings 2 and 3.) Just pick the one you prefer, and extend it as needed.

Giovanni Bavestrelli lives in Milan and is a software engineer for Techint S.p.A., Castellanza, Italy. He has a degree in Electronic Engineering from the Politecnico di Milano, and writes automation software for Pomini Roll Grinding machines. He has been working in C++ under Windows since 1992, specializing in the development of reusable object-oriented libraries. He can be reached at giovanni.bavestrelli@pomini.it.