Cross-Platform Development


A C++ CGI Framework

Richard B. Lam, Ph.D.

It's handy to have users fill out a form on a web page. Processing that form can be tiresome, however, unless you have some tools to do the dirty work.


The high demand for Web server scripts follows from the popularity explosion of the Internet and World Wide Web (WWW). Most of these scripts are written in Perl using the Common Gateway Interface (CGI) specification. Various Perl libraries are available for authoring custom scripts, but Perl programmers and a Perl interpreter for the web server operating system are required. A few C, Fortran, and even Forth libraries are also available, but they are often not intended for cross-platform use.

This article discusses a C++ framework and helper classes to enable cross-platform authoring of CGI scripts. All source code is written to be portable through the use of the C++ Standard Template Library [1] . Complete source code for the framework is also available.

CGI Specification

The CGI specification [2] describes how a web server interfaces to server-based web programs or scripts. Typically, web scripts are used to process Hypertext Markup Language (HTML) forms [3] and imagemaps. Most web browsers can display HTML forms, and each form can contain a variety of input controls: single and multi-line text entry fields, single and multiple-selection list boxes, checkboxes, radiobuttons, and pushbuttons.

The ability of HTML forms to display controls makes them quite feasible for cross-platform user-interface development, albeit interfaces limited to the kinds of controls supported. While not as flexible as Java or JavaScript, forms-based web programming can be viewed as a type of cross-platform UI development environment.

HTML form documents that interact with a user are easy to generate. Listing 1 shows the HTML code for a simple form, with the result displayed as in Figure 1. Note that the two input controls have "firstname" and "lastname" specified as their respective control names. A user can type text into the two fields on the form and push the button to cause an action to take place. The action associated with this form is specified in the <FORM ... ACTION=""> tag. In this case, the web program example1.exe in subdirectory /scripts on the server k12web will be executed. The server runs the web program, passing data about the form in one of two ways. The older and less flexible way is to specify METHOD="GET" in the FORM tag. Then a string consisting of name-value pairs for each control on the form is passed through environment variables to the script. Today, METHOD="POST" is recommended, which passes the string of data directly to the script through the standard input (stdin in C or cin in C++).

Each CGI program reads the information from cin and parses it to get a collection of name-value pairs. In Listing 1, the string that is received by the example1.exe program is:

firstname=John&lastname=Doe&
    pbnext=Next+%3e%3e

The values for each control are based on what the user typed into the entry fields. Name-value pairs are separated by ampersands, and spaces have been converted to + characters. Special characters (such as >) are converted to hexadecimal values (%3e).

Once the program parses the input string, it carries out whatever processing is required and writes a return document to the standard output (stdout in C or cout in C++). The return document may be another HTML document or any other MIME (Multimedia Internet Mail Extension) type. The return type is specified by preceding all output with a special string containing the MIME type (e.g., "Content-type: text/html").

You can see from the amount of string processing required why most web programs are written in Perl [4] or other languages that are good at string manipulation (e.g., REXX).

A C++ Framework

I had a project with a potential need for many CGI programs that could be quickly generated. I did not have any desire to learn Perl, but cross-platform portability was important. C++ seemed like a good choice, but its string handling and lack of standard collection classes were drawbacks. However, the draft ANSI Standard for C++ includes support for strings and collection classes via the Standard Template Library (STL). Thus, a portable C++ framework was needed.

Why a framework? The primary difference between function libraries and frameworks is who calls whom. The functions in a typical static or dynamic link library are called by another program. A framework is a set of modules that have defined interactions with one another, and that can interact with (i.e., call or send messages to) user-provided functions or objects.

Most existing CGI libraries provide a library of functions to be called. In the case of CGI programs though, we have a predefined application architecture. The data comes either from environment variables or cin. The string data has a fixed format that can be parsed into a set of name-value pairs. Programs use this data to generate new MIME content for the browser to handle. The output of CGI programs is written to cout. The web server then delivers the output data from the CGI program back to the browser that submitted the request.

Thus, CGI programming is a good candidate for a framework architecture. The framework handles all the low-level details, letting the programmer concentrate on the end goal. Using STL allows the framework to be built with portable C++ code, not relying on any specific string or collection class libraries. The disadvantage to this approach is that not all C++ compilers are up to the task of supporting STL, but compiler vendors are addressing this need. In the meantime, I used the STL implementation from ObjectSpace [5] .

Framework Classes

Figure 2 shows the overall architecture of CGIFramework. A template class, cgiTApplication<T>, is used to instantiate a cgiParser object and to create an instance of a user-supplied class defined by the template argument. An object of type cgiEnvironment is also constructed and queried to determine the origin ("GET" or "POST") and size of the data (from the environment variables REQUEST_METHOD and CONTENT_LENGTH). The application object then reads the CGI input data string from the environment or cin into a memory buffer (a strstream object).

The run method of cgiTApplication<T> calls the parse method for the cgiParser object to handle the input data from the strstream object. The parse method carries out the necessary string manipulations to extract the name-value pairs from the input stream. Plus characters are converted to spaces, and hexadecimal representations are converted to their appropriate characters. The name-value pairs are then stored in an associative array, using STL's map container.

An STL map object is a convenient container for this application because access to values can be obtained through direct association with a name (used as a subscript). A map object allows associative array access through a syntax such as:

string value = mapdata[name];

Two typedefs are used to refer to the individual name and value strings as cgiName and cgiValue. The typedef cgiDataList refers to map<cgiName, cgiValue>.

CGIFramework requires one user-supplied class that derives from another framework class, cgiProcessor. This is an abstract class that declares one pure virtual method named process. The user must provide a definition of this method in the derived class. The implementation of this method carries out the user's form-processing work, including the generation of a new HTML or other MIME-type document to return to the browser.

Internally, the framework simply calls the pure virtual method cgiProcessor::process. The C++ virtual function dispatch mechanism then invokes the user's derived class method. The cgiDataList of name-value pairs is passed as an argument to the method so the form data is accessible. Note that the framework hides all of the details regarding the data extraction. It only calls the user's process method once the data is available in a usable form.

If a problem arises inside any framework object, a cgiException is thrown. This exception is caught and the static method cgiProcessor::defprocess is called. This exception handler writes a standard return message to the browser indicating an error occurred in the CGI program.

Figure 2 shows one additional class, cgiHTMLHelper. This class contains methods and iostream manipulators for easily generating standard HTML tags. They are useful in CGI programs where dynamic web pages need to be constructed based on user interactions. For example, a CGI program might generate a new HTML document similar to that in Listing 2.

Inside the process method, the cout stream can be used in conjunction with the iostream manipulators and helper methods of cgiHTMLHelper to generate the document. Listing 3 shows the code that generates all of the tags in Listing 2.

Using CGIFramework

Listing 4 is a complete main program based on CGIFramework. All that is required is the instantiation of a cgiTApplication<T> template object, and a call to that object's run method.

The user class used as the template argument is derived from cgiProcessor. Each new CGI program needs to derive from this class and declare the new class name as the template argument in the main program above. Listing 5 shows the simplest derived class. This code together with the code from Listing 4 constitutes a complete CGI program.

Of course, this program is not very interesting because it just returns a "No response" message to the browser. A more useful program that can help you debug HTML forms is given in Listing 6 (which uses the same main program and header file as the previous listing). Note that this code declares an instance of cgiHTMLHelper which will write its output to cout. The contentType method defaults to "text/html" as the output MIME type. The header and trailer methods wrap the output lines in <html><body> and </body></html> tags.

The for loop uses an STL iterator over the map to access the entire list of name-value pairs sent to the program. Dereferencing the STL map iterator returns the item pointed to by the iterator as an STL pair of strings. Access to the names and values are via the first and second member variables of pair.

Calling the program in Listing 6 (using the form example from Figure 1) generates the HTML code shown in Listing 7. The resulting form appears in Figure 3.

Dynamic Form Generation

Let's write a more interesting CGI program which processes the form in Listing 1, but uses the name information to display a dynamically generated form. The code in Listing 8 gets the form user's name and puts it into another form which is generated to ask for payment information. Several new iostream manipulators and methods of cgiHTMLHelper are demonstrated here. htmlrule writes the tag for a horizontal rule line, and htmlformpost generates the <FORM> tag with "METHOD=POST". The listbox/endlistbox methods create a listbox control on the generated form with the contents of the listoptions. The entryfield method creates text entryfields on the form, with empty strings as default values. Optional arguments can be used (as in the expiration date field) to specify a default value for the field and a new size (in characters). Figure 4 shows the browser display of the resulting form generated by the CGI program.

This new form can call another CGI program to generate yet another form, ad infinitum. Note that all URL requests by a browser are stateless. However, each form can maintain the state of a sequence of user interactions using hidden name-value pairs. A series of questions and answers from the user can then cause some action by the web server based on all previous information from that user (without maintaining any state information at the server). Thus, a wizard, or series, of HTML forms can be generated to process an order, send a fax, complete an on-line survey, administer a test, etc.

Summary

This article presents a cross-platform C++ framework for writing CGI programs. The Standard Template Library was used to ensure portability of the framework to other platforms. Several example CGI programs built with the framework for processing HTML forms are discussed. The complete source code for the framework may be downloaded free of charge [6] .

References

1. M. Nelson, C++ Programmer's Guide to the Standard Template Library, IDG Books Worldwide, Inc., 1995.

2. "The Common Gateway Interface," http://hoohoo.ncsa.uiuc.edu/cgi/primer.html.

3. A. Davison, "Coding with HTML Forms," Dr. Dobb's Journal, June, 1995, p 70.

4. S. Brenner, "CGI Form Handling in Perl," http://www.bio.cam.ac.uk/web/form.html.

5. "STL<Toolkit>," ObjectSpace, http://www.objectspace.com.

6. R.B. Lam, "CGIFramework," available at ftp@mfi.com and on the code disk (see p. 3 for details).

Richard B. Lam is a member of the Research Staff at the IBM T. J. Watson Research Center, where he manages the Learning Technologies group.