Features


Using C++ with Word97 COM Interfaces

Jim Langseth

Imagine packing all of MS Word 97 into a C++ class. You can do it with a little knowledge of COM.


Introduction

While working on a recent project, I added a capability to my application to generate a report file. My first implementation simply generated a text file that the user saved or printed. For some time I considered the possibility that I could harness the text formatting power of Microsoft Word, which I had previously installed on the workstation as part of Office97. After all, COM was designed to provide just this sort of capability.

Since my application was based on MFC, I needed to find a way to use Word through C++ rather than the more common method of using Visual Basic for Applications, for which much documentation exists. Not surprisingly, using C++ is a bit trickier.

After finding little or no documentation or examples, I began piecing together the code to interface Word through COM. I had not programmed COM prior to this project, so it also seemed a good introduction.

Before too long, I found the proper formula for programmatically directing Word to format a document. (I had to install the Office Service Release 1 before I had any success.) However, due to the complexity and volume of the COM interfaces provided by Word (and the rest of Microsoft Office), I decided to define a class to wrap the needed functionality into a simple and usable package.

While Word possesses a huge assortment of capabilities, I was interested in a simple stream-like design that would allow me to create a document, append text (calling an API similar to C's printf or Pascal's writeln), and set characteristics such as font size, underlining, boldness, and italics. The finished document could be printed or saved to a file.

This article describes the wrapper class I developed and shows how easy it is to use. You can use this class without knowing how to do COM programming. To understand how this class works, however, you'll need to be familiar with COM concepts. Interested readers may wish to consult [1].

Using Word with C++

CWordStream is the name of the class I designed to hide the complexity of the Word interfaces (see WordStream.hpp, Listing 1). If you examine the first section of the source file (Figure 1), you will see the #import statements that import the type libraries describing the following interfaces:

C:\Program Files\Microsoft Office\Office\MSO97.DLL
C:\Program Files\Common Files\Microsoft Shared\VBA\VBEEXT1.OLB
C:\WINNT\system32\VEN2232.OLB
C:\Program Files\Microsoft Office\Office\MSWORD8.OLB

When the compiler detects these #import statements, it generates header files based on the type library information. These header files can be seen in the build directory after compilation. Each file imported causes two header files to be generated, a .tlh and a .tli. The .tli header file contains wrapper definitions for various interfaces found in the type library. The .tlh file contains object and class definitions as well as any required enumerations [2].

I found that there are approximately 77,000 lines of code included in these generated header files. Needless to say, you could probably make a career of learning all the aspects of Word's COM interfaces. Luckily, I was able to pick out the handful of items required to complete my document generator.

The no_namespace attribute is used to disable the namespace definition typically found in the type library. Removing this attribute will result in a failed compilation. The rename attribute is used to rename identifiers defined in the type library to avoid conflicts. Finally, the application code uses the Word namespace to make use of definitions from MSWORD8.OLB.

You can browse the header files to get a feel for what definitions are provided. For the discussion of my simple report generator I will focus on the following:

Objects

  • Document — represents a Word document object
  • Interfaces

  • _Document — basic document interface
  • Range — represents a range of contiguous text in the document
  • _ParagraphFormat — similar to Range
  • _Font — represents a text font
  • Pointers to Interfaces

  • RangePtr — pointer to a Range interface
  • _ParagraphFormatPtr — pointer to a _ParagraphFormat interface
  • _FontPtr — pointer to a _Font interface
  • With the exception of _Document, all the listed interfaces are accessed via pointers to interfaces (e.g. RangePtr to use the Range interface). These pointers are defined in the type libraries using the _COM_SMARTPTR_TYPEDEF macro (defined in comdef.h). This macro defines the pointer type using a template called com_ptr_t. This so-called smart pointer provides simple access to interfaces by handling some of the more tedious details (such as calling COM's obligatory AddRef, Release, and QueryInterface).

    Besides a no-argument constructor, CWordStream offers a small set of services for adding text, and setting text attributes such as font size, bold, italics, underline, and justification (see Listing 1). Prior to adding text to the document, simply set the attributes using the constant values defined in the header file.

    Example

    The following example generates a Word document, EXAMPLE.DOC, containing two lines of text. The first line uses single underlining, is bold, centered, and does not use italics. The second line is left-justified, not bold, not italic, and not underlined.

    CWordStream w;
    w.SetUnderline(WSTR_UL_SINGLE);
    w.SetBold(TRUE);
    w.SetJustification(WSTR_JU_CENTER);
    w.SetItalic(FALSE);
    w.AddText(
     "Single underline, bold, centered, no italics\n\n");
         
    w.SetUnderline( WSTR_UL_NONE);
    w.SetBold(FALSE);
    w.SetJustification(WSTR_JU_LEFT);
    w.AddText("Left justified, not bold, not italic\n");
         
    w.SaveAs("c:\\example.doc");
    

    Implementation Details

    The CWordStream constructor creates a Document object but saves a COM IUnknown interface for future reference. When needed, the IUnknown interface is used to fetch a _Document interface (in function AddText, for example). An alternative would be to store the _Document interface rather than IUnknown. I opted for the former so that the code using CWordStream does not need to know the definition for _Document.

    The class members that set text attributes (e.g. SetUnderLine) simply set member variables that are used in subsequent calls to AddText.

    CWordStream uses the Range interface to set the italic, bold, and underline attributes of text added using AddText. Two Range interfaces are used in the AddText routine. Here are the basic steps employed by the routine:

    1. Get a _Document interface.

    2. Get a Range interface representing the entire document (all text inserted so far).

    3. Insert the text after the range.

    4. Get a Range interface representing the text just inserted.

    5. Set Range attributes (italics, boldness, underlining, font size).

    6. Set text alignment using _ParagraphFormat.

    The only tricky part here is obtaining the Range interface representing the text that has just been inserted. The _Document interface provides a Range function that enables you to specify a beginning and ending offset. CWordStream must keep track of these offsets in order to get the necessary Range.

    Notice that in function CWordStream::PrintOut a single parameter is passed to _Document::PrintOut. If you refer to the generated header files, you will see many more parameters available. In fact, these parameters appear to embody most (if not all) of the options available in Word's Print dialog.

    If you have the Word application open while you execute the example code, you can actually see the documents created and the text being inserted. While this is not necessarily desirable behavior, there may be ways to circumvent it somewhere hidden in the type libraries.

    Conclusion

    Sifting through the type library information can be quite tedious, but the simple capabilities of CWordStream serve as a tribute to COM and Microsoft's dedication to supporting it through its applications.

    I can only imagine what gems of capabilities are hidden away in the .tlh files. What about the other Office elements — Excel, PowerPoint, and Access? This programming approach, while not lacking in complexity, offers a great deal of opportunity for developers to enhance their products.

    Notes and References

    [1] Gregory Brill. "A Gentle Introduction to COM," CUJ, January 1998.

    [2] It is worth noting that when these header files are compiled, several warnings are issued by my compiler (MSVC++ 5.0 under NT and 95). The code works despite these warnings, so I did not endeavor to eliminate them — probably a fruitless task, since the headers are generated from the type libraries.

    Jim Langseth graduated from NDSU in 1988 with a BSCS. He currently is employed through his free-lance consulting business in Minneapolis and specializes in C++ development. Interests include language parsing, software test, and tool development.