The Component Object Model

The foundation for OLE services

Sara Williams and Charlie Kindel

Sara is a technical evangelist in the developer relations group at Microsoft. She can be reached at saraw@microsoft.com. Charlie is a program manager, software-design engineer, and technical evangelist in the developer relations group at Microsoft. He can be reached at ckindel@microsoft.com.

The Component Object Model (COM) is a component-software architecture designed by Microsoft that allows applications and systems to be built from components supplied by different software vendors. COM is the underlying architecture that forms the foundation for higher-level software services, like those provided by OLE; see Figure 1. OLE services span various aspects of component software, including compound documents, controls, interapplication programmability, data transfer, storage, naming, and other software interactions.

These services provide distinctly different functionality to the user. However, all OLE services share a fundamental requirement for a mechanism that allows binary software components (supplied by different software vendors) to connect to, and communicate with, each other in a well-defined manner. This mechanism is supplied by COM, a component-software architecture that:

Defines a binary standard for component interoperability.
Is programming-language independent.
Is provided on multiple platforms (Windows, Windows NT, Macintosh, UNIX).
Provides for robust evolution of component-based applications and systems.
Is extensible.

In addition, COM provides mechanisms for:

Communications between components, even across process and network boundaries.
Error and status reporting.
Dynamic loading of components.

It is important to note that COM is a general architecture for component software. While Microsoft is applying COM to address-specific areas like those shown in Figure 1, any developer can take advantage of the structure and foundation that COM provides.

How does COM enable interoperability? What makes it such a useful and unifying model? To address these questions, it is helpful to examine the basic COM design principles and architectural concepts. In doing so, you will see the specific problems that COM was designed to solve, and how COM provides solutions for these problems. After this, turn to the article, "Application Integration with OLE," by Kraig Brockschmidt in this issue, to see how OLE provides higher-level services on top of the COM foundation. For an example implementation using COM and OLE, see the article, "Implementing Interoperable Objects," by Ray Valdés.

The Component-Software Problem

The fundamental question COM addresses is: How can a system be designed such that binary software components from different vendors, written in different parts of the world, at different times, are guaranteed to interoperate? To design such a system, four specific problems must be solved:

Basic interoperability. How can developers create their own unique components, yet be assured that these components will interoperate with other components built by different developers?

Versioning. How can one system component be upgraded without upgrading all the others?

Language independence. How can components written in different languages interoperate?

Transparent cross-process interoperability. How can developers write components to run in-process or cross-process (and eventually cross-network) using a single programming model?

These problems need to be solved without sacrificing performance. Achieving cross-process and cross-network transparency must be accomplished without adding undue system overhead to components interacting within the same address space. In-process components must be scalable down to small, lightweight pieces of software, equivalent in scope to C++ classes or GUI controls.

COM Fundamentals

The design of COM rests on fundamental concepts that:

Provide a binary standard for function calling between components.
Define groups of related functions into strongly typed interfaces and allow developers to define new interfaces.
Define a base interface that provides a way for components to dynamically discover the interfaces implemented by other components and tracks component instantiation by way of reference counting.
Define a mechanism to uniquely identify components and interfaces.
Provide a run-time library to establish and coordinate component interactions.

Binary Standard

To implement a binary standard for component invocations, COM defines a standard way to lay out (for each of several platforms) virtual function tables (known as "vtables") in memory, and a standard way to call a function in a vtable. Thus, any language that can call functions through double-pointer indirection (C, C++, Smalltalk, Ada, Basic, and many others) can be used to write components that can interoperate with other components written in any language that conforms to COM's binary standard.

An important distinction is made between objects and components. The word "object" indicates something different to everyone. In COM, an object is some piece of compiled code that provides some service to the rest of the system. To avoid confusion, a COM object here is referred to as a "Component Object," or simply a "component." This avoids confusing COM objects with source-code OOP objects, such as those used in C++ programs.

Interfaces

In COM, applications interact with each other and with the system through collections of functions (or methods) called "interfaces." Note that all OLE services are simply COM interfaces. A COM interface is a strongly typed contract between software components to provide a small, but useful, set of semantically related operations. An interface is the definition of an expected behavior and expected responsibilities. OLE's drag-and-drop support is a good example of COM interface usage. All the functionality that a component must implement to be a drop target is collected into the IDropTarget interface. All the drag source functionality is in the IDragSource interface. Interface names begin with "I." OLE defines a number of interfaces for compound document interactions--these usually start with "IOle." Any developer can design custom interfaces to take advantage of COM to implement specific types of component integration and communication. Incidentally, a pointer to a Component Object is really a pointer to one of the interfaces that the Component Object implements. This means that you can only use a Component-Object pointer to call a method and not to modify data. Example 1 shows an interface definition for a simple phone-directory service, ILookup, which has two methods, LookupByName and LookupByNumber.

All Component Objects support a base interface called "IUnknown," along with any combination of other interfaces, depending on what functionality a Component Object chooses to expose. Unlike C++ objects, Component Objects always access other component objects through interface pointers. A Component Object can never access another component object's data. Only an object's interfaces are exposed to other objects; see Figure 2. This is a primary architectural feature of the Component Object Model. It allows COM to completely preserve encapsulation of data and processing, a fundamental requirement of a true component software standard. It also allows for transparent remoting (cross-process or cross-network calling), since all component access is through well-defined interface methods that can exist in a proxy object that forwards the request and vectors back the response.

Interface Attributes

An interface is a contractual way for a Component Object to expose its services. The key aspects of this design are:

An interface is a type, not a class. While a class can be instantiated to form a Component Object, an interface cannot be instantiated by itself because it carries no implementation. A Component Object must implement that interface and that Component Object must be instantiated for there to be an interface. Furthermore, different Component Object classes may implement an interface differently, so long as the behavior conforms to the interface definition (such as two objects that implement a hypothetical IStack where one uses an array and the other a linked list). Thus, the basic OO principle of polymorphism fully applies to Component Objects.

An interface is not a Component Object. An interface is just a related group of functions and is the mechanism through which clients and Component Objects communicate. The Component Object can be implemented in any language with any internal state representation, so long as it can provide pointers to the interfaces it implements.

Clients only interact with pointers to interfaces, not with pointers to objects. When a client has access to a Component Object, it actually has nothing more than a pointer through which it can access the functions in the interface--an interface pointer. This pointer is opaque. It hides all aspects of internal implementation. Your code cannot "see" the Component Object's data--as in C++ programs, in which a client can directly access an object's data by way of an object pointer. In COM, the client can only call methods of the interface to which it has a pointer. This encapsulation allows COM to provide the efficient binary standard that enables local/remote transparency.

Component Objects can implement multiple interfaces. A Component Object can--and typically does--implement more than one interface. That is, the class has more than one set of services to provide. For example, a class might support the ability to exchange data with clients, as well as the ability to save its persistent state information (the data it would need to reload to return to its current state) into a file at the client's request. Each of these abilities is expressed through a different interface (IDataObject and IPersistFile), so the Component Object implements two interfaces.

Interfaces are strongly typed. Every interface has its own interface identifier (known as a GUID), which eliminates any chance of collision that might occur with human-readable names. To create a new interface, the developer also must create an identifier for that interface. In using an interface, the developer must use the interface identifier to request a pointer to the interface. This explicit identification improves robustness by eliminating naming conflicts that would otherwise result in run-time failure.

Interfaces are immutable. They are never versioned, which means that version conflicts between new and old components are avoided. A new version of an interface (created by adding more functions or changing semantics) is an entirely new interface and is assigned a new, unique identifier. Therefore, a new interface does not conflict with an old interface, even if only the name has changed.

Figure 3(a) shows a diagram of a Component Object that supports three interfaces--A, B, and C. By convention, a standard pictorial representation is used for objects and their interfaces in which an interface is represented as a "plug-in jack." Figures 3(b) and 3(c) show how interfaces allow for both client/server and peer-to-peer relationships between components.

Interface Benefits

The unique use of interfaces in COM provides a number of benefits:

Application functionality can evolve over time. As you will see, IUnkown's QueryInterface method is used both to determine (at run time) which interfaces an object supports, and to request a pointer to a supported interface. When a component is upgraded to support a new interface, it will return a pointer to that interface (instead of NULL, as it did before it supported the interface) the next time its QueryInterface is called. Because this negotiation is done at run time, other system components do not have to be altered to be able to take advantage of the upgraded component's newly supported interface. Revising an object by adding new functionality will, therefore, not require any recompilation on the part of existing clients. By definition, COM interfaces are immutable, which solves the versioning problem and guarantees backward compatibility across upgrades. This guarantee is a fundamental requirement for fostering a commercial component-software market. By comparison, other proposed system object models generally allow developers to change existing interfaces, which ultimately leads to versioning problems as components are upgraded. Although other approaches seem to handle versioning, they don't really work. If version checking is done only at object-creation time, for example, subsequent uses of an instantiated object can fail because the object is of the right type, but the wrong version (and per-call version checking is impractical because of high overhead).

Object interaction is fast and simple. Once a client establishes a connection to an in-process object, calls to that object's services (interface methods) are simply indirect functions calls through two memory pointers. As a result, the performance overhead of interacting with an in-process COM object (an object that is in the same address space as the calling code) is negligible. Calls between COM components in the same process are only a handful of processor instructions slower than a standard direct function call, and no slower than a compile-time-bound C++ object invocation. Interfaces are efficient even for cross-process objects, because the cost of negotiating capabilities at run time is minimized by negotiating interfaces not individual functions (by using QueryInterface).

Interfaces can be reused. Design experience suggests that many sets of operations are useful across a broad range of components (for example, many components require a set of functions to read and write byte streams). This facilitates reuse of both code and of design. A programmer must learn an interface only once, and can apply that interface to many different components. For example, IDataObject is the sole interface used to move data between objects. Regardless of how the user requests that data be moved (cut/copy/paste, drag-and-drop), IDataObject is always used for the data transfer.

Local and remote calls are indistinguishable to the client. The binary standard allows COM to intercept an interface call to an object and to make a remote procedure call instead, to an object in another process or on another machine. From the caller's point of view, these calls are the same. Of course, a remote procedure call has more overhead, but no special code is necessary in the client to differentiate an in-process object from out-of-process objects. All objects are available to clients in a uniform, transparent fashion. Microsoft will later provide a distributed version of COM that requires no modification to existing components in order to gain distributed capabilities. Programmers can be isolated dealing with networking issues, and components shipped today will operate in a distributed fashion when this future version of COM is released.

Component Objects are programming-language independent. Any programming language that can create structures of pointers and explicitly or implicitly call functions through pointers, can create and use Component Objects. Component Objects can be implemented in a number of different programming languages and used from clients that are written using completely different programming languages. Again, this is because COM (unlike an object-oriented programming language) represents a binary-object standard, not a source-code standard.

The IUnknown Interface

COM defines one special interface, IUnknown, to implement some essential functionality. All Component Objects are required to implement the IUnknown interface, and conveniently, all other COM and OLE interfaces derive from IUnknown. IUnknown has three methods: QueryInterface, AddRef, and Release; see Example 2. Since all interfaces derive from IUnknown, QueryInterface, AddRef, and Release can be called using any interface pointer.

AddRef and Release are simple reference-counting methods. An interface's AddRef is called when another Component Object makes a copy of a pointer to that interface. An interface's Release method is called when the other component no longer requires use of that interface. While the Component Object's reference count is nonzero, it must remain in memory. When the reference count becomes zero, the Component Object can safely unload itself, because no other components hold references to it.

QueryInterface is the mechanism that allows clients to dynamically discover (at run time) whether an interface is supported by a Component Object. At the same time, it is the mechanism that a client uses to get an interface pointer from a Component Object. When an application wants to use some function of a Component Object, it calls that object's QueryInterface, requesting a pointer to the interface that implements the desired function. If the Component Object supports that interface, it will return the appropriate interface pointer and a success code. If the Component Object doesn't support the requested interface, then it will return an error value. The application will then examine the return code. If successful, it will use the interface pointer to access the desired method. If the QueryInterface fails, the application will take some other action, letting the user know that the desired functionality is not available.

Example 3 shows a call to QueryInterface on the component Phonebook. The code is asking this component, "Do you support the ILookup interface?" If the call returns successfully, then the component supports the desired interface and a pointer can be used to call methods contained in that interface (in this case, either LookupByName or LookupByNumber). Note that AddRef() is not explicitly called in this case because QueryInterface() increments the reference count before returning the interface pointer.

Identifying Interfaces

COM uses Globally Unique Identifiers (GUIDs) to identify every interface and every Component Object class. GUIDs are equivalent functionally to Universally Unique Identifiers (UUIDs), as defined in the Open Software Foundation's Distributed Computing Environment (OSF DCE). GUIDs are 128-bit integers that are guaranteed to be unique in the world across space and time. Human-readable names are assigned only for convenience and are locally scoped. This helps ensure that COM components do not accidentally connect to the "wrong" component, server, or try to use the "wrong" interface, even in networks with millions of Component Objects. GUIDs are embedded in the component binary itself, and are used by COM dynamically at bind time to ensure no false connections are made between components.

CLSIDs are GUIDs that refer to Component Object classes, and IIDs are GUIDs that refer to interfaces. Microsoft supplies a tool (uuidgen) that automatically generates GUIDs. Additionally, the CoCreateGuid function is part of the COM API. Thus, you can create your own GUIDs when you develop Component Object classes and custom interfaces. COM header files provide macros that allow you to define a more readable name to your GUIDs. Example 4 shows two GUIDs. CLSID_PHONEBOOK is a Component Object class that gives users lookup access to a phone book. IID_ILOOKUP is a custom interface implemented by the PhoneBook class that accesses the phone book's database.

Component Object Library

The Component Object Library is a system component that provides the mechanics of COM. This library provides the ability to make IUnknown calls across processes. It also encapsulates all the "legwork" associated with launching components and establishing connections between components, so that both clients and servers are insulated from location differences.

When an application wants to instantiate a Component Object, it passes the CLSID of that Component Object class to the Component Object Library. The library uses that CLSID to look up the associated server code in the registration database. If the server is an executable, COM launches the EXE and waits for it to register its class factory through a call to CoRegisterClassFactory (a class factory is the mechanism in COM used to instantiate new Component Objects). If the associated server code happens to be a DLL, COM loads the DLL and calls the DLL's exported function DllGetClassFactory. COM uses the object's IClassFactory interface to ask the class factory to create an instance of the Component Object, and returns a pointer to the requested interface back to the calling application. The calling application neither knows nor cares where the server application is run. It just uses the returned interface pointer to communicate with the newly created Component Object. The Component Object Library is implemented in COMPOBJ.DLL on Windows and OLE32.DLL on Windows NT and Windows 95.

COM is designed to allow clients to transparently communicate with components, regardless of where those components are running. There is a single programming model for all types of Component Objects--for not only clients of those Component Objects, but also for the servers of those Component Objects. From a client's point of view, all Component Objects are accessed through interface pointers. A pointer must be in-process, and, in fact, any call to an interface function always reaches some piece of in-process code first. If the Component Object is in-process, the call reaches it directly. If the Component Object is out-of-process, then the call first reaches a "proxy" object provided by COM. This proxy generates the appropriate remote procedure call to the other process or the other machine. It can then transparently connect to objects that are in-process, cross-process, or remote.

From a server's point of view, all calls to a Component Object's interface functions are made through a pointer to that interface. Again, a pointer only has context in a single process, and so the caller must always be some piece of in-process code. If the Component Object is in-process, the caller is the client itself. Otherwise, the caller is a "stub" object provided by COM that picks up the remote procedure call from the proxy in the client process and turns it into an interface call to the server Component Object. As far as both clients and servers know, they always communicate directly with some other in-process code; see Figure 4.

The benefits of this local/remote transparency are:

The transparency provides a common solution to problems that are independent of the distance between client and server. For example, connection, function invocation, interface negotiation, feature evolution, and so forth, occur the same for components interoperating in the same process and components interoperating across global networks.
Programmers leverage their learning. New services are simply exposed through new interfaces, and once programmers learn how to deal with interfaces, they already know how to deal with new services that will be created in the future. This is a great improvement over environments where each service is exposed in a completely different fashion. For example, Microsoft is working with other ISVs to extend OLE services. These new services, which will be quite diverse in function, will all be very similar in their implementations because they will simply be sets of COM interfaces.
Systems implementation is centralized. The implementors of COM can focus on making the central process of providing this transparency as efficient and powerful as possible, thus benefiting every piece of code that uses COM.
Interface designers concentrate on design. In designing a suite of interfaces, the designers can spend their time in the essence of the design--the contracts between the parties--without having to think about the underlying communication mechanisms for any interoperability scenario. COM provides those mechanisms for free, including network transparency.

The Problem with Implementation Inheritance

Implementation inheritance--the ability of one component to "subclass" or inherit some of its functionality from another component--is a very useful technology for building applications. Implementation inheritance, however, can create many problems in a distributed, evolving object system.

The problem is that the "contract," or relationship between components in an implementation hierarchy is not clearly defined; it is implicit and ambiguous. When the parent or child component changes its behavior unexpectedly, the behavior of related components may become undefined. This is not a problem when the implementation hierarchy is under the control of a defined group of programmers who can update to components simultaneously. But it is precisely this ability to control and change a set of related components simultaneously that differentiates an application, even a complex application, from a true distributed-object system. So while implementation inheritance can be a very good thing for building applications, it is not appropriate for a system object model that defines an architecture for component software.

In a system built of components provided by a variety of vendors, it is critical that a given component provider be able to revise, update, and distribute (or redistribute) his or her product without breaking existing code in the field which is using the previous revision or revisions of his component. In order to achieve this, it is necessary that the actual interface on the component (including both the actual semantic interface and the expected behavior) used by such clients be crystal clear to both parties. Otherwise, how can the component provider be sure to maintain that interface and thus not break the existing client's? From observation, the problem with implementation inheritance is that it is significantly easier for programmers to be unclear about the actual interface between a base and derived class than it is to be clear. This usually leads implementors of derived classes to require source code to the base classes; in fact, most application-framework development environments that are based on inheritance provide full source code for this exact reason.

The bottom line is that inheritance, while very powerful for managing source code in a project, is not suitable for creating a component-based system where the goal is for components to reuse each other's implementations without knowing any internal structures of the other objects. Inheritance violates the principle of encapsulation, the most important aspect of an object-oriented system.

--S.W. & C.K.

COM Reusability Mechanisms

The key to building reusable components is black-box reuse, which means that the piece of code attempting to reuse another component knows nothing, and does not need to know anything, about the internal structure or implementation of the component being used. In other words, the code attempting to reuse a component depends upon the behavior of the component and not the exact implementation--implementation inheritance does not achieve black-box reuse.

To achieve black-box reusability, COM supports two mechanisms through which one Component Object may reuse another: containment/delegation and aggregation. For convenience, the object being reused is called the "inner object" and the object making use of that inner object is the "outer object."

Containment/delegation. The outer object behaves like an object client to the inner object. The outer object "contains" the inner object and when the outer object wishes to use the services of the inner object the outer object simply delegates implementation to the inner object's interfaces. In other words, the outer object uses the inner object's services to implement some (or possibly all) of its own functionality.

Aggregation. The outer object wishes to expose interfaces from the inner object as if they were implemented on the outer object itself. This is useful when the outer object would always delegate every call to one of its interfaces to the same interface of the inner object. Aggregation is a convenience to allow the outer object to avoid extra implementation overhead in such cases.

These two mechanisms are illustrated in Figure 5. The important part to both these mechanisms is how the outer object appears to its clients. As far as the clients are concerned, both objects implement interfaces A, B, and C. Furthermore, the client treats the outer object as a black box and thus does not care, nor does it need to care, about the internal structure of the outer object--the client only cares about behavior.

Containment is simple to implement for an outer object. The process is like a C++ object that itself contains a C++ string object. The C++ object would use the contained string object to perform certain string functions, even if the outer object is not considered a "string" object in its own right.

Aggregation is almost as simple to implement. The trick here is for COM to preserve the function of QueryInterface for Component-Object clients even as an object exposes another Component-Object's interfaces as its own. The solution is for the inner object to delegate IUnknown calls in its own interfaces, but also allow the outer object to access the inner object's IUnknown functions directly. COM provides specific support for this solution. Both Containment/Delegation and Aggregation provide for reuse of components without violating the OO principle of encapsulation.

--S.W. & C.K.

Figure 1 Component Object Model serves as the foundation for component-software services. Figure 2 Virtual function tables (vtables) are a binary standard for accessing component services. Figure 3 (a) A typical component object that supports three interfaces A, B, and C; (b) interfaces extend toward the clients connected to them; (c) two applications may connect to each other's objects, in which case they extend their interfaces toward each other. Figure 4 Clients always call in-process code; Component Objects are always called by in-process code. COM provides the underlying transparent RPC. Figure 5 (a) Containment of an inner object and delegation to its interfaces; (b) aggregation of an inner object, where the outer object exposes one or more of the inner object's interfaces as its own.

Example 1: C++-style interface definition generated by the MIDL compiler for ILookup, a simple custom interface.

interface ILookup : public IUnknown
{
  public:
  virtual HRESULT __stdcall LookupByName( LPTSTR lpName,WCHAR
                                  **lplpNumber)=0;
  virtual HRESULT __stdcall LookupByNumber( LPTSTR lpNumber,WCHAR
                               **lplpName)=0;
};

Example 2: The IUnknown interface is supported by all Component Objects.

interface IUnknown
{
    virtual    HRESULT  QueryInterface(IID& iid, void** ppvObj) = 0;
    virtual    ULONG    AddRef() = 0;
    virtual    ULONG    Release() = 0;
}

Example 3: Calling QueryInterface() on the component PhoneBook.

LPLOOKUP *pLookup;
char szNumber[64];
HRESULT hRes;

// call QueryInterface on the Component Object PhoneBook, asking for
// a pointer to the Ilookup interface identified by a unique interface ID.
hRes = pPhoneBook->QueryInterface( IID_ILOOKUP, &pLookup);
if( SUCCEEDED( hRes ) )
{
        // use Ilookup interface pointer
    pLookup->LookupByName("Daffy Duck", &szNumber);
        // finished using the IPhoneBook interface pointer
    pLookup->Release();
}
else
{
    // failed to acquire Ilookup interface pointer
}

Example 4: Two GUIDs, one CLSID for a phone-directory class, and an IID for a custom interface that retrieves phone-directory information.

DEFINE_GUID(CLSID_PHONEBOOK, 0xc4910d70, 0xba7d, 0x11cd, 0x94, 0xe8,
0x08, 0x00, 0x17, 0x01, 0xa8, 0xa3);

DEFINE_GUID(IID_ILOOKUP, 0xc4910d71, 0xba7d, 0x11cd, 0x94, 0xe8,
0x08, 0x00, 0x17, 0x01, 0xa8, 0xa3);

Example 5: IDL file for a custom interface, ILookup, used by the PhoneBook project.

[
    object,
    uuid(c4910d71-ba7d-11cd-94e8-08001701a8a3),// GUID for PhoneBook object
    pointer_default(unique)
]
interface ILookUp: IUnknown // ILookUp interface derives from IUnknown
{
    import "unknwn.idl";       // Bring in the supplied IUnkown IDL
    HRESULT LookupByName(      // Define member function LookupByName
             [in] LPSTR lpName,
             [out, string] WCHAR ** lplpNumber);
    HRESULT LookupByNumber(    // Define member function LookupByNumber
            [in] LPSTR lpNumber,
            [out, string] WCHAR ** lplpName);
}