Hector: Distributed Objects in Python

Dr. Dobb's Sourcebook January/February 1997

A framework for distributed environments

By David Arnold, Andy Bond, and Martin Chilvers

The authors are researchers at the CRC for Distributed Systems Technology, University of Queensland, Australia. They can be contacted at hector@dstc.edu.au.

Sidebar: About Python

Object-oriented design and programming has become a broadly accepted approach to computing. It is particularly beneficial in distributed environments where the encapsulation and subsequent independence of objects allows distribution of an application. However, programming distributed-object systems is difficult, complicated by the possibilities of heterogeneous machine architectures, physically distant locations, and independent component failures.

A number of distributed-processing environments have been developed that attempt to hide these problems from programmers, reducing the complexity of their task. Prominent among these are APM's ANSAware, the OSF's DCE, the OMG's CORBA, and Microsoft's DCOM. In addition, the ISO and ITU-T have invested significant effort in an international standard, known as the "Reference Model for Open Distributed Processing" (RM-ODP), addressing problems in this area of work. ANSAware is an early and influential implementation of some concepts from RM-ODP. Although now largely obsolete, it provides facilities only recently being matched by the more popular environments such as CORBA. These facilities include federated Traders, rudimentary type management, and a notification service.

The OSF's Distributed Computing Environment (DCE) provides a series of services for distributed programming, but does not support an explicitly object-oriented programming model. This has limited its acceptance, despite the fact that parts of its design and implementation are superior.

The Common Object Request Broker Architecture (CORBA) is a specification from the OMG now being broadly adopted, and despite its immaturity in some areas, its adoption of a fully object-oriented architecture seems set to ensure its dominance over DCE.

Finally, Microsoft's Distributed Component Object Model (DCOM) is an extension of their existing Object Linking and Embedding (now ActiveX) technology. DCOM provides a minimal environment with few services for programmers. Available for both Microsoft and third-party platforms, the DCOM specification is now managed by an industry consortium called the Active Group (http://www.activegroup.org/).

Unfortunately, none of these systems implement all the features described by RM-ODP: Each is basically a closed environment, allowing interoperation only with other objects from the same environment.

After examining these environments (with particular reference to RM-ODP), we set about implementation of an open distributed environment.

The result is Hector, a framework that sits above other distributed environments, providing open negotiation and interoperability of communication protocols, high-level description of component services and their requirements, a rich set of support services for objects, and an interaction framework that allows the description of workflow-like interactions between autonomous objects. The prototype is implemented almost entirely in Python, taking advantage of the language's powerful syntax and wide support for other services.

The Hector Environment

Programming distributed systems using traditional tools is complex and error prone. In an effort to reduce this difficulty, a range of transparencies are normally introduced. Transparencies hide complexity and allow the programmer to manipulate the abstraction that they implement.

An important attribute of transparencies is that they are removable -- a programmer can elect to manipulate the underlying functionality of transparencies directly if desired.

Hector provides application objects with a consistent environment, regardless of their physical location, through a series of transparencies. Designed with the goal of supporting a dynamic, global system of distributed objects, it embraces diversity through extensibility. While maintaining transparent use of object services, we specifically support:

Multiple parties in high-level interaction bindings.
Multiple object-implementation languages.
Multiple interaction models.
Multiple transport protocols.

Binding. Communication between distributed objects is traditionally modeled as remote method invocation in one of two flavors -- RPC and messaging -- depending on whether the method has a return value. The methods that a remote object exports are collectively called its interface. Typically, having selected a remote object (normally identified by some opaque address type), the programmer can invoke a method at the remote interface.

RM-ODP, and subsequently Hector, extend this notion with the formal concept of "bindings" -- a specification of a set of roles, and the communication patterns between those roles needed to perform some higher-level function. This allows complex, enterprise-level tasks to be described as binding types, similar to work-flow descriptions.

Hector provides an abstract Binding class that is specialized by application programmers. The kernel can then instantiate these derived classes. The instantiation process establishes the communication channels described by the binding type.

For example, the requirements of a distributed meeting scheduler can be specified at programming time as a binding type encompassing individuals' calendars, room and equipment reservation services, notification to a news service, and the scheduler component itself. The construction of the required peer-to-peer communication channels at run time is left to the infrastructure rather than being explicitly programmed.

Services. Objects require services, and in a distributed-environment, some services must be available via the network rather than from the local machine. Some of those services can be deemed fundamental, for reasons of bootstrapping, dependency, or as a means of ensuring a richer programming environment.

Hector guarantees to provide a fixed set of fundamental services at every location. The actual implementation of the services can vary, but the service interface provided to objects is standardized and guaranteed to be available.

We have selected eight fundamental services, with a particular emphasis on providing a rich environment for objects. These services are:

Notification. Generation of event information and subscription to receive notification of events using expressions over the event information. We are using the Elvin notification service being developed as a parallel activity at the DSTC.
Authentication of object identity. These credentials are used for authorization checks during binding.
Banking. Objects need to pay for use of services. In order to maintain their "credit" across invocations, a banking service is provided. Objects may open accounts and access them using identifying tokens.
Naming. This service provides a mechanism to associate information with a string name. This interface is modeled on the X/Open Federated Naming specification (XFN), but modified to support an object-oriented interface.
Type Management. This is a repository of type descriptions. A single type may have descriptions in many languages, from English commentary, through formal specifications, to executable specifications.
Trading. When an object requires the use of a service, it can locate instances of that service type in many ways. The Trading service provides a content-based (or "Yellow Pages") style of query, parameterized by service type and quality attributes. Service providers advertise their service, describing its attributes. Service consumers query the Trader, which returns a list of possible providers. The Trading service interface is modeled on the joint ISO ODP/OMG CORBA Trader specification.
Policy. Objects require configuration information. In a distributed system, traditional methods (such as a configuration file, environment variables, or X Window System resources) might not be available on all machines. The Hector Policy service provides a mechanism for retrieving an arbitrary string of information using a wild card key, similar to X resources. The resolution of this key is dependent upon a configurable hierarchy of policy contexts. If no match can be found within the initial context, resolution proceeds up a chain of nested contexts. The specification of the context ordering is configurable on a host, network, or enterprise basis.
Relationships. Objects frequently need to maintain records of relationships with other objects. This service provides a strongly typed repository of relationship instances that can be queried by objects.

Service implementations are bound to the standard interfaces during the initialization of the environment, as determined by a local-configuration file. This file can be set on a per-host, per-network, or other basis. An example of this configuration is to select an existing enterprise name service for use in Hector, such as the DCE CDS/GDS or the Solaris NIS+ system.

The Hector Architecture

Hector is structured as four layered components representing decreasing levels of abstraction. These layers are the Object Layer, Language Layer, Encapsulation (or Kernel) Layer, and Communications Layer, as shown in Figure 1.

During execution, the three lower layers combine to form a single entity, called the "Capsule," that provides a support environment for object execution. It encapsulates and insulates objects from the underlying machine and operating-system architectures.

The Capsule is an executable program, normally started by privileged users. Multiple objects can exist within a Capsule, each being allocated an initial thread (which may spawn more). Each object is associated with a security principal that controls its access to the resources of the Capsule and its host. Multiple Capsules can be run on a single host; the choice of whether to instantiate an object within an existing Capsule or to create a new one is a matter of performance tuning; see Figure 2.

Components of the Language and Communications Layers can be dynamically loaded at run time. Specific communication and language modules are typically configured when starting a Capsule, but they can be added or removed by the Capsule's owner at any time.

Object Layer. The Object Layer contains both application programs (objects) and services loaded by the Capsule during its initialization. Objects execute as separate entities within the Capsule process, each with at least one private thread.

The interface between the Object and Language Layers represents the complete set of facilities available to an object. (While it is currently possible for an object to directly access operating-system functions, this ability will be removed once replacement services are available and suitable security services are implemented. The Service class, derived from the Object class, will retain its access to the host's facilities and can be used to make them available to other objects.)

Objects are created from templates -- an object type that can be interpreted by the appropriate Language Layer to create instances. Objects are instantiated by the Capsule as a result of operations on the Capsule management interface.

Language Layer. The Hector kernel is designed to support objects written in different languages. A module that implements the mapping between the Capsule Kernel Layer (written in Python) and objects written in a particular language is called a "language binding." The set of language bindings loaded into a Capsule forms the Language Layer.

Each language binding performs three basic functions, the first of which is mandatory:

Encapsulate the Kernel Layer classes, making them accessible from the object language.
Enable instantiation of objects using language-specific templates.
Enable execution of the instantiated objects.

Wrapper Classes. Each language binding consists of a set of classes (see Figure 3) encapsulating the facilities provided by the Capsule kernel and the fundamental services. These classes are:

Message. These types are used or specialized by the object programmer to describe the basic units needed for communication between the application objects. The base Message class of a language binding must also map the native language data types onto those supported by the kernel. This issue has already been addressed (to different extents) by DCE, CORBA, and ILU; we intend to borrow heavily from their work in this area.
Interaction. These types constrain patterns of Message exchange between Interfaces. Standard Interactions (including RPC, streams, and multicast) are predefined classes, while other complex Message patterns can be specified by programmers.
Interface. These types are used or specialized by the object programmer to constrain the messages received and sent by an object. Hector Interfaces must explicitly describe all communication external to the object. Interfaces use Message types to specify the content of communicated data, while the ordering, permitted concurrency, and other qualities of service related to the communication are directly constrained by Interface types.
Role. Role is a subtype of Interface used in the specification of Interaction and Binding. It describes the required characteristics of Interfaces capable of fulfilling the nominated role in the Interaction or Binding. During instantiation of Bindings, Interface instances are required to satisfy the Role types.
Binding. Expressed in terms of Messages, Interactions, and Interfaces, Bindings constrain the connections between the objects used by an application. The use of Message, Interface, and Binding types allows a programmer to ensure the robustness of an object while allowing dynamic use of different service implementations and communication protocols. The kernel ensures that the abstract communication specified by the Binding occurs correctly or generates exceptions.
Object. The base Object class is specialized by the programmer to create Hector application components. Several subclasses of Object provide different levels of abstraction suited for different object types.
Offer. Objects interact by offering and consuming services. Objects wishing to offer a service create an instance of the Offer class, specifying the Interface type and their qualities of service. The Offer instance is created within the kernel and manages requests from other objects for use of the object.
Capsule. The Capsule class defines the Interface to the external functionality available to objects. It includes construction methods for the preceding classes and access to the instances of the service classes defined in Table 1. The fundamental services provided by the Capsule are represented as classes with standard APIs. The Service classes are an abstraction that hide details of the real services provided by a specific host/site. Each Capsule makes an instance of these classes available to its objects. Services themselves are a specialized subclass of Object implemented with careful negotiation of their mutual dependencies.

Each of these classes must be implemented in the target language. The target class methods are bound to the Hector system C API -- implemented as a Python C module. The classes must wrap the Python kernel classes and propagate callbacks from within Hector to the bound language.

Template Languages. Language bindings can provide support for additional "Template" formats if required. A Template is a description of a type with enough detail to allow its instantiation.

Python's language binding provides two Template formats: .py and .pyc-style strings. Some initial work has also been done on implementing a UNIX shared-object file-template format, with the intention that it be used for compiled languages. Additional compiled languages are unlikely to require a new template language since they can share the mechanism for loading shared-object file templates.

However, most interpreted languages have one or more executable formats. Typically the raw source or a preparsed/ checked bytecode is acceptable to the Interpreter. Both of these formats can be used as a template language.

Interpreters. Finally, Template languages other than OS/processor-specific machine code will require that Capsules load an interpreter module. This will normally require the interpreter to be available as a dynamically loadable module able to be called by the Capsule.

Languages that are dependent on a large run-time system (such as Smalltalk), or languages that are difficult to call from C are likely to be problematic. This is an area of ongoing work.

Python Language Binding. The initial language layer supports Python. It is available by default, since the visible kernel classes are actually written in Python, making the wrapper classes very simple.

For Python, Object Templates consist of either Python source code or the tokenized source used in .pyc files. The Python language layer uses a modified version of import to fetch imported modules from the Type service, before attempting to locate them locally. This allows host machines to provide only the basic Python installation. All other modules served from a LAN-based distribution service.

Kernel Layer. The Kernel Layer lies below the Language Layer and uses the Communications Layer for interaction. It provides implementations of the major Hector classes: Message, Interface, Role, Interaction, Binding, Offer, Object, Service, and Capsule. They are implemented in Python within the kernel, utilizing the underlying Communications Layer.

Communications Layer. The Communications Layer addresses transport-protocol transparency. The kernel is able to interact through multiple protocols while knowing nothing about the implementation of the transport protocol. Classes provided this layer include:

Endpoint. Endpoint instances maintain an actual communication-protocol endpoint within an abstract interface. Interface instances are implemented by one or more Endpoint instances, depending on their required connectivity to other objects. The actual mapping of an interface to a set of endpoints is performed by the Binding constructor using the initiating Object's policy setting to choose a preferred communication protocol for each physical connection.
Protocol. Protocol is an abstraction for an entity able to create Endpoint instances. It describes the kind of endpoints it can create using the abstract Message, Interface, and Interaction classes. This permits constraints on the data types of messages, their sequencing at the endpoint (for example, RPC announcements or streams) and their ability to connect to other parties (for example, peer to peer or multicast).
Address. Endpoint addresses are opaquely encoded in Address instances. Each Protocol supports one or more Address formats, registered with the Capsule to allow resolution of Interface endpoints.

The Communications Layer is comprised of the set of Protocol instances existing in a Capsule. Protocols can be dynamically loaded and unloaded using the Capsule's management interface.

Tools

The Hector environment provides several tools for programmers and users in addition to the basic Capsule executable, Services, and Object classes.

Description of Bindings and Interfaces has traditionally been performed using a combination of an IDL and its compiler with some program code. Hector provides a graphical editor for Interface and Binding type manipulation. It allows the description of the Interface events, their sequencing constraints, and the connection between Interfaces to form Bindings.

A Capsule Browser provides a means to manage the Capsules running across a network. The browser can display the state of the Capsule, Services, Communication modules, Language modules, loaded Templates, and instantiated Objects. Users can load or unload modules and templates, and create new objects from a loaded template.

The Capsule Browser uses the Capsule's management interface to perform these operations, and this interface is also available to any other objects with appropriate permissions.

Finally, the Capsule Browser also provides support for the fundamental Capsule services. The Trader, Type Manager, and Policy services are all accessible using specific control panels within the Capsule Browser.

Status

We are currently working on the second snapshot release of the core Hector system. The kernel is written entirely in Python. The current kernel and communication sources total just over 13,000 lines of Python code, with the services adding about half of that again.

The first release included a simple, UDP-based messaging protocol, a simple Trading service, a notification service, and supported objects written in Python.

The current release includes a notification service, naming service, Trading service, Type service, and Policy service.

Future

We have several major work items planned. Most important are different communication protocols to augment our current prototype.

While we have not yet chosen any particular protocols, those that interest us include:

SMTP e-mail.
HTTP.
CORBA IIOP.
DCE.
Sun (ONC) RPC.

We've also done some preliminary work on a language binding to C++. This will probably be integrated with the core system in the next release.

We are investigating support for Java (of course!) and ParcPlace VisualWorks Smalltalk (which is heavily used by another local project).

Additional plans include implementation of the Relationships and Banking services, together with a parallel project to provide the authentication service. These are expected to complete the fundamental services of the system.

In the future, work is planned on a customizable, distributed user-interface, as well as work on object migration and persistence services. More information about Hector can be found at http://www.dstc.edu .au/Hector.

References

Architecture Projects Management Ltd., ANSAware 4.1: Application Programming in ANSAware, February 1993.

Berners-Lee, T., R. Fielding, and F. Frystyk. Hypertext Transfer Protocol -- HTTP/1.0 INTERNET DRAFT <draft-ietf-http-v10-spec-05.html>.

Bond, A. and D. Arnold. "Visualizing Service Interactions in an Open Distributed System." Proceedings 1st International Workshop on Services in Distributed and Networked Environments, Prague, Czech Republic. June 1994.

Brown, N. and C. Kindel. "Distributed Component Object Model Protocol -- DCOM/1.0 INTERNET DRAFT <draft-brown-dcom-v1-spec-00.txt> May 1996.

Chilvers, et al. "What's in a name? A Distributed, Federated Naming System in Python," Proceedings 4th International Python Conference, to appear.

ISO/ITU-T, ITU Recommendation X.90x, ISO/IEC CD 10746-x. "Open Distributed Processing: Reference Model." Parts 1-4, 1994-5.

Linnington, P.F. "RM-ODP: The Architecture Open Distributed Processing, Experiences with distributed environments." Proceedings 3rd IFIP TC 6/WG 6.1 International Conference on Open Distributed Processing, ICODP95, Brisbane, Australia. March 1995.

Mansfield, T. "Customizable Presentation of Complex Data," Ph.D. Thesis. Queensland, Australia: University of Queensland, 1996.

McManis, C. and V. Samar. "Solaris ONC: Design and Implementation of Transport Independent RPC." Sun Microsystems Inc. 1991. IETF RFC 1831. ftp://nic.merit.edu/internet/documents/rfc/ rfc1831.txt/.

Object Management Group. "Common Object Request Broker Architecture 2.)." Object Management Group TC Document 96.03.04, July 1995. http://www .omg.org/docs/ptc/96-03-04.ps/.

Object Management Group. "Trading Object Service." Object Management Group Document orbos/96-07-08, July 1996. http://www .omg.org/docs/orbos/ 96-09-08.ps/.

Redhead, T. "A High-level Process Checkpointing and Migration Scheme for Heterogeneous Distributed Systems." Proceedings IFIP/IEEE International Conference on Distributed Platforms, IDCP 96, Dresden, Germany. February 1996.

Rosenbury, W., D. Kenney, and G. Fisher. Understanding DCE. Sebastapol, California: O'Reilly and Associates, September 1992.

Postel, J.B. "Simple Mail Transfer Protocol." IETF RFC 821, ftp://nic.merit.edu/ internet/documents/rfc/rfc0821.txt/.

X/Open CAE Specification. "Federated Naming: The XFN Specification." X/Open Document C403, ISBN 1-85912-052-0, 1995.

DDJ