Deploying DCE as an Infrastructure

One organization's experience with implementing DCE

Jack Danahy

Jack is with Hewlett-Packard's Chelmsford System Software Laboratory and can be contacted at jake@ch.hp.com.


Distributed computing tools--the enablers of a new generation of platforms and machines--provide transparent remote access to resources and services once restricted to local connections. System security, account management, clock synchronization, file storage, and others--once administered and delivered on a per-machine basis--are now available through platform-independent, distributed infrastructures such as Distributed Computing Environment (DCE) technology from the Open Software Foundation.

This relocation of essential resources and services, however, will almost inevitably create anxiety. Any deployment plan for a distributed infrastructure needs to consciously address a user's need for stability throughout the course of this migration. In this article, I'll describe the process we took in migrating Hewlett-Packard's Chelmsford Systems Software Laboratory to a DCE-based infrastructure. This deployment provides a road map of one successful path through the implementation of this new technology. I hope that new DCE users will benefit from the decisions and discoveries made through the exercise undertaken in our lab.

The Pre-DCE Environment

The Chelmsford Software Systems Laboratory is a group of more than 100 individuals engaged in the development of distributed-computing software and strategies, including DCE. The majority of this group is comprised of engineers, along with managers, marketing personnel, and a large documentation group. This mix represented a variety of skill sets, need and usage patterns, levels of technical expertise, and willingness to migrate to new technology.

Before adopting DCE technology as our infrastructure, the lab employed distributed solutions already present: The Network File System (NFS) and the Andrew File System (AFS) provided distributed file services, while Kerberos authentication secured mail and mailing lists. Naming services were provided through the Network Information Service (NIS). While our patchwork solution provided the basic services required for a distributed infrastructure, the associated costs were high. Everyone was forced to authenticate several times, once into each of the various security domains. Administrative tasks were also undertaken separately in each of the named environments, increasing overhead.

Although most of us viewed the introduction of a single distributed framework as solving specific problems in our infrastructure (not as a wholesale change in operations), there were some staff members who were not yet taking advantage of the distributed services already in place. Consequently, we had to emphasize the supposed benefits of DCE: lower administrative costs, centralized backup and recovery, and machine-independent view of the namespace, all of which contribute to a more consistent user environment, regardless of the current host.

The Approach to Deployment

While we all felt that thorough testing of a DCE project manager in a corporate environment was a significant, worthwhile goal, none of us would willingly accept reduced productivity, reliability, or stability in our everyday work environment during the transition. To compound this, the DCE software to be implemented in our lab was still in its developmental stages and not yet completely tested. Consequently, we decided to pursue a phased approach to implementation. Other sites will surely vary in their adoption model, and while the phases I describe here were appropriate for our implementation of a DCE infrastructure, another site (or technology) may find this approach too aggressive or too methodical. These phases are only an example of one ultimately successful migration.

Phase 1: Plan and Establish the Cell

In DCE terms, a "cell" is a secure administrative domain, containing interrelated services to provide a distributed environment to users. Services within a cell include:

These services may reside on a single machine, or they may be spread over multiple machines. The distribution of services and servers form the configuration of the DCE cell.

We considered stability and performance most important in our cell configuration. As mentioned earlier, aggressive schedules and a production environment meant that we needed to achieve a high level of productivity in this deployment. To eliminate down time, we employed replicas of the security and naming services. When the master of either service fails, the replicas provide access to the information, and can be promoted to master in the event of catastrophic loss. The same replicas provide a performance benefit to the community. For most operations, clients bind against the first server they find, be it master or replica, decreasing the load on master servers and increasing responsiveness.

Adequate planning in the creation of the DCE cell facilitates the addition of more users as the cell and the community grow. Planning should address resource allocation and administrative overhead, as a cell is not of fixed size or capability. Establishing site-specific metrics for acceptable performance is recommended as a means of triggering increased server capacity or distribution. Performance evaluation derived from mean time of request for authentication or access into the namespace provides insight into the type of service that requires additional resources.

Our deployment was ready to exit this first phase once the cell had been designed and created, and the responsibilities of daily administration had been assigned.

We made the following decisions in our deployment:

Phase 2: Create Common Resource Areas

In Phase 2 of the deployment of DCE, the objective was to create resource areas within the cell. These areas contain information that users want or need to access, and their existence within the cell encourages users to migrate for their own benefit. During this phase, users install DCE software on their systems and exercise the basic functionality of the distributed file services (DFS) and DCE core services.

The level of distributed-computing knowledge among the user community determines the ease with which a distributed infrastructure will be introduced in Phase 2. Phase 1 requires very little acceptance on the part of the user community, but Phase 2 is keyed upon some active user participation. In cases where DCE is implemented as a solution to the complete lack of a distributed environment, good choices for migration are tools, games, and commonly read datafiles.

In our lab, data formerly distributed by other means (such as NFS and AFS) was relocated to DCE/DFS-based areas. Through the course of Phase 2, the gradual relocation of commonly used resources to the new areas provided users with a high comfort level, as they saw the new DCE technology providing support in areas where they were familiar with sharing resources. Maintaining copies of this data at the old locations enabled users to move to the new technology with confidence, as they always had the ability to retreat to the earlier solution in the event of catastrophic failure.

The resources chosen were those frequently accessed by the community, but seldom modified. Read-only resources do not require user authentication into the security domain of the DCE cell.

We were ready to exit Phase 2 once 75 percent of the lab members had installed the new DCE software and were accessing the common data through the DCE cell. This percentage represents an approximation of a median-usage case in the community. Any higher, and the threshold might never have been reached--any lower, the performance analysis and resource allocation might have proven inadequate as a model of a full adoption. This percentage provided an adequate measure of the new technology, but it did not impel the community to move wholesale.

Measuring this adoption rate, however, was not so clearly defined. A strict implementation requires that the users remove all but the DCE-based references to these common data areas. Percentages were arrived at by checking the DCE namespace for the hosts incorporated as clients in the cell against the total host base in the lab.

Because we focused on providing preexistent resources in a new location, the new technology was rapidly accepted. Users took the time to install the DCE software and enter the cell because these tasks had been tailored into scripts to make the operation straightforward and consistent. The scripts loaded the DCE software and configured the host namespace to utilize DCE and DFS paths in the place of the previous non-DCE/DFS locations. The resources maintained under the previous technologies were linked into lesser-known paths for quick relocation in the event of a need to fall back.

Administratively, this phase provided a smooth ramp for the corporate internal technical-support team, as they viewed the move as a switch in tools, not as a sweeping change in technology and architecture.

Phase 3: Utilize Security and Read/Write Resources

In Phase 3, the objectives were to populate the DCE registry with user-account data and to create read/write resource areas within the cell. The emphasis was on the establishment of security for the cell, thus the creation of more restrictive permissions for volumes that exist under the DFS. This phase performs several functions:

Having met the exit criteria for Phase 2, in Phase 3 our users had only to change their login habits. During Phase 3, users logged in daily to access the data in the cell. In DCE, administrators are allowed to set the time that a DCE login authentication is valid, and this period is generally less than 24 hours. Setting the expiration at 24 hours acquainted users with a security scheme that requires daily interaction. This login habit was encouraged by placing adequate restrictions on read/write information in the cell. We were forced to authenticate to avoid the annoyance of a failed access. Logging into an additional secure domain was not resisted, as we were already logging into UNIX5, AFS, and Kerberos daily.

The new read/write resources included those that had been read/write under a different distributed file system, and new resources that provided benefit to the user community. Status directories for project groups, commentary files on the progress of the deployment, and sign-up lists of various types were all created as common read-write areas.

To ensure actual interaction with the data under the DCE, each project group created its own status area within the DCE cell, to which it accorded permissions as it saw fit. Lab-wide data and administrative tips were collected and stored under more openly accessible areas. In this manner, there was a high level of traffic for both the read/write volumes in the common areas and the security and authentication services, as individual groups required their own levels of security.

The security replica created during Phase 1 provided uninterrupted security service for logging in, even when the primary server was being updated with new software. This made users comfortable with restricting the permissions on resources: They felt confident that they would not be locked out through an inability to login to the security server.

Administrative personnel had to populate the registry with the user base and assign users to the correct project groups. This was simple, but time consuming. However, once the registry was populated, the control of individual files and directories was left to the groups that had created them, lessening the need for administrative overhead.

In Phase 3, the participation of the project-management teams led to a rapid acceptance of the technology. Specific project information, stored in the DCE/DFS areas, forced team members to both login and access these resources, increasing their familiarity and confidence in the product.

We were ready to exit Phase 3 when 75 percent of the lab was logging into the DCE cell daily. This percentage was measured by checking the requests made for login to the security server, and balancing this against the total lab population.

Phase 4: Completing the Transition

Phase 4 was the final transition of user services from existing distributed services (NIS, AFS, NFS) to DCE-based services for the same functions (CDS, Security, DFS). Once accomplished, DCE can be realistically considered as a solution for distributed-computing deployment and as a reliable infrastructure for application development.

As Phases 1, 2, and 3 were accomplished, cell administration became a background task. The natural progression of the phases created sufficient ramp time for both users and administrators to understand the DCE technology. Allowing the first three phases to reach their exit criteria simplified the final phase.

Phase 4 was divided into three separate activities, the first of which was the integration of login utilities. This integration let us simultaneously login to AFS, DCE, UNIX, and Kerberos security from the normal system login. The new utilities were loaded and enabled on the client side, and the necessary security domains were individually activated. By replacing multiple login commands with a transparent acquisition of credentials, users automatically acquired all the credentials they needed at login.

These credentials were obtained through an extension of the existing login scheme. Code from the existing system commands was altered to change the path through which credentials were generated.

In the initialization phase, the process reads an authentication configuration file. An authorization policy is derived, based on the secure domains to be contacted. Calls through the authorization library create an internal data structure, containing field limits for user information, time-out limits based on the type of services, and a message list (typically for errors).

Once initialized, the process continues with the local-machine entitlement utilities, populating the data structure with the username and password as entered at entitlement. This information is then passed through the primary-registry login interface, as described in the authorization policy. In our case, this was the DCE registry. Once this authentication is obtained, all other registry authentication is performed sequentially, based on the authorization policy. Once all registries have been contacted, the following are returned: a group list, the password data structure, the environment for setup, and another message list.

The last phase is to set the group access list for the process and assign the process uid and gid to be the user's uid and gid; see Figure 1. At this point, the user has obtained the credential information necessary to proceed as an authenticated user in any of the configured domains.

The second activity in Phase 4 was the movement of home directories into the DCE cell. This is the directory that the user sees most frequently, and where most user modification takes place. For this reason, the movement of home directories to a central location can be subject to more resistance than the other phases of the deployment, but the benefits are many.

The first benefit was a consistent user environment, regardless of the physical machine. User permissions, files, and initialization processes remained consistent because our home directories were accessible from any node. This eliminated the down time caused by individual machine failure. Centrally located home directories enabled centralized backup. This decreased the network traffic during backup periods and the support load for administrators performing the backups. From an architectural viewpoint, centralized home directories focused the users on the client/server model and decreased personal administrative costs.

The move of home directories to DFS was a very basic change for us. With the loss of a home directory being so catastrophic, fallback solutions were key in encouraging our lab to migrate to an unreleased DCE product. We created new paths to the old data in home directories, and this provided us with an option to return to a known state and to get work done in the event that the DCE cell or DFS directories experienced prolonged downtime. To provide recovery of any directories that could be lost, a detailed backup plan was created by the technical-support group to ensure timely, restorable backup states for all DFS-based information.

This movement was the greatest leap of faith for some users, but it provided increased functionality and ease of use. We were now totally immersed in the DCE functionality, and only accessed multiple distributed environments to interact with other groups and other labs.

Monitoring DCE Performance in Cell

The performance of the DCE cell was monitored throughout the deployment, as it was one of our highest priorities. It was used as input to the administrative team, and enhanced the topology of the cell as it developed. Most of this monitoring took the form of constantly evaluating the satisfaction of our user community, as they were in the best position to determine an acceptable level of performance.

Both the performance and the reliability of the DCE cell proved much higher than we had originally expected, and users found themselves virtually unaware that the home directories had moved from AFS to DFS. Users of NFS were pleasantly surprised with the performance improvements over traditional NFS-mounted volumes and with the security of the DCE/DFS home directories. The ongoing performance monitoring did lead the administrative group to relocate highly used volumes onto separate servers, redistributing the load on the servers themselves, but these moves were obvious, and the improvements to performance, immediate.

Conclusions

The deployment of DCE in the Chelmsford Lab provided concrete benefits to the community in the areas of reduced administrative overhead, single login, and distributed file services. The model described here proved successful, and it can be applied to the introduction of any new technology into an existing environment. Rational steps with immediate and well- known fallbacks, in case of emergency, comfort an organization in change and lower the resistance to new technology.

Members of the DCE customer community can apply these results to their own needs, as appropriate. Solid reliability, minimal down time, reduced overhead, increased services, and a proven transition plan for the DCE services provided us with a usable framework for distributed development in a production environment.

Figure 1 Implementing integrated login.


Copyright © 1995, Dr. Dobb's Journal