Deploying DCE as an Infrastructure

One organization's experience with implementing DCE

We considered stability and performance most important in our cell configuration. As mentioned earlier, aggressive schedules and a production environment meant that we needed to achieve a high level of productivity in this deployment. To eliminate down time, we employed replicas of the security and naming services. When the master of either service fails, the replicas provide access to the information, and can be promoted to master in the event of catastrophic loss. The same replicas provide a performance benefit to the community. For most operations, clients bind against the first server they find, be it master or replica, decreasing the load on master servers and increasing responsiveness.

Adequate planning in the creation of the DCE cell facilitates the addition of more users as the cell and the community grow. Planning should address resource allocation and administrative overhead, as a cell is not of fixed size or capability. Establishing site-specific metrics for acceptable performance is recommended as a means of triggering increased server capacity or distribution. Performance evaluation derived from mean time of request for authentication or access into the namespace provides insight into the type of service that requires additional resources.

Our deployment was ready to exit this first phase once the cell had been designed and created, and the responsibilities of daily administration had been assigned.

We made the following decisions in our deployment:

Scalability/performance. The number of cell users (about 100) is well within the documented level of usage for a single server cell. Therefore, we didn't expect the traffic to necessitate the addition of more servers to handle requests at a level of performance consistent with the distributed services already in place.
Down time. Any instances of down time would have doomed the deployment, and possibly the timely release of the product as well. In order to mitigate the fear and likelihood of down time, replica security and naming servers were added.
File storage. During the first phase of the deployment, little data was contained within the file storage in the cell, so only one distributed file server was configured. A second machine resource, however, was reserved as a means of quickly expanding as the need arose.

Phase 2: Create Common Resource Areas

In Phase 2 of the deployment of DCE, the objective was to create resource areas within the cell. These areas contain information that users want or need to access, and their existence within the cell encourages users to migrate for their own benefit. During this phase, users install DCE software on their systems and exercise the basic functionality of the distributed file services (DFS) and DCE core services.

The level of distributed-computing knowledge among the user community determines the ease with which a distributed infrastructure will be introduced in Phase 2. Phase 1 requires very little acceptance on the part of the user community, but Phase 2 is keyed upon some active user participation. In cases where DCE is implemented as a solution to the complete lack of a distributed environment, good choices for migration are tools, games, and commonly read datafiles.

In our lab, data formerly distributed by other means (such as NFS and AFS) was relocated to DCE/DFS-based areas. Through the course of Phase 2, the gradual relocation of commonly used resources to the new areas provided users with a high comfort level, as they saw the new DCE technology providing support in areas where they were familiar with sharing resources. Maintaining copies of this data at the old locations enabled users to move to the new technology with confidence, as they always had the ability to retreat to the earlier solution in the event of catastrophic failure.

The resources chosen were those frequently accessed by the community, but seldom modified. Read-only resources do not require user authentication into the security domain of the DCE cell.

We were ready to exit Phase 2 once 75 percent of the lab members had installed the new DCE software and were accessing the common data through the DCE cell. This percentage represents an approximation of a median-usage case in the community. Any higher, and the threshold might never have been reached--any lower, the performance analysis and resource allocation might have proven inadequate as a model of a full adoption. This percentage provided an adequate measure of the new technology, but it did not impel the community to move wholesale.

Measuring this adoption rate, however, was not so clearly defined. A strict implementation requires that the users remove all but the DCE-based references to these common data areas. Percentages were arrived at by checking the DCE namespace for the hosts incorporated as clients in the cell against the total host base in the lab.

Because we focused on providing preexistent resources in a new location, the new technology was rapidly accepted. Users took the time to install the DCE software and enter the cell because these tasks had been tailored into scripts to make the operation straightforward and consistent. The scripts loaded the DCE software and configured the host namespace to utilize DCE and DFS paths in the place of the previous non-DCE/DFS locations. The resources maintained under the previous technologies were linked into lesser-known paths for quick relocation in the event of a need to fall back.

Administratively, this phase provided a smooth ramp for the corporate internal technical-support team, as they viewed the move as a switch in tools, not as a sweeping change in technology and architecture.

Phase 3: Utilize Security and Read/Write Resources

In Phase 3, the objectives were to populate the DCE registry with user-account data and to create read/write resource areas within the cell. The emphasis was on the establishment of security for the cell, thus the creation of more restrictive permissions for volumes that exist under the DFS. This phase performs several functions:

Read/write areas exercise and demonstrate the capabilities of the DFS.

Registry information is used to control access to the read/write areas.

Logging into the DCE acquaints users with the security component of the DCE and with the concepts of "principals" and "access-control" lists.

Having met the exit criteria for Phase 2, in Phase 3 our users had only to change their login habits. During Phase 3, users logged in daily to access the data in the cell. In DCE, administrators are allowed to set the time that a DCE login authentication is valid, and this period is generally less than 24 hours. Setting the expiration at 24 hours acquainted users with a security scheme that requires daily interaction. This login habit was encouraged by placing adequate restrictions on read/write information in the cell. We were forced to authenticate to avoid the annoyance of a failed access. Logging into an additional secure domain was not resisted, as we were already logging into UNIX5, AFS, and Kerberos daily.

The new read/write resources included those that had been read/write under a different distributed file system, and new resources that provided benefit to the user community. Status directories for project groups, commentary files on the progress of the deployment, and sign-up lists of various types were all created as common read-write areas.

To ensure actual interaction with the data under the DCE, each project group created its own status area within the DCE cell, to which it accorded permissions as it saw fit. Lab-wide data and administrative tips were collected and stored under more openly accessible areas. In this manner, there was a high level of traffic for both the read/write volumes in the common areas and the security and authentication services, as individual groups required their own levels of security.

The security replica created during Phase 1 provided uninterrupted security service for logging in, even when the primary server was being updated with new software. This made users comfortable with restricting the permissions on resources: They felt confident that they would not be locked out through an inability to login to the security server.

Administrative personnel had to populate the registry with the user base and assign users to the correct project groups. This was simple, but time consuming. However, once the registry was populated, the control of individual files and directories was left to the groups that had created them, lessening the need for administrative overhead.

In Phase 3, the participation of the project-management teams led to a rapid acceptance of the technology. Specific project information, stored in the DCE/DFS areas, forced team members to both login and access these resources, increasing their familiarity and confidence in the product.

We were ready to exit Phase 3 when 75 percent of the lab was logging into the DCE cell daily. This percentage was measured by checking the requests made for login to the security server, and balancing this against the total lab population.

Phase 4: Completing the Transition

Phase 4 was the final transition of user services from existing distributed services (NIS, AFS, NFS) to DCE-based services for the same functions (CDS, Security, DFS). Once accomplished, DCE can be realistically considered as a solution for distributed-computing deployment and as a reliable infrastructure for application development.

As Phases 1, 2, and 3 were accomplished, cell administration became a background task. The natural progression of the phases created sufficient ramp time for both users and administrators to understand the DCE technology. Allowing the first three phases to reach their exit criteria simplified the final phase.

Phase 4 was divided into three separate activities, the first of which was the integration of login utilities. This integration let us simultaneously login to AFS, DCE, UNIX, and Kerberos security from the normal system login. The new utilities were loaded and enabled on the client side, and the necessary security domains were individually activated. By replacing multiple login commands with a transparent acquisition of credentials, users automatically acquired all the credentials they needed at login.

These credentials were obtained through an extension of the existing login scheme. Code from the existing system commands was altered to change the path through which credentials were generated.

In the initialization phase, the process reads an authentication configuration file. An authorization policy is derived, based on the secure domains to be contacted. Calls through the authorization library create an internal data structure, containing field limits for user information, time-out limits based on the type of services, and a message list (typically for errors).

Once initialized, the process continues with the local-machine entitlement utilities, populating the data structure with the username and password as entered at entitlement. This information is then passed through the primary-registry login interface, as described in the authorization policy. In our case, this was the DCE registry. Once this authentication is obtained, all other registry authentication is performed sequentially, based on the authorization policy. Once all registries have been contacted, the following are returned: a group list, the password data structure, the environment for setup, and another message list.

The last phase is to set the group access list for the process and assign the process uid and gid to be the user's uid and gid; see Figure 1. At this point, the user has obtained the credential information necessary to proceed as an authenticated user in any of the configured domains.

The second activity in Phase 4 was the movement of home directories into the DCE cell. This is the directory that the user sees most frequently, and where most user modification takes place. For this reason, the movement of home directories to a central location can be subject to more resistance than the other phases of the deployment, but the benefits are many.

The first benefit was a consistent user environment, regardless of the physical machine. User permissions, files, and initialization processes remained consistent because our home directories were accessible from any node. This eliminated the down time caused by individual machine failure. Centrally located home directories enabled centralized backup. This decreased the network traffic during backup periods and the support load for administrators performing the backups. From an architectural viewpoint, centralized home directories focused the users on the client/server model and decreased personal administrative costs.

The move of home directories to DFS was a very basic change for us. With the loss of a home directory being so catastrophic, fallback solutions were key in encouraging our lab to migrate to an unreleased DCE product. We created new paths to the old data in home directories, and this provided us with an option to return to a known state and to get work done in the event that the DCE cell or DFS directories experienced prolonged downtime. To provide recovery of any directories that could be lost, a detailed backup plan was created by the technical-support group to ensure timely, restorable backup states for all DFS-based information.

This movement was the greatest leap of faith for some users, but it provided increased functionality and ease of use. We were now totally immersed in the DCE functionality, and only accessed multiple distributed environments to interact with other groups and other labs.

Monitoring DCE Performance in Cell

The performance of the DCE cell was monitored throughout the deployment, as it was one of our highest priorities. It was used as input to the administrative team, and enhanced the topology of the cell as it developed. Most of this monitoring took the form of constantly evaluating the satisfaction of our user community, as they were in the best position to determine an acceptable level of performance.

Both the performance and the reliability of the DCE cell proved much higher than we had originally expected, and users found themselves virtually unaware that the home directories had moved from AFS to DFS. Users of NFS were pleasantly surprised with the performance improvements over traditional NFS-mounted volumes and with the security of the DCE/DFS home directories. The ongoing performance monitoring did lead the administrative group to relocate highly used volumes onto separate servers, redistributing the load on the servers themselves, but these moves were obvious, and the improvements to performance, immediate.

Conclusions

The deployment of DCE in the Chelmsford Lab provided concrete benefits to the community in the areas of reduced administrative overhead, single login, and distributed file services. The model described here proved successful, and it can be applied to the introduction of any new technology into an existing environment. Rational steps with immediate and well- known fallbacks, in case of emergency, comfort an organization in change and lower the resistance to new technology.

Members of the DCE customer community can apply these results to their own needs, as appropriate. Solid reliability, minimal down time, reduced overhead, increased services, and a proven transition plan for the DCE services provided us with a usable framework for distributed development in a production environment.

Figure 1 Implementing integrated login.