Article nov2006.tar

The Argument for AFS

Roger Feldman

As long as I have been aware of the Andrew File System (AFS), I have heard many arguments both for and against it. It seems to create passion from the architects and administrators who use and promote it, and there are others who want to make sure that it doesn't take a foothold in their IT environment. I first came in contact with AFS in the late 1990s while it was being tested by a team of administrators who were known for being the open software gurus of our organization. The other administrators shook their heads as they tested Kerberos and AFS. It seemed like a game for the techies who were mostly interested in working with the latest cutting-edge software tools. What could this have to do with the practical needs of the organization?

I found myself a bit reluctant to learn about these Kerberos servers and was a bit daunted by the talk of implementing some strange file system that required learning a whole new set of work skills. Many of our administrators didn't want to get involved with Kerberos and AFS and were just happy with the NFS and Windows file system sharing. After some time, the members of the AFS testing team convinced a small portion of our customers to use AFS, and suddenly we had Kerberos and AFS servers in operation. Because I was known as one of the open-minded administrators, I was deputized as Kerberos administrator.

Two years later, our organization was being consolidated and we had to make decisions about how to distribute our application environment, which was then dominated by NFS file sharing. By this time, one of our departments was running a full-blown application tree using the AFS file system. The department using AFS was known for having the most complete and up-to-date application tree, so their AFS model was chosen as the consolidated application environment. The rest of the Unix IT department was then faced with learning about AFS and Kerberos. The model was not chosen for the writable project/user file shares, and I will explain why later in this article.

This won't be a "how to set up an AFS environment" article. In this article, I will, however, show you "hands on" how a few things work on an AFS server and client as a means of exploring the main concepts, security features, and technical functions of the AFS. This can be a starting point for comparing AFS to other file system technologies. There are many popular products on the market that, after many years of development, can do some of the special things for which AFS is known. AFS has, for example, always had security, distribution, and replication features in its focus.

What Is AFS?

Most sites use a shared file system to distribute their applications, project data, home directories, and other data-using protocols and programs, such as NFS, Samba, Windows CIFS, and Netware. NFS is very often used for Unix sites. The AFS offers an alternative to these well-known methods. Surprisingly, AFS is not known at all to many in the Unix world. Even if you do not choose to implement AFS, it is worth investigating because it has influenced the growth of other distributed file systems and NAS products.

The history of AFS began at Carnegie Mellon University, and later it became a commercial product at Transarc Corporation, which was eventually bought up by IBM. IBM later branched the AFS code to open source, and it became known as the OpenAfs Project, which has its home at:

http://www.openafs.org 
            
The latest version is openafs-1.4.1.

The concept is that you create a cell that resembles a domain. This cell has servers that distribute a Unix directory tree structure built up of AFS volumes that can be accessed by clients running the OpenAFS client. The clients use Kerberos to identify themselves to the servers, which also have advanced ACL file permission protection, which increases the level of security. The OpenAFS clients have an advanced local cache mechanism on their local disks, which makes often-used files and applications appear to be local. The cache can also be used should a server become temporarily unavailable.

The servers can easily replicate to read/only replication servers at other sites allowing for fast access in remote cities, sites, and countries. AFS volumes can be moved to another server while the users are working. The OpenAFS clients can even access other cells. The name space has location independence, a single global and local independent space, which makes for structural consistency. Put this all together, and you have a distributed file system that can transparently allow safe access to data all over the world.

A Few Examples of Server and Volume Administration

The server infrastructure and administration can be a bit daunting at first. You may need to have Kerberos servers available, and setting up those is out of the scope of this article. You are not forced to use your Kerberos servers as your authentication method, but they are often used and can increase security and stability. This article is based on a site that is using MIT Kerberos V as a replacement for the native authentication server offering. Most sites that already have their own Kerberos servers running for other security purposes feel safer having the Kerberos on separate machines with a redundancy scheme. Please note that using the native authentication server and tools offers easier and more compact administration.

Let's dive in and take a look at some of the things that appear on an AFS server. Here are the disks from our "writable" AFS server that will be holding the AFS volumes:

Filesystem            1K-blocks   Used     Available Use% Mounted on 

/dev/afs_vg/vicepa_lv 103212320  80720000  21443744  80%  /vicepa 
/dev/afs_vg/vicepb_lv 103212320  90172744  11991000  89%  /vicepb 
/dev/afs_vg/vicepc_lv 103212320  94212124   7951620  93%  /vicepc 
/dev/afs_vg/vicepd_lv 103212320  90483344  11680400  89%  /vicepd 
As you see, we have configured the disks with mountpoints named "vicep(a-d)". The "vicep" is the standard prefix for AFS partitions and you use "a-z" to name each disk (up to 256 available using other alphabetic combinations). Later, you will be creating AFS volumes in these "vicep" partitions. It is common to use a RAID disk system to hold the "vicep" partitions.

The catalogs, files, and data you create are not seen to the clients in the "vicep" partitions. You first create a logical container called a volume, which is stored in the "vicep" partitions. After you create a volume you choose "where" to mount it. Once you mount the AFS volume, it is available for use. The standard namespace used with AFS is to mount under /afs/cellname, as the /afs mount is the view that the clients see. This is important because it allows all AFS cells to have the same structure, and you may be accessing data from different cells. You will now look at the top structure of the /afs on your cell:

% ls -al /afs 

.mydomain.com  
mydomain.com 
    
In this case, we have two AFS mounts because we have both read-only and write mounts. The dot ".mydomain.com" is the writable mount, and the "mydomain" is the read-only mount. To create a new directory, you would work in the /afs/.mydomain.com mountpoint, assuming that you had the permissions and account, which are stored in the "pts" account database. You will also need to be included in a special file that resides on the servers; this file is named "UserList", and it lists users that can perform some of the administrative commands. Then you will need to take a Kerberos ticket, which in turn allows you to take an AFS token. This is done using "kinit" to authenticate with Kerberos and "afslog" to get your AFS token. There are several variations (depending on your decision to use your own Kerberos or the local auth server) on how you authenticate, and I recommend that you make your own choices after reading about authentication at the OpenAFS site.

With AFS token in hand, you can now create a volume and mount it in the /afs tree. One of the most useful features of AFS administration is that you do not need to be logged onto the AFS servers while creating volumes and administrating many aspects of your cell. You can do most of your work from a client machine as long as you have the correct administrative credentials. This opens up a whole new way of thinking as your physical location becomes much less relevant in the AFS world. You are always seeing the same /afs namespace, which is regulated and filtered by your credentials in the cell. The concept is similar to having a file system that has the global availability of the Web and explains why AFS has used the slogan: "No matter where you go, there you are".

Next, we'll create a volume and mount it under /afs/mydomain.com/progs. The vos command is used to create and manipulate volumes and is used quite often in AFS administration. We use the following arguments: create to create a volume, server01 to denote the server where the volume resides, vicepa to specify which vice partition to use, and finally app.newprog to denote the name of the new volume:

[/afs/.mydomain.com/progs] -> vos create server01 vicepa app.newprog 
    
Create the mount under the prog directory with the fs command. The argument after mkmount directory name is where the clients will see the volume in the AFS tree:
[/afs/.mydomain.com/progs] -> fs mkmount newprog app.newprog 
Once you have created new volumes and files, you must "release" them with the vos release command. This releases the changes to the read/only replicated servers. You have now created the new catalog "newprog", which is available to the clients in your cell. Of course I have created a simplified example here. In reality, you will have to consider an overall strategy of which layers of your file system structure will be actual volumes in relation to which volumes will simply contain sub-directories that are not AFS volumes. If that sounds strange, don't worry, it is one of the concepts that you will get used to, although it may take a bit of thought before it all sinks in. You will handle those issues in different ways depending on what kind of data you are creating, such as your home directories, as opposed to an application hierarchy that I have used in this example.

One other important feature of AFS is worth mentioning at this point. The "@sys" variable reflects the client OS architecture. To see the architecture name of your client, use the fs command. (Please do not confuse the AFS fs command with the earlier Unix fs command.)

% fs sysname 
    
This command returns the following output: "Current sysname is 'sun4x_58'". If you had run the command on a Linux machine with the 2.6 kernel, it would have answered with: "Current sysname is 'i386_linux26'".

You can take advantage of the "@sys" variable when creating installations for each architecture release that you are supporting. The installations for all architectures will be created under the ".sys" catalog, which makes for some transparency:

/afs/mydomain.com/progs/newprog/.sys 
     
This allows you to create installations for several different architectures, such as Sun and Linux:

/afs/mydomain.com/progs/newprog/.sys/sun4x_58 
/afs/mydomain.com/progs/newprog/.sys/sun4x_59 
/afs/mydomain.com/progs/newprog/.sys/i386_linux24 
/afs/mydomain.com/progs/newprog/.sys/i386_linux26 
You then install each application under the correct architecture catalog under the current release number:

/afs/mydomain.com/progs/newprog/.sys/i386_linux26/1.2 
After creating a link directly under newprog, the AFS client will automatically transcend to the correct architecture version of the program using the @sys variable. When the AFS client sees the @sys variable, it automatically translates @sys to the correct OS architecture. Assuming that we have installed version 1.2 of an application named newprog, it would look like this:

[/afs/.mydomain.com/progs/newprog] -> ls -al  
 
drwxr-xr-x   3 root    root   14336 Nov 22 14:53 .. 
drwxr-xr-x   8 myid    mygrp   2048 Dec  8 13:19 .sys 
lrwxr-xr-x   1 myid    mygrp     15 Dec  8 13:16 1.2 -> .sys/@sys/1.2 
After you have created the volume you can use the fs command again to set the quota and permissions on the catalog or files. The ACL permissions used by AFS are more robust than the standard Unix ACLs and are considered to be one of the advantages of AFS. To check the current permissions in the current catalogue:

% fs listacl 
This command will return something like the following:

-------------------- 
Access list for . is 
Normal rights: 
  staff rlidwka 
  system:administrators rlidwka 
  system:anyuser rl 
-------------------- 
You see the permissions are a list of the following possible bits, "rlidwka" which correspond to:

'l' The lookup right 
'i' The insert right 
'd' The delete right 
'a' The administer right 
'r' The read right 
'w' The write right 
'k' The lock right 
    
The ACL of a directory may be changed using the fs setacl command. By default, fs sa adds to or alters the existing ACL, rather than replacing it entirely. Let's look at two of the predefined protection groups that are quite helpful in situations in which you want to allow large groups to have read access to certain directories:

  • system:anyuser -- The system:anyuser allows any AFS user to access the catalog. It is quite handy for a site that wants to allow any AFS client to access and use applications without regard to getting tickets and tokens, thus reducing administration for users that won't be creating files in AFS.

  • system:authuser -- The system:authuser requires that you authenticate within the cell, thus creating access for all in your cell while blocking other cells that may have access to other areas of your site.

    The fs command is also used to set the quota. Here we use the sq argument to set 100 MB as our quota:

    [/afs/.mydomain.com/progs/newprog] -> fs sq . 100000 
        
    I chose these previous examples to give you a hands-on idea of how AFS works when creating and mounting a volume. I don't want to spend too much time describing the server daemon processes as you can get the details at:

    http://www.openafs.org/documentation 
    
    Note that AFS denotes each server process as a "server", and in reality several of these servers often reside on the same server machine. Here is a brief look at some of the AFS server processes:

    Database servers:

    Kaserver -- Handles authentication

    Ptserver -- Protection database; runs on database servers

    Vlserver -- Volume location server; keeps track of their location

    Buserver -- Backup server; needed to run the backup utility

    It is best practice to replicate the database servers to one of your other AFS servers. This process is not to be confused with the replication of volumes. This would cover you in the event of a crash. One of the important concepts of AFS is that the clients can always contact another server if they need information from one of the database servers. If you want the details about this subject, please read further about the "ubik" algorithm. The AFS concept always takes consideration to create a form of failover, which gives a certain amount of high availability.

    Fileserver types:

    Fileserver -- Takes care of storing and delivering files

    Bosserver -- Runs on fileservers; controls and monitors status

    Upserver -- Handles updates and distributes them

    Volserver -- Handles vos commands to manipulate volumes

    You will want to familiarize yourself with the bos command, because it is used both to get server status and to start and stop services. You can see it as a command suite. If you have a large enterprise site, the actual server administration (physical machine, RAID disk maintenance, and processes) may be separated from the cell administration of volume creation, user administration, and user data.

    Volume replication is an important part of AFS, especially if you are distributing a lot of read-only data like applications. The concept is that you install a replica AFS fileserver with the same-size disk partitions and use that machine as a read-only server at another site or location (or possibly at your site) to handle a heavy load. This means that if you have a writable AFS server at your main site, you can simply replicate to your site in another location or country. Once you have set up your replication server, you add the sites to the volumes that are to be replicated. We use the vos command with the addsite argument. The remote01 argument is the name of the remote read-only server, vicepa is the vice disk on the replication server, and app.newprog is the volume to be replicated:

    [/afs/.mydomain.com/progs] -> vos addsite remote01 vicepa app.newprog 
        
    We then run the vos release to replicate to the remote read/only server:

    [/afs/.mydomain.com/progs] -> vos release app.newprog 
    
    I find it very interesting how tight this mechanism is built into the AFS concept. Once you have installed the replication server, it is simple to replicate. How much would that cost if you were using a commercial product? How long has it taken for many commercial products to catch on to this theme?

    AFS Client

    Many of the aspects of the AFS client are not seen by the actual users. The administrator will have installed the AFS client, which hopefully included the files needed for cell configuration. If your client is doing anything more than accessing read-only applications, you will also need Kerberos client configuration so users can obtain Kerberos tickets and AFS tokens. You will also need to educate your users so that they can requests tickets and tokens if you have not made that a transparent part of the login process. The name of the default cell is stored in a file named "ThisCell" under /usr/vice/etc/. This also contains a file name "CellServDB", which lists the IP address and DNS names of the database server machines in the cells that you need to contact. Don't forget that you may be contacting several different cells. The fs command mentioned earlier can show information about your cells:

    % fs listcells 
    
    One of the vital functions and features of the client is the cache manager that is loaded into the kernel. The cache manager is also one of the special features of AFS. AFS uses a pre-defined area of disk; this could be a separate partition and should allow at least several hundred megabytes to be used for caching. Applications that request files from the Volume Location server will place the files in cache. You will often read about the "callback" concept, which means that the server will promise to notify the clients if they need a newer version of the file. I always found this concept hard to believe, but it does work and we have been amazed that many distributed applications run like locally installed applications because of the cache manager. The file /usr/vice/etc/cacheinfo contains the following information to define 500MB cache located under /usr/vice/cache:

    /afs:/usr/vice/cache:500000 
        
    There are many ways to measure, monitor, and tweak your cache, and you can investigate those yourself if you decide to test AFS. Here are two quick samples with the fs command:

    % fs getcacheparms 
    % fs setcachesize 
        
    The fs command can also be used to list and set server preferences; this can be valuable if some servers are having problems.

    One final note about the client -- your site will have to consider a strategy on how to incorporate the /afs namespace into your local file system hierarchy namespace. This should not be a big problem as you have probably already tackled such issues while mounting and accessing other types of distributed file systems.

    Users and PTS

    At the beginning of this article, I mentioned that we have chosen not to use AFS for the bulk of our project-writable data and home catalogs. There is a very positive side to using AFS with applications and read-only site configuration data. The clients can use applications without much regard to the Kerberos and pts user database, which decreases administration in AFS, Kerberos, and pts. If you choose to use AFS for writable project data, then all users will need to be administered in both Kerberos (or Authentication server) and pts. Attention must also be paid as to how you will synchronize their uid names, uid numbers, groups, and passwords with the regular Unix accounts and any existing password schemes, such as NIS, that may be running at your site. We do also have some project- and user-writable data in AFS, and I expect that many readers may be interested in using AFS for home catalogs and project data. This means that your decisions about Kerberos and user administration will greatly affect how you administer the site.

    The documentation at OpenAFS (which seems to be derived from the IBM/Transarc docs) assumes that you are using the built-in authentication server and kas admin suite. There are other utilities that take advantage of this, thus making user and Kerberos administration much easier. My main point is that there are decisions to be made, and you must thoroughly consider your site needs before deciding your strategy.

    Here are a few examples of how it will look when you administer the pts database. Pts tracks and registers users, client machines, and groups in AFS. You must be a member of the third pre-defined protection group system:administrators to create users and machine entries. Please note that it is standard practice to map a separate Kerberos "admin" account for each admin who will be listed in system: administrators. Once you start to create your regular users, concern should be taken to match the Unix login name and uid number with the pts user and uid. If you have a Unix user named "jonesj" with uid 1099, then you could create a user in pts with the following command:

    % pts createuser jonesj 1099 
    
    You will also need to create this same user in Kerberos or authentication server with the "kas" command, as both are needed for AFS access. You can look at your pts entries using the "examine" argument:

    % pts examine jonesj 
        
    It is also interesting that you can have machines defined. Once they are added to a group entry, they could be used on ACLs. Add to that the fact that wild cards can be used with machine addresses, and you can really fine-tune which networks access your data.

    Creating groups is straightforward and can even be done at the user level because AFS has the concept of group ownership. You can see who owns a group with the following:

    % pts listowned [user] or [group] or [id] 
    
    To create a group, use the "creategroup" argument:

    % pts creategroup jonesj:testers 
    
    Then use "adduser" to place other users in your group:

    % pts adduser -user smithb -group jonesj:testers 
    
    The pts membership command will show you the groups to which a certain user belongs or the members of a group. The pts listentries command will dump all pts entries for the users or groups.

    As you can see from these few short examples, combining pts users, machines, and groups with the ACL listing provides deep possibilities to fine-tune your access to sensitive information. This is just the tip of the iceberg, and you'll have to study a while if you are used to working with standard Unix permissions.

    Backup

    There are different methods used to backup AFS data. Many of the well-known commercial backup products do not have support for AFS. One option is to use the AFS backup utility command, which works in coordination with the buserver. The backup utility supports a whole series of commands to define dump devices, schedule, and administer backup.

    Our site has opted to use the vos dump command, which is incorporated into a Perl script that can handle both full and incremental dumps under a monthly period. The dumps are dumped to disk where our commercial backup product writes the dump files to disk. Please note that this is not the recommended backup method. It works for our site because we have a resource that can program in Perl, and we have tested and confirmed that this method can consistently backup and restore our AFS data.

    Conclusion

    In this article, I've mentioned both the good points and some of the difficulties with AFS. There are many topics and issues that I did not have room for in this short introduction; whole books have been written on the subject. I didn't focus on server installation, setup, and administration because I don't think that it would given you a feeling for how AFS works in a more practical sense.

    I did not write this article as a mission to convert you to AFS. I think it is beneficial to enterprise Unix administrators and architects to be aware of the AFS model, because it has been a frontrunner in distributed file system world. Choosing to use AFS is a major decision for a site and requires planning and education. Using OpenAFS requires that you have the administrative resources, support, and knowledge to run OpenAFS and Kerberos. I hope that you can see that it is worth investigating the advantages of AFS with regard to security, availability, operation, and administration.

    A book that I recommend on this topic is Managing AFS: The Andrew File System by Richard Campbell (Prentice Hall). Please note that despite the fact that it is 8 years old and that some of the information has become obsolete, it is considered a very informative and well-written book on AFS.

    Roger Feldman is a jack-of-all-trades Unix administrator who has been involved in just about everything at one time or another. He is currently working as a consultant for a major company at his home in Stockholm Sweden. He can be contacted at: roger.feldman@bostream.nu.