Article sep2006.tar

Network Device Configuration Management

Anshuman Kanwar

Your most elaborate disaster recovery plans are only as good as your backups. In the context of routers (and most firewalls), all configuration is normally stored as a plain-text file in flash memory or some sort of NVRAM. Creating a replica of a router in case of catastrophic failure is simply a matter of physically plugging in a cold standby and copying the configuration from some backup medium onto the new device.

Rancid (Really Awesome New CIsco config Differ) is a tool that automates the process of backing up device configuration. In this article, I will discuss how to install and maintain rancid and also explore the benefits that result from having a working, up-to-date rancid repository.

What Is Rancid?

Rancid is a combination of shell, Perl, and Expect scripts that work together to provide configuration management. Although the name implies Cisco-only support, the tool has grown to work with a multitude of devices from most major vendors. Adding extensions for new device classes is also fairly easy. Details about supported devices can be found at:

http://www.shrubbery.net/rancid/ 
            
Rancid takes as input a list of device names and login credentials. It then cycles through this list and attempts to log into each device. Then, it runs a pre-determined set of commands at the CLI and collects the responses. Subsequently, it tries to match the generated output against a template to filter out incrementing changes and sensitive information. For instance, this step can mask out plain-text passwords or IPSec keys from the config file [1]. This output is compared, using diff, against the one generated via the prior run of the same process. These diffs are emailed to a set of admins, and, finally, the new file is checked into a version control system. Both CVS and subversion are supported (as of version 2.3.1).

Benefits

There are many uses for these collected configs besides the obvious disaster recovery scenario. In the first place, it becomes pretty easy to search for a line of config across multiple devices. As a simple example, suppose you want to find all interfaces that lack a "descr" line. A more involved example could be the case where you want to find any switch ports that are non-trunk, have "spanning tree portfast" enabled but may not have "bpdugard" enabled. With rancid this information is available on a local file system, so instead of logging into every router, you can parse, filter, and analyze it using standard text-manipulation tools such as grep, awk, or Perl.

The second win is for inventory control and patch management. Since all routers are already in the configuration management system, a report can be generated enumerating the code version across them. Depending on the device type, a list of modules can be created allowing you to track, for instance, how many of your network blades are Fast Ethernet (10/100) and how many are Gigabit Ethernet (10/100/1000).

The third benefit is that the operator does not have to log in to the device to look at a config or to confirm how to implement something. He or she can simply pull up the configuration as a Web page [2]. Human error is (therefore) reduced in proportion to the decreased number of operations performed directly on CLI.

Fourth, change management and tracking are automatically implemented. Hourly cron jobs can be set up to mail config diffs to senior network engineers so that all changes can be tracked. After the end of change windows, the tool can be run manually to report deltas across all device configs.

Finally, rancid includes a looking-glass service. This means that status check commands (such as "sho ip route <ip>" or "sho ip bgp summary") can be run by filling in a Web form. No direct ssh login to the router is required. You can see some examples of looking glasses at:


http://bgp4.net/wiki/doku.php?id=tools:ipv4_looking_glasses 
Installing the Dependencies

The following install instructions are based on Fedora Core 5. You can modify them to function with your packaging system or flavor of *nix:

# make sure the pre-requisites are installed 
su - 
yum install gcc 
yum install make 
yum install automake 
yum install cvs        # installs tcsh as a dependency 
yum install perl 
yum install expect     # installs tcl as a dependency 
yum install diffutils 
exit 
Installing Rancid

The actual installation is fairly easy:

wget ftp://ftp.shrubbery.net/pub/rancid/rancid-2.3.1.tar.gz 
tar -xzvf rancid-2.3.1.tar.gz 
cd rancid-2.3.1 
view README
./configure --prefix=/home/rancid --localstatedir=/home/rancid/data/ 
su - 
groupadd rancid 
useradd rancid -g rancid 
make install 
chown -R rancid:rancid /home/rancid/ 
exit 
Component Overview

At the end of the above steps, all rancid-related files are installed under /home/rancid. The collected configurations will live under /home/rancid/data. The main config file will be /home/rancid/etc/rancid.conf. All interesting scripts live under /home/rancid/bin. The login credentials will be stored in /home/rancid/.cloginrc.

Bin/rancid-run is the top-level script and is typically run via cron. It in turn calls bin/control_rancid. This is the main script responsible for launching a collection process per router. More than one router can be collected simultaneously. This script manages the router.db files, which define the lists of routers that need to be collected; it also sends out diffs via email post data collection. Bin/control_rancid calls bin/rancid-fe. If you peruse this script, you will discover that it acts as the switchboard that launches an appropriate collection script for every platform. Here are a few lines from bin/rancid-fe:

elsif ($vendor =~ /^baynet$/i)     { exec('brancid', $router); } 
elsif ($vendor =~ /^cat5$/i)       { exec('cat5rancid', $router); } 
elsif ($vendor =~ /^cisco$/i)      { exec('rancid', $router); } 
This implies that every platform has a {$plat}rancid script that actually runs the collection commands [3]. These {$plat}rancid scripts are mostly written in Perl. The final pieces of the puzzle are bin/{$plat}login scripts. These are Expect scripts that perform the actual password exchange and provide the authenticated CLI for the {$plat}rancid scripts to run commands on.

Also note that in addition to grabbing the config by running sho running-config on Cisco, for example, the {$plat}rancid scripts run a multitude of informational commands, such as sho modules and sho vlan. As the output of these commands is strictly informational in nature and is not required for restoring device configuration, it is put in the repository preceded by a comment character (e.g., a "!" for Cisco).

Configuring Rancid

Once you have Rancid installed, you're ready for some configuration. To begin, edit the main config file. Only one line needs to be updated:

# as user rancid 
vi /home/rancid/etc/rancid.conf 
    
Uncomment the LIST_OF_GROUPS variable. You can organize devices as groups based on geographical location or based on administrative boundaries or device types. For this example, I have created two groups: corp-routers and corp-firewalls:

LIST_OF_GROUPS="corp-routers corp-firewalls"
If you have a large number of devices, it may make sense to collect more than one at a time. To do this, set the variable PAR_COUNT to a higher number. Two other useful variables relate to security. FILTER_PWDS can be set to NO if you want to keep the passwords visible in the configs. Similarly, NOCOMMSTR can be set if you want to hide the community strings in the configs. As an example of password blanking, the following:

line vty 0 4 
 ... 
 password S0mepMd% 
    
is saved in the repository as:

line vty 0 4 
 ... 
! password <removed> 
For every group defined in LIST_OF_GROUPS, Rancid requires two email addresses: rancid-admin-<groupname> and rancid-<groupname>. This allows flexibility in assigning roles to different "noc" personnel. Create the required aliases according to your organizational structure:

  su - 
  cat - << END_ALIASES >> /etc/aliases 
rancid-admin-corp-routers: noc,bofh 
rancid-admin-corp-firewalls: noc,bofh 
rancid-corp-routers: noc 
rancid-corp-firewalls: noc 
noc: noc@<yourcompany.com>
bofh: <yourname>@<yourcompany.com>
END_ALIASES

 newaliases 
    
Manually send mail to these accounts to make sure the setup works.

Login Credentials

Rancid needs to be provided with a login name and password(s) such that configuration can be obtained from the device. These credentials are entered in the file ~/.cloginrc. If you have a AAA system, such as Cisco ACS server or OpenRadius, already installed, create a user in that system and allow it enough privilege to run commands listed in the corresponding bin/{$plat}rancid script. Let's assume that you have created a user named "rancid". Then this user "rancid" must be allowed to run the following commands for a Cisco router (line 1010 onwards in bin/rancid):

@commands=( 
        "show version", 
        "show boot", 
        "show flash", 
        "dir bootflash:", 
        "dir slot0:", 
        "dir slot1:", 
        "dir sup-bootflash:", 
        "dir sup-microcode:", 
        "show module", 
        "show port ifindex", 
        "write term all", 
        "write term"
    
); 
Here is how to set up the .cloginrc file:

 # as user rancid  
 cd ~ 
 cat - << END_CLOGIN >> .cloginrc 
add user * rancid 
add password routername1  LoginPwdA EnablePwdA 
add password *.sfo  LoginPwdB EnablePwdB 
add method routername3 ssh 
add method * telnet 
END_CLOGIN 
As shown above, wildcards can be used to specify multiple devices. This file must be accessible by the owner only:

 chmod 600 .cloginrc 
Test, Test, Test

At this point, you should be able to log into your devices using the bin/${plat}login scripts. For Cisco, this script is bin/clogin. To test, run:

$ bin/clogin routername1 
 spawn ssh -c 3des -x -l rancid routername1 
 rancid@routername1's password: 
 routername1>enable 
 Password: 
 routername1# 
Now you are logged in as you would be by manually typing in your username and password. Type in a command or two to verify functionality:

 routername1#sho clock 
 *13:43:17.136 PDT Wed Apr 5 2006 
 routername1#exit 
 Connection to routername1  closed. 
$ 
Create the Repository

The next step is to create the initial directory structure that will store the configs. To do this, run the initialization command:

 # as user rancid 
 bin/rancid-cvs 
By running this command, a separate directory will be created under /home/rancid/data/ for every group specified in the LIST_OF_GROUPS variable in etc/rancid.conf. Now you can add the names of the actual devices into rancid. For example, to put routername1 and rtr2 in corp-routers and fw1 and fw2 in corp firewalls, you need to edit the /home/rancid/data/$groupname/router.db files:

 cat - << END_RTRS >>  /home/rancid/data/corp-routers/routers.db 
routername1:cisco:up 
rtr2:juniper:up 
END_RTRS 

 cat - << END_FWS >>  /home/rancid/data/corp-firewalls/routers.db 
fw1:netscreen:up 
fw2:cisco:up 
END_FWS 
    
Note that the file format is <devicename>: <devicetype>:<status=collect if "up">. The list of all possible device types can be determined by peering into the bin/rancid-fe script.

Finally, you can run the rancid collection on all devices:

 # as user rancid 
 bin/rancid-run 
    
If there is a problem, check the log files in /home/rancid/data/logs:

 cd data/logs 
 tail -f `ls -tr | tail -1` 
Note that a new log file is created per group per run.

Day-to-Day Operation

It is typical to run rancid every few hours (or hourly) via a cronjob. A second cron job can be used to clean out old log files:

 # as user rancid 
 crontab -l 
 * */4 * * * /home/rancid/bin/rancid-run 
 # clean out config differ logs 
 45 22 * * * /usr/bin/find /home/rancid/data/logs -type f 
  -mtime +4 -exec rm {} \; 
After the initial install phase, adding a new router group is a simple process:
    1. Edit LIST_OF_GROUPS in etc/rancid.conf.
    2. Add the two corresponding aliases in /etc/aliases; run newaliases.
    3. Create subtree for this group using bin/rancid-cvs.
    4. Add the routers in router.db for the new group.
Either run collection by hand i.e. rancid/run or wait for the next cron collection cycle.

Also note that it is easy to collect individual routers by using -r <routername> as an argument to bin/rancid-run. This comes in useful to document config changes after a change window.

Adding a Repository Viewer

So far, we have a deployed a great data collection system, but the only interface it has is CLI. To enhance usability and access to the data, we can add a Web wrapper. CVSweb is the perfect workhorse for this purpose. Here is how to install it on Fedora Core 5:

 yum install httpd 
 yum install cvsweb 
 yum install cvsgraph # optional 
The only required configuration is to edit cvsweb.conf and point it at the right repository:

 vi /etc/cvsweb/cvsweb.conf 
 # change the below line 
 'local'   => ['Local Repository', '/home/rancid/data/CVS'], 
 # now start httpd 
 /etc/init.d/httpd start 
You should be able point your browser http://yourserver/cgi-bin/cvsweb.cgi and click through your configurations with ease. For added security, my own setup uses mod_ssl and mod_auth_ldap to demand authentication over an SSL channel before allowing access to either the data or the looking glass, which I will cover next.

Looking Glass

To install the looking-glass scripts, copy bin/lg.cgi and bin/lgform.cgi to the cgi-bin directory of your Web server. The simplest way to deal with permissions is to make sure that Apache (or any other http server) is running as the same user that owns the repository (i.e., rancid):

# as root 
perl -MCPAN -e 'install LockFile::Simple' 
cp /home/rancid/bin/lg* /var/www/cgi-bin/ 
The config file for the looking glass is /home/rancid/etc/lg.conf. Edit this file and change $LG_CACHE_DIR to /home/rancid/data/tmp and LG_CLOGINRC to /home/rancid/.cloginrc.

You should now be able to browse to http://yourserver/cgi-bin/lgform.cgi and take the looking glass for a drive.

Summary

Rancid combines a slew of open source tools to create a comprehensive, extensible, and compelling configuration management system for network devices. If maintained correctly, the config repository will save your chops in many a fix. I know that it has saved mine many a time!

Footnotes

1. Make sure these are saved elsewhere manually, if you decide to use password masking.

2. Cvsweb is needed in addition to rancid.

3. The Cisco script is simply called bin/rancid.

Anshuman Kanwar has dabbled in Unix administration, datacenter design, and computer security for the past five years, realizing in the process that perhaps the most valuble class he took in grad school was "Time Management -- expedited". Currently he works at Citrix Online as their Network Architect. He can be reached at: human@digitarchy.com.