Network Device Configuration Management
Anshuman Kanwar
Your most elaborate disaster recovery
plans are only as good as your backups. In the context of routers (and most
firewalls), all configuration is normally stored as a plain-text file in
flash memory or some sort of NVRAM. Creating a replica of a router in case
of catastrophic failure is simply a matter of physically plugging in a cold
standby and copying the configuration from some backup medium onto the new
device.
Rancid (Really Awesome New CIsco config Differ) is a
tool that automates the process of backing up
device configuration. In this article, I will discuss how to install and
maintain rancid and also explore the benefits that result from having a
working, up-to-date rancid repository.
What Is Rancid?
Rancid is a combination of shell, Perl, and Expect
scripts that work together to provide configuration management. Although
the name implies Cisco-only support, the tool has grown to work with a
multitude of devices from most major vendors. Adding extensions for new
device classes is also fairly easy. Details about supported devices can be
found at:
http://www.shrubbery.net/rancid/
Rancid takes as input a list of device names and login
credentials. It then cycles through this list and attempts to log into each
device. Then, it runs a pre-determined set of commands at the CLI and
collects the responses. Subsequently, it tries to match the generated
output against a template to filter out incrementing changes and sensitive
information. For instance, this step can mask out plain-text passwords or
IPSec keys from the config file [1]. This output is compared, using diff,
against the one generated via the prior run of the same process. These
diffs are emailed to a set of admins, and, finally, the new file is checked
into a version control system. Both CVS and subversion are supported (as of
version 2.3.1).
Benefits
There are many uses for these collected configs
besides the obvious disaster recovery scenario. In the first place, it
becomes pretty easy to search for a line of config across multiple devices.
As a simple example, suppose you want to find all interfaces that lack a
"descr" line. A more involved example could be the case where
you want to find any switch ports that are non-trunk, have "spanning
tree portfast" enabled but may not have "bpdugard"
enabled. With rancid this information is available on a local file system, so instead of logging into every router, you can
parse, filter, and analyze it using standard
text-manipulation tools such as grep, awk, or Perl.
The second win is for inventory control and patch
management. Since all routers are already in
the configuration management system, a report can be generated enumerating
the code version across them. Depending on the device type, a list of
modules can be created allowing you to track, for instance, how many of
your network blades are Fast Ethernet (10/100) and how many are Gigabit
Ethernet (10/100/1000).
The third benefit is that the operator does not have
to log in to the device to look at a config or to confirm how to implement
something. He or she can simply pull up the configuration as a Web page
[2]. Human error is (therefore) reduced in proportion to the decreased
number of operations performed directly on CLI.
Fourth, change management and tracking are
automatically implemented. Hourly cron jobs can be set up to mail config
diffs to senior network engineers so that all changes can be tracked. After
the end of change windows, the tool can be run manually to report deltas
across all device configs.
Finally, rancid includes a looking-glass service.
This means that status check commands (such as "sho ip route
<ip>" or "sho ip bgp summary") can be run by
filling in a Web form. No direct ssh login to the router is required. You
can see some examples of looking glasses at:
http://bgp4.net/wiki/doku.php?id=tools:ipv4_looking_glasses
Installing the Dependencies
The following install instructions are based on Fedora
Core 5. You can modify them to function with your packaging system or
flavor of *nix:
# make sure the pre-requisites are installed
su -
yum install gcc
yum install make
yum install automake
yum install cvs # installs tcsh as a dependency
yum install perl
yum install expect # installs tcl as a dependency
yum install diffutils
exit
Installing Rancid
The actual installation is fairly easy:
wget ftp://ftp.shrubbery.net/pub/rancid/rancid-2.3.1.tar.gz
tar -xzvf rancid-2.3.1.tar.gz
cd rancid-2.3.1
view README
./configure --prefix=/home/rancid --localstatedir=/home/rancid/data/
su -
groupadd rancid
useradd rancid -g rancid
make install
chown -R rancid:rancid /home/rancid/
exit
Component Overview
At the end of the above steps, all rancid-related
files are installed under /home/rancid. The collected configurations will
live under /home/rancid/data. The main config file will be
/home/rancid/etc/rancid.conf. All interesting scripts live under
/home/rancid/bin. The login credentials will be stored in
/home/rancid/.cloginrc.
Bin/rancid-run is the top-level script and is
typically run via cron. It in turn calls bin/control_rancid. This is the
main script responsible for launching a collection process per router. More
than one router can be collected simultaneously. This script manages the
router.db files, which define the lists of routers that need to be
collected; it also sends out diffs via email post data collection.
Bin/control_rancid calls bin/rancid-fe. If you peruse this script, you will
discover that it acts as the switchboard that launches an appropriate
collection script for every platform. Here are a few lines from
bin/rancid-fe:
elsif ($vendor =~ /^baynet$/i) { exec('brancid', $router); }
elsif ($vendor =~ /^cat5$/i) { exec('cat5rancid', $router); }
elsif ($vendor =~ /^cisco$/i) { exec('rancid', $router); }
This implies that every platform has a {$plat}rancid script that
actually runs the collection commands [3]. These {$plat}rancid scripts are mostly
written in Perl. The final pieces of the puzzle are bin/{$plat}login scripts. These are
Expect scripts that perform the actual password exchange and provide the
authenticated CLI for the {$plat}rancid scripts to run commands on.
Also note that in addition to grabbing the config by
running sho running-config on Cisco, for example, the {$plat}rancid scripts run a multitude of informational commands, such as sho modules and sho vlan. As the output of
these commands is strictly informational in nature and is not required for
restoring device configuration, it is put in the repository preceded by a
comment character (e.g., a "!" for Cisco).
Configuring Rancid
Once you have Rancid installed, you're ready for
some configuration. To begin, edit the main config file. Only one line
needs to be updated:
# as user rancid
vi /home/rancid/etc/rancid.conf
Uncomment the LIST_OF_GROUPS variable. You can organize devices as groups based on
geographical location or based on administrative boundaries or device
types. For this example, I have created two groups: corp-routers and
corp-firewalls:
LIST_OF_GROUPS="corp-routers corp-firewalls"
If you have a large number of devices, it may make
sense to collect more than one at a time. To do this, set the variable PAR_COUNT to a higher number.
Two other useful variables relate to security. FILTER_PWDS can be set to NO if you
want to keep the passwords visible in the configs. Similarly, NOCOMMSTR can be set if
you want to hide the community strings in the configs. As an example of
password blanking, the following:
line vty 0 4
...
password S0mepMd%
is saved in the repository as:
line vty 0 4
...
! password <removed>
For every group defined in LIST_OF_GROUPS, Rancid requires two
email addresses: rancid-admin-<groupname> and rancid-<groupname>. This allows flexibility in assigning
roles to different "noc" personnel. Create the required aliases
according to your organizational structure:
su -
cat - << END_ALIASES >> /etc/aliases
rancid-admin-corp-routers: noc,bofh
rancid-admin-corp-firewalls: noc,bofh
rancid-corp-routers: noc
rancid-corp-firewalls: noc
noc: noc@<yourcompany.com>
bofh: <yourname>@<yourcompany.com>
END_ALIASES
newaliases
Manually send mail to these accounts to make sure the
setup works.
Login Credentials
Rancid needs to be provided with a login name and
password(s) such that configuration can be obtained from the device. These
credentials are entered in the file ~/.cloginrc. If you have a AAA system,
such as Cisco ACS server or OpenRadius, already installed, create a user in
that system and allow it enough privilege to run commands listed in the
corresponding bin/{$plat}rancid script. Let's assume that you have created a
user named "rancid". Then this user "rancid" must
be allowed to run the following commands for a Cisco router (line 1010
onwards in bin/rancid):
@commands=(
"show version",
"show boot",
"show flash",
"dir bootflash:",
"dir slot0:",
"dir slot1:",
"dir sup-bootflash:",
"dir sup-microcode:",
"show module",
"show port ifindex",
"write term all",
"write term"
);
Here is how to set up the .cloginrc file:
# as user rancid
cd ~
cat - << END_CLOGIN >> .cloginrc
add user * rancid
add password routername1 LoginPwdA EnablePwdA
add password *.sfo LoginPwdB EnablePwdB
add method routername3 ssh
add method * telnet
END_CLOGIN
As shown above, wildcards can be used to specify
multiple devices. This file must be accessible by the owner only:
chmod 600 .cloginrc
Test, Test, Test
At this point, you should be able to log into your
devices using the bin/${plat}login scripts. For Cisco, this script is bin/clogin.
To test, run:
$ bin/clogin routername1
spawn ssh -c 3des -x -l rancid routername1
rancid@routername1's password:
routername1>enable
Password:
routername1#
Now you are logged in as you would be by manually
typing in your username and password. Type in a command or two to verify
functionality:
routername1#sho clock
*13:43:17.136 PDT Wed Apr 5 2006
routername1#exit
Connection to routername1 closed.
$
Create the Repository
The next step is to create the initial directory
structure that will store the configs. To do this, run the initialization
command:
# as user rancid
bin/rancid-cvs
By running this command, a separate directory will be
created under /home/rancid/data/ for every group specified in the LIST_OF_GROUPS variable in
etc/rancid.conf. Now you can add the names of the actual devices into
rancid. For example, to put routername1 and rtr2 in corp-routers and fw1
and fw2 in corp firewalls, you need to edit the /home/rancid/data/$groupname/router.db files:
cat - << END_RTRS >> /home/rancid/data/corp-routers/routers.db
routername1:cisco:up
rtr2:juniper:up
END_RTRS
cat - << END_FWS >> /home/rancid/data/corp-firewalls/routers.db
fw1:netscreen:up
fw2:cisco:up
END_FWS
Note that the file format is <devicename>:
<devicetype>:<status=collect if "up">. The list
of all possible device types can be determined by peering into the
bin/rancid-fe script.
Finally, you can run the rancid collection on all
devices:
# as user rancid
bin/rancid-run
If there is a problem, check the log files in /home/rancid/data/logs:
cd data/logs
tail -f `ls -tr | tail -1`
Note that a new log file is created per group per run.
Day-to-Day Operation
It is typical to run rancid every few hours (or
hourly) via a cronjob. A second cron job can be used to clean out old log
files:
# as user rancid
crontab -l
* */4 * * * /home/rancid/bin/rancid-run
# clean out config differ logs
45 22 * * * /usr/bin/find /home/rancid/data/logs -type f
-mtime +4 -exec rm {} \;
After the initial install phase, adding a new router
group is a simple process:
1. Edit LIST_OF_GROUPS in etc/rancid.conf.
2. Add the two corresponding aliases in /etc/aliases; run newaliases.
3. Create subtree for this group using bin/rancid-cvs.
4. Add the routers in router.db for the new group.
Either run collection by hand i.e. rancid/run or wait
for the next cron collection cycle.
Also note that it is easy to collect individual
routers by using -r <routername> as an argument to
bin/rancid-run. This comes in useful to document config changes after a
change window.
Adding a Repository Viewer
So far, we have a deployed a great data collection
system, but the only interface it has is CLI. To enhance usability and
access to the data, we can add a Web wrapper. CVSweb is the perfect
workhorse for this purpose. Here is how to install it on Fedora Core 5:
yum install httpd
yum install cvsweb
yum install cvsgraph # optional
The only required configuration is to edit cvsweb.conf
and point it at the right repository:
vi /etc/cvsweb/cvsweb.conf
# change the below line
'local' => ['Local Repository', '/home/rancid/data/CVS'],
# now start httpd
/etc/init.d/httpd start
You should be able point your browser http://yourserver/cgi-bin/cvsweb.cgi and click through your configurations with ease. For added security, my own
setup uses mod_ssl and mod_auth_ldap to demand authentication over an SSL
channel before allowing access to either the
data or the looking glass, which I will cover next.
Looking Glass
To install the looking-glass scripts, copy bin/lg.cgi
and bin/lgform.cgi to the cgi-bin directory of your Web server. The
simplest way to deal with permissions is to make sure that Apache (or any
other http server) is running as the same user that owns the repository
(i.e., rancid):
# as root
perl -MCPAN -e 'install LockFile::Simple'
cp /home/rancid/bin/lg* /var/www/cgi-bin/
The config file for the looking glass is /home/rancid/etc/lg.conf. Edit this file and change $LG_CACHE_DIR to /home/rancid/data/tmp and LG_CLOGINRC to /home/rancid/.cloginrc.
You should now be able to browse to http://yourserver/cgi-bin/lgform.cgi and take the looking glass for a drive.
Summary
Rancid combines a slew of open source tools to create
a comprehensive, extensible, and compelling
configuration management system for network devices. If maintained
correctly, the config repository will save your chops in many a fix. I know
that it has saved mine many a time!
Footnotes
1. Make sure these are saved elsewhere manually, if
you decide to use password masking.
2. Cvsweb is needed in addition to rancid.
3. The Cisco script is simply called bin/rancid.
Anshuman Kanwar has dabbled in Unix administration,
datacenter design, and computer security for the past five years, realizing
in the process that perhaps the most valuble class he took in grad school
was "Time Management -- expedited". Currently he works at
Citrix Online as their Network Architect. He can be reached at: human@digitarchy.com.
|