Automating the Management of Network Devices through the Command Line
Alan Holt
Many communications devices, typically in ISP networks, can be managed via a command-line interface (CLI), either from a console port or remotely by logging in from a telnet client. Managing a large domain of devices, such as ATM switches, DSLAMs (Digital Subscriber Line Access Multiplexers), and routers, can be problematic [1], however, as it involves establishing a telnet session to each device and typing in a set of commands. Furthermore, there are security issues with using telnet for remote access, in that passwords are transmitted over the network in clear text.
Consider the scenario depicted in Figure 1, where management of an ISP's network is outsourced to an external organization. Direct telnet access, from the support organization's network, to the managed devices cannot be allowed because packets (containing password information) traverse the public Internet. This problem is resolved by establishing a secure shell session (using SSH [2]) to a jump server located within the ISP's network. A telnet client is then initiated from the jump server. Thus, remote management is a somewhat convoluted procedure involving the initiation of both SSH and telnet clients as well as multiple authentication procedures.
It would be useful to automate certain management functions such that they could be carried out on multiple machines. This can be done by piping a series of command statements through the telnet process. The command-line interface, however, does not really provide a convenient "API" for the purpose of scripting management functions. The output from commands will be presented in a format intended for viewing on the screen, which can make post-processing in a script difficult. In this article, I'll describe how to automate the remote management of a domain of network devices through their command-line interface using the Python programming language [3]. The scripts presented in the article are specific to the Lucent Stinger DSLAM but could be used, with some modifications, for other types of devices with command-line user interfaces.
Automated Management Using Bash
To begin, I will show the manual procedure for logging onto a DSLAM and issuing the date command. Rather than logging onto the jump server (with ssh) and then running a telnet client from the jump server to the DSLAM, I start an ssh tunnel. The diagram in Figure 2 depicts the telnet session through the ssh tunnel. The ssh tunnel is created with the command:
$ ssh -N jump.myisp.net -L 9012:st1.myisp.net:23 &
where jump.myisp.net is the jump server and st1.myisp.net is the DSLAM. In this case, the ssh tunnel is completed without a password. This is because I have my public-key configured on the jump server. Now I can telnet through the tunnel from my local machine (from within the firewall):
$ telnet localhost 9012
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
(City Stgr 1) Enter password:
User: myadm
Password:
myadm> date
Mon Feb 19 11:47:37 2007
myadm> quit
Connection closed by foreign host.
It is important to go through this process to ensure that you understand the login procedure for the type of device you're managing. We can see here, for the Stinger, that the user must supply a telnet password. On acceptance, the user is prompted for an account name and then another password (neither of the passwords are echoed to the screen). On successful completion of this procedure, the command-line prompt (myadm> in this case, which it is important to note) is displayed and the user can issue commands (date in this example). To clear down the ssh tunnel, simply kill the background process:
$ kill $!
Listing 1 shows the bash script get_info for delivering a sequence of commands to a number of DSLAM devices. The script reads lines from standard input (where it expects to receive a list of DSLAM addresses), then it creates an ssh tunnel to each DSLAM in turn (as a background process). The environment variable array LOGINSEQ contains the telnet password, user name, and user password sequence:
LOGINSEQ=('secret1' 'myadm' 'secret2')
These string values are delivered to the DSLAM (in sequence) in order to complete the telnet/user login process. This is done by echoing the strings to standard output, which is redirected to a telnet process. Note that the purpose of the (outer) for-loop of the script is merely to form a "unit" of code, the output of which is piped to the telnet client process. The CMDS array contains a list of commands issued to the DSLAM command line, in this case:
CMDS=('date' 'info' 'uptime' 'quit')
When you run the script, it waits for standard input. Enter the domain name (or IP address) of a network device (or alternatively redirect input from a file containing a list of devices):
$ ./get_info
st1.myisp.net
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
(City Stgr 1) Enter password:
User: myadm
Password: myadm> date
Thu Feb 15 11:06:26 2007
myadm> info
Platform : Lucent Stinger FS+
System Name : City Stgr 1
Serial Number : 1519531234
Software Version : TAOS 9.9.1 (stngrcm2)
Boot Version : TAOS 9.9.1
Installed Memory : 128MB
Controller Role : Primary
Hardware revision: 2.2 Model E - IP (Version B)
myadm> uptime
Current system time: 11:06:28
{ shelf-1 first-control-module } cm-v2 63 days 07:13:06
( PRIMARY )
myadm> quit
Connection closed by foreign host.
Automated Management with Python
The bash script mentioned previously has a number of limitations. Note the sleep commands that insert a delay (in seconds) between the echo of each string value. This is to give the DSLAM enough time to respond to a command (or password) before the script issues the next one. Thus, the script must assume that the DSLAM will respond within the specified sleep time. Here we have a trade-off between a delay that is too short (echoing the next value string before the DSLAM is ready), and a delay that is too long (thereby impacting on performance). In all honesty, issuing commands prematurely is not all that critical as device driver buffers will alleviate synchronization issues. However, it would be desirable to synchronize communication with the response of the device. For example, send the telnet password only after reception of the password prompt, instead of some after arbitrary delay.
With this bash solution, we cannot use the device output as feedback, because the flow of data through the pipeline is one way. In this section, I will present a solution using Python. The Python module telnetlib provides the class telnet, which is telnet client. Data can be sent over the telnet connection with the write method, and the response can be read with the read_until method. read_until will block forever, or until a timer expires if one is supplied as an argument. This gives us the opportunity to deal with error conditions if the expected response is not received.
Unfortunately, there is no corresponding telnetlib module for SSH (yet), but we can use the python-pexpect module. If you use a Debian-based Linux, then you can download and install python-pexpect with:
$ sudo apt-get install python-pexpect
The automgmt.py module (Listing 2) includes two classes, jump and stcon. The jump class establishes an ssh tunnel (to the DSLAM) through the jump server. The stcon class, which inherits the jump class (and calls its constructor), sets up a telnet session through the tunnel.
The jump class takes an argument, which is the name of a configuration file containing a number of parameters in the form key=value. If the argument is missing, the jump class assumes the name of the configuration file is myisp.conf. These parameter values (which, for the purpose of this article are purely fictitious) need to be modified to reflect your environment. My configuration file looks like this:
$ cat myisp.conf
TELPASSWD=secret1
USER=myadm
USRPASSWD=secret2
JMPSERVER=jump.myisp.net
JMPPROMPT=bash-2.05
PORT=9012
The private method _set_env reads the configuration file and stores the parameter values in env. env is a data dictionary, where env[key] = value.
The parameter env['JMPSERVER'] is the domain name (or IP address) of the jump server and env['PORT'] is the local TCP port of ssh tunnel. env['JMPPROMPT'] is the prompt we expect to see when the ssh tunnel is completed. env['TELPASSWD'] is the DSLAM Telnet password, env['USER'] is user name account and env['USRPASSWD'] is the account password.
I have my public-key installed on the jump server, so the ssh tunnel can be completed without a password. If you don't have a public-key set up, then jump will still work with a password. Merely add this line to the configuration file:
SSHPASSWD=secret3
The jump constructor method checks to see if the key SSHPASSWD exists. If it does, the private method _login is called and completes the ssh login session using password authentication. Next, I'll demonstrate jump. To begin, invoke the Python interpreter from the command line:
Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
Import all the classes (that is jump and stcon) from the automgmt.py module:
>>> from automgmt import *
Declare a jump instance and assign it to s:
>>> s = jump("st1.myisp.net")
>>> print s
<automgmt.jump instance at 0xb7ddb22c>
The declaration of s causes an ssh tunnel to be initiated to the jump server. From another console window, you can telnet to the target device through the ssh tunnel:
$ telnet localhost 9012
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
(City Stgr 1) Enter password:
Close the ssh tunnel, which in turn will terminate the telnet sesssion:
>>> s.tunnel.close()
Now establish a telnet session by calling stcon:
>>> t=stcon("st1.myisp.net")
As the stcon contructor calls the jump constructor, the ssh tunnel is set up automatically. Now issue the date command using the do_cmd method, which is included in stcon:
>>> t.do_cmd("date")
['Fri Feb 23 16:09:45 2007']
Finally, terminate the telnet session and the ssh tunnel:
>>> t.session.close()
>>> t.tunnel.close()
Some Example Scripts
Listing 3 shows a Python script info.py that issues the info command on a number of DSLAMs. It reads in a list of DSLAM addresses from standard input. The lines of output from the command are stored in a list of lists. The file devices.lst contains a list of domains names of DSLAMs (in this case just two):
$ cat devices.lst
st1.myisp.net
st2.myisp.net
For each DSLAM, the script displays the output of the info command:
$ ./info.py <devices.lst
Platform : Lucent Stinger FS+
System Name : City Stgr 1
Serial Number : 1519531234
Software Version : TAOS 9.9.1 (stngrcm2)
Boot Version : TAOS 9.9.1
Installed Memory : 128MB
Controller Role : Primary
Hardware revision: 2.2 Model E - IP (Version B)
Platform : Lucent Stinger FS+
System Name : Rovers Stinger 0
Serial Number : 1526535678
Software Version : TAOS 9.9.1 (stngrcm2)
Boot Version : TAOS 9.9.1
Installed Memory : 128MB
Controller Role : Primary
Hardware revision: 2.2 Model E - IP (Version B)
We may wish to do more than merely display the output of a command. To carry out further processing of the output, it would be useful to store it in some data structure. The proc_info function extracts each line of the response from the info command and stores the details in a data dictionary. On each line, the string to the left of the ":" is used as the key and the string to the right is the value:
def proc_info(data,t):
dd = {'device' : t.stinger}
for line in data:
if re.search(':', line):
key,value = line.split(':')
dd[key.strip().lower()] = value.strip()
return dd
The two lines of code below, store the output of the info command as a list of data dictionaries (one dictionary per DSLAM):
output = t.do_cmd("info")
info.append(proc_info(output,t))
Thus, if we want the serial number of each DSLAM, we would use the code extract below:
for dslam in info:
print dslam['device'], dslam['serial number']
If we are merely issuing commands to the devices and reading the output, then we could have just used a bash script. Any post-processing of the output could have been done with awk. What we cannot do with the bash script, however, is respond to feedback from commands. For example, the Stinger command get slot-info reports information on the hardware in a particular slot in the DSLAM chassis. It requires, as an argument, a slot id. The show command reveals the slots populated with cards:
myadm> show
Controller { first-control-module } ( PRIMARY ):
Reqd Oper Slot Type
{ shelf-1 slot-1 0 } UP UP stngr-72-gs-adsl-card
{ shelf-1 slot-2 0 } UP UP stngr-72-gs-adsl-card
{ shelf-1 slot-3 0 } UP UP stngr-72-gs-adsl-card
{ shelf-1 slot-11 0 } UP UP ep-72-hs-gs-adsl2plus
{ shelf-1 slot-12 0 } UP UP ep-72-hs-gs-adsl2plus
{ shelf-1 slot-13 0 } UP UP ep-72-hs-gs-adsl2plus
Thus, if we want information about the fourth card (which is in slot 11), we issue the command:
myadm> get slot-info { shelf-1 slot-11 0 }
[in SLOT-INFO/{ shelf-1 slot-11 0 }]
slot-address* = { shelf-1 slot-11 0 }
serial-number = 1624369876
software-version = 9.9
software-revision = 1
software-level = ""
hardware-level = " 9"
software-release = ""
The function get_slots extracts the ids (e.g., { shelf-1 slot-11 0 }) of the slots that are populated with cards and returns them as a list:
def get_slots(data):
slots = []
for line in data:
if re.compile("shelf").search(line):
slot = line.split("}\")[0] + " }"
slots.append(slot)
return slots
The code extract below issues the show command and processes the output to get the slot ids:
output = t.do_cmd("show")
slots = get_slots(output)
The slot ids (for one DSLAM) are stored in the list slots:
['{ shelf-1 slot-1 0 }', '{ shelf-1 slot-2 0 }',
'{ shelf-1 slot-3 0 }', '{ shelf-1 slot-11 0 }',
'{ shelf-1 slot-12 0 }', '{ shelf-1 slot-13 0 }']
For each slot in the chassis, we issue the get slot-info command:
for s in slots:
slot_info = t.do_cmd("get slot-info " + s)
for i in slot_info:
print i
The complete listing for slots.py is shown in Listing 4. Running the following command produces a long list of slot information:
$ ./slots.py <devices.lst
[in SLOT-INFO/{ shelf-1 slot-1 0 }]
slot-address* = { shelf-1 slot-1 }
serial-number = 153213l415
software-version = 9.9
software-revision = 1
software-level = ""
hardware-level = " 4"
software-release = ""
[in SLOT-INFO/{ shelf-1 slot-2 0 }]
slot-address* = { shelf-1 slot-2 }
serial-number = 1537129878
(and so on...)
Summary
The command-line user interface of a network device does not provide a convenient API for scripting automated management procedures. However, I have described a Python module that runs a telnet client through an SSH tunnel for just such a purpose. While the solution presented here is specific to a particular network device (the Lucent DSLAM), it could be adopted for any type of device with a command-line front end. These scripts have proved useful for a number of tasks, such as inventory tracking, mass upgrades of devices, and gathering of performance statistics.
Acknowledgments
Thanks to Dan Hunt and Dr. Chi-Yu Huang for their comments and suggestions.
References
1. Huang, C., K. Cheng, and A. Holt. 2006. An integrated manufacturing network management framework by using mobile agent. International Journal of Advanced Manufacturing Technology.
2. Barrett, Daniel J. and Richard E. Silverman. 2001. SSH: The Secure Shell. O'Reilly: USA.
3. Beazley, David M. 2000. Python Essential Reference. New Riders Publishing: Indianapolis.
Alan Holt is a technical consultant with IP-Performance in the United Kingdom. He was a senior lecturer at the University of Waikato in New Zealand, and before that he was a telecommunications consultant with AT&T in the UK. Alan holds a B.Sc. (Hons) degree in Computer Science and a Ph.D. in Telecommunications. He can be contacted at: aholt@ip-performance.co.uk.
|