Article apr2006.tar

Getting to Know Your Network -- Part III

Luis Enrique Muñoz

As you may remember, in the previous parts of this series, I presented a tool called aconfig that allows sys admins to execute combinations of configuration commands and Perl code -- ascripts -- on network devices. I showed how to use this to harvest the version and configuration information from network devices, and then place the relevant parts of this information into a relational database.

This results in a very nice inventory, which can be updated automatically using cron. But this is not a complete view of your network, by far. Networks interconnect many things like desktops, laptops, printers, servers, etc. Let's call these things "endpoints". In many networks, most endpoints have dynamic addressing, and some even move among network segments, such as when a laptop is unplugged from one office and plugged into another, or when a device associates itself with a rogue access point.

Capturing Address Data

Our strategy in this article will be to "sample" the ARP tables on all of the network devices to see what we can find out. This will show us which endpoints have been talking to various devices and tell us their physical address (i.e., MAC or Ethernet address), as well as their current IP address. Astute or paranoid readers will know that MAC addresses can be changed. One way to do this is to change the network card. Another way is through software, instructing the hardware to use a different address. In my experience, this is only an issue in incidents where a malicious individual is trying to wreak havoc in your network and, by the time that individual has gained a position where he or she can spoof a MAC address, you are in much deeper trouble.

We will undertake all of this with the help of a nice ascript, aptly called ascripts/arp-capture, reproduced below:

$ cat ascripts/arp-capture
%INCLUDE ascripts/include/setup%
%INCLUDE ascripts/include/save-arp%
The juicy parts are in ascripts/include/save-arp, which is heavily based on the includes written in the previous article, with the difference that now we add a time stamp to the name of the file. This will come in handy later if we want to do command-line manipulations with the resulting files:

$ cat ascripts/include/save-arp
show arp
%{
    package save::arp;
    use POSIX qw(strftime);
    use IO::Zlib;
    my $fh = new IO::File './output/' . $main::ADDR . '-' .
    strftime("%Y%m%d%H%M%S", localtime) . '.arp', "w";
    die "Failed to create output file: $!\n"
      unless $fh;
    print $fh $main::LAST;
    close $fh;
    undef;
}%
We should set this to run every few hours under cron. Keep in mind that for very large networks, you may want to split the list of devices and run an instance of aconfig for each chunk. This allows for quick parallelization of the task, with minimal impact on the network. To run a single instance from the command line, we issue the now familiar command:

   $ ./aconfig -c ascripts/arp-capture aconfig.hosts
As this command runs, files with names like output/10.64.106.129-20051020114846.arp will be created, with contents that look like:

Protocol  Address      Age (min)  Hardware Addr   Type  Interface
Internet  10.2.237.101       163  0011.2537.e368  ARPA  FastEthernet0
Internet  10.2.237.100        58  00e0.7d42.66ce  ARPA  FastEthernet0
Internet  10.2.237.103       151  0000.8642.f43e  ARPA  FastEthernet0
...
The next step is to add this information to the database. Each endpoint, recognized by the MAC address, will be added to the "endpoint" table. We will record which IP address was assigned at the time, through the "assignment" table, which will also tell us in which subnet we are seeing the endpoint. We will also record the sighting on the device on a given interface through the "sighting" table.

To do this, we will use the "scripts/arp2db" script (Listing 1), which parses the .arp files and adds data to our database. This includes a nice touch: a mapping between MAC or Ethernet addresses and device vendors, which helps you locate specific brands of network cards. This is useful, again, to detect rogue access points or WiFi network cards that are signaling either a rogue access point in the subnet or interface where this card was seen, or a machine acting as a bridge between its wireless card and the wired card. I will not cover all the details of arp2db, as it is quite similar to config2db.

However, let's look at the main part of arp2db to get a better understanding of how to use this database schema and how to extract information into it. The main action happens in a loop between the lines 83 and 156, where all the lines in the ".arp" file are read. Line 86 splits the columns of each line. Lines 89 and 90 make sure we skip lines that do not convey information we want, such as the report headings, lines with a different structure, lines that contain an incomplete ARP, etc.:

 83: while (my $line = <>)
 84: {
 85:     chomp $line;
 86:     my ($proto, $ip, $age, $mac, $class, $if) = \
             split(/\s+/, $line, 6);
 87:     my $vendor = undef;
 88:
 89:     next unless $age eq '-' or $age =~ /\d+/;
 90:     next if $mac eq 'Incomplete';
 91:
    ...
156: }
Next, lines 92 and 93 normalize the format used to store the MAC address. We remove separator symbols that may be used and convert the string to lowercase. Note that this could also be done in the Class::DBI -- derived classes were placed in MyConfigCDBI.pm, using the "deflate" facilities provided by Class::DBI:

  92:     $mac =~ tr/-:. //ds;
  93:     $mac = lc $mac;
Line 95 converts the IP address associated with the entry we just read from our input into a NetAddr::IP object for easy conversion. Line 97 determines the correct device name to use in this case, based on the filename of the file we're reading. This technique is simple although fragile. Another, possibly more robust, approach would be to have the ascript write the $main::ADDR variable to the .arp file and then parse it in this script.

In any case, if it becomes impossible to determine which device the entry is taken from, the entry is skipped at lines 99 through 103:

 95:     $ip = NetAddr::IP->new($ip);

 97:     my ($devif, $time) = get_device_from_file($ARGV);

 99:     unless ($devif)
100:     {
101:         warn "$ARGV cannot be mapped to an interface. Skipping\n";
102:         next;
103:     }
Lines 105-109 are responsible for finding a vendor code in the %ethers translation table and printing a rudimentary progress indicator:

105:     my $submac = substr($mac, 0, 6);
106:     $vendor = $ethers{$submac} if exists $ethers{$submac};
107:
108:     print join(' ', $ARGV, $devif->device, $devif->interface,
109:                $if, $mac, $time), "\n";
Lines 111 through 121 ensure that we create the endpoint or update the endpoint access time, so that it becomes possible to purge endpoints that have not been seen on a long time. We can also detect fresh endpoints, possible visitors plugging equipment in the corporate network, or new devices:

111:     my $ep = MyConfig::CDBI::Endpoint->find_or_create
112:         (
113:          endpoint => $mac,
114:          ($vendor ? (vendor => $vendor) : ())
115:         );
116:
117:     unless ($ep->time)
118:     {
119:         $ep->time($time);
120:         $ep->update;
121:     }
Lines 123 through 129 record the sighting of the endpoint in the interface if this record does not already exist:

123:     my $sighting = MyConfig::CDBI::Sighting->find_or_create
124:          (
125:           endpoint => $ep->endpoint,
126:           time => $time,
127:           device => $devif->device,
128:           interface => lc $if,
129:          );
Lines 131 through 147 find which subnets contain the assigned IP address of this endpoint. Normally, only one subnet should match. Warnings are produced if no subnet in the database matches the endpoint, meaning that there are devices we're not seeing. If more than one subnet matches, it likely means that some devices have an incorrect netmask configured in one of the interfaces:

131:     my @subnet = MyConfig::CDBI::Subnet->search_where
132:         ({
133:             first => { '<=', scalar $ip->numeric },
134:             last => { '>=', scalar $ip->numeric },
135:          },
136:          { logic => 'AND' }
137:         );
138:
139:     if (@subnet == 0)
140:     {
141:         warn "$mac ($ip) matches no known subnet(!)\n";
142:     }
143:     elsif (@subnet > 1)
144:     {
145:         warn "$mac ($ip) matches more than one subnet(!)\n";
146:         warn '    ' . $_->cidr . "\n" for @subnet;
147:     }
Finally, the assignment is recorded in the database:

149:     my $assignment = MyConfig::CDBI::Assignment->find_or_create
150:         (
151:          ip => scalar $ip->numeric,
152:          time => $time,
153:          endpoint => $ep->endpoint,
154:          (@subnet > 0 ? (cidr => $subnet[0]->cidr) : ()),
155:         );
To run it, you could do something like this:

$ find ./output -type f -name '*-20051020??????.arp' \
  | xargs ./scripts/arp2db
./output/10.64.106.129-20051020110654.arp CT-TC-ZUL fastethernet0 ...
./output/10.64.106.129-20051020110654.arp CT-TC-ZUL fastethernet0 ...
./output/10.64.106.129-20051020110654.arp CT-TC-ZUL fastethernet0 ...
./output/10.64.106.129-20051020110654.arp CT-TC-ZUL fastethernet0 ...
...
You can also use more complex incantations of the find command to process the most recent ".arp" files, for example. You would also need to clean up the directory, removing those files with more than a few days of age. This keeps your space consumption relatively constant.

All this new information can give you a few new tricks. Let's start with a pet peeve of mine and pretend that you would like to find rogue or hidden access points in your network. Now you could simply find vendors that are likely to represent a wireless network card, such as:

sqlite> select count(*) from endpoint where vendor
   ...> like '%Linksys%' or vendor like '%D-link%'
   ...> or vendor like '%Netgear%';
15
Remember that you may not see the access point itself. Likely, it will be set up as a bridge, which makes it invisible in layers 2 and 3 with normal techniques. But, in this case, you'll still see hosts attached to the access point.

We could also ask where in our network we've seen those endpoints. This might give us an idea of where to look, because every device that has seen the endpoint recently should remember it in its ARP table. But, don't celebrate just yet. Because multiple sightings are kept, using the "time" column to record when they took place, we'll filter by using a Perl one-liner to calculate the correct Unix time for us:

$ perl -MDate::Parse -e 'print str2time("Oct 20, 2005"), "\n"'
1129780800
$ sqlite3 config.db
sqlite> select distinct s.device, s.interface from endpoint e,
   ...> sighting s where e.vendor like '%Linksys%' or vendor
   ...> like '%D-link%' or vendor like '%Netgear%' and
   ...> s.endpoint = e.endpoint and s.time > 1129780800;
AP-01-PB-ESTE-BLDG1|bvi1
SW-06-P3-BLDG1|vlan1
SW-07-P3-BLDG1|vlan1
SW-08-P3-BLDG1|vlan1
SW-09-P3-BLDG1|vlan1
SW-10-P3-BLDG1|vlan1
SW-11-P3-BLDG1|vlan1
...
In our case, we have a legitimate access point at AP-01-PB-ESTE-BLDG1, because we got in there to get its configuration file; its name follows our standard scheme and is within our inventory:

sqlite> select hardware, software from device where
   ...> device='AP-01-PB-ESTE-BLDG1';
cisco AIR-AP1231G-A-K9 ...|IOS (tm) C1200 Software (C1200-K9W7-M)
However, it seems that something fishy is going on at Building 1, where our switches are picking up suspicious ARP entries on the third floor (that's what P3-BLDG1 stands for). Just remember that we're simply looking at the vendor code of the network card. A vendor can make wireless and wired network cards that will share the same vendor code, so further research is required. However, this tool gives you a healthy start because it quickly tells you where to start looking.

Adding nmap

To improve our knowledge of the network, we will use nmap, an excellent tool to perform network scanning, to probe the endpoints we've found. I'll assume that you either have that tool installed or know how to do it yourself. In this case, let's go ahead and install Nmap::Scanner, a Perl module that allows us to drive nmap from a Perl script:

$ perl -MCPAN -e shell
cpan> install Nmap::Scanner
...
After installing this module, we'll need a script to automate the scanning for us. Let's call it scripts/nmap2db and, as usual, I'll describe its most important parts. You can download the complete script (Listing 2) from the Sys Admin Web http://www.sysadminmag.com:

12: use constant HOSTS => 100;

14: MyConfig::CDBI->connection('dbi:SQLite:dbname=config.db');

16: my @eps = MyConfig::CDBI::Endpoint->search_where(
17:     { os => \ "IS NULL" },
18: );
At line 12, we tell the script how many addresses we want to scan with nmap. This number should be small enough that it does not take an unacceptable amount of time. With 100 hosts, it takes my laptop about 8 minutes to do a scan. Line 14 sets up the connection to the database, as customary. Lines 16 to 18 fetch the list of endpoints that have a NULL os column:

20: my %list = ();
21: for my $ep (@eps)
22: {
23:     my @addr = MyConfig::CDBI::Assignment->search_where(
24:         { endpoint => { '==' => $ep->endpoint } },
25:         { order_by => 'time DESC' }
26:     );
27:     next unless @addr;
28:     $list{NetAddr::IP->new($addr[0]->ip)->addr} = $ep;
29:     last if keys %list == HOSTS;
30: }
Lines 20 to 30 build a list that links each endpoint found previously with its most recent IP assignment. We use the search_where method provided by Class::DBI::AbstractSearch in MyConfigCDBI.pm:

32: my $scanner = new Nmap::Scanner;
33: $scanner->tcp_syn_scan(1);
34: $scanner->add_scan_port('80,25,135,137,139,445,443,110,113,53');
35: $scanner->guess_os(1);
36: $scanner->max_rtt_timeout(200);
37: $scanner->add_target($_) for keys %list;
38: my $r = $scanner->scan;
Lines 32 to 37 configure our scanner. You may want to verify these parameters according to your own network, especially for the list of ports to scan. Line 38 starts the scan for all the hosts found in previous steps:

39: my $hl = $r->get_host_list();

41: while (my $h = $hl->get_next)
42: {
43:     my $os = join(', ',
44:                   map { join '/', grep { defined $_ } $_->type, $_->vendor,
45:                         $_->osgen, $_->osfamily, $_->accuracy . '%' }
46:                   $h->os->osclasses);
47:     for my $a ($h->addresses)
48:     {
49:         next unless exists $list{$a->addr};
50:         $list{$a->addr}->os($os);
51:         $list{$a->addr}->update;
52:         last;
53:     }
54: }
Line 39 gets the list of scanned hosts, which is iterated through at lines 41 to 54. These lines simply compose a very verbose value for the os column and update the endpoint entry in the database.

With this, you could leave this script running within a loop like the following. Note that you will need root privileges so that nmap can craft the packets it needs for guessing the remote operating system of the device:

$ su -
Password:
# while true; do ./scripts/nmap2db; sleep 10; done
Note that the endpoints are always returned in the same order. Eventually, all the returned endpoints will be the ones that do not respond to pings or that cannot be scanned. This means that periodically, you will need to erase the old, unreachable endpoints. That is why the time column is there.

A very simple workaround for this is to throw in a Schwartzian Transform at line 21, so that it reads:

21: for my $ep (map { $_->[1] } sort { $a->[0] <=> $b->[0] }
                map { [rand, $_] } @eps)
With this change, the returned rows will be in random order, minimizing the issue.

Conclusion

Now that we have these tools at our disposal, we could write much more interesting queries that will tell us with greater precision what is the potential impact of a vulnerability. In the next part of this article, I will show how to put all this information to use as we generate maps and diagrams of the network.

Luis has been working in various areas of computer science since the late 1980s. Some people blame him for conspiring to bring the Internet into his home country, where currently he spends most of his time teaching others about Perl and taking care of network security at the largest ISP there as its CISO. He also believes that being a sys admin is supposed to be fun.