Using DNSBLs to Monitor Network Security
Luis E. Muñoz
Many email administrators are turning to DNSBLs
-- DNS Block Lists -- as useful weapons in the arsenal against
spam. There are DNSBLs covering many aspects of the security spectrum
related to spam. A brief sample of the overall focus of the most common
lists include:
- Open HTTP proxies
- Open SMTP proxies
- Zombies or trojaned machines
- Miscellaneous open proxies
- Hosts that send spam to spamtrap addresses
These lists continue to grow despite the efforts of
the community to educate the general public and, more importantly, the
administrators responsible for the operation or security of the network. No
matter how many security measures we implement in our network, the reality
is that a lot of computers in the public network and in our datacenters,
are compromised each day.
This article will introduce another useful application
for the DNSBLs. I'll show how to use this valuable information source
to diagnose and monitor the overall security level of a given network.
I'll do so by generating a sort of "reputation" or index,
based in the information collected from the lists themselves.
The code I will use for this, although simply an
example, is available from the Sys Admin Web site:
http://www.sysadminmag.com
The Lists
One of the first things to do is research the existing
DNSBLs. To save you some time, I will be using the following lists in this
article. Be aware that you must thoroughly understand how each list work
and what is really represented by a listing there:
- l2.spews.org (SPEWS level 2 list): IP
ranges listed here usually have long-standing problems that have not been
properly addressed by their operators. Alternatively, the IP ranges were
associated with spamming operations. Level 1 and Level 2 are
"severity levels" associated with a listing. Quoting from the
SPEWS FAQ, "... A common practice is to bounce based on the SPEWS
Level 1 list, and tag based on the SPEWS Level 2 list."
A listing in SPEWS probably is something you need to
address immediately as those listings have been known to grow progressively
to sizeable chunks of IP space. In this article I'll use the level 2,
which tends to include more IP space. In practice, this can work as an
early warning of an upcoming level 1 listing.
- psbl.surriel.com (Passive Spam Block
List): This list includes hosts that have sent spam to certain spamtraps
with the rationale that anyone can quickly and easily delist them.
In my experience, many spammers tend to hit
psbl's spamtraps, so this may be a good early warning of problems.
However, your mileage will vary.
- list.dsbl.org (Distributed Sender
Blackhole List): This list specializes in detecting SMTP relay hosts. An
open SMTP relay will likely be listed in dsbl.org pretty quickly when one
of the testers receives spam coming from it and subsequently tests the
relay.
This particular version of the list contains hosts
that have been included by trusted users of dsbl.org. My experience shows
that this is the most common variant of the list used to block email. For
statistical purposes, you may want to include unconfirmed.dsbl.org and
multihop.dsbl.org, which can also glean useful information about what is
leaking out of your network.
Note that it is actually hard for an ISP not to have its
customer-serving mail servers listed in multihop, so this list is less
common in mail filtering.
- spam.dnsbl.sorbs.net (SORBS' Spam
Database): This is the "Database of hosts sending to SORBS'
spamtraps". The SORBS' FAQs provide a thorough description of
what is listable in this DNSBL. A quick summary is that any host sending
spam, in the vicinity of a spammer or providing spam support can be listed.
- smtp.dnsbl.sorbs.net (Open SMTP Relay
Servers): A list of SMTP servers that can be used to relay spam, commonly
known as open relays.
- http.dnsbl.sorbs.net (Open HTTP Proxies):
A list of HTTP proxy servers that can be used by anyone without (or with
trivial) authentication.
- socks.dnsbl.sorbs.net (Open SOCKS
Proxies): A list of SOCKS proxy servers that can be used by anyone without
(or with trivial) authentication.
- web.dnsbl.sorbs.net (Web Servers with
Spammer-Abusable Vulnerabilities): It is a well-known fact that notorious
bugs in common Web applications can be exploited to send bulk email, making
it seem to come from the Web server hosting those faulty applications. This
list is designed to catch those hosts and prevent spam sent in this way
from spreading.
- misc.dnsbl.sorbs.net (Non-HTTP and
Non-SOCKS Open Proxies): Any other type of open proxy will be listed here.
This deals mostly with worm infections that cause a host to become a
"zombie". Zombies are commonly used to send huge amounts of
spam for the benefit of those who control them. Those zombies are also used
to orchestrate DDoS attacks.
- sbl.spamhaus.org (Spamhaus Block List):
Straight from the Spamhaus site, "The SBL is a realtime database of
IP addresses of verified spam sources ... maintained by the Spamhaus Project team and supplied as a free service to help
email administrators better manage incoming email streams."
A listing in the SBL, especially if it is associated
with a spammer in the Register of Known Spamming Operations (ROKSO), can be a very serious matter warranting prompt
attention.
- xbl.spamhaus.org (Exploits Block List):
Again, straight from the Spamhaus site, this list is "a realtime
database of IP addresses of illegal third party exploits, including open
proxies..., worms/viruses with built-in spam engines, and other types of
trojan-horse exploits."
The data in the XBL is composed from two public lists:
the CBL and the NJABL Open Proxy list. Listings in the XBL, just as
listings in other similar DNSBLs, are reliable indicators of host security
compromise.
Obtaining List Feeds
There are many more DNSBLs that work under different
guidelines, so do your own research before selecting one or more as part of
an indicator for your network. I've chosen these lists for my own
use, because they measure variables consistent with the indicators I need
to track in my job.
The next step is to obtain regular copies or
"feeds" of the lists. Those feeds are used when you want to set
up a mirror for the list for your own use and contain all the data in a
file you're supposed to pass to your DNS server. In our case,
we'll be concerned with using the information for our statistics.
While this process is specific to each list, most
include the use of either wget, curl, or rsync. These are examples of the
crontab entries I use to fetch some of these lists:
16 9 * * * cd $DNSBL_WORK_DIR && \
rsync -q psbl.surriel.com::psbl/psbl.txt ./psbl.surriel.com && \
cp ./psbl.surriel.com ../lists
21 7,20 * * * cd $DNSBL_WORK_DIR && \
curl -s -o ./l2.spews.org \
http://www.spews.org/spews_list_level2.txt && \
cut -f1 '-d ' l2.spews.org | egrep '^[0-9]' | \
sort --temporary-directory=/var/tmp| uniq > ../lists/l2.spews.org
3 10 * * * cd $DNSBL_WORK_DIR && \
for i in http misc smtp socks spam web; do \
rsync -q rsync://rsync.us.sorbs.net/rbldnszones/ \
$i.dnsbl.sorbs.net .;\
cut -f1 '-d ' $i | egrep '^[0-9]' | \
sort --temporary-directory=/var/tmp | uniq > ../lists/$i; done
Notice the seemingly odd times I use for the entries.
This is done to help randomize the hits on the sync servers operated by the
lists. Also, please make sure you obtain permission from the list operators
before mirroring their data. Each list usually has information about how to
request and obtain copies of its data that you should review before trying
to obtain their source.
As you can see, these crontab entries leave the list
data in a mnemonically named file in the $DNSBL_WORK_DIR/lists/ directory,
which I will be using throughout this article as the repository for list
information.
One of the most important metrics that can be
extracted from this list data is the proportion of our IP space that has
been listed on each DNSBL. This is in fact what this article is about.
Tracking this variable over time can give you useful insights about what is
really going on in the network and what are its most visible symptoms. More
importantly, it can show you whether what you're doing to secure your
network is working and to what degree.
Additionally, this kind of analysis can provide a
robust metric for security incidents, which can help justify the business
case for special tools or resources for your area. I've been able to
use information like this a few times.
We will calculate this variable once a day and store
this sample in a database. Later, we will produce a simple graph that shows
the progress of this variable over time. Let's begin with the tools
we'll need:
- sqlite3 (Database)
- rrdtool (Graph)
- Perl
- NetAddr::IP (Perl module to handle IP
addresses -- use the latest version)
- Class::DBI (an OO interface to databases
supported by Perl's DBI)
I'll assume you can install these packages if
you don't already have them. As for the database, the examples in
this article are quite portable and simple, so you can use whatever you
have around.
The first step is finding out which IP addresses are
included in which lists. A little script called "count-ips" can
take care of that. Let's see its interesting parts:
3 use strict;
4 use warnings;
6 use IO::File;
7 use File::Find;
10 use NetAddr::IP 4.00;
13 use Getopt::Std;
14 use vars qw/$opt_i $opt_s $opt_h/;
15 my $opts = 'hi:s:';
16 getopts($opts);
Lines 3-16 load the modules we will be using and parse the command-line options:
18 die <<HELP
19 Usage: count-ips [-h] [-i isp-ip-space] [-s source]
...
34 HELP
35 if $opt_h;
Lines 18-35 use the HEREDOC syntax for specifying
multi-line strings in Perl in order to produce some help text when count-ips is invoked with the -h option:
41 my @input = (); # The IP space itself
43 my $fh = IO::File->new($opt_i, "r")
44 or die "Unable to open input file $opt_i: $!\n";
Lines 41-44 prepare @input, the variable where the IP space we want to match against
the DNSBLs will be stored. This IP space will be read from a file that will
be available through the $fh filehandle:
46 while (my $l = $fh->getline)
47 {
48 $l =~ s!(?:\#.*$|\s+)!!g;
49 chomp $l;
50 next unless $l =~ /\S/;
51 my $ip = new NetAddr::IP $l;
52 unless ($ip)
53 {
54 warn "$opt_i:$.: Invalid IP address $l\n";
55 next;
56 }
58 push @input, $ip;
59 }
This loop at lines 46-59 reads one line at a time from
the given file, stripping whitespace and Perl comments. This allows for
some familiar formatting and documentation to be included in the file.
Blank lines are skipped. The rest -- hopefully a
CIDR block or lone IP address -- is passed to NetAddr::IP, which can
understand most formats for IPv4 network specification (CIDR, range
notation, etc.). If NetAddr::IP fails to recognize an IP address or subnet,
its ->new() method returns undef, which then causes a warn() to output a message referencing the bogus entry in the
input file. The $. variable
holds the line number of the last filehandle read, which allows the script
to generate a more useful message:
61 @input = NetAddr::IP::Compact @input;
62 my @ire = map { qr/$_/ } map { $_->re } @input;
After this is done, NetAddr::IP::Compact() is used to merge contiguous
ranges of IP space. That is, two contiguous /25s will become a single /24.
IP blocks will also be nicely sorted as a side effect. This happens at line
61.
Line 62 uses some map magic to convert each subnet to
a Perl regular expression that will match any IP address within it. This is
a handy feature for this application, and here is why. In many cases, the
DNSBLs contain lots of /32s. The overhead of parsing the IP address out of
its textual representation to use NetAddr::IP->contains() method to determine
whether that IP address is listed is usually too high. Alternatively, the
/32s could be simply matched against the regular expression, which is
normally much faster.
Note that two maps are used at line 62. This is so
because in the future, ->re() might return an array of regular expressions. Whatever the
case, this leaves a list of regular expressions that will match any IP
address within our network in @ire.
Next comes the slightly complex matching function,
responsible for reading in each DNSBL source file and checking it against
the @input and @ire representations of our
IP space. Let's analyze it piece by piece:
66 sub slurp_n_check
67 {
68 my $file = $File::Find::name;
69 if ($file =~ m/[\[\]\(\)\{\}\$]/)
70 {
71 warn "Possibly dangerous filename: $file - Skipping\n";
72 return;
73 }
For paranoia's sake, lines 68-73 verify that
the file name supplied by File::Find is not dangerous. (We'll use
this module later to scan the directory where the DNSBLs sources are.)
Basically, the code looks for potential variable interpolations or
metacharacters in the file name and refuses to work with them:
75 my $fh = IO::File->new($file, "r");
77 unless ($fh)
78 {
79 warn "Failed to open $file: $!\n";
80 return;
81 }
If the filename is considered safe enough, it is
opened for reading with IO::File. If the open fails, a suitable warning is
returned and this DNSBL is skipped. This happens at lines 75-81.
The loop at lines 83-140 reads and processes each line
on the DNSBL source file. This seemingly simple task is implemented with
comparatively much code, as a few tradeoffs are done for speed:
87 $l =~ s!(?:\s+|/32$)!!g;
Line 87 removes the (for us) useless /32 mask from some entries:
89 if ($l =~ m!/(\d{1,2})$!)
90 {
91 my $m_len = $1; # Cheap mask len
92 my $ip = new NetAddr::IP $l;
93 unless ($ip)
94 {
95 warn "$file:$.: Invalid IP spec <$l>\n";
96 next IP;
97 }
98
99 for my $n (@input)
100 {
101 if ($n->masklen < $m_len)
102 {
103 if ($n->contains($ip))
104 {
105 print "$file $ip ",
106 $ip->broadcast->numeric - \
$ip->network->numeric + 1,
107 "\n";
108 next IP;
109 }
110 }
111 else
112 {
113 if ($ip->contains($n))
114 {
115 print "$file $n",
116 $n->broadcast->numeric - \
$n->network->numeric + 1,
117 "\n";
118 next IP;
119 }
120 }
121 }
122
123 }
Lines 89-123 take care of the case where the address
has a netmask. In this case, we parse it using NetAddr::IP and use its ->contains() method to
find out how to report the listing. Lines 93-97 take care of the
potentially corrupt entries that cannot be parsed by NetAddr::IP.
126 else
127 {
128 for my $re (@ire)
129 {
130 if ($l =~ m/$re/)
131 {
132 print "$file $l 1\n";
133 next IP;
134 }
135 }
136 }
Lines 126-136 take care of the simpler case of a lone
/32, now without a mask. This can simply be matched against the regular
expressions in @ire,
reporting in the same way as for the previous case:
139 close $fh;
Line 139 explicitly closes the $fh pointing to the DNSBL data file:
143 File::Find::find
144 (
145 {
146 no_chdir => 'yes',
147 wanted => \&slurp_n_check,
148 },
149 $opt_s
150 );
Finally, lines 143-150 summon File::Find to scan the
directory provided with the -s option to our script and process it with the function
discussed before. Now I will create the file isp-networks, with content
like this:
# This is a list of all my IP space
200.11.128.0/17
...
A simple run of count-ips would produce output as follows:
$ mkdir results
$ count-ips -i isp-networks -s lists > results/hits-'date '+%Y%m%d''
$ head results/hits-20060910
lists/http.dnsbl.sorbs.net 200.11.177.138 1
lists/http.dnsbl.sorbs.net 200.11.178.204 1
lists/http.dnsbl.sorbs.net 200.11.181.28 1
lists/http.dnsbl.sorbs.net 200.11.182.113 1
lists/http.dnsbl.sorbs.net 200.11.182.114 1
lists/http.dnsbl.sorbs.net 200.11.182.115 1
lists/http.dnsbl.sorbs.net 200.11.182.187 1
lists/http.dnsbl.sorbs.net 200.11.182.58 1
lists/http.dnsbl.sorbs.net 200.11.182.59 1
lists/http.dnsbl.sorbs.net 200.11.182.62 1
In fact, I could now put this line in my crontab, as follows:
7 1 * * * cd $DNSBL_WORK_DIR && \
./count-ips -i ./isp-networks -s ./lists > \
./results/hits-'date '+%Y%m%d''; \
find ./results/ -type f -mtime +10 | xargs rm
This would produce a hits-<date> file with each day's result. It probably
isn't useful to generate more than one datapoint per day. Note that
many DNSBLs place a limit on the frequency for downloading their sources,
so this may also limit the number of samples per day you can generate. In
practice, I've been working with a single sample per day. This
provides good results as the trends are usually a few weeks long.
Note the find | rm added at the end of the command. This should allow you
to keep the last 10 days of results, in case you need them. You can adjust
this value based on the amount of listings you have (and disk space).
These hits files are very useful. As an example, here
is a script -- warn-count -- that can alert you whenever a critical part of your
network appears in one of your lists. The beginning of the script, up to
line 44 is very similar to count-ips, so I won't discuss it. The first relevant code is
found below:
46 my @C = (); # The critical IP space
48 my $fh = IO::File->new($opt_c, "r")
49 or die "Unable to open critical file $opt_c: $!\n";
50
51 while (my $l = $fh->getline)
52 {
53 $l =~ s!\#.*$!!g;
54 chomp $l;
55 next unless $l =~ /\S/;
56 my ($n, $d) = split(m/\s+/, $l, 2);
57 my $ip = new NetAddr::IP $n;
58 unless ($ip)
59 {
60 warn "$opt_c:$.: Invalid IP address $n\n";
61 next;
62 }
63
64 push @C, { ip => $ip, desc => $d };
65 }
Lines 46-65 are responsible for reading in the
"critical" file. This is a file in a format similar to the
input file of count-ips, which specifies network ranges that are critical. For instance,
your mail servers should be there. After the IP subnet specification, that
should be the first whitespace-separated column, a legend must be added.
This legend should be mnemonic, so that the warnings make more sense.
The critical space is stored in a list of hashrefs at @C:
70 @C = sort { $b->{ip}->masklen <=> $a->{ip}->masklen } @C;
Line 70 sorts @C so that the network specifications are arranged from
most specific to least specific. In some cases, you may want to have an
entry for a whole datacenter and then more specific entries for groups of
servers:
75 while (<>)
76 {
77 print $_ if $opt_f;
78 chomp;
79 my ($bl, $n, $num) = split(/\s+/, $_, 3);
80 my $ip = new NetAddr::IP $n;
81 unless ($ip)
82 {
83 warn "Unrecognized IP spec $n (line $.) - Ignoring\n";
84 next;
85 }
86
87 # Iterate through the critical networks
88 for my $c (@C)
89 {
90 if ($c->{ip}->contains($ip))
91 {
92 warn "Listing $bl ($ip) [$num hosts] matches $c->{desc}\n";
93 last;
94 }
95 elsif ($ip->contains($c->{ip}))
96 {
97 warn "Listing $bl ($ip) [$num hosts] contains $c->{desc}\n";
100 }
101 }
102 }
Lines 75-102 iterate through the hits being read and
match the critical networks. The sorting done at line 70 allows the last in
line 93 to break the loop early, avoiding unnecessary work.
Now, I'll create a "critical" file with the following contents:
# Our datacenter space
200.11.128.0/20 IP space for internal use
200.11.128.0/24 Firewalls
200.11.130.0/24 Misc public servers
200.11.132.0/24 Access servers
200.11.134.0/24 Misc private servers
200.11.130.0/25 Public mail servers
...
Warn-count could be run as in this example:
$ cd $DNSBL_WORK_DIR
$ ./warn-count -c critical ./results/hits-20060910
Listing lists/spam.dnsbl.sorbs.net (200.11.130.10/32) [1 hosts]
matches Public mail servers
Listing lists/spam.dnsbl.sorbs.net (200.11.130.10/32) [1 hosts]
matches Misc public servers
Listing lists/spam.dnsbl.sorbs.net (200.11.130.10/32) [1 hosts]
matches IP space for internal use
The report contains the input file, the IP block that
was found and the number of IP addresses it contains. If placed in the
proper crontab file, it would cause this information to be sent via email
to the person in charge of dealing with listings of critical
infrastructure. The example above would tell you that one of your public
mail servers was listed in spam.dnsbl.sorbs.net.
At this point, we know how to generate daily samples
of our IP space listings in the DNSBLs we're tracking. Now it's
time to store that information in a database, so that we can easily build
reports out of it.
Let's start with defining a schema like the one
shown in Figure 1. The "dnsbl" table will hold the name of each
of the DNSBLs we will consider. The "sample" table will hold
each of the individual samples that will describe our variable over time.
The file count.sql (Listing 1) contains the SQL
description of this schema in SQLite syntax.
To create the database in the file
$DNSBL_WORK_DIR/count.db, the following command can be used:
$ cd $DNSBL_WORK_DIR
$ sqlite3 count.db < count.sql
That's all there is to it, so now let's
move on to filling the database with samples. To make that job even easier,
let's use Class::DBI. This provides for a very nice OO wrap around
the database, making the code considerably cleaner.
Our Class::DBI-derived class will be in $DNSBL_WORK_DIR/lib/Net/Count.pm and the module will be
called "Net::Count" for lack of a better name. Within this
file, I will define three classes: Net::Count (the base class inheriting
from Class::DBI), Net::Count::Dnsbl, and Net::Count::Sample. Let's
see some code:
9 package Net::Count;
10 use base 'Class::DBI';
Lines 9 and 10 are pretty much it for Net::Count.
However, this allows for a handy place to put common code, if the need
arises. Most of my Class::DBI hierarchies have a common base like this,
just in case:
12 package Net::Count::Dnsbl;
13 use base 'Net::Count';
15 __PACKAGE__->table('dnsbl');
16 __PACKAGE__->columns(All => qw/id name/);
17 __PACKAGE__->has_many(samples => 'Net::Count::Sample');
Lines 12-17 define Net::Count::Dnsbl base properties:
The name of the table, the columns that will be managed by Class::DBI, and
the relationship with the Net::Count::Sample class, expressed in this case
with a call to ->has_many():
19 sub normalize_column_values
20 {
21 my $self = shift; # Object or string - Careful!
22 my $r = shift;
23 $r->{name} =~ s/\W/_/g;
24 }
Since we want to be sure that the names of the lists
do not contain dangerous characters, I provide a normalize_column_values() function. This is invoked by Class::DBI automatically every
time a row is to be inserted or updated in the table, and provides for an
opportunity to alter the data:
26 __PACKAGE__->set_sql(names => qq{
27 SELECT DISTINCT name
28 FROM __TABLE__
29 });
30
31 sub unique_names
32 {
33 my $sth = Net::Count::Dnsbl->sql_names();
34 $sth->execute;
35 map { $_->[0] } @{$sth->fetchall_arrayref(['name'])};
36 }
Finally, lines 26-36 use the extended ->set_sql()
provided by Class::DBI to provide a custom ->unique_names() method returning a list of unique DNSBL names in the
database. This will be useful later, when we try to generate the graphs
from our data:
38 package Net::Count::Sample;
39 use base 'Net::Count';
40
41 __PACKAGE__->table('sample');
42 __PACKAGE__->columns(Primary => qw/dnsbl_id sample_utime source/);
43 __PACKAGE__->columns(Other => qw/count/);
44 __PACKAGE__->has_a(dnsbl_id => 'Net::Count::Dnsbl');
Lines 38-44 define the basic details of the Net::Count::Sample. The relationship with the dnsbl table,
through the Net::Count::Dnsbl class, is expressed by Class::DBI->has_a().
Note the calls to ->columns(), which allow the specification of multi-column primary keys
as well as regular columns:
53 __PACKAGE__->set_sql(sources => q{
54 SELECT DISTINCT source
55 FROM __TABLE__
56 });
58 __PACKAGE__->set_sql(historic_data => q{
59 SELECT __ESSENTIAL__
60 FROM dnsbl, __TABLE__
61 WHERE
62 dnsbl.name = ?
63 AND sample.source = ?
64 AND sample.sample_utime >= ?
65 AND dnsbl.id = sample.dnsbl_id
66 ORDER BY sample.sample_utime
67 });
69 sub unique_sources
70 {
71 my $sth = Net::Count::Sample->sql_sources();
72 $sth->execute;
73 map { $_->[0] } @{$sth->fetchall_arrayref(['source'])};
74 }
Lines 52-74 use ->set_sql() again to produce the list of unique sources and also
to query the database looking for the latest samples corresponding to a
given DNSBL and source.
These 73 lines of code, completely abstract the
interaction with the database and allow our data loading script, dbi-count,
to be much simpler. Let's see its relevant parts:
47 while (<>)
48 {
49 chomp;
50 my ($bl, $num) = (split(/\s+/, $_, 3))[0, 2];
51 $T{$bl} += $num;
52 }
The first step is to load and summarize all the hits
in each DNSBL. This is done by a accumulating the total in a simple hash,
at lines 47-52:
55 Net::Count->connection($opt_d, undef, undef);
Line 55 initializes the Class::DBI machinery using
either the argument to the -d command-line flag or a default DSN specified earlier in the
code:
58 for my $bl (sort keys %T)
59 {
60 my $db_bl = Net::Count::Dnsbl->find_or_create(name => $bl);
61 my $db_sample = Net::Count::Sample->insert
62 (
63 {
64 dnsbl_id => $db_bl,
65 sample_utime => $time,
66 source => $opt_s,
67 count => $T{$bl},
68 }
69 );
70 # Some fancy output
71 print "$opt_s\t$bl\t$T{$bl}\n";
72 }
Lines 58-72 trivially insert each sample in the
database. Line 60 uses Class::DBI->find_or_create() to
either search for or create a new DNSBL entry
corresponding with the total we intend to store. This allows our database
to automatically adapt as we incorporate new DNSBLs to our monitoring,
sparing us from having to remember to add columns or modify scripts.
Lines 61-69 insert the sample corresponding to this
list, with this source, at the time the program is running.
Thanks to having used Class::DBI, our script is
extremely portable. It will support any database already supported by the
Perl DBI. Running the script could not be easier:
$ cd $DNSBL_WORK_DIR
$ ./dbi-count ./results/hits-20060910
Alternatively, you could modify the line in your crontab file where you invoke count-ips like this:
7 1 * * * cd $DNSBL_WORK_DIR && \
./count-ips -i ./isp-networks -s ./lists > \
./results/hits-'date '+%Y%m%d''; \
./dbi-count -d $DSN ./results/hits-'date '+%Y%m%d''; \
find ./results/ -type f -mtime +10 | xargs rm
This causes the results to be loaded to the database
as soon as the scanning of the DNSBL data is finished.
All that's left now is producing the eye candy
by converting the data stored in the database to a nice graph we can show
the boss. We will use rrdtool for this task. This process has two parts:
populating the RRD data and generating the graph.
To accomplish the first task, let's take a look
at count-rrd, a simple script that takes the results stored in the
database, creates the missing RRD files, and updates them. This makes the
process of maintaining the RRDs automatic, thus reducing our workload:
48 my @dnsbls = map { s!\W+!_!g; $_ } \
Net::Count::Dnsbl->unique_names;
49 my @sources = map { s!\W+!_!g; $_ } \
Net::Count::Sample->unique_sources;
Lines 48-49 use the ->unique_* methods defined in Net::Count::* to obtain the list
of DNSBLs and sample sources in the database. Note the use of map { ... }
as an additional measure against dangerous names coming from the database:
52 for my $d (@dnsbls)
53 {
54 for my $s (@sources)
55 {
56 my $name = "$path/$d-$s.rrd";
57 if (-f $name)
58 {
59 update_rrd($name, $d, $s, 86400);
60 }
61 else
62 {
63 init_rrd($name, $d, $s);
64 }
65 }
66 }
Lines 52-66 iterate through all possible combinations
of DNSBL and sample sources. If the RRD corresponding to that combination
is missing, the function init_rrd() is called. Otherwise, the update_rrd method is invoked:
72 sub init_rrd
73 {
74 my $file = shift;
75 my $dnsbl = shift;
76 my $source = shift;
77
78 # Which data sources and aggregator functions to
79 # include in each RRD
80
81 my @dss = (qw/listings/);
82 my @agg = (qw/AVERAGE MIN MAX/);
83
84 # Create the RRDs
85 RRDs::create
86 (
87 $file,
88 '--start' => '-1 year',
89 '--step' => 86400,
90 (map { "DS:$_:GAUGE:172800:0:U" } @dss),
91 (map { "RRA:$_:0.5:1:365" } ('LAST', @agg)),
92 (map { "RRA:$_:0.5:5:365" } ('LAST', @agg)),
93 (map { "RRA:$_:0.5:30:365" } ('LAST', @agg)),
94 );
...
98 # Verify creation errors
99 my $error = RRDs::error;
100 die "Failed to create RRD $file: $error\n"
101 if $error;
102
103 # Perform the update
104 update_rrd($file, $dnsbl, $source, 31536000);
105 }
Lines 72-105 define the ->init_rrd() method. This is a rather thin wrapper around RRDs::create() that
supplies the parameters to create the RRD as required.
This is precisely what allows count-rrd to
automatically adapt to new DNSBLs or sample sources. RRDs for new
combinations can be created on the fly. In fact, you can erase the RRDs and
rerun count-rrd. This will re-create the erased files:
107 sub update_rrd
108 {
...
115 my @samples = Net::Count::Sample->search_historic_data($dnsbl,
116 $source,
117 $time);
118 # Now update the RRD file
119 for my $sample (@samples)
120 {
121 # The actual update command
122 RRDs::update(
123 $file,
124 '-t' => 'listings',
125 $sample->sample_utime . ":" . \
$sample->count
126 );
127
128 # Verify update errors
129 my $error = RRDs::error;
130 warn "Failed to update RRD $file: $error\n"
131 if $error;
132 }
133 }
Lines 107-133 obtain data for a given period of time
using the Net::Count::Sample->search_historic_data() method, the
data is then passed to the RRDs::update() method for updating of the corresponding RRD file.
I can now update all my RRD files by invoking the following command:
$ cd $DNSBL_WORK_DIR
$ mkdir rrd
$ ./count-rrd rrd
This will create all the missing RRDs and update their
information. Of course, I could also add this step to the crontab file. The
new line would look a lot like this:
7 1 * * * cd $DNSBL_WORK_DIR && \
./count-ips -i ./isp-networks -s ./lists > \
./results/hits-'date '+%Y%m%d''; \
./dbi-count -d $DSN ./results/hits-'date '+%Y%m%d''; \
./count-rrd -d $DSN ./rrd; \
find ./results/ -type f -mtime +10 | xargs rm
All that's left is converting the RRDs into
graphs that can be put on Web pages -- probably with rrdcgi -- or
added to reports. For demonstration purposes, I prepared graph.sh, which
creates a very simple area graph with the lists I've discussed in
this article:
#!/bin/bash
rrdtool graph 'rrd/summary.svg' --imgformat SVG \
--start '-6 months' --end '-1 second' --step 86400 \
--width 600 --height 240 --lower-limit 0 \
--slope-mode \
--title 'Listing summary' --watermark 'SysAdmin Magazine' \
DEF:var0=rrd/lists_http_dnsbl_sorbs_net-default.rrd:listings:LAST \
DEF:var1=rrd/lists_smtp_dnsbl_sorbs_net-default.rrd:listings:LAST \
DEF:var2=rrd/lists_socks_dnsbl_sorbs_net-default.rrd:listings:LAST \
DEF:var3=rrd/lists_spam_dnsbl_sorbs_net-default.rrd:listings:LAST \
DEF:var4=rrd/lists_web_dnsbl_sorbs_net-default.rrd:listings:LAST \
DEF:var5=rrd/lists_misc_dnsbl_sorbs_net-default.rrd:listings:LAST \
DEF:var6=rrd/lists_list_dsbl_org-default.rrd:listings:LAST \
DEF:var7=rrd/lists_psbl_surriel_com-default.rrd:listings:LAST \
DEF:var8=rrd/lists_sbl_spamhaus_org-default.rrd:listings:LAST \
DEF:var9=rrd/lists_xbl_spamhaus_org-default.rrd:listings:LAST \
AREA:var0#FF0000:'rrd/lists_http_dnsbl_sorbs_net-default.rrd' \
AREA:var1#FFFF00:'rrd/lists_smtp_dnsbl_sorbs_net-default.rrd':STACK \
AREA:var2#FF00FF:'rrd/lists_socks_dnsbl_sorbs_net-default.rrd':STACK \
AREA:var3#CCDD22:'rrd/lists_spam_dnsbl_sorbs_net-default.rrd':STACK \
AREA:var4#0000FF:'rrd/lists_web_dnsbl_sorbs_net-default.rrd':STACK \
AREA:var5#0044EE:'rrd/lists_misc_dnsbl_sorbs_net-default.rrd':STACK \
AREA:var6#CCAAFF:'rrd/lists_list_dsbl_org-default.rrd':STACK \
AREA:var7#3322DD:'rrd/lists_psbl_surriel_com-default.rrd':STACK \
AREA:var8#888800:'rrd/lists_sbl_spamhaus_org-default.rrd':STACK \
AREA:var9#882288:'rrd/lists_xbl_spamhaus_org-default.rrd':STACK
Please see the documentation for rrdtool, as this
excellent package includes lots of options for customizing your graphs. As
shown, this script produced a graph similar to Figure 2 when fed with data
for a few months.
Another example of the look you can achieve can be
seen in Figure 3. This graph, in Spanish, shows the results of applying
this technique in a real network.
And this eye candy provides an excellent excuse to
finish this article, so you can go play with the rrdtool options for
changing how your graphics look. I hope this information proves as useful
to you as it has been for me.
Luis has been working in various areas of computer
science since the late 1980s. Some people blame him for conspiring to bring
the Internet into his home country, where currently he spends most of his
time teaching others about Perl and taking care of network security at the
largest ISP there as its CISO. He also believes that being a sys admin is
supposed to be fun.
|