Secure Your Code With Taint Checking

The Perl Journal September, 2004

By Andy Lester

Andy manages programmers for Follett Library Resources in McHenry, IL. In his spare time, he evangelizes about automated testing, shepherds the Phalanx project (http://qa.perl.org/phalanx/), and leads the Chicago Perl Mongers. He can be contacted at andy@petdance.com.

Your web code is under constant attack. You may not notice it, but go look in your server logs. You'll find lines like this:

139.114.236.27 - - [03/Sep/2004:15:03:23 -0500] "POST /cgi-bin/formmail.pl HTTP/1.0" 404 285

Automated robot programs started by no-good crackers are hunting down commonly known scripts at random, throwing bad data at random, trying to poke holes in security.

So what can happen if the bad guys find a poorly written program on a web server? Consider this overly simple example of a guestbook application. Somewhere on my web site, I have a page, guestbook.html, that looks like this:

Please leave your name and today's date<BR>
Date (MM-DD-YYYY): <INPUT TYPE="text" NAME="date"><BR>
Message: <INPUT TYPE="text" NAME="date><BR>
<INPUT TYPE="submit">

and posts to guestbook.cgi.

My intent was to have a user visit the page, type "09-08-2004" in the date field, and "Hi, Andy, this is a great page you've got here" in the message field. Then, my handy little guestbook.cgi script processes the input for me and writes it into a date-stamped log file:

#!perl -w
my $date = param('date');
my $logfile = "/var/guestbook/log-$date";
open( my $fh, ">>", $logfile ) 
   or die "Can't open $logfile: $!";
print {$fh} param('message'), "\n";
close $fh;

This script takes the date, builds a filename from it, and dumps the message into the file. It effectively creates a log file for each day that someone visits. But what if my visitor is not so nice? What if, instead of a date, he passes these values:

date="../../../../../etc/passwd"

and

message="badguy:$1$kG66hi8F$VY3ch9Iyii5BypV0c0SZn1:0:0::/bin/bash"

My little program will build $logfile as

/var/guestbook/log-$date../../../../../etc/passwd

which is my /etc/passwd file. Then, my program will happily write that line to the password file, creating a user record called badguy that has root privileges on my box. My script, because I didn't tell it to check the format of the inputs passed in, has given the bad guy complete access to my computer. That's no fun for anyone except him.

What happened? My program got data that I wasn't expecting it to get and hadn't planned for. That assumption leads to the program writing out data to places I didn't intend. A simple fix involving Perl's taint checking functionality would have prevented this problem. With the addition of the -T command-line switch to the shebang line in guestbook.cgi, like so:

#!perl -Tw

Perl would have complained as soon as my script was run:

Insecure dependency in open while running with -T switch
at guestbook.pl line 4.

This tells me that the program is trying to write to a file where the filename is tainted, and thus insecure. Thank Perl for my handy seat belt!

Turning on Taint Checking

Taint checking is a feature unique to Perl. Since it's part of the language runtime environment, I don't have to do anything other than turn it on. Perl will take care of the rest. However, since taint checking is so restrictive, and it doesn't cover everything you might think, it's important to know the details of how tainting works.

Taint checking is turned on whenever a program is run when the effective and real user or group IDs are different, on behalf of someone other than the owner. This is to prevent a program inadvertently performing unsafe actions with the privileges of someone else. When taint checking is on, any unsafe operations will throw a fatal erorr, and the program will stop.

I can force taint checking to be enabled with the -T command-line switch. There's also a -t command-line switch, where unsafe operations generate a warning instead of an error, but I never use it. Taint checking is for security; what good is having your program tell you "Hey, I just let you do something bad"? It may have a purpose, but I've never seen it.

Tainting is either on or off for the duration of the program. You can't turn it off while your program is running, like warnings with the no warnings pragma or modifying $^W. Tainting must also be turned on at the time the program starts with a command-line switch. This might be by calling my program like so:

$ perl -T taint-safe

or putting the -T on the shebang line at the top of the program:

#!perl -T

If I call a Perl program with a -T in the shebang line, but execute it via Perl without a -T, I get the notorious "Too late for -T" error:

$ cat shebanged
#!perl -T
....
$ perl shebanged
Too late for "-T" option at shebanged line 1.

Perl has to know immediately upon starting that taint checking is on, because the environment must get properly tainted. To get around this, either invoke Perl with the -T:

$ perl -T shebanged

or as a shell script:

$ ./shebanged

"Too late for -T" isn't the clearest error message, and I've been trying to get a more obvious one in future versions of Perl. It's also possible to get this error if the -T isn't the first switch on the command line or shebang line. Always put -T first.

The Rules of Tainting

The principles of tainting are simple:

Any data from outside the program is tainted. This includes any data that didn't start in the program, including data read from files, sockets, or even directory entries from readdir().
Tainted data may not be used in any operation that modifies files, directories, or processes, in an eval, or when invoking a subshell.
Only scalars may be tainted. Arrays, lists, and hashes cannot be tainted, but individual elements can. Hash keys are never tainted, even if they are created from tainted scalars. Also, undef is never tainted.

Taintedness is pervasive. Any tainted data in an expression renders the whole expression tainted. That's obvious in a case like this:

my $filename = $tainted_dir . "/" . $tainted_filename;

However, even if the expression can never be logically tainted, the taintedness rules apply:

my $zero = $tainted_data * 0;
# $zero is tainted, even though it's effectively constant.

This rule comes in handy when you want to check a value's taintedness. Here's the canonical is_tainted() function from the perlsec man page:

sub is_tainted {
    return ! eval { eval("#".substr(join("", @_), 0, 0)); 1 };
}

Note how all the parameters are joined together into one string, and then a zero-length string is extracted from that. Even though the string is always empty, it will be tainted if any of @_ is tainted.

The exception to the "any tainted data taints the entire expression" rule is the ternary conditional operator, so that in this case:

$x = $tainted_value ? "foo" : "bar";

$x will not be tainted, since this is the same as:

if ( $tainted_value ) {
    $x = "foo";
} else {
    $x = "bar";
}

Untainting

Once I have tainted data, I want to untaint it to use it. More precisely, I extract the good data from the tainted data using regular expressions and the numeric group match variables.

When I successfully match a string against a regular expression and use the parentheses to group an element of the regex, each subexpression matched is placed into the special variables $1, $2, and so on.

my $graffiti = "867-5309 Jenny";
if ( $graffiti =~ /(\d\d\d-\d\d\d\d)/ ) {
    $phone = $1;
    # $phone contains "867-5309"
}

In this example, I'm looking in a string for a simple phone number of three digits, a dash, and four digits. If it's found, it's put into $1. These capture variables are never tainted, and so can be used as part of my extraction of untainted data.

my $q = CGI->new;
my $user_input = $q->param( 'phone' ); # tainted
my $phone;
if ( $user_input =~ /^(\d\d\d-\d\d\d\d)$/ ) {
    $phone = $1;
    # Do something with the phone number
}

Note that you must check the return value of the regex match to see if the string matched. It is not sufficient to check to see if $1 has a value in it, since a failed regex match will not clear the value in $1. If the prior example was written as:

# BAD EXAMPLE: Do not do this
$user_input =~ /^(\d\d\d-\d\d\d\d)$/;
if ( $1 ) {
    $phone = $1;
    # Do something with the phone number
}

then if the regex did not match, $phone would get whatever value was last in $1 from the last successful match.

When I untaint, I want my regex to be as restrictive as possible. Note how my regexes have all been anchored at the beginning and end with the ^ and $ anchors (or \A and \Z if you prefer). Without them, my phone number regex would match a string like "something 867-5309 something else."

If I'm concerned with security, I don't want to take the approach of "take the good stuff out and leave the rest." This is especially true in the case of validating form fields that I've created myself. If they're not in the format I'm expecting, it must be because someone is being naughty.

How Not to Untaint

There are two bad ways to untaint. The first is to use an overly permissive regex, like /(.*)/. That will match the entire string and put it in $1. The result will be untainted, but it might as well still be tainted because nothing will have been checked with regards to its content. Don't fall prey to this trap: Write your untainting expressions to be strict. The time will pay off later in added security.

The other bad way to untaint is to use the value as the key to a hash, and then retrieve the key. Perl's hash keys aren't full-blown scalars but a special case of string, and thus have no taintedness associated with them. Therefore, doing something like:

my @commands = <STDIN>; # tainted input from STDIN

# Hash keys aren't tainted
$commands{$_} = 1 for @commands;

foreach ( sort keys %commands ) {
    # Execute the erroneously untainted command
    # VERY dangerous!
    system( $_ );
}

is extremely dangerous. This code reads commands from standard input, untaints the strings, and then executes them via the system command. It has removed the tainting without checking the content of the strings.

Web and Tainting

Now that you know the security of having taint checking watch your back, you'll want to start using it in as many apps as possible. For CGI programs, it's simple: Put a -T in the shebang line of your program, and Perl will respect it.

If you're using mod_perl, you'll need to have the PerlTaintCheck directive set to On in your httpd.conf. This will turn on taint checking for every mod_perl handler under Apache. Recall that taint checking is on at the time the Perl interpreter starts. Since mod_perl actually embeds the interpreter in an Apache httpd executable, Perl starts when Apache starts. This can cause problems with multiple developers, or multiple applications on the same server where one app is careful with its untainting, and the other isn't.

Taint Checking with CGI::Untaint

Although you may be able to get by with the aforementioned is_tainted() function, there are a number of CPAN modules available to help your journey into enhanced security.

Untainting is most commonly used in CGI programs, so it's no surprise to find CGI::Untaint at the top of the pack. CGI::Untaint combines the extracting of query parameters and untainting into one operation. For example:

my $q = new CGI;
my $untaint = CGI::Untaint( $q->Vars );
my $id = $untaint->extract( -as_integer => 'id' );

The $untaint object is initialized with all the CGI parameters. Then, the extract method looks for a parameter called "id," and validates it as an integer. If the id parameter is there, and passes the regex as an integer, extract returns the value, untainted. In any other case, it returns undef. You can also check for formats like printable characters, hexadecimal numbers, and so on.

The beauty of CGI::Untaint is its extensibility. The framework allows a programmer to add other formats of data to check for validity. The CPAN has add-on modules for data forms such as ISBNs, host names, IP addresses, and e-mail addresses. You can also write your own for your specific formats.

Even if you use mod_perl, you can still use CGI::Untaint by passing the args from your Apache request:

sub handler {
    my $request = shift;

    my $untaint = CGI::Untaint( $request->args );

    # Untaint as shown before...
}

Taint Checking with DBI and Class::DBI

Databases cause some problems for the programmer who's not careful with taint checking. Since there's no meaning of "tainted" outside of Perl, writing tainted data into a database and then reading it back out effectively untaints the data without checking it for validity. Worse, the data is in the database for an indeterminate amount of time after your program finishes running, so other programs might come along later and use it, not knowing that it hasn't passed stringent checking. Fortunately, since DBI 1.31 and higher, DBI provides support for tainting.

For each DBI handle, either database or statement, you can set the TaintIn or TaintOut parameters. If the TaintIn parameter is True, data being sent into DBI will be checked for taintedness. Currently, this includes "all the arguments to most DBI method calls," but may change in the future. The converse is the TaintOut parameter, which taints "most data fetched from the database" before it gets returned to your program. This most closely resembles the general principle of "any outside data is tainted." Finally, the Taint parameter is a combination of both TaintIn and TaintOut.

If you're wrapping your DBI calls with the excellent Class::DBI module, or its base class Ima::DBI, your tainting checks are taken care of for you. They turn on DBI's Taint parameter automatically.

Tainting for Module Authors and Test::Taint

Module authors should make sure their modules properly handle tainted data. If a module can't handle running under taint mode, it might be useless to a user who wants to run it in a taint-enabled application.

If you want to make sure that your module untaints data properly, or have code that returns data that the user expects to be tainted, you'll want to look at Test::Taint. Based on the standard Test::Builder framework, Test::Taint provides functions for verifying the functionality of your module.

For example, say I wrote a module that untaints hex values and I want a test file to verify its behavior. First, the test file will make sure that it's running under taint checking, create a test value, taint that value, and verify it's tainted:

taint_checking_ok();

# Make a tainted ID
my $hex = "deadbeef";
taint( $hex );
tainted_ok( $hex );

Then, the real testing happens. The validate_hex function, the one that I wrote, is called and the script verifies that the $newhex value is untainted and is the correct value:

# Check your validator
my $newhex = validate_hex( $hex );
untainted_ok( $newhex );
is( $newhex, 'deadbeef' );

Test::Taint provided all of these test functions and makes checking a breeze.

Wrap-up

Since web-based attacks are trivial to implement and indiscriminate in their targets, it makes sense to use Perl's unique taint-checking feature by default in all your web code. Taint checking also provides a simple way to remind you not to do unsafe things with your code. If you write code with use warnings and use strict by default, start adding -T to those defaults.

TPJ