Article

dec2006.tar

Writing Email Auto-Responders

Hal Pomeranz

Email auto-responders are a quick and easy way to extend the functionality of your email system. They can provide instant feedback for potential customers, help guide existing customers toward appropriate support resources, interface with CRM and helpdesk systems, and just generally alleviate a lot of grunt work that would otherwise have to be done by a human being. However, these programs typically run with elevated privileges so extreme care must be taken when implementing such a system.

This article covers the basics of writing a simple auto-responder program and provides working Perl code that you can use as a template for your own auto-responders. Along the way I'll also go into some detail of the pitfalls you face when developing these kinds of email auto-responders, which should help when you're trying to write such a system on your own. The Perl code is so trivial that it's easily ported to your preferred programming language, but I find Perl particularly effective for this task because it is such a good text manipulation language.

High-Level Overview

Auto-responder programs are generally invoked via an alias, such as:

sales: |/usr/local/bin/info

You may also specify command-line arguments to the auto-responder program as part of the alias, but a difficulty is that these command-line arguments are hard-coded for each alias and do not vary on a per-message basis. Still, command-line arguments are sometimes useful. For example, we might use the same auto-responder program for two different aliases, but change the content of the outgoing reply message by specifying a different directory of attachments in each alias:

sales: "|/usr/local/bin/info -d /var/attach/sales" 
support: "|/usr/local/bin/info -d /var/attach/support"

Note the use of double quotes to group the command-line arguments with the rest of the alias.

The program invoked by the alias generally runs with the privileges of the mail server (MTA) process, typically root. Thus, it is extremely important to code these auto-responders carefully to avoid both programmer error and exploits resulting from malicious user input. Note that the standard Sendmail distribution includes "smrsh" (the SendMail Restricted SHell), a safer execution environment for these kinds of aliases, but configuring smrsh is outside of the scope of this article.

Auto-responder programs should exit with an exit status of zero, unless an error has occurred. If you don't explicitly exit with zero status, you will typically see "Unknown mailer error" messages in your postmaster email and mail server logs.

Parsing the Incoming Message

The auto-responder message will be handed the incoming message via its standard input. The incoming message will be in the typical "Unix mailbox" format, which you can think of as having three major components. The first line of the incoming message will be information from the message envelope -- typically the sender address and a time/date stamp. Following the envelope information will be a variable number of lines of mail headers. The header information is complete when you encounter a blank line. After the blank line, all other lines in the message are considered to be part of the message body.

Here is some simple Perl code to break the incoming message into its component chunks:

$envelope = <STDIN>; 

while (<STDIN>) { 
    last if (/^$/);       # look for blank line 
    push(@headers, $_); 
} 

@msg_body = <STDIN>;

Actually, if you're concerned that the incoming message might contain large attachments, you may not want to "slurp" the entire body of the message into memory as we do on the last line of the example above. In fact, in most cases auto-responder programs can safely ignore the body of the incoming message entirely. The exceptions would be programs like mailing list managers that are looking for commands (subscribe, unsubscribe, etc.) embedded in the body of the email messages. But for simple auto-responder programs, this isn't an issue.

Dealing with Headers

The initial line of envelope information generally looks like "From <sender> <date>". For example:

From hal@deer-run.com Tue Sep 26 07:36:45 2006

Note that the envelope line uses "From<space>", which is different from the "From: " ("From<colon><space>") header found later in the header lines. Really the only thing interesting in the envelope line is the sender email address, which we can extract easily as follows:

$envelope = <STDIN>; 
($sender_addr) = $envelope =~ /^From\s+(\S+)/;

Here we're exploiting the fact that the Perl pattern match operator, when used in a list context, returns a list of the sub-expressions (things in parentheses) in the regular expression.

The remaining headers are of the form "<header>: <value>", as in:

From: Hal Pomeranz <hal@deer-run.com>

Actually you are allowed to have multi-line headers such as:

Received: from somehost.example.com (...) 
        by mail.deer-run.com (...) ... 
        for <hal@deer-run.com>; ...

Notice that all lines after the first line of the multi-line header must begin with whitespace.

But which headers are significant or important? Obviously, for an auto-responder you must work out the sender address you want your reply to go to. In my auto-responders, I generally will only reply to the sender address found in the envelope. Yes, the From: header is supposed to contain the sender's address and there may even be a Reply-to: header with a different email address where you should direct responses (RFC2822 says that you SHOULD obey the Reply-to: header, but it doesn't say that you MUST do so). However, as we will see shortly, forging email headers is trivial. Forged headers are one common vector for malicious outsiders who are trying to exploit your auto-responder program. It is just slightly more difficult to forge the envelope information, and so you may choose to regard this information as more trustworthy. Wherever you take the sender address from, you should still carefully verify this address using the techniques we will discuss shortly.

One header that may be useful to note is the Precedence: header. This header was originally created in Sendmail as a mechanism for ordering the mail queue so that more important messages got delivered first. However, other software -- particularly mailing list management software -- has adopted this header to tag traffic generated via a mailing list or other bulk email program. In general, you would like your auto-responder to avoid spamming large Internet mailing lists, so you would probably like to recognize the Precedence: tags list, junk, or bulk and simply not reply if you find one of them.

Sanity Checking Sender Addresses

The biggest worry you have with these sorts of auto-responder programs is a malicious user attempting to inject shell commands and shell metacharacters into the sender address in order to trigger bad behavior in your auto-responder program. One simple approach is to just filter for obviously bad characters:

die "Hostile address: $sender_addr\n"
    if ($sender_addr =~ /[<>|&;\\]/);

Actually, the right-hand side of an email address must be a valid domain name or something that looks like a fully qualified hostname. This means that the right-hand side of the email address should only contain alphanumeric characters, period, and hyphen. Thus, we can make our filtering expression a bit more aggressive:

die "Hostile address: $sender_addr\n"
    unless ($sender_addr =~ /^[^<>|&;\\]+@[-\w.]+$/);

So here we're saying that everything to the left of the @ sign cannot contain any of the following characters: <, >, |, &, ;, and \. On the right-hand side of the @ sign we have to have something that looks like a valid domain name or hostname. While this doesn't cover every possible valid email address, it's "good enough" for 99% of the email addresses you'll encounter on the modern Internet, while still filtering out the bad stuff we're trying to avoid.

The other thing you want to avoid is generating an auto-responder reply to somebody else's auto-responder or to an error message from another mail server. Otherwise, you run the risk of getting into an endless "shouting match" between two auto-responder programs that are mindlessly firing emails back and forth to one another. Auto-responder messages and mail system error messages are supposed to use a sender address of MAILER-DAEMON, often represented simply as the sender address <>. When we receive and email from this sender address, our auto-responder should simply exit without replying. Also, when we generate a reply to a legitimate email address, we should make sure our messages go out using MAILER-DAEMON as the message sender.

With all of this in mind, here's a slightly updated version of our original example:

$envelope = <STDIN>; 
($sender_addr) = $envelope =~ /^From\s+(\S+)/; 
exit(0) if ($sender_addr eq '<>' || 
            $sender_addr =~ /^mailer-daemon/i) 
die "Hostile address: $sender_addr\n" 
    unless ($sender_addr =~ /^[^<>|&;\\]+@[-\w.]+$/); 

while (<STDIN>) { 
    last if (/^$/); 
    exit(0) if (/^Precedence:\s+(list|junk|bulk)/); 
    push(@headers, $_); 
}

Generating the Outgoing Message

Now that we've safely gathered the sender's email address for our reply and have parsed through the header information, it's time to think about actually generating the response email. I find that the easiest thing to do is to simply invoke the sendmail binary directly and feed the outgoing email into the sendmail process on the standard input (though I'm using Sendmail for this example, the principles are the same for other mail systems). One of the advantages to this approach is that it allows you to generate your own custom mail headers.

You could use the standard Perl open() syntax to start the Sendmail process:

open(MAIL, 
     "|/usr/sbin/sendmail -f '<>' $sender_addr");

Note the use of -f '<>' to specify the sender address for the envelope of our outgoing email message. Our $sender_addr is specified on the command line as the recipient of the email message -- whatever you put in the To: header in the message itself doesn't matter.

However, the problem with this syntax is that the command line you specify in the open() call first gets passed to the shell, which means that if we've somehow failed to filter out all of the dangerous shell metacharacters from $sender_addr we could be opening ourselves up for a possible compromise. If you're really paranoid like I am, you can use the following alternate syntax to accomplish the same goal without ever invoking the shell:

$pid = open(MAIL, "|-"); 

# The child executes this block. 
unless ($pid) { 
    exec(/usr/sbin/sendmail, 
         '-f', '<>', $sender_addr); 
    die "exec() failed: $!\n"; 
} 

# If we get here, we're the parent process. 
die "fork() failed: $!\n" unless (defined($pid));

The open(..., "|-") tells Perl to fork(), which means to make an exact duplicate copy (usually referred to as a child process) of the original process (the parent). Both processes continue executing from the instant after the open() call. However, in the parent process the open() call returns the non-zero PID of the child process (undef is returned if the fork() fails for some reason). In the child process the open() call returns 0.

In the example above, the child process recognizes this return value and calls exec(), which causes the child process to go away and be replaced by the sendmail command line specified in the exec() call. Specifying the sendmail command line as a list of separate arguments is a Perl peculiarity that forces exec() not to invoke the shell, but instead simply run our command line directly.

The other important detail of the open(..., "|-") syntax is that the file handle created in the parent process (MAIL in our example) becomes the standard input of the child process -- even after the exec() call replaces the original child process with the new sendmail process. So we can compose our outgoing email message by simply writing data to the MAIL file handle, just as if we had used the normal Perl open() syntax. When composing the outgoing email message, you need to specify your mail headers (From:, To:, Subject:, etc.), followed by a blank line, followed by whatever message body you wish to send. For example:

print MAIL<<"EoMessage"; 
From: Santa@Northpole.com 
To: Little Elf <$sender_addr>
 
Subject: You'd better watch out! 

You've been a naughty, naughty child this year! 
EoMessage 

close(MAIL); 
exit(0);

You can absolutely make up any header information or other message content you wish, which is why I'm so dubious about headers like From: and Reply-to: in incoming messages received by my auto responders. Note that if you don't include a Date: or Message-Id: header, they will typically be added for you automatically by your mail system.

When you're done composing the outgoing email message, simply close() the MAIL file handle, which in turn closes the standard input in the child process and causes sendmail to send the outgoing email and exit. You may then exit(0) in the parent process.

Wrapping Up

For a somewhat more interesting example, please see Listing 1, which is the code for a little auto-responder I wrote to let people know that a particular user has left the company and optionally what his new email address is. The standard invocation for the program is:

smith: "|/usr/local/bin/natabot smith@other.org"

The email address argument is optional -- if provided, the outgoing auto-reply tells the sender that this is the new email address for the user he was trying to reach. Also note that the auto-responder includes a copy of the original email in the outgoing response, just in case the sender doesn't automatically make copies of his outgoing email.

Beyond this simple example, it's really a case of "the sky's the limit". Your auto-responder can interact with external databases, just like the Unix vacation program keeps track of sender email addresses in a database so that it doesn't reply to a given mailbox more than once per week. Or the auto-responder can automatically open up a support ticket in your helpdesk software or generate a "potential sales lead" notice in the Sales CRM system for your company. You can use MIME encoding tools from the CPAN and other sources to send binary attachments with your auto-responder messages. Or you can embed commands in the incoming email message and have the auto-responder parse the message and take various actions (I recommend at least using a digital signature technology like GPG for validation if you contemplate something like this). Once you've got the basics down, this really is quite a powerful tool to have at your disposal. Happy coding!

Hal Pomeranz (hal@deer-run.com) is really just a very clever email auto-responder program that sometimes masquerades an independent consultant and the Technical Editor for Sys Admin magazine.