Writing Email Auto-Responders
Hal Pomeranz
Email auto-responders are a quick and easy way to
extend the functionality of your email system. They can provide instant
feedback for potential customers, help guide existing customers toward
appropriate support resources, interface with CRM and helpdesk systems, and
just generally alleviate a lot of grunt work that would otherwise have to
be done by a human being. However, these programs typically run with
elevated privileges so extreme care must be taken when implementing such a
system.
This article covers the basics of writing a simple
auto-responder program and provides working Perl code that you can use as a
template for your own auto-responders. Along the way I'll also go
into some detail of the pitfalls you face when developing these kinds of
email auto-responders, which should help when you're trying to write
such a system on your own. The Perl code is so trivial that it's
easily ported to your preferred programming language, but I find Perl
particularly effective for this task because it is such a good text
manipulation language.
High-Level Overview
Auto-responder programs are generally invoked via an alias, such as:
sales: |/usr/local/bin/info
You may also specify command-line arguments to the
auto-responder program as part of the alias, but a difficulty is that these
command-line arguments are hard-coded for each alias and do not vary on a
per-message basis. Still, command-line arguments are sometimes useful. For
example, we might use the same auto-responder program for two different
aliases, but change the content of the outgoing reply message by specifying
a different directory of attachments in each alias:
sales: "|/usr/local/bin/info -d /var/attach/sales"
support: "|/usr/local/bin/info -d /var/attach/support"
Note the use of double quotes to group the
command-line arguments with the rest of the alias.
The program invoked by the alias generally runs with
the privileges of the mail server (MTA) process, typically root. Thus, it
is extremely important to code these auto-responders carefully to avoid
both programmer error and exploits resulting from malicious user input.
Note that the standard Sendmail distribution includes "smrsh" (the SendMail
Restricted SHell), a safer execution
environment for these kinds of aliases, but configuring smrsh is outside of the scope of this article.
Auto-responder programs should exit with an exit
status of zero, unless an error has occurred. If you don't explicitly
exit with zero status, you will typically see "Unknown mailer error" messages in your
postmaster email and mail server logs.
Parsing the Incoming Message
The auto-responder message will be handed the
incoming message via its standard input. The incoming message will be in
the typical "Unix mailbox" format, which you can think of as
having three major components. The first line of the incoming message will
be information from the message envelope -- typically the sender address and a time/date stamp. Following the envelope information will be a variable
number of lines of mail headers. The header
information is complete when you encounter a blank line. After the blank
line, all other lines in the message are considered to be part of the
message body.
Here is some simple Perl code to break the incoming
message into its component chunks:
$envelope = <STDIN>;
while (<STDIN>) {
last if (/^$/); # look for blank line
push(@headers, $_);
}
@msg_body = <STDIN>;
Actually, if you're concerned that the incoming
message might contain large attachments, you may not want to
"slurp" the entire body of the
message into memory as we do on the last line of the example above. In
fact, in most cases auto-responder programs can safely ignore the body of
the incoming message entirely. The exceptions
would be programs like mailing list managers that are looking for commands (subscribe, unsubscribe, etc.) embedded in the body of
the email messages. But for simple auto-responder programs, this
isn't an issue.
Dealing with Headers
The initial line of envelope information generally
looks like "From <sender> <date>".
For example:
From hal@deer-run.com Tue Sep 26 07:36:45 2006
Note that the envelope line uses "From<space>", which is different from
the "From: " ("From<colon><space>") header found later in the header lines. Really the only thing
interesting in the envelope line is the sender email address, which we can extract easily as follows:
$envelope = <STDIN>;
($sender_addr) = $envelope =~ /^From\s+(\S+)/;
Here we're exploiting the fact that the Perl
pattern match operator, when used in a list context, returns a list of the
sub-expressions (things in parentheses) in the regular expression.
The remaining headers are of the form "<header>: <value>", as in:
From: Hal Pomeranz <hal@deer-run.com>
Actually you are allowed to have multi-line headers such as:
Received: from somehost.example.com (...)
by mail.deer-run.com (...) ...
for <hal@deer-run.com>; ...
Notice that all lines after the first line of the
multi-line header must begin with whitespace.
But which headers are significant or important?
Obviously, for an auto-responder you must work out the sender address you want your reply to go to. In my auto-responders, I
generally will only reply to the sender address found in the envelope. Yes, the From: header is supposed to contain the sender's address and there may even be a Reply-to: header with a different
email address where you should direct responses (RFC2822 says that you
SHOULD obey the Reply-to: header, but it doesn't say that you MUST do so). However, as we
will see shortly, forging email headers is trivial. Forged headers are one
common vector for malicious outsiders who are trying to exploit your
auto-responder program. It is just slightly more difficult to forge the
envelope information, and so you may choose to regard this information as
more trustworthy. Wherever you take the sender address from, you should
still carefully verify this address using the techniques we will discuss shortly.
One header that may be useful to note is the Precedence: header. This
header was originally created in Sendmail as a mechanism for ordering the
mail queue so that more important messages got delivered first. However,
other software -- particularly mailing list management software
-- has adopted this header to tag traffic
generated via a mailing list or other bulk email program. In general, you
would like your auto-responder to avoid spamming large Internet mailing
lists, so you would probably like to recognize the Precedence: tags list, junk, or bulk and simply not reply if you find one of them.
Sanity Checking Sender Addresses
The biggest worry you have with these sorts of
auto-responder programs is a malicious user attempting to inject shell
commands and shell metacharacters into the sender address in order to
trigger bad behavior in your auto-responder program. One simple approach is
to just filter for obviously bad characters:
die "Hostile address: $sender_addr\n"
if ($sender_addr =~ /[<>|&;\\]/);
Actually, the right-hand side of an email address must
be a valid domain name or something that looks like a fully qualified
hostname. This means that the right-hand side of the email address should
only contain alphanumeric characters, period, and hyphen. Thus, we can make
our filtering expression a bit more aggressive:
die "Hostile address: $sender_addr\n"
unless ($sender_addr =~ /^[^<>|&;\\]+@[-\w.]+$/);
So here we're saying that everything to the left
of the @ sign cannot contain any of the following characters: <, >, |, &, ;, and \. On the right-hand side of the @ sign we have to have
something that looks like a valid domain name or hostname. While this
doesn't cover every possible valid email address, it's
"good enough" for 99% of the email addresses you'll
encounter on the modern Internet, while still filtering out the bad stuff
we're trying to avoid.
The other thing you want to avoid is generating an
auto-responder reply to somebody else's auto-responder or to an error
message from another mail server. Otherwise, you run the risk of getting
into an endless "shouting match" between two auto-responder
programs that are mindlessly firing emails back and forth to one another.
Auto-responder messages and mail system error messages are supposed to use
a sender address of MAILER-DAEMON, often represented simply as the sender address <>. When we receive and
email from this sender address, our auto-responder should simply exit
without replying. Also, when we generate a reply to a legitimate email
address, we should make sure our messages go out using MAILER-DAEMON as the message
sender.
With all of this in mind, here's a slightly updated version of our original example:
$envelope = <STDIN>;
($sender_addr) = $envelope =~ /^From\s+(\S+)/;
exit(0) if ($sender_addr eq '<>' ||
$sender_addr =~ /^mailer-daemon/i)
die "Hostile address: $sender_addr\n"
unless ($sender_addr =~ /^[^<>|&;\\]+@[-\w.]+$/);
while (<STDIN>) {
last if (/^$/);
exit(0) if (/^Precedence:\s+(list|junk|bulk)/);
push(@headers, $_);
}
Generating the Outgoing Message
Now that we've safely gathered the
sender's email address for our reply and have parsed through the
header information, it's time to think about actually generating the
response email. I find that the easiest thing to do is to simply invoke the sendmail binary
directly and feed the outgoing email into the sendmail process on the standard input (though I'm using
Sendmail for this example, the principles are the same for other mail
systems). One of the advantages to this approach is that it allows you to
generate your own custom mail headers.
You could use the standard Perl open() syntax to start the
Sendmail process:
open(MAIL,
"|/usr/sbin/sendmail -f '<>' $sender_addr");
Note the use of -f '<>' to specify the sender address for
the envelope of our outgoing email message. Our $sender_addr is specified on the
command line as the recipient of the email message -- whatever you put
in the To: header
in the message itself doesn't matter.
However, the problem with this syntax is that the
command line you specify in the open() call first gets passed to the shell, which means that if
we've somehow failed to filter out all of the dangerous shell
metacharacters from $sender_addr we could be opening ourselves up for a possible
compromise. If you're really paranoid like I am, you can use the
following alternate syntax to accomplish the same goal without ever
invoking the shell:
$pid = open(MAIL, "|-");
# The child executes this block.
unless ($pid) {
exec(/usr/sbin/sendmail,
'-f', '<>', $sender_addr);
die "exec() failed: $!\n";
}
# If we get here, we're the parent process.
die "fork() failed: $!\n" unless (defined($pid));
The open(..., "|-") tells Perl to fork(), which means to make an
exact duplicate copy (usually referred to as a child process) of the original process (the parent). Both processes continue
executing from the instant after the open() call. However, in the parent process the open() call returns the non-zero PID
of the child process (undef is returned if the fork() fails for some reason). In the child process the open() call returns 0.
In the example above, the child process recognizes
this return value and calls exec(), which causes the child process to go away and be replaced
by the sendmail command line specified in the exec() call. Specifying the sendmail command line as a list of separate arguments is a Perl
peculiarity that forces exec() not to invoke the shell, but instead simply run our
command line directly.
The other important detail of the open(..., "|-") syntax is
that the file handle created in the parent process (MAIL in our example) becomes the
standard input of the child process -- even after the exec() call replaces the
original child process with the new sendmail process. So we can compose our outgoing email message by
simply writing data to the MAIL file handle, just as if we had used the normal Perl open() syntax. When composing
the outgoing email message, you need to specify your mail headers (From:, To:, Subject:, etc.), followed by a blank line, followed by
whatever message body you wish to send. For example:
print MAIL<<"EoMessage";
From: Santa@Northpole.com
To: Little Elf <$sender_addr>
Subject: You'd better watch out!
You've been a naughty, naughty child this year!
EoMessage
close(MAIL);
exit(0);
You can absolutely make up any header information or
other message content you wish, which is why I'm so dubious about
headers like From: and Reply-to: in
incoming messages received by my auto responders. Note that if you
don't include a Date: or Message-Id: header, they will typically be added for you automatically
by your mail system.
When you're done composing the outgoing email
message, simply close() the MAIL file handle, which in turn closes the standard input in the child process
and causes sendmail to send the outgoing email and exit. You may then exit(0) in the parent process.
Wrapping Up
For a somewhat more interesting example, please see
Listing 1, which is the code for a little auto-responder I wrote to let
people know that a particular user has left the company and optionally what
his new email address is. The standard invocation for the program is:
smith: "|/usr/local/bin/natabot smith@other.org"
The email address argument is optional -- if
provided, the outgoing auto-reply tells the sender that this is the new
email address for the user he was trying to reach. Also note that the
auto-responder includes a copy of the original email in the outgoing
response, just in case the sender doesn't automatically make copies
of his outgoing email.
Beyond this simple example, it's really a case
of "the sky's the limit". Your auto-responder can
interact with external databases, just like the Unix vacation program keeps
track of sender email addresses in a database so that it doesn't
reply to a given mailbox more than once per week. Or the auto-responder can
automatically open up a support ticket in your helpdesk software or
generate a "potential sales lead" notice in the Sales CRM
system for your company. You can use MIME encoding tools from the CPAN and
other sources to send binary attachments with your auto-responder messages.
Or you can embed commands in the incoming email message and have the
auto-responder parse the message and take various actions (I recommend at
least using a digital signature technology like GPG for validation if you
contemplate something like this). Once you've got the basics down,
this really is quite a powerful tool to have at your disposal. Happy
coding!
Hal Pomeranz (hal@deer-run.com) is really just a very clever email auto-responder program that sometimes masquerades an independent consultant and the Technical Editor for Sys Admin magazine.
|