Article

Sendmail as Gatekeeper

Michael Schwager

If you have an Internet email account, you likely have encountered the problem of "spam", or "Unsolicited Commercial (or Bulk) Email" (UCE or UBE). It may already be a big problem for your organization or company. Beginning this month, Sys Admin will begin a three-part series on Sendmail and configuring it for denying spam. In this issue, we'll give you the basic foundation for understanding Sendmail. Next month, we'll cover the important anti-spam features; and in March, we'll wrap it up by discussing ways to be even more proactive in stopping mail before it enters your Intranet.

You and your users may already be combatting spam with filters on your mail clients. However, it would be most efficient if you could stop spam before it enters your domain. Sendmail, as the Internet's foremost Mail Transfer Agent (MTA), is becoming smarter at denying spam-suspicious email. Here are the facilities that Sendmail, as of version 8.9, contains to deny spam mail:

Disallowing relaying (the delivery of mail to addresses not in your domain, from a client host not located in your domain).
Blocking mail from envelope (MAIL FROM:) addresses that do not have a valid domain name.
Blocking mail based on the envelope sender's user and/or hostname.
Denying mail from hosts known to harbor spammers, or are misconfigured such that spammers are using it as a relay using the "Realtime Blackhole List" (discussed next month).
Blocking mail based on the contents of the message headers; for example, if the To: address is "friend@public.com" (used by some popular spam generating packages).

The first four items are set up when you configure sendmail "out of the box", the last item requires you to set up the rules to enable them. All of the items need at least some configuration to allow or disallow certain emails.

Sendmail Operation

Compiling and installing the Sendmail binary is liable to give you a false sense of bravado. The real "guts" of Sendmail are found in the configuration file. Indeed, it can seem intimidating. However, as with any great work, much of Sendmail's complexity comes from its genuine simplicity.

If you think of the Sendmail program itself as a black box, it does the following four things:

Receives a mail message.
Processes addresses (they may be changed somehow).
Figures out where the message should go next.
Delivers the message.

To do these things, it uses the sendmail.cf file. In it:

Operators make up a rule.
Rules are applied to addresses.
Rules make up rulesets.
Addresses get rewritten, and a mail delivery method is chosen.

Rulesets are traditionally referred to by a simple number (e.g., "Ruleset 3" or "Ruleset 0"), but as of Sendmail 8.7, named rulesets have been available (e.g., "Ruleset check_mail").

Sendmail has a definitive order for doing things. This order is pre-ordained in the source code. There is a famous "sequence of rule sets" diagram that bears repeating. It shows the basic paths that an address may take through Sendmail's rules:

The sequence of rule sets Diagram

For in-depth information about the program, I suggest buying a copy of Sendmail, Second Edition, by Bryan Costales and Eric Allman from O'Reilly and Associates (ISBN 1-56592-222-0). It is truly the Sendmail bible. Within the constraints of this article, however, I'll discuss Sendmail's operation by looking at a typical SMTP conversation. The client connects to the server at port 25 where our Sendmail daemon is listening:

Speaking?	Conversation	Type of address
client:	MAIL FROM: some_address@somewhere.com	Sender
us:	250 some_address@somewhere.com...
client:	RCPT TO: valid_recipient@valid_host.com	Recipient
us:	250 valid_recipient@valid_host.com...
client:	DATA
us:	354 Enter mail, end with "." on a line by itself
client:	sends its message, including headers and body.
	The last line is comprised of a single '.':
	From: some_address@somewhere.com	Sender
	To: valid_recipient@valid_host.com	Recipient
	Subject: Hi

	Hello!
	.
us:	250 Message accepted for delivery

That's four addresses. Here's what happens (refer to the sequence of rulesets): The envelope FROM: address (in the SMTP dialog) is run through Ruleset 3, then Ruleset 1, then the rule that appears in the S= flag from the mailer specification as determined by Ruleset 0, then through Ruleset 4. It is rewritten based on the rules in those rulesets, and the modified form is passed from one ruleset to the next ruleset. Thus, Ruleset 3 modifies the address, and the output of its work becomes the input to Ruleset 2, and so on.

Now for the envelope recipient (the RCPT TO: in the SMTP dialog). In reality, this address is one of the first to be processed. The address first gets processed by Ruleset 3. Since it is the envelope recipient address, it is then sent through Ruleset 0. The job of Ruleset 0 is to look into the future, and determine where this message is going. Is it a local or Internet user? Are there special delivery conditions? Ruleset 0 figures it out and selects a mailer. A mailer is just the method used to deliver this mail. The method includes the host or program we need to interact with, and any special rules that should be applied to the email addresses on this mail message, by filling in the blanks after the R= and S=.

Let's say Sendmail has determined that R=21 and S=11. So, the envelope recipient address continues (after Ruleset 3 and after determining the mailer) to go through and be processed by Rulesets 2, then 21, then 4.

Similarly, the header To: address is processed by Rulesets 3, 2, 21, and 4. The header From: address is a sender address, so it is processed by Rulesets 3, then 1, 11, and 4. After all this rewriting is accomplished, the mail message is written to a file and queued for redelivery (if necessary) and sent.

Installing Sendmail

In order to learn Sendmail, it's best to have a running copy. You can download Sendmail by pointing your Web browser to http://www.sendmail.org and following the instructions there. Sendmail runs on many versions of UNIX, including Linux, see:

http://www.sendmail.org/faq/section3.html#3.26

A Windows NT version is available from www.metainfo.com. I have not used that version; it claims to have a specialized GUI for configuration although I understand it uses a standard sendmail.cf file.

Additionally, you may want to get the Berkeley DB package. It comes highly recommended and is easy to install and implement. The most recent version is available at http://www.sleepycat.com.

Once you obtain the source code, you'll need to compile it and install it. I will assume that you are installing the Berkeley DB package, as well. Installing Sendmail goes like this:

Uncompress and extract the tar files.
Install DB according to the instructions.
Create BuildTools/Site/site.config.m4. Below is a sample. (When you are editing or creating m4 files, be sure that you do not create extraneous blank spaces, or you will get strange results.)

APPENDDEF(`confENVDEF',`-DMAP_REGEX')
define(`confCC',`gcc')
define(`confEBINDIR',`/usr/local/sendmail-r8.9/lib')
define(`confHFDIR',`/usr/local/sendmail-r8.9/lib')
define(`confMANROOT',`/usr/local/man/man')
define(`confMANDOC',`-man')
define(`confMBINDIR',`/usr/local/sendmail-r8.9/lib')
define(`confNROFF',`nroff -Tlp')
define(`confSBINDIR',`/usr/local/sendmail-r8.9/bin')
define(`confUBINDIR',`/usr/local/sendmail-r8.9/bin')
define(`confLINKS',`${UBINDIR}/newaliases ${UBINDIR}/mailq \
  ${UBINDIR}/hoststat ${UBINDIR}/purgestat')
define(`confLIBDIRS',`-L/usr/local/lib')
define(`confMAPDEF', `')

A description of the above values can be found in the src/README file in the Sendmail source tree, and in the BuildTools/README file.

Note that I am not installing Sendmail in the usual place. This practice goes against Eric Allman's recommendation, but I always leave the root partition alone as much as possible. Because of the myriad different files associated with the program, I install it in /usr/local/sendmail-r8.X (where X is the current revision). When I upgrade to a new version, it will be easy for me to link /usr/lib/sendmail to the new Sendmail. As soon as I am confident the new program is running well, I simply delete the old structure.

To build Sendmail:

Create the directories /usr/local/sendmail-r8.X, /usr/local/sendmail-r8.X/bin, /usr/local/sendmail-r8.X/lib, and /usr/local/sendmail-r8.X/databases.
cd to the src subdirectory in the Sendmail source hierarchy.
run Build -c to build Sendmail (the -c overrides any previous configuration).
cd obj* and do make install. You may have errors installing the man pages into the man8 subdirectory. Edit the Makefile to correct this.
cd ../../makemap. Run sh Build -c to build makemap. cd to the obj* directory and do make install.
Move your old version of Sendmail to something like /usr/lib/sendmail.bak, and link /usr/local/sendmail-r8.X/lib/Sendmail to /usr/lib/sendmail.

A Short Tutorial on Sendmail's Rules
Sendmail's rules look like this:

Rlhs <tab> rhs <tab> comment

Note that there are three fields: a left-hand side (LHS), a right-hand side (RHS), and a comment. They are separated by tabs; this is very important! If you are ever cutting and pasting when you edit the sendmail.cf, you will lose the tabs to spaces. Be very careful, because this is an easy mistake to make. The comment is optional. The letter R just means, "This is a Rule." Sendmail's rules have a short little algorithm, which works like this:

begin:
if (lhs matches the workspace) then
  rhs rewrite
  goto begin
else
  next rule

First the input (an email address) and LHS are tokenized (separated into component pieces). Tokens are divided by delimiting characters: .:%@!^=/[] also <>; and the space (which is why Sendmail needs a tab to separate fields). Then the LHS is pattern matched against the input (case insensitive); this is not a regular expression.

For example, tokenize the input:

<fuu@bar.spammer.org> -> becomes < fuu @ bar . spammer . org >

Tokenize the patterns:

$*<$*>$*<$*>$* -> becomes $* < $* > $* < $* > $*

$=w . $=D ! $+ -> becomes $=w . $=D ! $+

(Note that the spaces around our separators only makes them easier to read. It doesn't add more separation.)

$=W$*<@$=X.$X>$* -> becomes $=W $* < @ $=X . $X > $*

Trick question! I have not shown you the building blocks, or operators, of the LHS. These operators follow; they are each tokens.

The LHS

LHS Building blocks:

$*	Match 0 or more token
$+	Match 1 or more token
$-	Match exactly 1 token
$@	Match exactly 0 tokens
$=letter	Match any token in class letter
$~letter	Match any token not in class letter
$letter	Match any token in macro letter
Anything else	Match token for token

Tokens in the LHS match the minimum number of tokens possible in the input (including matching nothing). If the token does not match, you back up and retry, adding a token to the input and trying to match. The simplified algorithm goes like this:

begin:
tokenize
loop:
if (all tokens are used up) match!
if (current tokens match) next tokens
else backup lhs
goto loop:

Matching "nothing" on the RHS is a trivial case, so it may be convenient to think of "nothing" as a "token". For example:

Match LHS: $*yyy$*
with input: Xxx.yyy.zzz

Work methodically, and apply the algorithm.
Tokenize:

$* yyy $*          Xxx . yyy . zzz

Match:

$*	with	nothing	? Yes, it matches.
yyy	with	Xxx	? No, it does not. So back up LHS:
$*	with	Xxx	? Yes, it matches.
yyy	with	.	? No, so back up LHS:
$*	with	Xxx .	? Yes, it matches.
yyy	with	yyy	? Yes, it matches.
$*	with	nothing	? Yes, it matches. We're out of LHS, so we stay at the $*...Since we still have RHS, we continue.
$*	with	.	? Yes, it matches.
$*	with	. zzz	? Yes, it matches.

Before we look at the RHS, let's look at some more constructs that can be used in the config file:

Classes and Macros

Classes are used only in the LHS. Macros are used in the LHS or RHS. A macro is defined thus:

Dletterstring

It is evaluated as the configuration file is read. It is called as follows: $letter.

A class is defined in the following manner:

Cletter string1 string2 string3

It is evaluated when it is reached. It is called as follows: $=letter. Prior to V8, Sendmail looked up only a single token to match a class. Classes are stored in a hash table; lookup is efficient.

File classes are very similar to regular C classes. C and F build on each other; simply use them when you need a longer list of strings. A file class is defined like: FXfile. For example:

FA/usr/local/sendmail-r8.X/lib/subdomains

Database classes are defined like this:

Kname class args

Class specifies what kind of database it is, and args is the name of the file it is stored in. Example:

Kmailertable dbm /usr/local/sendmail-r8.X/lib/mailertable

Database classes are called in the RHS only, like:

$( name key $)

That string is replaced with the result of the key lookup in database "name". Each entry in the database has two parts, the key and the data. If the key matches, the data is returned; otherwise, the key is returned. However, you can have an additional construct, the $:. It looks like this:

$( name key $: default $)

This means that if the key is not found, the token following the $: is returned, in this case, "default".

The RHS

The purpose is to rewrite the workspace. The workspace is a temporary buffer whereupon a rule's actions are applied. The way a rule works is this: The email address is copied to the workspace. The workspace is compared against the LHS. If it matches, it is rewritten according to the RHS. After it is rewritten, you simply try again until it does not match. Here is the algorithm:

while (lhs matches workspace); do
      rewrite workspace (according to rhs)

The operators of the RHS:

$digit	Copy by position
$:	Rewrite once (prefix)
$@	Rewrite and return (prefix)
$>set	Rewrite through another ruleset
$#	Specify a delivery agent
$[ $]	Canonicalize hostname
$( $)	Database lookup
$letter	Rewrite with macro
anything else	Rewrite

Now we have all the building blocks needed to write a rule. So, let's go through a simple rule:

Rab$+.$*   de.$2

Our input is ab.c.def.
Tokenize:

LHS	input
ab $+ . $*	ab . c . def

Match:	Match:
ab	matches	ab
$+	matches	.
.	does not match	c	so backup
$+	matches	. c
.	matches	.
$*	matches	def

Yes, it matches! Now, we rewrite ab.c.def:

The $* match is put into $2; "copy by position" means that the n'th pattern from the left-hand side is used when writing to the workspace. So, the output from our first run through this rule is: de.def. But, we're not done. Recall that Sendmail rules loop, so we try again with the new input of de.def:
Tokenize:

LHS	input
ab $+ . $*	de . def

Match:	Match:
ab	does not match	dc

This rule is through because we do not match. We go down to the next rule.

Delivery Rules

There is a special type of rule called a delivery rule. Its job is not merely to rewrite, but to actually define an action: How is this message to be delivered? A delivery rule will tell you.

Delivery rules apply to the RHS only. They are used in ruleset 0 to set the delivery agent or exit on an error, and in the anti-spamming rulesets to deny mail. The only delivery rule that makes sense in the anti-spamming rulesets is the #error agent.

The RHS in a delivery rule looks like: $#agent $@host $:user

The LHS is as usual. For example:

R$+<$+>$* $#smtp$@internal.com $:$1<@$2>

Here, the RHS says that if the LHS matches, this message will be sent using the smpt delivery agent to the host internal.com, and the user will be written using the first and second patterns that the LHS matched.

Delivery Agent

The actual mail delivery for the message in question is handled by the delivery agent. It's not a rule; it is an action. Delivery agents look like: Mname, equate, equate, equate, ... For example:

Msmtp, P=[IPC], F=mDFMuX, S=11/31, R=61, E=\r\n, L=990, A=IPC $h

Only the name, the P= equate, and the A= equate are required.

Rulesets

A single rule can only do so much. To get any real work done, rules are grouped together in sets. Within the group, the output of the previous rule becomes the input of the next rule.

Rulesets begin with S and a name or number, such as S3. Consider the following ruleset:

######################################################################
###  LookUpDomain - search for domain in access database
###
###     Parameters:
###             <$1> - key (domain name)
###             <$2> - default (what to return if not found in db)
###             <$3> - passthru (additional data passed unchanged through)
######################################################################

SLookUpDomain
R<$+> <$+> <$*>         $: < $(access $1 $: ? $) > <$1> <$2> <$3>
R<?> <$+.$+> <$+> <$*>  $@ $>LookUpDomain <$2> <$3> <$4>
R<?> <$+> <$+> <$*>     $@ <$2> <$3>
R<$*> <$+> <$+> <$*>    $@ <$1> <$4>

This is a fairly simple ruleset found in your Sendmail config file. Each rule in the ruleset is applied to the input. The input of each rule is the result of the rule above it in this ruleset.

Summary

In preparation for our Sendmail-based war on spam, we've learned that Sendmail uses rules to determine destination and to rewrite addresses. Rules have a left-hand side (LHS) and a right-hand side (RHS). Pattern matching is used in the LHS. Sendmail's function can be summarized as tokenize, then rewrite or deliver. We've also seen that in accomplishing that task, Sendmail uses operators, classes, and macros. Now that we have a solid foundation for using Sendmail, next month we will delve into the specifics of using the program to fight spam.

(E-Media Manager's Note: This is the first article of a three part series. Check the February and March, 1999 issues for continued content.)

About the Author

Mike Schwager has been a Systems Administrator since he struggled with his 2-Megabyte disk quota on a Vax 11/780. He is going to use some of the proceeds from this article for a new 10-Gig hard disk on his Linux box. Write him at: Michael@Schwager.com and visit http://www.enteract.com/~schwager.