| jan99.tar |
Sendmail as GatekeeperMichael Schwager If you have an Internet email account, you likely have encountered the problem of "spam", or "Unsolicited Commercial (or Bulk) Email" (UCE or UBE). It may already be a big problem for your organization or company. Beginning this month, Sys Admin will begin a three-part series on Sendmail and configuring it for denying spam. In this issue, we'll give you the basic foundation for understanding Sendmail. Next month, we'll cover the important anti-spam features; and in March, we'll wrap it up by discussing ways to be even more proactive in stopping mail before it enters your Intranet. You and your users may already be combatting spam with filters on your mail clients. However, it would be most efficient if you could stop spam before it enters your domain. Sendmail, as the Internet's foremost Mail Transfer Agent (MTA), is becoming smarter at denying spam-suspicious email. Here are the facilities that Sendmail, as of version 8.9, contains to deny spam mail:
The first four items are set up when you configure sendmail "out of the box", the last item requires you to set up the rules to enable them. All of the items need at least some configuration to allow or disallow certain emails. Sendmail Operation Compiling and installing the Sendmail binary is liable to give you a false sense of bravado. The real "guts" of Sendmail are found in the configuration file. Indeed, it can seem intimidating. However, as with any great work, much of Sendmail's complexity comes from its genuine simplicity. If you think of the Sendmail program itself as a black box, it does the following four things:
To do these things, it uses the sendmail.cf file. In it:
Rulesets are traditionally referred to by a simple number (e.g., "Ruleset 3" or "Ruleset 0"), but as of Sendmail 8.7, named rulesets have been available (e.g., "Ruleset check_mail"). Sendmail has a definitive order for doing things. This order is pre-ordained in the source code. There is a famous "sequence of rule sets" diagram that bears repeating. It shows the basic paths that an address may take through Sendmail's rules:
For in-depth information about the program, I suggest buying a copy of Sendmail, Second Edition, by Bryan Costales and Eric Allman from O'Reilly and Associates (ISBN 1-56592-222-0). It is truly the Sendmail bible. Within the constraints of this article, however, I'll discuss Sendmail's operation by looking at a typical SMTP conversation. The client connects to the server at port 25 where our Sendmail daemon is listening:
That's four addresses. Here's what happens (refer to the sequence of rulesets): The envelope FROM: address (in the SMTP dialog) is run through Ruleset 3, then Ruleset 1, then the rule that appears in the S= flag from the mailer specification as determined by Ruleset 0, then through Ruleset 4. It is rewritten based on the rules in those rulesets, and the modified form is passed from one ruleset to the next ruleset. Thus, Ruleset 3 modifies the address, and the output of its work becomes the input to Ruleset 2, and so on. Now for the envelope recipient (the RCPT TO: in the SMTP dialog). In reality, this address is one of the first to be processed. The address first gets processed by Ruleset 3. Since it is the envelope recipient address, it is then sent through Ruleset 0. The job of Ruleset 0 is to look into the future, and determine where this message is going. Is it a local or Internet user? Are there special delivery conditions? Ruleset 0 figures it out and selects a mailer. A mailer is just the method used to deliver this mail. The method includes the host or program we need to interact with, and any special rules that should be applied to the email addresses on this mail message, by filling in the blanks after the R= and S=. Let's say Sendmail has determined that R=21 and S=11. So, the envelope recipient address continues (after Ruleset 3 and after determining the mailer) to go through and be processed by Rulesets 2, then 21, then 4. Similarly, the header To: address is processed by Rulesets 3, 2, 21, and 4. The header From: address is a sender address, so it is processed by Rulesets 3, then 1, 11, and 4. After all this rewriting is accomplished, the mail message is written to a file and queued for redelivery (if necessary) and sent. Installing Sendmail In order to learn Sendmail, it's best to have a running copy. You can download Sendmail by pointing your Web browser to http://www.sendmail.org and following the instructions there. Sendmail runs on many versions of UNIX, including Linux, see:
http://www.sendmail.org/faq/section3.html#3.26 A Windows NT version is available from www.metainfo.com. I have not used that version; it claims to have a specialized GUI for configuration although I understand it uses a standard sendmail.cf file. Additionally, you may want to get the Berkeley DB package. It comes highly recommended and is easy to install and implement. The most recent version is available at http://www.sleepycat.com. Once you obtain the source code, you'll need to compile it and install it. I will assume that you are installing the Berkeley DB package, as well. Installing Sendmail goes like this:
APPENDDEF(`confENVDEF',`-DMAP_REGEX')
define(`confCC',`gcc')
define(`confEBINDIR',`/usr/local/sendmail-r8.9/lib')
define(`confHFDIR',`/usr/local/sendmail-r8.9/lib')
define(`confMANROOT',`/usr/local/man/man')
define(`confMANDOC',`-man')
define(`confMBINDIR',`/usr/local/sendmail-r8.9/lib')
define(`confNROFF',`nroff -Tlp')
define(`confSBINDIR',`/usr/local/sendmail-r8.9/bin')
define(`confUBINDIR',`/usr/local/sendmail-r8.9/bin')
define(`confLINKS',`${UBINDIR}/newaliases ${UBINDIR}/mailq \
A description of the above values can be found in the src/README file in the Sendmail source tree, and in the BuildTools/README file. Note that I am not installing Sendmail in the usual place. This practice goes against Eric Allman's recommendation, but I always leave the root partition alone as much as possible. Because of the myriad different files associated with the program, I install it in /usr/local/sendmail-r8.X (where X is the current revision). When I upgrade to a new version, it will be easy for me to link /usr/lib/sendmail to the new Sendmail. As soon as I am confident the new program is running well, I simply delete the old structure. To build Sendmail:
A Short Tutorial on Sendmail's Rules
Rlhs <tab> rhs <tab> comment Note that there are three fields: a left-hand side (LHS), a right-hand side (RHS), and a comment. They are separated by tabs; this is very important! If you are ever cutting and pasting when you edit the sendmail.cf, you will lose the tabs to spaces. Be very careful, because this is an easy mistake to make. The comment is optional. The letter R just means, "This is a Rule." Sendmail's rules have a short little algorithm, which works like this:
begin: if (lhs matches the workspace) then rhs rewrite goto begin else next rule First the input (an email address) and LHS are tokenized (separated into component pieces). Tokens are divided by delimiting characters: .:%@!^=/[] also <>; and the space (which is why Sendmail needs a tab to separate fields). Then the LHS is pattern matched against the input (case insensitive); this is not a regular expression. For example, tokenize the input:
<fuu@bar.spammer.org> -> becomes < fuu @ bar . spammer . org > Tokenize the patterns:
$*<$*>$*<$*>$* -> becomes $* < $* > $* < $* > $* $=w . $=D ! $+ -> becomes $=w . $=D ! $+ (Note that the spaces around our separators only makes them easier to read. It doesn't add more separation.)
$=W$*<@$=X.$X>$* -> becomes $=W $* < @ $=X . $X > $* Trick question! I have not shown you the building blocks, or operators, of the LHS. These operators follow; they are each tokens. The LHS LHS Building blocks:
Tokens in the LHS match the minimum number of tokens possible in the input (including matching nothing). If the token does not match, you back up and retry, adding a token to the input and trying to match. The simplified algorithm goes like this:
begin: tokenize loop: if (all tokens are used up) match! if (current tokens match) next tokens else backup lhs goto loop: Matching "nothing" on the RHS is a trivial case, so it may be convenient to think of "nothing" as a "token". For example:
Match LHS: $*yyy$*
Work methodically, and apply the algorithm.
$* yyy $* Xxx . yyy . zzz Match:
Before we look at the RHS, let's look at some more constructs that can be used in the config file: Classes and Macros Classes are used only in the LHS. Macros are used in the LHS or RHS. A macro is defined thus:
Dletterstring It is evaluated as the configuration file is read. It is called as follows: $letter. A class is defined in the following manner:
Cletter string1 string2 string3 It is evaluated when it is reached. It is called as follows: $=letter. Prior to V8, Sendmail looked up only a single token to match a class. Classes are stored in a hash table; lookup is efficient. File classes are very similar to regular C classes. C and F build on each other; simply use them when you need a longer list of strings. A file class is defined like: FXfile. For example:
FA/usr/local/sendmail-r8.X/lib/subdomains Database classes are defined like this:
Kname class args Class specifies what kind of database it is, and args is the name of the file it is stored in. Example:
Kmailertable dbm /usr/local/sendmail-r8.X/lib/mailertable Database classes are called in the RHS only, like:
$( name key $) That string is replaced with the result of the key lookup in database "name". Each entry in the database has two parts, the key and the data. If the key matches, the data is returned; otherwise, the key is returned. However, you can have an additional construct, the $:. It looks like this:
$( name key $: default $) This means that if the key is not found, the token following the $: is returned, in this case, "default". The RHS The purpose is to rewrite the workspace. The workspace is a temporary buffer whereupon a rule's actions are applied. The way a rule works is this: The email address is copied to the workspace. The workspace is compared against the LHS. If it matches, it is rewritten according to the RHS. After it is rewritten, you simply try again until it does not match. Here is the algorithm:
while (lhs matches workspace); do
rewrite workspace (according to rhs)
The operators of the RHS:
Now we have all the building blocks needed to write a rule. So, let's go through a simple rule:
Rab$+.$* de.$2
Our input is ab.c.def.
Yes, it matches! Now, we rewrite ab.c.def:
The $* match is put into $2; "copy by position" means that the n'th pattern from the left-hand side is used when writing to the workspace. So, the output from our first run through this rule is: de.def. But, we're not done. Recall that Sendmail rules loop, so we try again with the new input of de.def:
This rule is through because we do not match. We go down to the next rule. Delivery Rules There is a special type of rule called a delivery rule. Its job is not merely to rewrite, but to actually define an action: How is this message to be delivered? A delivery rule will tell you. Delivery rules apply to the RHS only. They are used in ruleset 0 to set the delivery agent or exit on an error, and in the anti-spamming rulesets to deny mail. The only delivery rule that makes sense in the anti-spamming rulesets is the #error agent. The RHS in a delivery rule looks like: $#agent $@host $:user The LHS is as usual. For example:
R$+<$+>$* $#smtp$@internal.com $:$1<@$2> Here, the RHS says that if the LHS matches, this message will be sent using the smpt delivery agent to the host internal.com, and the user will be written using the first and second patterns that the LHS matched. Delivery Agent The actual mail delivery for the message in question is handled by the delivery agent. It's not a rule; it is an action. Delivery agents look like: Mname, equate, equate, equate, ... For example:
Msmtp, P=[IPC], F=mDFMuX, S=11/31, R=61, E=\r\n, L=990, A=IPC $h Only the name, the P= equate, and the A= equate are required. Rulesets A single rule can only do so much. To get any real work done, rules are grouped together in sets. Within the group, the output of the previous rule becomes the input of the next rule. Rulesets begin with S and a name or number, such as S3. Consider the following ruleset:
###################################################################### ### LookUpDomain - search for domain in access database ### ### Parameters: ### <$1> - key (domain name) ### <$2> - default (what to return if not found in db) ### <$3> - passthru (additional data passed unchanged through) ###################################################################### SLookUpDomain R<$+> <$+> <$*> $: < $(access $1 $: ? $) > <$1> <$2> <$3> R<?> <$+.$+> <$+> <$*> $@ $>LookUpDomain <$2> <$3> <$4> R<?> <$+> <$+> <$*> $@ <$2> <$3> R<$*> <$+> <$+> <$*> $@ <$1> <$4> This is a fairly simple ruleset found in your Sendmail config file. Each rule in the ruleset is applied to the input. The input of each rule is the result of the rule above it in this ruleset. Summary In preparation for our Sendmail-based war on spam, we've learned that Sendmail uses rules to determine destination and to rewrite addresses. Rules have a left-hand side (LHS) and a right-hand side (RHS). Pattern matching is used in the LHS. Sendmail's function can be summarized as tokenize, then rewrite or deliver. We've also seen that in accomplishing that task, Sendmail uses operators, classes, and macros. Now that we have a solid foundation for using Sendmail, next month we will delve into the specifics of using the program to fight spam. (E-Media Manager's Note: This is the first article of a three part series. Check the February and March, 1999 issues for continued content.)
About the AuthorMike Schwager has been a Systems Administrator since he struggled with his 2-Megabyte disk quota on a Vax 11/780. He is going to use some of the proceeds from this article for a new 10-Gig hard disk on his Linux box. Write him at: Michael@Schwager.com and visit http://www.enteract.com/~schwager.
|