Dr. Dobb's Journal May 2004
You'll try filters, you'll try rules, you'll try just about anything to stop e-mail spam from getting throughbut as hard as you try, there's always something that makes it through. The latest trick from our friends in black hats is to use HTML to send spam. Lately, for example, I've been inundated by spam that says I can get free cable TV. Now, I like free things as much as the next guy, but in my experience "free" usually means paying lots of money and getting little in return. No problem, I thought as I added "Free Cable" to my spam filter list. Thinking I'd fixed the problem, I went about my business only to receive four more e-mails offering me free cable. Deciding to get to the root of the problem, I started poking around and discovered that the e-mail is HTML encoded. Viewing the source I found this:
<enjoy>F</enjoy>r<babble>ee</babble> C<thingie>abl</thingie>e
What's happening here is that, because the viewer in Outlook is actually an embedded instance of Internet Explorer (IE), it's decoding this line and ignoring the tags that aren't part of the HTML standard. Once you remove them, you get "Free Cable." Thinking there must be some way around this with a rule, I poked around some more and ended up trying a VBA macro solution. What the macro (Listing One) did was to strip anything inside "<" and ">" and check for specific words or phrases. This worked but evoked a bit of gotcha, as you can see in Figure 1.
It turns out that, at one time, Microsoft enabled automation of its Office products and, for a while, it was good. Then somebody figured out that if they put it inside an e-mail message, a script that would read the user's address list and send itself to all of the users contained therein would make a nasty little wormand it did. Seeing the error of its ways, Microsoft tightened security. This is fine if you're only running a script once but, in this case, I wanted to use it every time a new e-mail arrived. While there are probably other solutions to this issue, the one I chose was to convert my code to a COM add-in.
Microsoft Visual Studio .NET 2003 contains a nifty wizard for building Office add-ins. This wizard guides you through creating a project and ends up generating code. On the File menu, choose New -> Project and Select Other Projects, then Extensibility Projects. The one you want is Shared Add-in (Figure 2). Click Next on the first page of the wizard (it's pretty, but not very useful), then choose Create an Add-in using Visual C# on the second page and hit Next.
For purposes here, we're only interested in creating an add-in for Microsoft Outlook, so uncheck all the rest and click Next. Enter a name and description of the add-in. In my case, it's "HTMLSpamFilter." Once you're done, click Next
The next step is important: Click the checkbox beside the option "I would like my Add-in to load when the host application loads," but leave the other one blank (Figure 3). The reason for this is that the add-in becomes an option in the COM add-in manager in Outlook only if you haven't marked it available to all users. If this isn't important to you, then choose the second option as well. Click Finish on the next page and you're presented with the skeleton code that loads the add-in every time Outlook loads. At the bottom of this code are two important objects:
private object applicationObject;
private object addInInstance;
You're going to change the applicationObject to be an instance of Outlook.Application; but first, you need to add Outlook as a reference in the project. You do this by going to the Project, Add Reference menu and selecting Microsoft Outlook from the COM tab (Figure 4). You can also add a reference to System.Windows.Forms.dll in the .NET tab. Once this is done, add this to the top of the file:
using System.Windows.Forms;
using Outlook;
Then, at the bottom of the file, change private object applicationObject; to private Outlook.Application applicationObject;. You connect the applicationObject to Outlook in the onConnection function, changing this function as in Listing Two. This initializes the application object and casts it to an Outlook Application object. You can safely do this because, in the wizard, you chose only Outlook as the add-in host. You also have to add some clean-up code to the onDisconnection function (Listing Three) to ensure that things are dereferenced properly.
The final important addition to the skeleton code is in the OnStartupComplete function, which ties the NewMailEx event to a function that you want called when a new mail item is received; see Listing Four. The last thing to do is to add the function stub (Listing Five) that gets called every time new mail is received.
If you build the solution and install it, the next time you run Outlook, you'll get a message box stating, "New Mail Received" whenever new messages are received. Neat, but not particularly useful.
To filter the spam, you need to access the actual message content. However, to do that, you first need to retrieve the message from the store. You do that by calling the function GetItemFromID like so:
Outlook.MailItem item =
(Outlook.MailItem)outlookNamespace.Get-
ItemFromID(mailId, null);
Use null as the second parameter (the StoreID) to tell it to use the default message store. Once you have the item, you can determine if it's in a nonplain format by testing the BodyFormat of the item like so:
if ( item.BodyFormat != Outlook.OlBody- Format.olFormatPlain ) {
If it's not plaintext, then you'll want to strip out any HTML. You can do this with a simple state machine (Listing Six). While this won't catch everything (entire libraries are written for that), it should suffice.
All that's left is to test for the inclusion of specific phrases. You do this by hard-coding the phrases into an array, such as:
private String[] supressList = { "free cable", "prescription", "mortgage" };
If you find one of these bad boys, then move the message off to another folder (for this example, I'm using the "Junk E-mail" folder created by Outlook in the Personal Folders). Listing Seven is the entire NewMail function.
Now I'll sit back and wait until somebody comes up with some new way to bug me. That's only a matter of time.
DDJ
' Function that will move e-mail that contains HTML spam to a specific folder
Public Sub FilterHTMLSpam(opMail As MailItem)
Dim slBody As String
On Error GoTo ErrorHandler
If opMail.BodyFormat <> olFormatPlain Then
Set myOlApp = CreateObject("Outlook.Application")
Set myNameSpace = myOlApp.GetNamespace("MAPI")
Set myPersonalFolders = myNameSpace.Folders("Personal Folders")
Set myDestFolder = myPersonalFolders.Folders("Junk E-mail")
' We have an HTML message. Flatten it
slBody = stripHTML(opMail.HTMLBody)
' If after we strip and trim it's empty then it's VERY likely spam
If (Len(Trim(slBody)) < 1) Then
opMail.Move myDestFolder
Else
' See what we've got
If Contains(slBody, "descramble", "Medication", "Doctor", _
"porn", "viagra", "cash", "Prescribes", _
"naked", "adult", "%", "sell") Then
opMail.Move myDestFolder
End If
End If
End If
Exit Sub
ErrorHandler:
Msg = "Error # " & Str(Err.Number) & " was generated by " _
& Err.Source & Chr(13) & Err.Description
MsgBox Msg, , "Error", Err.HelpFile, Err.HelpContext
End Sub
' Function that will return true if any of the values in the ParamArray
' are contained within the Body text
Private Function Contains(spBody, ParamArray spText() As Variant) As Boolean
Dim slText As Variant
tempTest = LCase(spBody)
For Each slText In spText()
If InStr(tempTest, LCase(slText)) Then
Contains = True
Exit For
End If
Next
End Function
' Simple state machine to remove html tags from the input string
Function stripHTML(HTMLString As String)
result = ""
pos = 1
skipping = False
While (pos < Len(HTMLString))
enable = False
current = Mid(HTMLString, pos, 1)
If (current = "<") Then skipping = True
If (current = ">") Then enable = True
If (Not skipping) Then result = result + current
If (enable) Then skipping = False
pos = pos + 1
Wend
stripHTML = result
End Function
public void OnConnection(object application,
Extensibility.ext_ConnectMode connectMode,
object addInInst, ref System.Array custom) {
applicationObject = (Outlook.Application)application;
mAddInInstance = addInInst;
if(connectMode != Extensibility.ext_ConnectMode.ext_cm_Startup) {
OnStartupComplete(ref custom);
}
}
public void OnDisconnection(Extensibility.ext_DisconnectMode
disconnectMode, ref System.Array custom) {
if(disconnectMode != Extensibility.ext_DisconnectMode.ext_dm_HostShutdown) {
OnBeginShutdown(ref custom);
}
applicationObject = null;
}
public void OnStartupComplete(ref System.Array custom) {
// Setup a function to be called on the NewMailEx Event
Outlook.NameSpace outlookNamespace =
mApplicationObject.GetNamespace("MAPI");
try {
utlookNamespace.Application.NewMailEx +=
new Outlook.ApplicationEvents_11_NewMailExEventHandler(this.NewMail);
} catch (System.Exception ex) {
System.Windows.Forms.MessageBox.Show(ex.ToString());
}
}
private void NewMail( String mailId ) {
System.Windows.Forms.MessageBox.Show( "New Mail Received" );
}
private string stripHTML( string source ) {
System.Text.StringBuilder builder = new System.Text.StringBuilder();
int pos = 0;
bool skipping = false;
while ( pos < source.Length ) {
if ( !skipping ) {
if ( source[pos] == '<' ) {
skipping = true;
} else {
builder.Append( source[pos] );
}
} else {
if ( source[pos] == '>' ) skipping = false;
}
pos ++;
}
return builder.ToString();
}
// The list of words or phrases that we want to suppress/Mark as spam
private String[] supressList = { "free cable", "prescription", "mortgage" };
private void NewMail( String mailId ) {
try {
// First obtain the MAPI namespace from the Outlook Application
Outlook.NameSpace outlookNamespace =
applicationObject.GetNamespace("MAPI");
// Now retrieve the MailItem from the default store.
Outlook.MailItem item =
(Outlook.MailItem)outlookNamespace.GetItemFromID( mailId, null );
// Figure out where we want to put the junk mail
Outlook.MAPIFolder personalFolders =
outlookNamespace.Folders["Personal Folders"];
Outlook.MAPIFolder junkFolder = personalFolders.Folders["Junk E-mail"];
// See if the message is in a non plain format
if ( item.BodyFormat != Outlook.OlBodyFormat.olFormatPlain ) {
// Strip the body text
string body = stripHTML( item.HTMLBody );
for ( int i = 0; i < supressList.Length; i ++ ) {
if ( body.IndexOf( supressList[i].ToLower() ) >= 0 ) {
// We've got spam...move it to the junk folder
item.UnRead = true;
item.Move( junkFolder );
}
}
}
} catch ( System.Exception e ) {
System.Windows.Forms.MessageBox.Show( e.ToString() );
}
}